Musing on dictionaries, axioms, and algorithms

by

800px-dictionaryindentsI have a distinct memory of a specific moment in childhood, sitting in a second grade classroom, when I realized that dictionaries are inherently circular.  I can open the dictionary to see a description of the meaning of the word “proponent”, and read “one who argues in favor of something”.  But then if this is to truly provide meaning for the word “proponent”, I need to also know the meanings of these seven other words.

Each of which I can look up in this same dictionary, and find their respective meanings described in terms of myriad other English words.

Sadly, this process never ceases.  In order for English words to have meaning by this process, one needs to know the meanings of some core set of words, some basis, in terms of which all other English words can be described.

(A parallel example:  If I don’t know any Finnish, then picking up a Finnish dictionary is not going to help me learn Finnish.)

So languages, and dictionaries in particular, are kind of like axiomatic systems.  We start with basic terms or axioms, whose meaning we know, whose truth we assume.  From this ad hoc starting point, we can build structure and meaning, but fundamentally the meaning of any particular thing reduces down to reference to our starting axioms, our undefinable terms.

I imagine creating  a digraph, whose vertices correspond to English words, with edges A→B if the word A appears in the definition of B.  (Such a digraph will depend on the dictionary one chooses, and there are also subtleties driven by the fact that some words take on multiple meanings in different contexts.  But let’s brush over those technicalities for now.)

I have questions.  And I don’t know enough about computational linguistics to even know how to ask them appropriately, or where to look for possible answers.

  1. Is this graph weakly connected? (Is everything linked to everything else?)  Or are there subgraphs that are isolated from the rest?  [Almost certainly the use of pronouns and simple verbs will lead to a single connected component; otherwise one might imagine some esoteric field of study all of whose technical vocabulary comprises a single component of the graph.]
  2. Is the digraph strongly connected: can I get from every vertex to every other by following the directed links?  If I take the definition of “proponent”, and then examine the definitions of “one”, “who”, “argues”, “in”, “favor”, “of”, “something”, and iterate this process, will I ever find a definition which uses the word “green”, for example? [Presumably in general the answer is no.  There are probably words that are never used in the definition of other words:  highly technical words come to mind, such as “anthrax” or “dyspnea”.]
  3. Is there a “basis” for the graph? That is, a minimal set of vertices V containing predecessors for all other vertices in the graph? [Is this called a “rootset” in the context of digraphs?]  {OK, this one I can answer with a Yes, on general principles, since the number of words is finite. One at a time, throw out any superfluous ones.}
  4. What is the smallest possible basis for this graph?  How many English words must I know in order to be able to look up and understand the meaning of any other word in the dictionary?
  5. Are there reasonably efficient algorithms for generating the smallest possible basis V?  Is this something I could be doing for fun on a PC running Matlab?  Or is this something so unreasonably complex that I’d need something much more powerful to take it on?

Apart from personal curiosity, it seems that the size of such a minimal basis might be used to measure the quality of a dictionary: can the authors describe everything in terms of a relatively small basic vocabulary?

The image Dictionaryindents.jpg was photographed by Thegreenj and posted to Wikipedia under the GNU Free Documentation License (v 1.2 or later).

Tags: , ,

4 Responses to “Musing on dictionaries, axioms, and algorithms”

  1. David Petersen Says:

    I have had similar thoughts. Although it’s a general rule not to use the word within the definition, when defining the letter “A,” are you required to use words that have only other letters?

  2. deMoMo Says:

    While looking for something completely unrelated, I came across your website. As a lover of numbers and words, I get a sinking feeling that I will be spending (too much) time going through your archives over the next few days.

    I’m in trouble. I can tell already.

  3. David Petersen Says:

    deMoMo, if you like words and math, you might like this blog:

    http://sixwingedseraph.wordpress.com/

    TwoPi, I hope you don’t mind the plug for another blog I enjoy reading.

  4. TwoPi Says:

    David: I don’t mind at all; sixwingedseraph is one I haven’t seen before, but at first read it looks quite interesting. Thank you!

    deMoMo: Welcome! And thank you for your kind words!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: