Posts Tagged ‘graph theory’

Musing on dictionaries, axioms, and algorithms

April 2, 2009

800px-dictionaryindentsI have a distinct memory of a specific moment in childhood, sitting in a second grade classroom, when I realized that dictionaries are inherently circular.  I can open the dictionary to see a description of the meaning of the word “proponent”, and read “one who argues in favor of something”.  But then if this is to truly provide meaning for the word “proponent”, I need to also know the meanings of these seven other words.

Each of which I can look up in this same dictionary, and find their respective meanings described in terms of myriad other English words.

Sadly, this process never ceases.  In order for English words to have meaning by this process, one needs to know the meanings of some core set of words, some basis, in terms of which all other English words can be described.

(A parallel example:  If I don’t know any Finnish, then picking up a Finnish dictionary is not going to help me learn Finnish.)

So languages, and dictionaries in particular, are kind of like axiomatic systems.  We start with basic terms or axioms, whose meaning we know, whose truth we assume.  From this ad hoc starting point, we can build structure and meaning, but fundamentally the meaning of any particular thing reduces down to reference to our starting axioms, our undefinable terms.

I imagine creating  a digraph, whose vertices correspond to English words, with edges A→B if the word A appears in the definition of B.  (Such a digraph will depend on the dictionary one chooses, and there are also subtleties driven by the fact that some words take on multiple meanings in different contexts.  But let’s brush over those technicalities for now.)

I have questions.  And I don’t know enough about computational linguistics to even know how to ask them appropriately, or where to look for possible answers.

  1. Is this graph weakly connected? (Is everything linked to everything else?)  Or are there subgraphs that are isolated from the rest?  [Almost certainly the use of pronouns and simple verbs will lead to a single connected component; otherwise one might imagine some esoteric field of study all of whose technical vocabulary comprises a single component of the graph.]
  2. Is the digraph strongly connected: can I get from every vertex to every other by following the directed links?  If I take the definition of “proponent”, and then examine the definitions of “one”, “who”, “argues”, “in”, “favor”, “of”, “something”, and iterate this process, will I ever find a definition which uses the word “green”, for example? [Presumably in general the answer is no.  There are probably words that are never used in the definition of other words:  highly technical words come to mind, such as “anthrax” or “dyspnea”.]
  3. Is there a “basis” for the graph? That is, a minimal set of vertices V containing predecessors for all other vertices in the graph? [Is this called a “rootset” in the context of digraphs?]  {OK, this one I can answer with a Yes, on general principles, since the number of words is finite. One at a time, throw out any superfluous ones.}
  4. What is the smallest possible basis for this graph?  How many English words must I know in order to be able to look up and understand the meaning of any other word in the dictionary?
  5. Are there reasonably efficient algorithms for generating the smallest possible basis V?  Is this something I could be doing for fun on a PC running Matlab?  Or is this something so unreasonably complex that I’d need something much more powerful to take it on?

Apart from personal curiosity, it seems that the size of such a minimal basis might be used to measure the quality of a dictionary: can the authors describe everything in terms of a relatively small basic vocabulary?

The image Dictionaryindents.jpg was photographed by Thegreenj and posted to Wikipedia under the GNU Free Documentation License (v 1.2 or later).

Königsberg (Or how I much I love Google Maps)

April 23, 2008

Some of my students gave a presentation today on graph theory, and talked about the Bridges of Königsberg. Königsberg was founded in 1254 and is part of the Russian enclave Kaliningrad. And it shows up in math talks because of its bridges. As a brief background, the Pregal River runs through the city and splits in two; there’s also an island, so there are four different land masses. You can kind of see this in the following map from 1652 (which appears in almost any math talk on the subject): Click for the picture, and also for a cool satellite photo of Königsberg today (which I hid behind the jump because it takes a few seconds to load).