When I was a grad student, I took a model theory class. Model theory is a branch of mathematical logic that deals with the meaning of symbols (in part). There were about a dozen of us in the class and half we CS PhD students and the other half were Math PhD students. The first part of the class was filled with pretty heavy set theory and the CS students were struggling. The next part however, was much easier for the CS students than the Math majors. I remember one class where the professor was introducing the idea that symbols and their meaning were separate. He made the point that the + symbol doesn't mean addition. One math major with a very perplexed look on his face said "that doesn't make any sense!" He'd never considered that the symbols and their meaning were separate
Computer scientists have long dealt with the issues surrounding the meanings of symbols. We're very comfortable with syntax and semantics. Every time we learn a new programming language, the task is building a mental model of what the syntax means. The symbol grounding problem that Jon mentions is the same problem. Computer science does syntax very well and XML is a great example of that. Semantics is tougher. Unfortunately, most of the work in semantics is not accessible to an undergraduate computer science major because of the complex mathematics involved. Also, some of the best books on the subject, like this one by Mike Gordon are out of print.
When students ask me why they ought to get a CS degree when they already know how to program, XML is one of my favorite examples. The concepts behind XML are pretty well understood CS theory. There's a lot of work turning that into a practical system, but the theory's years old. Having a solid grounding in CS theory is very helpful in understanding new things. 20 years ago it was object-oriented programming. Now, its XML. I'm sure it will be useful when the next new thing comes along.
My understanding of namespaces, which is admittedly not based on extensive study, is that they serve three purposes
- They eliminate symbol clashes.
- They potentially ensure that when we see a element we can tell if its the same element as the one with the same name in another document.
- They potentially give us more information about what the author of the XML in a particular namespace expects that tag to mean
The first purpose is important from a practicality standpoint. For example, if I write:
you can distinguish it from other <creator/> elements used in that particular document. These other elements with the same name might be distinguished in their own namespace, or they night be in the "null" namespace, which I prefer to empty (after all, its not empty if its got elements in it). :-)
The second purpose is about symbol equality across documents. When we use <dc:creator/>, we use it in a context where dc has been grounded by referencing a URL. That URL has to be unique, but it doesn't have to actually point to something. If it is a null URL, then we've achieved the purpose of making the elements in the namespace unique. We've also achieved a second important goal, we've ensure that when you say <creator/> and I say <creator/>, we're talking about the same tag if we ground our namespace in the same URL.
The third purpose is really what Jon's been wrestling with. When I see <creator/> how do I know what it means? As Jon has pointed out, this is where things get tricky. When we use the word "means" we usually think of some rigorous, complete definition. Its fairly easy to see how namespaces might provide us with more metadata and thus increase the information we have available to us about any given XML document. Its much harder to imagine that machines will be able to divine the meaning of the document no matter how much metadata you include.
Yet, its precisely this latter concept I hear when I listen to people talk about the semantic web. I'm with Jon, "If the RDF folks have really solved the symbol grounding problem, I'm all ears." I'll be satisfied, however, with better representations for metadata and good tools for processing it.