Representations and URIs


Jon Udell references a discussion that has been raging on what a URI represents. Jon quotes Tim Berner-Lee who summarizes it, succinctly, as follows:

What does "http://www.amazon.com/exec/obidos/ASIN/ 0679600108/qid=1027958807/sr=2-3/ref=sr_2_3/103-4363499-9407855" identify?
  1. A whale
  2. "Moby Dick or the Whale" by Herman Melville
  3. A web page on Amazon offering a book for sale
  4. A URI string
  5. All the above
I find this question fascinating since it takes me back to my formal computer science roots. A long time ago, I was a formal methods researcher. Sherman, fire up the way-back machine:

In the late 80s there was a huge battle raging in some professional journals and letters to the editor in Communications of the ACM (CACM). (Remember, this was before blogs or even the web.) Demillo, Lipton, and Perlis had published a paper called Social Processes and Proofs of Theorems and Processes. It was followed by a paper in CACM by James Fetzer called "Program Verification: The Very Idea." Fetzer argued that proofs of correctness for programs was ridiculous from the start since the artifact had many properties that the proof could not consider. In 1989. Jon Barwise (who is now dead, unfortunately) wrote an eloquent paper in the Notices of the American Mathematical Society called "Mathematical Proofs of Computer System Correctness". Barwise argued that Demillo, Lipton, Perlis, and, especially, Fetzer had made a critical error by confusing the thing with the mathematical representation of the thing. Sound familiar?

The very essence of Applied Mathematics, and by extension, Computer Science is the notion of representation, naming, and abstraction. Confusion in these issues shows up quite frequently when students are learning about naming in a programming language theory course. (I always like to ask the question "What is '3'?") Confusion in these issues can lead to significant problems as the theory is developed later.

From a formal semantics viewpoint, a URI, like any other name, has to represent the thing that it is bound to and nothing else. That is, the URI represents whatever is returned when the URI is dereferenced. What is returned in the case of a URI is a string of bits. The resource that is returned may represent something else, but the URI is just a name for the resource. It carries no semantics beyond that. The resource is a model of something. The mathematics behind all of this is quite well developed and well thought out. This problem is a classic CS problem, not something that the web invented.

As an example, consider the URI for tracking a package at UPS. The URI is merely a name for the resource. That same resource could have millions of other names that all dereference to the same resource. The resource is a model for a package. That is, its an abstraction, based on properties, of a physical object. We must not confuse the model with its name and we must not ever confuse either of them with the artifact that the model represents.

For those who are interested in this kind of thing, Mike Gordon's The Denotational Description of Programming Languages: An Introduction is a great, readable introduction by one of the luminaries in the field.