The Next Wave of the Web


Nigel Shadbolt (University of Southampton), Tim Berners-Lee (W3C), Richard Benjamins (iSOCO),Clare Hart (Dow-Jones), and Jim Hendler (MINDSWAP)
Plenary Panel
(click to enlarge)

WWW2006 was started this morning with an introduction to the technical program. The conference is very competitive; of the 697 papers submitted this year, 84 were accepted or 11%.

The plenary panel was entitled "The Next Wave of the Web." Nigel Shadbolt (University of Southampton) was the panel chair. The panelists were Tim Berners-Lee (W3C), Richard Benjamins (iSOCO), Clare Hart (Dow-Jones), and Jim Hendler (MINDSWAP). The discussion was mostly about the semantic web.

Shadbolt asked Berners-Lee what the achievements have been in the semantic web since the first article appeared in Scientific American. He pointed to RDF and OWL as very real achievements. He said that the big stopper has been not having a query language. SPARQL fills that gap.

Hendler said that we've underestimated the scale needed for deployment. People in the bioinformatics world can generate a 100 million triples per day. The challenge of scaling was way more than we've expected.

Benjamins said that it's easy to get people to understand the vision of semantic Web, but it's harder to get them to adopt. OWL has helped companies understand that the risk is limited now and it's easier to bring these technologies to the market.

Hart said that businesses have to figure out how to tap unstructured and structured information resources o build competitive advantage. Getting an inventory, determining a vocabulary, building sector relationships, applying tags automatically, and then building smart applications are key steps.

Are ontologies and taxonomies anything people can actually use? Hart says "yes." 80% of the data in businesses is unstructured. Organization is unique to the way the data is collected and used. Random indexing is great, but classification is necessary for many applications. For example, you can tell how many news articles in a given day mention IBM with indexing, but how many of those are about IBM?

Benjamins said that businesses think of ontologies as a necessary evil. They should recognize that they are doable and useful. We need better ways of estimating the cost of building an ontology so that businesses can plan for them and determine the ROI. Ontologies can be a source of competitive advantage. Factors in the cost include size, the difficulty of gathering the data, and the number of people involved in building it.

Hart mentions that Web 2.0 is leading expectations. Hendler says that he thought tagging would eventually work it's way up to ontologies, but its going both directions. There's some notion of converging to a "middle" but it's unclear to me where the middle is. What if these two activities are happening on different planes?

Berners-Lee says that the notion of people adding semantic tags to documents using Notepad is too simple. Hart says, again, that technology has to be the driver of categorization--people writing tags can only go so far. The amount of time people spend "searching" is increasing, not decreasing. This is a waste of time. You pay people to analyze data. The more time they spend searching, the less time they spend analyzing. Are you trusting your information strategy to a search engine?

Rohit Khare pointed out (via Bon Jour) this slide deck which has some good examples of Person OWL. This illustrates Benjamins earlier point perfectly--it's not hard to demonstrate the utility of the Semantic Web. People get it with only a few simple examples. They problem is making it happen.

Hendler says (again) that 1/3rd of the clicks on the Web are happening inside social Web sites like MySpace.com. This is layering a social component on top of the Web.

Berners-Lee says that his goal for RDF stores is that they will be invertible, With invertibility, you get the ability to debug where inferences go wrong. Thus you could get more than just a search engine; you'd get an explanation engine.

The title of this panel "the next wave of the Web" was appropriate for this panel, not because I necessarily believe that the Semantic Web is the next wave, but because it's still being talked about in the future tense and something that "will be really cool when it gets here."