Building the Memex

Phil Windley // Wed Apr 23 15:55:00 2003

Maciej Cegiowski from the National Institute for Technology and Liberal Education is talking about P2P semantic search engines. The Memex is an idea that Vannevar Bush wrote about in 1945 to catalogue and organize information. Maciej is speaking on semantic indexing.

Maciej claims that its possible to infer semantic relationships from document content. He uses a case study of Steven Johnson (author of the book Emergence). Steve had 1146 paragraph clippings from 15 books arrange in flat-file text. He shows a search on "photosynthesis" which returns the traditional keywork matches but also entries that talk about "chloroplasts" and "symbiosis." It works on the principal that related documents share words.

The most well-known algorithm, called LSI, is O(n³). Another method is called contextual network graphing.

Peer-to-peer search, which seems something of a misnomer to me searches multiple connections in parallel and then interleaves the results. The project uses contextual network graphs to link articles. Maciej envisions a search API that would allow users to plug various search engines into their own aggregator (think RSS, but RSS doesn't quite cut it for this app).

Semantic searching and document linking is popular area. When I was CIO, I talked to a lot of folks who had a better way to search. I talked to some folks, Steve Nieker and Martin Remy, at lunch who are from ThinkTank23 who do searching. Homeland security has pushed this topic to the fore, but large organizations have always had the problem of finding what they have.