Autonomy: Using Unstructured Data

I was going to go to the session on finance, but ended up not making it because I stopped to spend some time with some folks from Autonomy Systems and by the time I got to the session, it was beyond full.  Oh well, Autonomy was probably more aligned with my interests anyway.   

Autonomy allows one to find information by concept in unstructured data using a combination of "bayesian inference and Shannon's information theory."  Its been a long time since I studied either one of those, so that didn't mean much to me.  I found this document on their site which was much more helpful.    Autonomy is a British company and it shows when you see stuff like this.   I've often joked that British universities couldn't afford computers so they actually studied Computer Science.    

The reason for my initial interest is that Autonomy Systems recently signed a deal with the Office of Homeland Security as reported recently by the Wall Street Journal (and a company press release).  We have a project to create a first responder portal as part of our homeland security project and we need a way link information so that they can see what's relevant, not just by job function or location, but stuff that's related to what they're interested in right now.   My first thought is to throw a Google appliance at the problem, but Google's method of determining relevance may not be relevant in this case.  Something like what Autonomy has to offer might be just the ticket.

The same could be true of indexing internal data as well.  For indexing public data Google's algorithm works pretty well: I'll find interesting what most other people found interesting.  For private data or data that doesn't have a lot of interest but may be very relevant to the current problem, that algorithm doesn't work as well.  Google relies on the fact that there are lots of sites linking to lots of other sites to create relevance data.  That's not necessarily a good basis in some cases. 

Maybe we should do a bake-off: get a Google appliance, an Autonomy DR engine, and any other interesting technologies and run them against our data and study the effectiveness of the results.  I'd bet we could get the companies to donate the systems (maybe not) but we'd have to get a grant or something to pay for the set up, research, etc.   Any takers?