Improving Search Results Inside the Enterprise


Pavel Dmitriev from Cornell
Pavel Dmitriev from Cornell
(click to enlarge)

Organizations often use search engines as part of their corporate information infrastructure. The problem is that inside corporations creating Web pages is typically much more difficult than it is on the Web at large and consequently, links to pages are a much less useful indicator of page relevance. How do you solve this problem?

I attended a presentation by Pavel Dmitriev from Cornell that discusses one such solution. (See the paper.)

Within an organization, users are much more likely to be interested in improving the results from a search engine. Dmitriev and his co-authors built a system called Trevi for researching various strategies for annotating search results to improve relevance. There are two types of annotations: explicit and implicit.

Explicit annotations involve having the user indicate relevance of results using some kind of feedback mechanism. Once users are at a page, they are unlikely to go back, so you need to provide a means for them to indicate relevance of open pages. This research used a toolbar where users could enter a relevance rank.

Implicit annotation use the query logs along with logs indicating what users clicked on to represent a vote of relevance. There are a couple of different strategy vectors.

First do you count every click or just the last click? User's tend trust the search engine, so they click on higher ranked entries without necessarily reading them closely. To eliminate this bias, you might only annotate the last entry the user clicks on.

Second, you can assume each search is independent or combine the results from a series of searches over a short interval of time on the assumption that searches made close together are related to the task the user is working on right now.

The team did a series of experiments and determined that explicit annotation leads to a significant improvement in relevant pages being highly ranked (13.9% vs an 8.9% baseline). Implicit annotation, regardless of strategy used did not result in a significant improvement. The author's concluded:

A closer investigation showed that there was little overlap between the pages that received implicit annotations, and the ones that were annotated explicitly. We suspect that, since the users knew that the primary goal of annotations is to improve search quality, they mostly tried to enter explicit annotations for pages they could not find using Trevi. We conclude that a different experimentation approach is needed to evaluate the true value of implicit annotations, and the differences among the four annotation strategies.