Computational Advertising

Andrei Broder of Yahoo! Research
Andrei Broder of Yahoo! Research
(click to enlarge)

I'm in a talk by Andrei Broder, a Yahoo! Fellow and Vice President of Computation Advertising on, what else, computational advertising. I was drawn to the talk by the title.

Find the "best match" between a given user in a given context and a suitable advertisement. Context could be click stream, page content, or something else. Key ideas:

  • The financial scale is huge. Small constants matter.
  • Advertising is a form of information
  • Finding the "best ad" is a type of information retrieval problem.

Classic advertising falls into one of two camps: brand advertising that is projecting a message and direct advertising that is attempting to elicit action. Coupons are a classic example of direct marketing.

For advertisers interested in online (keyword) ads, the key issues are

  • what words to buy
  • how much to pay
  • spamming is an economic activity

For search engine owners, the questions are

  • How to price the words (auction)
  • How to match ads to content

The problem with matching is that it's not purely syntactic. For example, an ad for Seattle hotels ought to match "Alaska cruise starting point" but not "Seatlle's Best Coffee Chicago". Finding the right ad is a query problem, but the ad database is smaller than the database of web pages. The the entries are smaller pages (less content). An ranking is not just based on matching, but also the bid.

There's been a lot of progress on this problem in recent years. Matches are not syntactic. What's not solved? Filtering for relevance. Ads on a page about Scotter Libby's testimony included entries for Libby Shoes.

We're moving from an explicit demand for information driven by a user query to active information supply driven by user activity and context. This requires the increased use of semantics and context. An information supply engine looks at user profile and context, the activity context (browsing) and the ad inventory, and provides an ad. User action then feeds back into the system.

There is a different quality (utility) factor for publishers, advertisers and users. The ad agency has it's own economic interest. Different types of ads (text, graphical, multimedia) are not easily compared.

One technique is to allow the searcher to peak at the result to determine what a query is about. For example viewing the query "TFM-PCIV92A" doesn't give you a lot of information about what this is about, but looking at the results tells you this is about 56K baud modems. Note that if you do that search in Google, you don't see any ads for modems. If you modemsearch modem, you'll see all kinds of sponsored ads. Why isn't Google figuring out the first search is about modems? (this is at least true from China...)

Finding better approaches requires interdisciplinary techniques: machine learning, optimization, information retrieval, statistical modeling, microeconomics, and so on.