Subscription Aggregation and Routing

I've mentioned Rohit's Khare's work on application level internetworking or ALIN before. While Rohit and others talk about general Web Services networking, including proxies, switches, and routers, I've recently been introduced to a company via Jon Udell called PreCache that is really about one thing: subscription routing.

The issue with a publish subscribe model isn't new. Think about RSS and how news aggregators work. The news aggregator has to poll each of the RSS feeds once an hour or so to see if anything has changed. For my simple web site there's about 125 of you who do that. Imagine the load if there were millions. If you remember your undergraduate computer architecture class, CPUs gave up on polling a long time ago and went to interrupts. Unfortunately, there are some things where polling is the most reasonable alternative.

The Founder and CTO of PreCache, David Rosenblum, who happens to be Rohit's advisor in grad school, is quoted in Jon's InfoWorld piece describing PreCache's solution to the problem:

"As more and more subscriptions come onto the network," Rosenblum says, "routers exploit overlaps in interest among subscriptions and merge them so that only the most general subscription characterizing all subscriber interests at the edge router propagates upstream to the publisher."

As Jon says, that's a mouthful, but a picture makes its all quite a bit clearer. This diagram, which is from the PreCache site, is a graphical view of what PreCache's NetInjector system does.

If I removed some of the labels and asked you what it was, you might say it was a diagram of a multicast network. Move Networks, which I wrote about last week might claim that it was a diagram of their system for moving large files. Akamai might claim that it was a representation of what they're doing. Conceptually, all of these problems are related: How do we efficiently, reliably, and quickly move data around a network which was designed more for peer to peer transactions than it was things that more closely resemble broadcast?

This problem isn't just about RSS. Publish/subscribe is one of the common use cases for message-oriented middleware software like MQ Series, JMS, and even Jabber. This problem should be of great interest to Web Services vendors as well since the publish/subscribe model is an important transport mechanism for some service delivery scenarios. None, of these products, as far as I know, is optimized for the case where you have a few publishers and millions of subscribers spread out all over the Internet. Where they're being used now, it seems that the number of publishers and subscribers is more balanced. Solutions like the one envisioned by PreCache will be needed to make it all work.