Tim Bray on Atom (ETech 2006)


Tim Bray is speaking on Atom as a case study. RSS is the most successful use of XML in existence. If it's that successful, why replace it?

Tim outlines some problems with RSS as specified:

The RSS specification says "one only", but many podcasts use multiple enclosures. Clients vary unpredictably in how they support them.

There is silent data loss. In a title element doing AT&T or AT&T or fails silently. The only predictable way to do it is AT&T and that just sucks.

Links sometimes don't work. In an RSS <description>, putting a link to an image doesn't work with relative paths. You have to have absolute paths.

IRIs (international resource identifiers) cause problems. The RSS spec says you can't have anything but ASCII.

The RSS related APIs (MetaWeblog and Blogger APIs) are under specified, under secured, poorly debugged, offer little interoperability, and omit many important authoring features.

HTML has been successively and successfully revised over the years. HTTP and XML are other examples. RSS is fixed and can't be changed. The RSS roadmap suggests that big projects be done in a separate project with a different name. So, if you want to fix them, that's what you do.

There's also a "syndication culture" Tim launches a slide show that features the Cry of the Valkaries as the background music and various pictures of people fighting, nuclear bombs, wars, and so on. "At the end of the day, is this just kittens fighting under the toilet?"

The Atom specification requires over 17,000 email messages to debug. Why so many? These are hard problems and take a lot of work to get right.

Atom is going through the IETF process. All decisions are made by email/wiki. Consensus is decreed by the chair and may be appealed. The IETF process is long and hard, but prevents any one person from taking control of the spec and using it to their own end.

Atom is like RSS2, except that feeds and entries must have unique IDs, timestamps, and human readable labels. This is important when feeds are aggregated to form other feeds. Text can be provided in plain, HTML, or XHTML, with clear signaling. Atom can provide both summary and full-content. Namespaces and extensibility are clean because of the "must ignore" rule.

Atom is the general purpose collection idiom that XML has never had before. XML has been about trees, not collections. There are lots of hacks on how to do collections, but no standards. Atom can be that standard.

The Atom publishing protocol uses a trick to avoid the problems WebDAV has had with clients having to know about and manage the URL space on the server. In Atom, the clients gives the feed something to POST and the server returns the URL where it put the post in the HTTP response header. Good idea.

The Atom publish protocol is the missing infrastructure link in making the Web writable by everyone.

If you're doing blogs and news feeds, RSS2 is good enough. "RSS2 won't be displaced. It's going to be with us for a long, long time." If you're doing more technical feeds where there's a premium on not losing data, then you need Atom. Implementors should probably support reading RSS2 and Atom 1.0.

The protocol is a different story--potentially a game changer.