Microformats: Paving the Cowpaths

Phil Windley // Mon Jul 11 11:12:00 2005

Long ago, Jon Udell introduced me to the idea of using class attributes in HTML tags to add semantic information to Web content. The goal is a simpler way to make searching better without the overhead of hiring a team of librarians to create an ontology for every new effort. I've long used a bookmarklet to create quotes for my blog so that the date they were referenced and their URI were captured in the HTML for eventual searching. I've done the same thing with code snippets, putting class="code" as an attribute to the <pre> tag that surrounds the code.

Friday, while I was in Palo Alto, I met Tantek Celik and he started going off on microformats. From the Microformats website:

[M]icroformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns (e.g. XHTML, blogging).
From microformats | About microformats
Referenced Mon Jul 11 2005 11:23:22 GMT-0600 (MDT)

The idea is to mark-up human readable documents with additional, semantic information. Here's an example. If you use OS X, you're probably familiar with the iCalendar format, the lingua franca of iCal. iCalendar is not XML based. You could imagine an effort to create an XML-based version of iCalendar (in fact they exist), but it's not clear you're going to get much traction. Instead, the microformats folk create ways of translating iCalendar into HTML in a way that is lossless. That is, you can translate it back into iCalendar if you need to.

Here's an example of an appointment expressed in iCalendar format:

BEGIN:VCALENDAR
PRODID:-//XYZproduct//EN
VERSION:2.0
BEGIN:VEVENT
URL:http://www.web2con.com/
DTSTART:20051005
DTEND:20051007
SUMMARY:Web 2.0 Conference
LOCATION:Argent Hotel, San Francisco, CA
END:VEVENT
END:VCALENDAR

The hCalendar spec shows how this can be encoded as HTML that also contains the semantic information (as described by the spec) necessary to interpret it as an appointment:

<span class="vevent">
 <a class="url" href="http://www.web2con.com/">
  <span class="summary">Web 2.0 Conference</span>:
  <abbr class="dtstart" title="20051005">October 5</abbr>-
  <abbr class="dtend" title="20051007">7</abbr>,
 at the <span class="location">Argent Hotel, San Francisco, CA</span>
 </a>
</span>

This encoding displays as follows in a browser:

Web 2.0 Conference: October 5- 7, at the Argent Hotel, San Francisco, CA

One format that's readable by machines and humans. Pretty cool. In addition to hCalendar, for marking up XHTML with calendar information, the Microformats organization also has specifications for the following:

hCard - People and Organizations
hCalendar - Calendars and Events
VoteLinks, hReview - Opinions, Ratings and Reviews
XFN - Social Networks
relLicense - Licenses:
relTag - Tags, Keywords, Categories
XOXO - Lists and Outlines

Adam Rifkin has referred to this approach as paving the cowpaths. I like that image because it's how things mostly get done.

Tantek said something provocative on Friday to the effect that RSS and XHTML are the only wins XML has had on the Internet. XML is very useful within the firewall, but outside the firewall, XML hasn't really had that big a splash. I'm not sure I'd conclude from that that it never will--there's just too much structured data that needs to be shared. I am willing to concede, however, that things like hCalendar and hCard make a lot more sense to me than trying to get the whole world to adapt to some new XML standard and then build XSLT stylesheets and the like to get them on the Web.

This might not seem obvious. After all, if I've got to get someone to move to a new standard, why not move them all the way to a sparkly, fresh XML standard instead of just adding some tags to HTML? The answer is that I can adopt the hCalendar format in a few minutes, even hand generate it, and its useful without any further infrastructure because I've already got a browser. Add that to the power of an XQueryable RSS cache like the one Jon Udell has or the one I'm working on in Scheme and now you could get value out of it with no programming at all. Cool, huh?

The point that the microformats people are making is not that XML isn't useful, just that its power is best harnessed through incremental additions to XHTML that people can immediately use inside their browser rather than with some brand new infrastructure.

Microformats are not a lot different, in philosophy, than REST. REST is another "pave the cowpaths" effort that catches on because it's so much easier to use. In fact, if you read the following set of principles from the Microformats wiki, you'd find most of them apply to RESTful Web services as well:

solve a specific problem
start as simple as possible
design for humans first, machines second
reuse building blocks from widely adopted standards
modularity / embeddability
enable and encourage decentralized and distributed development, content, services

I like the concept. There are HTML purests who will object. In fact, there's one guy who's downright nasty about this stuff (I know I've received email from him), but I think that's really water under the bridge. HTML has never been about purity--it's been about practically solving problems. This is a practical solution to a real problem and bending (X)HTML to the task will be just one more instance of HTML proving its versatility.