Relax NG

Phil Windley // Thu Jul 17 11:39:00 2003

I wanted to go to Mike Fitzgerald's talk on Relax NG last week at the OReilly Open Source Convention, but it was opposite Andy McKay's Plone talk and I needed to go to that for other reasons. I did make a note to myself to spend some time looking into it when I got back and this morning I had a few minutes to do that.

The basic syntax for XML is pretty loose, basically requiring only a sea of angle brackets, proper tag nexting, and strict matching of opening and closing tags. Of course, to really make XML useful, we need schemas to further constraint the basic XML syntax. This is the feature that makes XML a meta-markup language. Schema languages can go beyond context free grammars (CFG) to specify some context sensitive constraints, but for the most part you can think of them as context free grammars to fed into a parser. The key difference between XML parsers and parser generators like YACC or Bison is that XML parsers are interpreted---they get their grammar on the fly instead of being hard coded for one specific parsing task.

Relax NG is an alternative schema language for XML. The specs for the language were developed by the RELAX NG technical committee at OASIS between April and December 2001. One of the things I like about it is an optional compact syntax that dispenses with angle brackets for human readability. I've long argued that using XML for XML's sake is silly. Relax NG is a merging of Makato's RELAX and Clark's TREX.

The resources linked in at the end of this article will give you some detailed information, including the slides from Mike's talk, which are excellent, but I wanted to include an example Relax NG Schema to give you a feel for what it looks like. Here's the XML version of a Schema to define a library patron.

<element name="patron"
         xmnln="http://relaxng.org/ns/structure/1.0">
  <interleave>
    <element name="name"><text/></element>
    <element name="id-num"><text/></element>
    <zeroOrMore>
      <element name="book">
        <choice>
          <attribute name="isbn"/>
          <attribute name="title"/>
        </choice>
      </element>
    </zeroOrMore>
  </interleave>
</element>

This example can almost just be read out loud. A library patron record contains a name, an ID number, and a collection of zero or more books which are identified by a title or an ISBN number. The compact version of this schema is shown below.

element patron {
  element name { text }   &
  element id-num { text } &
  element book {
    (attribute isbn { text } |
     attribute title { text } )
  }*
}

I think that's even clearer. Almost anyone who's studied BNF could read this and make sense of it. That's a huge improvement over most XML schemas. The compact schema is much more readable. Humans are remarkably good at parsing things and don't typically need all the closing tags and other paraphernalia that make XML such a good language for machine to machine communication.

Relax NG isn't likely to displace the W3C's XML Schema language anytime soon, but given its readability, I think its likely to garner a large group of users. Here are some resources that I found helpful in understanding Relax NG: