If you've been involved in XML and particularly XML databases, you probably already know this, but since I'm just discovering, I've got to say that XQuery is cool. I've used XPath for a while now and it's nice, but expected given what we all know about paths in file systems. XQuery goes beyond (literally extends) XPath to provide sophisticated query capabilities for XML documents. It's imperative rather than declarative, since XML doesn't have the nice set-theoretic properties of relational databases, but it's still quite capable.

Suppose, for example, that I had an XML document representing configuration information for RSS feeds that looks like so:

<config>
  <feed>http://www.identityblog.com/rss.xml</feed>
  <enabled>true</enabled>
  <type>rss2.0</type>
</config>
<config>
  <feed>http://partners.userland.com/people/docSearls.xml</feed>
  <enabled>true</enabled>
  <type>rss2.0</type>
</config>
<config>
  <feed>/rss.xml</feed>
  <enabled>true</enabled>
  <type>rss2.0</type>
</config>

From this configuration, a program grabs each of these feeds and stores the channel information for them in another XML document like so:

<channel>
  <feed>/rss.xml</feed>
  <title>Phil Windley's Technometria</title>
  <link>/</link>
  <description>Organizations Get the IT They 
          Deserve</description>
</channel>
<channel>
  <feed>http://partners.userland.com/people/docSearls.xml</feed>
  <title>The Doc Searls Weblog</title>
  <link>http://doc.weblogs.com/</link>
  <description>The Continuing End of Business As 
        Usual</description>
</channel>
<channel>
  <feed>http://www.identityblog.com/rss.xml</feed>
  <title>Kim Cameron's Identity Weblog</title>
  <link>http://www.identityblog.com/</link>
  <description>Let's talk about identity in a virtualizing
               world.</description>
</channel>

Later, in a Web application, I want to use the configuration file information (which feeds are enabled) to print a list of the titles information from the RSS feeds. This little bit of XQuery does that:

query "for $config in
 collection('pms-config.dbxml')/config[enabled = 'true']
 let $name :=
   collection('rss-feeds.dbxml')/channel[feed = $config/feed]
 return <li><b>{$name/title/text()}</b> -
              {$name/description/text()}</li>"

This returns the list:

  • Kim Cameron's Identity Weblog - Let's talk about identity in a virtualizing world.
  • The Doc Searls Weblog - The Continuing End of Business As Usual
  • Phil Windley's Technometria - Organizations Get the IT They Deserve

If we wanted list in alphabetical order, we could use this command, with an order by clause:

query "for $config in
 collection('pms-config.dbxml')/config[enabled = 'true']
 let $name :=
   collection('rss-feeds.dbxml')/channel[feed = $config/feed]
 order by $name/title           
 return
    <li><b>{$name/title/text()}</b> -
             {$name/description/text()}</li>"

This represents the general form of an XQuery expression, which is abbreviated FLWOR for "for," "let," "where," "order by," and "return." My query doesn't contain a where clause, but suppose, I was keeping track of how many items I'd seen from each RSS feed in the RSS feed document. I could add a where clause to only select those channels with more than 10 items in the database by adding one like so:

query "for $config in
 collection('pms-config.dbxml')/config[enabled = 'true']
 let $name :=
     collection('rss-feeds.dbxml')/channel[feed = $config/feed]
 let $itemCount := 
        collection('rss-feeds.dbxml')/channel/itemCount
 where fn:count($itemCount) > 10
 order by $name/title           
 return
     <li><b>{$name/title/text()}</b> -
              {$name/description/text()}</li>"

Note that this last query hasn't actually been tested since I don't have the item count in my data; all the other's have been run against actual documents

Overall, a very capable language for grabbing data out of XML documents and manipulating it. As with any data query language, there's always the design question of how much you do in the data query language and how much you do in the application doing the query. My personal feeling is that I'll let the query language handle as much of the chore as possible without embedding business logic into the query itself. I think from a design standpoint, the business logic ought to be kept together.


Please leave comments using the Hypothes.is sidebar.

Last modified: Thu Oct 10 12:47:19 2019.