Simon St. Laurent: Office XML Formats


Simon St. Laurent is talking about the XML formats for XML. Simon is clearly excited by the advent of XML formats for Office even though he's not known as a Microsoft Fan. He cites Internet Explorer's lax support for XML as a sign that Microsoft, advertising notwithstanding, has not always been the biggest supporter of XML. The last six months have shown that Microsoft only plans to fully support XML (at least with InfoPath [nee XDocs]) in the Enterprise edition.

Word has a format called WordML. In the professional edition, you get a set of tools for editing XML documents using your own vocabulary. I was most excited about this from an enterprise standpoint, but I'm disappointed that its only available in the Enterprise edition. Simon further states that this isn't as easy as it might be, so maybe its just as well.

Excel supports SpreadsheetML. Powerpoint has not XML format. Simon says (to a big laugh) that the PowerPoint team works in California and wasn't at lunch when they all discussed the XML support. Access will support XML schema and XSLT. Frontpage will be used to generate XSLT. InfoPath is a new Office component for building and using XML based forms.

A basic knowledge of WordML is necessary to create Word XML solutions in other flavors. If users save as XML, then the resulting documents can be processed as XML outside of Word. Word's XSLT support creates a method for inserting your own vocabulary into WordML documents. When they're saved, its possible to just see data in your own XML vocabulary. This should give Adobe a run for their money on these same features.

Simon creates and then saves a Word document to show us the XML. Its pretty complex. The document is also verbose because all of the style information, meta information, and formatting information is contained inside the XML. Simon points out some odd formatting issues with WordML, but says that at least its consistent. It may not be pretty or as well designed as it could be, but its always the same and that makes it usable. Images as encoded inline as base64 strings. Unfortunately, embedded spreadsheets are treated the same way, rather than including the relevant SpreadsheetML tags.

Users can specify XSLT transforms as hooks on import and export functionality so that opening and saving documents runs them through the XSLT transform.

Excel lets you separate the spreadsheet data from the spreadsheet logic so that you can get the data as XML without all the spreadsheet information. Simon does the same thing with SpreadsheetML that he did with Word: create a document in Excel and then show us the XML. The XML in SpreadhseetML is cleaner than WordML. The formula cells have both the formula and the current value given the spreadsheet contents. That's nice for just grabbing the data. He demonstrates how you can transfer a schema to the spreadsheet by dragging and dropping and then read in an XML file that meets the schema and see the data populate the spreadsheet.

Simon calls InfoPath a "bold endeavor." InfoPath is a stronger tool for both intranet web and SOAP-based web services than HTML forms. InfoPath seems most compelling as a human-readable Web service interface. InfoPath is Javascript, CSS, and other open tools, but its been extended to the point that they're no longer open.

Simon finishes by talking a bit about OpenOffice. OpenOffice XML formats have gone through OASIS and so are more open. They also have a mark-up designed for a variety of uses. There's no support, yet, for custom XML formats. Both Microsoft and OpenOffice are using XMl to connect their applications to a wider world. Apple, with Keynote, is doing the same thing. This could be the beginning of the end of the desktop island. The harder barriers to break down will be the mindset of users and IT staff.