Virtual Databases

One of a CIO's holy grails is integration. The primary driver is business agility. This is related to aligning IT with the business, but there's a speed component. The marketing and manufacturing departments don't need you to be aligned with them next year, they need you aligned today and tomorrow when they change their focus and direction for what seems like the fifth time this year. Integration is one strategy for building an information infrastructure that can handle whatever is thrown at it.

This desire for an integrated infrastructure is behind much of the interest in Web Services and other, more Herculean tasks like Enterprise Application Integration, or EAI. Web Services, can be looked on as the poor man's answer to EAI or, alternately, as an iterative approach to application integration. I like to think of it as the latter. Why try to boil the ocean, when you can connect a few applications together more easily and harvest the low hanging fruit.

But what if you find that just connecting up the applications through their interfaces isn't enough? A middle ground between a lightweight Web services project and a Hoover dam-sized EAI project is something called Enterprise Information Integration, or a virtual database.

A virtual database, or federated database, provides a single, virtual interface to a collection of data sources. These data sources may live in multiple databases from multiple vendors and even be in multiple formats (relational vs hierarchical for example). To make this work, the organization deploying the virtual database creates a data model that contains the needed elements of each of the data sources being integrated and then creates a map from the existing data sources to the new data model. Using this model and map, the virtual database management system processes queries, updates, insertions, and deletions of the integrated data.

I think that data integration is the key to the integration puzzle, whether you're going to use Web services, EII, or EAI. I think we've actually never realized the chief benefit of databases. A database is commonly thought of my lay people as vast collections of valuable data. But we know better. For the most part, we use it as just a persistent portion of the program's variables. I asked these question a few months ago:

  1. How many databases under your control were started as the data foundation to a single application.
  2. How many of those ever get called on by some other application.

I'd bet the answer to these questions is "all" and "a few" in that order. This isn't surprising---most IT is done incrementally, reactively, in response to the problems of the day. Someone starts an Access database to keep track of a few things at work and in a few years the database is mission critical, living on direct attached disks, under someone's desk.

I argue in my Enabling Web Services whitepaper, there's a lot you can do to make data integratable, for very little cost, as you build web applications that use that data. This is not a full scale integration, but it cuts the ties between the database and the single application that uses it and allows other applications to start using that data as well.

The biggest hurdle in an EII project is creating the universal data model. There are two problems:

  • You have to get everyone to agree to share their data and participate in the modeling. Its more work to share than to keep your data to yourself. Its also a source of power. Sometimes, what's in the data is embarrassing.
  • You have to pay for it. As Felix Rausch, in a panel at the CIO Summit this May said: "no one's going to get money to do data architectures so it has to be dressed up in programs."

Two of the players in this field are MetaMatrix and IBM's DataJoiner. I've not used or evaluated either of these products. Other related products include offerings from places like Whamtech which create indexes of data in various data sources.

Further reading: