CouchDB from 10,000 Feet

Jan Lehnardt and Damien
Jan Lehnardt and Damien Katz
(click to enlarge)

Damien Katz and Jan Lehnardt are talking about CouchDB. My students have mentioned it several times and we've had brief discussions about it, but I've never spent much time on it. This seemed like my chance. CouchDB's goal is a simple, non-relational database.

Damien started the CouchDB project after working for a number of years on the Lotus Notes project. He loved the document model of the data store (as did a lot of other people). He wanted an open source version of that model and CouchDB was born.

In real life, most data is document centric--not relational. A business card has all the data on it. The downside is that of your job title changes, a self-contained document model doesn't update that (it's not a separate table). On the other hand, more and more documents are starting to contain references to other data (URLs) which makes up for this in some cases.

CouchDB documents are in a JSON format. If you're not familiar with JSON, it's an XML-like format for storing data, but without the angle brackets. It's easier for people to read and write. It's not a substitute for XML, but it's great when just just need simple structured data. JSON is widely supported.

CouchDB uses an HTTP API. This allows CouchDB to make use of existing caches, load balancers, and analyzers. You can use curl to drive CouchDB from the command line or HTTP libraries for various languages to use it.

CouchDB views allow you to filter, collate, and aggregate data. Views are powered by Map/Reduce. The map stage processes key/value pairs to produce intermediate values and reduce then combines intermediate values for particular key. Map/Reduce is inherently parallelizable making it useful on clusters of machines.

CouchDB is designed to be easily replicated and supports synchronizing machines.

Disks are getting cheaper and machines are being built with more and more cores. That makes a model like CouchDB uses very appealing. CouchDB is written in Erlang and provides a non-locking MVCC and ACID compliant data store.

There are some bonus features: Lucene is integrated for fulltext search and CouchDB also provide JSON searching using JSearch, a wrapper on Lucene for JSON structures.

CouchDB has been accepted for incubation as an Apache project and uses the Apache license.