Domain Specific Languages


Recently, I've been designing a domain specific language for Kynetx, the start-up I'm working on. When you tell someone you're designing a language, the usual reaction is incredulity. "Why would you design your own language?!?!" they say. I'm here to tell you why.

First, let me say that I'm a big believer in notation. Using the right notation to describe and think about a problem is a powerful tool--one that we're too eager to give up it seems. People seem to believe that (a) all languages are pretty much the same and (b) the world has enough notations. While (a) is true in theory (they're all Turing complete, after all) the power of a notation isn't in what it can accomplish, but the ways in which it allows you to think. I'll deal with (b) in what follows.

There are basically three myths that lead people to avoid language design.

  1. It's too hard--the domain of experts
  2. GUIs are better and more approachable than notations
  3. A general purpose langauge (GPL) is a proper tool for solving domain specific problems

The first argument is simply not true. I teach dozens of undergraduates a year how to design and implement programming languages. Parsing is well understood and good parser tools exists for almost all modern programming languages. Building an interpreter for a parse tree is no harder than building any other program of comparable size. There is skill involved, to be sure, but anyone who is a competent programmer can build an interpreter for a small to medium sized programming language.

So, if building interpreters for programming languages is within the reach of most programmers, why don't more do it? There are basically two ways to tells a computer what to do: a GUI or a language. They both have their advantages. Drag and drop is great for moving one file from one directory to another. When I want to systematically rename hundreds of files, a language is the clear winner. Still, the GUI has become so much a part of everyday computing experience that we often assume that it's user-friendly nature trumps all other needs.

I'm in the middle of reading Walter Isaacson's new biography of Einstein. It's clear that notation played a major role in his ability to come up with the principle of general relativity. He demurred at first, believing that the math was for someone else to come along later and tidy up. But later in his life, after the experience of working on general relativity, became an ardent convert.

Similarly, there is power in notation for computing tasks. Not merely the advantage of parameterized execution, as I gave above in the file moving example, but in it's ability to allow us to think about problems, express them so that other's can clearly and unambiguously see our thoughts, and collaborate to create joint solutions. What's more, languages can be versioned. GUI configurations are hard to version. Programming languages have advantages even when they're not executed.

The DSL becomes the focal point for design activities. The other day, I was having a discussion with three friends about a particular feature. Pulling out pencil and paper and writing what the DSL would need to look like to support the feature helped all of us focus and come up with solutions. Without such a tool, I'm not sure how we would have communicated the issues or whether we'd have all had the same conception of them and the ultimate solution we reached.

Over and over in my career I've seen that when you give people a language instead of a GUI, they come up solutions that you didn't even know were possible. Language expands the range of solutions while GUIs usually limit it to the vision of the creator.

The final argument--that GPLs are capable of doing anything a domain specific language (DSL) can is true, but misses the point. For example, when I've explained to people--capable computer scientists--what I'm doing, the first reaction is "why don't you use X?" where X could be BEPL, Javascript, etc. There are several reasons:

  • The abstractions aren't right and nothing has the right mix of abstractions. I need a rule language focused on Web pages with event triggers and data source integration taken for granted. BEPL isn't that, although it could be shoe-horned in. Notation designed to increase the power of thought can't be shoe-horned (and shouldn't have too many angle brackets).
  • GPLs are, by definition, general. They are designed to solve a wide host of problems outside the domain I care about. As a result, expressing something I can put in a line of code in a DSL takes a dozen in most GPLs. I can build very tight and expressive abstractions for a single domain that would not be appropriate for a GPL.

To be fair, there are trade-offs. When you develop a DSL, you lose the ability to leverage the checkers, IDEs, and other tools that exist for GPLs. This is offset by the increase in expressive power and the relatively small size of most DSLs. You can get away with less--especially at first while you're using the language more for clarifying your thoughts.

Do I anticipate that wonks will write programs in my DSL? Wonks will have a GUI, or maybe a 2x2 matrix--the universal tool of the MBA. In fact, I view my DSL as an intermediate form--something that is generated by one part of the system for use by another. But in the meantime, I've found incredible freedom, design leverage, and expressiveness in designing and implementing my DSL.