Getting Quotes and Comments Right

Phil Windley // Mon May 31 09:03:00 2010 // krl kynetx programming+languages

Poor decisions in language design are tough to hide because fixing them is akin to changing the API of your library. You can hide all kinds of sins below the covers, but errors in the syntax and semantics can only be fixed by confessing those sins to the world. Consider this a confession.

One of the niggling little details of Kynetx Rule Language has been comment treatment. The parser we user, Parse::RecDescent, does not include a lexer--tokens are specified as regular expressions in productions. That works fine for everything but comments. Because there's no lexer, you can't flag a token as a comment and then throw it out. There are a number of ways around that; the method I've used is to strip comments from the source using a regular expression before they go into the parser.

Comments in KRL are like those in Javascript: a double slash starts a comment and it goes to the end of the line. If I were designing KRL over again, I'd definitely rethink that because it makes stripping comments from code that contains URLs problematic. The regular expression for matching comments is not trivial. Nevertheless, Javascript-style comments are what we're stuck with for now.

The problem of stripping comments from code without stripping code was made even tougher by a poor decision I made regarding extended quotes a few years ago. (Extended quotes in KRL start with << and end with >>.) For some reason, I decided that stripping newlines from the material inside an extended quote was the right thing to do. It's not. Quoted material ought to be left alone. That's why the developer quoted it; they want it to stay the way they wrote it.

Lately there have been some problems that have caused me to dive into this and rework it. Consequently, starting with the next code release, we will no longer strip newlines from extended quotes and not remove anything--even things that look like comments--from inside them. When material in an extended quote is used for a Javascript emit, it will be emitted unchanged. When it is turned into a Javascript string we will escape quotes and newlines so that they remain in place when Javascript evaluates them.

This will normalize some things in KRL, but there may be code in applications that relies on the fact that we've been stripping newlines. In particular, Javascript won't let you split strings across lines. Since we're escaping material that gets turned into string, there shouldn't be any problem there. But emitted Javascript that hasn't been careful about this will break.

Another fallout from this change is that any Javascript comments you've been putting in your emits will remain whereas before they've been stripped. That means you shouldn't be putting company secrets in your Javascript comments since they might be seen by users. On the positive side, the fact that were now leaving newlines in the Javascript source will make single stepping through your code more fruitful.

As always, we're happy to work with you to help you through this transition. I anticipate that this code will roll out Tuesday afternoon. We'll put notifications in the appropriate places beforehand.