« June 2010 | Main | August 2010 »

July 28, 2010

Remembering in KRL: Using Entity Variables with Forms

Dan asked a question in the Kynetx Developer Exchange about remembering user entered data in KRL. I gave him a brief outline of the solution but thought an example would be nice. This blog post is the detailed answer to how to gather, remember, and use user-supplied data.

The basic idea is to store the data in an entity variable. Trails (a type of persistent variable) are the most appropriate type of entity variable to use. The ruleset pattern has four rules: initialize, send the form, process the form, use the data. The actual ruleset has five because I added one to delete the user data since it makes testing much more convenient. I'll go over each rule in order.

Forget: The first rule just clears the entity variable ent:name when you visit a particular web page. You could, of course, do this under user control with a form submission or something, but this is easy enough and suits my purpose for testing.

rule clear_name is active {
   select when web pageview "www.foobar.com" 
   noop();
   always {
      clear ent:name;
      last
   }
 }

Note that if the rule is selected (the page URL matches) then the entity variable is cleared and this is the last rule executed in this ruleset.

Initialize: I like initializing the area of the page I'm going to write or modify and then do the modification in later rules since I can have different rules do different things to the area as needed. So the initialization rule just puts up an empty notification box with a div named #my_div that we'll use in late rules.

rule initialize is active {
  select when pageview ".*"
   pre {   
    blank_div = <<
<div id="my_div">
</div>
    >>;
  }
  notify("Hello Example", blank_div)
      with sticky=true;
}

Note: for all the rules in this ruleset that are selected by web pageviews, I've made the URL pattern as general as possible (.*). In a real ruleset, these would likely be much more restricted.

Send the Form: This rule puts the form into the div we initialized in the last rule if the entity variable ent:name is empty (we could probably add a primitive test that makes this easier). The rule also sets a watcher on that form. If this rule fires (i.e. the rule is selected and the action premise is true) then this will be the last rule executed in this ruleset.

rule set_form is active {
  select when pageview ".*"
  pre {   
    a_form = <<
<form id="my_form" onsubmit="return false">
<input type="text" name="first"/>
<input type="text" name="last"/>
<input type="submit" value="Submit" />
</form>
    >>;
  }
  if(not seen ".*" in ent:name)  then {
    append("#my_div", a_form);
    watch("#my_form", "submit");
  }
  fired {
     last;
  }
}

When this rule fires, you get a notification box that looks like this:

Process the form: We need a rule to process the form. This rule is selected when the form is submitted. The rule doesn't have an action (i.e. the action is noop();). All the real work is done in the postlude where we store the name in the entity variable ent:name and then raise an explicit event that will cause another rule to be selected.

rule respond_submit is active {
  select when web submit "#my_form"
  pre {
     first = page:param("first");
     last = page:param("last");
  }
  noop();
  fired {
     mark ent:name with first + " " + last;
     raise explicit event got_name
  }
} 

Use the data: The final rule uses the data in the entity variable to put the user's name in the div we placed in the initialization rule. This rule has two selection conditions. It can be selected on a pageview like the other rules or when an explicit event named got_name is raised. Remember that the previous rule raises that event when it's done. The action replaces the contents of the div having the ID my_div with a hello message that includes the name.

rule replace_with_name is active {
   select when explicit got_name 
            or web pageview ".*" 
   pre {
     name = current ent:name;
   }
   replace_inner("#my_div", "Hello #{name}");
}

When this rule fires, the notification box looks like this:

hello box

Whenever this ruleset is selected in the future the user will not see the form but simply see this box because the system remembers the name in the entity variable and has no need to ask for it again. If the data gets cleared, then the user is prompted to enter it again.

This ruleset also shows the use of explicit events. If we don't raise the explicit event got_name in the rule that stores the name, nothing will replace the contents of the notification box and show the user that the form submission is successful. We could have done it in that rule, but then we'd have two rules replacing the contents with a message and if the message changed we'd have to make sure we changed it in two places. This technique allows us to have one rule responsible for putting the hello message in the div. We just fire it under two different circumstances.

Update: Here's a video that shows this ruleset in action:

2:33 PM | Comments () | Recommend This | Print This

July 26, 2010

Changing the News with Kynetx: Utah Company WILL Revolutionize Internet Use

Yesterday Tom Harvey of the Salt Lake Tribune had a great article on Kynetx in the Money section (see Utah company looks to revolutionize Internet use). I loved the article, but what I loved even more was what Casey Holgado (@caseyholgado), one of our summer interns, did with it. Here's the headline for the online article as it appeared:

SLTrib article before the application of Kynetx

Casey wrote a quick Kynetx app that changed the headline to read like this:

SLTrib article after the application of Kynetx

This to me embodies the spirit of Kynetx: "the Web, your way." We didn't change the text on the server, of course, only the headline for people who have the app installed in their browser. But that's the point: if you want this headline to read our way, then you install the app. Everyone else gets the default view. Welcome to Kynetx, where the user is in control.

12:08 PM | Comments () | Recommend This | Print This

July 15, 2010

Using ANTLR and PerlXS to Generate a Parser

As I mentioned earlier, we're anticipating changing out the current Parse::RecDescent based parser in the Kynetx platform with one that will perform better. We've been going down the path of using ANTLR, a modern parser generator that supports multiple target languages. That flexibility was one of the key thing that got us interested in ANTLR. We might want to generate Ruby or Javascript KRL generators at some point.

But of course right now we want to generate a Perl parser since that's what the underlying Kynetx Event Service (KES) is written in (it's an Apache module). ANTLR doesn't support Perl. That's probably just as well however since we're after as much speed as possible. We could generate Java (the target best supported by ANTLR) but adding Java servers into the current operational mix doesn't excite me.

The obvious course is to generate C and then use PerlXS to integrate the resulting parser into the Perl-based KES engine. To explore the feasibility of that, I decided to play around with ANTLR generated C parsers and PerlXS to see how they'd work. What follows is an intermediate report of what I found.

I started with a SimpleCalc example that is part of the five minute introduction to ANTLR. The grammar file is unchanged from that example:

grammar SimpleCalc;

options
{
    language=C;
    output=AST;
    ASTLabelType=pANTLR3_BASE_TREE;

}

tokens {
	PLUS 	= '+' ;
	MINUS	= '-' ;
	MULT	= '*' ;
	DIV	= '/' ;
}


/* PARSER RULES */
expr	: term ( ( PLUS | MINUS )^  term )*;
term	: factor ( ( MULT | DIV )^ factor )* ;
factor	: NUMBER ;


/* LEXER RULES */
NUMBER	: (DIGIT)+ ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ 	
     { $channel = HIDDEN; } ;
fragment DIGIT	: '0'..'9' ;

The only difference is that I've told it to generate an AST and annotated the grammar (with ^) to tell it which tokens are tree nodes.

I used h2xs to generate the boilerplate xs files:

h2xs -A -n SC

This creates a directory called SC and a punch of files for PerlXS. That's where I put all the generated files from ANTLR. If you look at the C version in the ANTLR introduction, you'll see an @members declaration that contains some C code that exercises the parser. That's what I modified to put into the PerlXS file:

#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include "SimpleCalcLexer.h"
#include "SimpleCalcParser.h"

#include "ppport.h"

MODULE = SC		PACKAGE = SC		

char *
showtree(in)
         char * in
    CODE:

    pANTLR3_INPUT_STREAM           input;
    pSimpleCalcLexer               lex;
    pANTLR3_COMMON_TOKEN_STREAM    tokens;
    pSimpleCalcParser              parser;
    SimpleCalcParser_expr_return     langAST;

    char * output;

    input = antlr3NewAsciiStringInPlaceStream 
            (in,
            (ANTLR3_UINT32) strlen(in), 
            NULL); 
    lex    = SimpleCalcLexerNew(input);
    tokens = antlr3CommonTokenStreamSourceNew
               (ANTLR3_SIZE_HINT, 
		TOKENSOURCE(lex));
    parser = SimpleCalcParserNew(tokens);

    langAST = parser->expr(parser);

    output = langAST.tree->toStringTree(langAST.tree)->chars;

    // Must manually clean up
    //
    parser ->free(parser);
    tokens ->free(tokens);
    lex    ->free(lex);

    RETVAL = output;

    OUTPUT: 
      RETVAL

This defines a function called showtree that will be called from Perl. The file also includes the .h files that ANTLR generated and uses the input string inplace instead of reading a file as the example did. The return value (denoted by the special identifier RETVAL) is just a string representation of the parse tree.

The Makefile.PL file for PerlXS is pretty standard:

use 5.010000;
use ExtUtils::MakeMaker;
WriteMakefile(
    NAME              => 'SC',
    VERSION_FROM      => 'lib/SC.pm', 
    PREREQ_PM         => {}, 
    ($] >= 5.005 ?     
      (ABSTRACT_FROM  => 'lib/SC.pm', 
       AUTHOR         => 'Web-san ') : ()),
    LIBS              => ['-lantlr3c'], 
    DEFINE            => '', 
    INC               => '-I.', 
    # link all the C files too
    OBJECT            => '$(O_FILES)',
);

You'll notice I link the ANTLR library in here. Since the xs file references a string output, I created a typemap file to map that for PerlXS:

TYPEMAP
char * T_PV

Now, we compile the xs files in the standard way:

perl Makefile.PL
make

The Perl file is pretty simple as well:

#!/usr/bin/perl -w

use ExtUtils::testlib;   # adds blib/* directories to @INC
use SC;
print SC::showtree("3 + 4 * 5"), "\n";

Executing this program prints a prefix representation of the arithmetic expression passed into the showtree function.

Of course, this isn't what we want for our system. We want a full fledged AST back that we can manipulate in Perl. I spent a little time on typemaps and have reached the conclusion that the right method is to use an ANTLR generated treeparser (a parser for the AST) to walk the tree and create a tree that is more like what we are used to in the KES engine and use typemap to turn that into Perl.

So, it would appear that using ANTLR to generate a C-based parser and then using PerlXS to wrap that for use in Perl is feasible. As we figure out the AST output, I'll write more.

12:10 PM | Comments () | Recommend This | Print This

July 13, 2010

Parsing for a Cloud-Based Programming Language

Kynetx

If you follow my blog, you're probably all too aware that Kynetx provides a cloud-based execution engine for a programming language called the Kynetx Rule Language, or KRL. The execution engine is a big, programmable event-loop in the cloud and KRL is the language that is used to program it. As with any language, the first step in executing it is to parse it into an abstract syntax tree.

The KRL parser has had it's ups and downs. A while ago, I reported on changing out the HOP parser for Damian Conway's RecDescent (note that the system is written in Perl). The KRL parser is also what promted me, a long time ago, to write a large set of tests for the engine code, including the parser.

When you're building a parser for a language to be compiled on the developer's machine, performance is an issue to be sure, but you are saved from problems you encounter when the parser is the first step in a service that needs to run quick. The most important of these is to make sure you cache the abstract syntax tree so that you only have to parse things when they change.

Whenever there's a cache miss the KRL parsing process basically works like this:

  1. Retrieve the KRL source for the ruleset from the rule repository.
  2. Parse the ruleset
  3. Optimize the ruleset
  4. Compile event expressions into state machines
  5. Cache the result

The optimization does things like perform dependency analysis on variables to move them outside loops and so on.

The caching presents it's own problems. Flushing a single ruleset from the cache when it changes is a simple enough procedure. But when the abstract syntax tree (AST) changes, they all need to be flushed. My preference is to do this in an emergent way so that we're not dependent on a step in the code deployment process that might get forgotten or messed up. So, when a ruleset is optimized, an AST version number is placed in the AST. When the engine retrieves a ruleset from the cache, it checks the AST version number against the current number and if it's old, automatically flushes the ruleset and proceeds with the process given above to recreate the AST.

So far, so good. When I make a change to the parser that would change the format of the AST, I update the version number and when the code is deployed, the rulesets will automatically flush and get regenerated. This is, unfortunately, where my intuition failed me.

You can imagine that the engine is a number of individual servers behind a load balancer. When multiple requests come in they are given to various server to process. Like most people, I've been conditioned by the physical world to expect guassian distributions around an average, but intuition developed around guassian distributions often leads to poor decisions in the world of computers where distributions are much more likely to follow a power curve.

Here's what happened the last time I made a change to the AST. We deployed code and immediately the system stopped responding to requests. On further examination, we discovered that the machines we all busy parsing--the same ruleset. Because there are some rulesets that are much more likely to be executed than others, the first requests seen by the new code were, for the most part, all for the same ruleset and every machine was busy parsing. Worse, the requests started to stack up cause the machines to thrash. Everything stopped working.

The answer to this problem has two parts:

  • The parser needs to be faster--especially for large rulesets. In the short term we'll do some point imporvements to the existing Perl-based parser. In the long term we'll replace it with a C-based parser.
  • The rule retieval process detailed above needs to use semaphores so that only one parsing task for any given ruleset is launched at one time.

We've put the short term fixes and the semaphore in place and things are looking pretty good so far. We're partly down this road already and the fact that we have good specifications of what the parser has to do and tests to check it out thoroughly, I'm not worried about the change.

I used the cache as an interprocess commmunication system to create the semaphores. It's not strickly speaking atomic, but it's close enough and was dirt simple to implement.

3:25 PM | Comments () | Recommend This | Print This

July 1, 2010

iPhone 4 Underwhelmed Me

iPhones

I know, they may take away my Apple Fanboy membership card. But the iPhone 4 that came yesterday hasn't blown me away. Here's my initial thoughts:

  • The camera is better and the flash is nice. But it's still a phone camera, not a real camera.
  • The screen is much clearer than my iPhone 3Gs, but not so much that it enables cool new things. .
  • The glass back looks like one more thing to break.
  • Facetime was fun the one time I did it. I'd probably do it a lot with my kids if they were little. Now, not so much.
  • It doesn't seem any snappier, although it might be technically faster

Part of the problem is that most of the really cool stuff is in the OS and I already had iOS 4 on my 3Gs. That makes the hardware less stunning. Bottom line: I like looking at it and marveling at it's beauty, but my life hasn't changed.

Buying advice: if you're on a first or second generation iPhone, upgrade--it's worth it. If you've got a 3Gs, be sure you really have to have it before you order.

3:07 PM | Comments () | Recommend This | Print This

Scaling the Cloud

Sebastian Stadil

This week's Technometria podcast is with Sebastian Stadil of Scalr. We talked about how to scale cloud computing. That may seem like a boring topic--after all, isn't the cloud supposed to be scalable? That's one of it's basic qualities, right? True, but it turns out that while scaling services via the cloud is relatively easy--if they've been architected right, managing that scale can produce some real problems.

Scalr is both a company and an open source project. We not only talk about the problems and solution of scaling cloud computing, but also how this hybrid business model works for a service. I enjoyed getting to know Sebastian and I hope you do too.

8:07 AM | Comments () | Recommend This | Print This