Peter Yared on Building Web 2.0 Applications (OSCON 2005)


I went to Peter Yared's talk this afternoon on rapidly building Web 2.0 applications. Peter's the CTO of ActiveGrid, a company and an open source project.

Sun, J2EE, and Oracle powered Web 1.0. Web 2.0 is powered by LAMP.

In the past, we were solving impedance mismatch problems. noting talked to each other. App servers were meant to solve this (and other problems). Recently, the back-ends became standardized to jSomething. The front-end was the Web. Next (today?) is XML simplicity. Anything you want to talk to on the back end is exposed as XML over HTTP--even databases. Things have -- gotten simpler. Its hard to use the software for solving yesterday's problems to solve today's problems.

The active grid project provides a high level visualization tool for rapid application development. Graphical operations are just editing XML (BEL, XForms, XPath, etc.) The tool supports PHP, Perl, and Python. Wrapping all code as a Web service prevents scripting mayhem.

In addition, there's a backend installer that includes a Web server and database. The architectures of most Web applications depend on the deployment architectures. With Active Grid, the deployment is architecture independent.

Application flow is difficult to maintain. Active Grid uses BPEL to manage application flow. The graphical editor allows BPEL to be easily created and maintained.

If you define all your Schema as XML scheme and do all queries with Xquery, there's a single API for any datasource.

Java is overkill for simple control flow programming. Java requires a lot of overhead to handle unstructured data like XML since it's strongly typed. Java's primary selling point "write-once, run-anywhere" doesn't mean much in the LAMP/Intel world. Notice that no one says that anymore.

In the old world, inexpensive Web servers arbitrate connections to expensive applications server. That doesn't make as much sense in a world of fast 1-2 processor servers. Create a redundant array of inexpensive servers that share data and services.

Apache on Linux on commodity Intel boxes is the most optimized stack in the world. Its very fast. Use HTTP for intermachine communication.

What's missing? Process management, session replication, interface rendering, interface caching web services stack, autonomous and inter-node deployment patterns, and data caching. ActiveGrid has added those on top of Apache.

Autonomous node deployment patterns: single-node for simple non-mission-critical applications, database session for HA applications, cookie sessions for HA apps with small sessions. Inter-node deployment patterns: distributed sessions for HA application with large sessions, distributed replicated sessions for HA, fault tolerant applications, and distributed session with in place processing for HA apps with larger sessions.

Inter-cluster communications architectures require a dynamic distributed hash tables. ActiveGrid uses HTTP for this. The hash table allows machines to retrieve the session from the machine that has it (distributed sessions) or redirect the request to the machine that has it (in place processing).

Peter also talks about data caching patterns: timed pulls where each node retrieves data to be cached in a rolling manner at timed intervals, timed pulls to dedicated node, distributed RAM data caches which uses HTTP GETs to grab data from the node with the data, and in-place caching where results are cached where they hit and then broadcast.

XForms provide a way of creating a declarative user interface. On a request, the XForm can be adjusted to the role, rendered for the client, and then the data can be added. Before the data is entered, cache the customized, rendered form. Of course, a data cache can cache the data as well for a particular request. This allows smart caching of dynamic forms.