nodeStorage and the Personal Cloud Application Architecture

Dave Winer just released software called nodeStorage along with a sample application called MacWrite. Dave's been working on these ideas for a long time and it's fun to watch it all coming together.

Dave's stated goal is support for browser-based applications, something near and dear to my heart. nodeStorage provides three important things that every app needs: In Dave's words:

nodeStorage builds on three technologies: Node.js for the runtime, Twitter for identity and Amazon S3 for storage.

This makes it easy to build applications by handling three big things that developers would otherwise have to worry about.

This idea is similar to my Personal Cloud Application Architecture (PCAA). The biggest difference is that PCAA isn't just solving the backend problem for developers, but proposing that the right way to do it is by using the application user's backend. Not only don't developers have to build the backend, they don't have to run it either! And the user gets to keep their data in their own space. Traditional apps do this:

standard_web_architecture

A PCAA app separates the app from the backend like so:

unhosted_web_architecture

nodeStorage does this too. The only question is who runs the application data cloud. As far as I can see, there's nothing in Dave's proposal that would keep nodeStorage to be used with something like a Freedom Box, Johannes Ernst's UBOS linux distro, or any other indieweb project to so that users can run their own backend for apps.

Dave's solving developer pain and is taking an important step down the path toward solving some user pain. Sounds like a strategy for adoption.


Re-imagining Decentralized and Distributed

I teach a course at BYU every year called "Large Scale Distributed Systems." As I discuss distributed systems with the class, there is always a bit of a terminology issue I have. It has to do with how we think of distributed systems vs. decentralized systems. You often see this diagram floating around the net:

centralised-decentralised-distributed

This always feels like an attempt is to place the ideas of centralized, decentralized, and distributed computing on some kind of continuum.

In his PhD dissertation, Extending the REpresentational State Transfer (REST) Architectural Style for Decentralized Systems (PDF), Rohit Khare makes a distinction about decentralized systems that has always felt right to me. Rohit uses "decentralized" to distinguish systems that are under the control of different entities and thus can't be coordinated by fiat.

Plenty of systems are distributed that are still under the control of a single entity. Almost any large Web 2.0 service will be hosted from different data centers, for example. What distinguishes the Internet, SMTP, and other distributed systems is that they are also made to work across organizational boundaries. There's no centerpoint that controls everything.

Consequently, I propose a new way of thinking about this that gives up on the linearity of graphics like the one above and resorts to that most powerful of all analytic tools, the 2x2 matrix:

system_type_2x2

In this conceptualization, we classify systems along two axes:

  • Whether the components are co-located or distributed. This could be either physical or logical depending on the context and level of abstractions.
  • Whether the components are under the control of a single entity or multiple entities. A central control point could be logical or abstract so long as it is able to effectively coordinate nodes in the system.

We could envision a third axis on the model that also classifies systems as to whether they are hierarchical or heterarchical like so:

3 axes

If you're having trouble with the distinction, note that DNS is a decentralized, hierarchical system where as Facebook's OpenGraph is a centralized, heterarchical system.

I like this model and so, for now, I'm sticking with it and starting to think of and describe systems in this way. I've gotten some mental leverage out of it. I'd love to know what you think.


Rethinking Ruleset Registration in KRL

Updated January 21, 2014, 11:15am to add additional unresolved issues.

URL

Since it's inception, KRL was meant to be a language of the Internet. This was something of an experiment. Firstly, as an Internet language, all processing is in the cloud. That is, it's PaaS-only model; you can't run it from the command line. Secondly, programs would be identified by URL.

This has, for the most part, worked pretty well. But as I move to a model where multiple KRL rule engines (KREs) are running in Docker instances around the Internet, there's one early design decision that has caused some problems: ruleset registration.

URLs are long, so we created a registry where a ruleset identifier, or RID, could be mapped to the URL. This meant that KRL programs could refer to rulesets by a relatively short ID instead of a long URL. So, you'll see KRL code that looks like this:

ruleset example {
  meta {
    name "My Example Ruleset"

    use module a16x8 alias math
    use module b57x15
  }

  rule flip {
    select when echo hello
    pre {
      x = math:greatCicleDistance(56);
      y = b57x15:another_function("hello")
    }
    send_directive("hello world");
    always {
      raise notification event status for a16x69
        with dist = x
    }
  }
}

Note that we're using two modules identified by RID, a16x8 and b57x15 respectively. In the first case we gave it an alias to make the code easier to read. In the explicit event raise that happens in the rule's postlude, we raise the event for a specific ruleset by ID, a16x69 in this case. This doesn't happen often, but it's an optimization that KRL allows. When the rule engine runs across a RID, it looks it up in the registry and loads the code at the associated URL (if it's not cached).

The problem with a fixed registry is that each instance of KRE is running it's own registry. No problems there unless we want them to all be able to run the same program, say Fuse. The Fuse rulesets refer to each other by RID. That means that they need to have the same RID on every instance of KRE. An ugly synchronization problem.

Another solution would be to create a global registry, but that's just another piece of infrastructure to run that will go down and cause reliability problems. If KRL is a language of the Internet, then it ought not be subject to single points of failure.

I've determined the real solution is to go back to the root idea and simply use URLs, with in-ruleset aliases, as the ruleset identifier. So the proceeding code might become this:

ruleset example {
  meta {
    name "My Example Ruleset"

    use module https://s3.amazonaws.com/my_rulesets/math.krl alias math
    use module https://example.com/rulesets/transcode.krl alias transcode
    use rid notify for https://windley.com/rulesets/notification.krl
  }

  rule flip {
    select when echo hello
    pre {
      x = math:greatCicleDistance(56);
      y = transcode:another_function("hello")
    }
    send_directive("hello world");
    always {
      raise notification event status for notify
        with dist = x
    }
  }
}

Note that in the case of modules, we've simply replaced the RID with a URL and used the existing alias mechanism to provide a convenient handle. In the case of the event being raised to a specific ruleset, we don't necessarily want to load it as a module (and incur whatever overhead that might create), so I've introduced a new pragma in the meta block to declare aliased for rids. The syntax for that isn't set in stone, this is just a proposal.

The advantage to this method is that now rulesets can live anywhere without explicit registration. And multiple instances of KRE can run the program without a central registry. The ruleset serves as a soft registry that can be changed by the programmer as needed without keeping some static structure up to date. Note: none of this changes the current security requirements for rulesets to be installed in a pico before they are run there.

There are a few problems that I've yet to work out.

  1. This method works fine for rulesets that are publicly available at a URL. But some rulesets have developer keys and secrets. And some programmers don't want to make their ruleset public for other reasons (e.g. trade secrets). With a registry, we solved this problem by supporting BASIC AUTH URLs. Since the registry hid the URL, the password wasn't exposed. That obviously won't work here.

  2. The Sky Cloud API model relies on the RID. We obviously can't substitute a URL in the URL scheme for Sky Cloud and have it be very easy to use. One solution would be to use the ruleset name (the string immediately after the keyword ruleset in the ruleset definition) for this purpose. The system could dynamically register the name with the URL for a specific pico when the ruleset is installed in that pico. The user wouldn't be able to install two rulesets with the same name. This could be a potential problem since there's no way to enforce any global uniqueness on ruleset names.

  3. When rulesets are flushed from the cache in a given instance, the current method is to put a semicolon separated list of RIDs in the flush URL. This would have to change to support a collection of URLs in the body of a post.

These are the issues I've thought of so far. I'll continue to update this as I give it more thought. I welcome your comments and especially any suggestions you have to improving this proposal.


Fuse, Kynetx, and Carvoyant

Fuse, the open-source connected-car platform I'm working on is stack of technologies that ultimately provide the total user experience. Here's one way to look at that stack:

Fuse technology stack

From bottom to top, the components of the stack are:

  1. The device, a CalAmp LMU-3030, is plugged into the vehicle and has a cellular data connection. The diagram leaves out the telephone company, but they're involved as well. The device uses data on the OBD-II port along with data from its built-in GPS to create a stream information about the vehicle that is sent to Carvoyant.
  2. Carvoyant uses a telematics server that is designed to interact with the LMU device to receive and process the data stream from the device in the vehicle. Carvoyant processes that data stream and makes it available as an API.
  3. Kynetx hosts a rules engine called KRE. KRE is a container for online persistent objects that we call "picos." Each vehicle has a pico that processes its interactions and stores data on its behalf.
  4. The Fuse API is created by the software running in the vehicle's pico.
  5. Applications (like the Fuse app) use the Fuse API to provide a user experience.

Note that the mobile app is just one of many applications that might make use of the Fuse API. For example, as shown in this diagram, not only does the mobile app use the API, but so does the Fuse Management Console and the iCal feed.

fuse model

Picos are modeling device that have significant advantages for connected things:

  • Picos can be used to model people, places, things, concepts, and so on. In Fuse, we have one for each vehicle, one representing the owner, and one representing the owner's fleet.
  • Picos are related to other picos to create useful systems. For example, in Fuse, the owner, fleet, and vehicle picos are, by default, related as shown in the following diagram.

    fuse microservice overall
  • Pico relationships are flexible. For example, a Fuse fleet can have two owners, an owner could allow a "borrower" relationship with someone borrowing the vehicle, and vehicles could have relationships with their manufacturers or service agents.
  • A vehicle pico can be moved from one fleet to another simply by changing the relationships.
  • Picos store the data for the entity they model. There's no big Fuse database with all the vehicle data in it. Each vehicle pico is responsible for keeping track of it's own persistent data.
  • As a result of the pico-based persistent data store, personal data is more readily kept private.
  • Further, the pico-based persistent data store allows data about the vehicle (e.g. its maintenance records) to be kept with the vehicle when it has a new owner.
  • Even though all the Fuse picos are currently being hosted on the Kynetx-run instance of KRE, they could be hosted anywhere. Even vehicles in the same fleet could be hosted in different KRE containers if need be. I'm working on a Docker-based KRE install that will make this easier for people who want to self-host.
  • Each pico is an independent processing object and run programs independent of other picos, even those of the same type. This means that a given vehicle pico might, for example, run an augmented API or a different set of rules for managing trips.
  • Picos have a built-in event bus that allows for multiple rules to easily interact with events from the vehicle. We've put that to great use in creating Fuse by leveraging what can be seen as a microservices architecture.

The Fuse API differs from the Carvoyant API in several significant ways:

  • Fuse is fleet-based, meaning that Fuse provides fleet roll-up data not available from the Carvoyant API.
  • The Fuse API includes APIs for fuel and maintenance in addition to those for trips. These interact with data from Carvoyant, but aren't available in the Carvoyant API. For example, Fuse enriches trip data from Carvoyant with trip cost data based on fuel purchases.
  • Fuse uses Carvoyant and they've been a great partner. But my vision for Fuse is that it ought to allow vehicle data from a variety of devices. I'd love to let people use Automatic devices for example, with Fuse. If you're interested in helping, let me know.

The link to Carvoyant in the Fuse Management Console (OAuth) has provided some angst for people do to the need to create a Fuse (Kynetx) account and then to also create and link-in a Carvoyant account. Indeed this has been the source of 90% of the support issues I deal with. In theory, it's no different than linking you Pocket account to Facebook and Twitter so that you can share things you read with Pocket. In practice it's hard for people to understand with for a few reasons:

  • In the Pocket example I cite, people already have a relationship with Twitter and Facebook.
  • Not only do they already have a relationship, but they understand what Twitter and Facebook are and why they want them.
  • Twitter and Facebook are used in more apps that Pocket, so Pocket is riding a wave of user understanding.
  • Pocket is linking to more than one thing and the fan out helps by providing multiple examples.

If Fuse supported more than just Carvoyant devices and you linked in multiple device accounts and if people used Carvoyant with more than one app, this might be clearer. But that's not reality right now, so we live with the model even though it seems somewhat forced.

The same is true of the Fuse (Kynetx) account. For simplicity, I refer to it as a Fuse account and the branding on the account interaction is Fuse, but if you pay attention, you're actually going to Kynetx to create the account. That's because you're really creating a hosting account for your picos on the Kynetx instance of KRE. Fuse itself really has no notion of an account. The Kynetx account is used to associate you with the owner pico that belongs to you, but that's all. Other mechanisms could be used to do that as well. You could run applications other than Fuse in that Kynetx account (and I do).

You're probably saying "this is more complicated than it has to be." And that's true if your goal is just to create a connected-car app like Automatic. My goal has always been a little larger than that: using Fuse as a means to explore how a larger, more owner-controlled Internet of Things experience could be supported. All this, or something similar, is necessary to create an owner-controlled Internet of Things experience.


The Core of Your API

cored apples One of the topics that came into relief for me quite clearly recently is the idea of core domains and their application in API design. This happened as part of our design meetings for BYU's University API. When I say "core domain" I'm thinking of the concepts taught in Domain-Driven Design (DDD) and made clear in Implementing Domain-Driven Design (iDDD). (Aside: if you're in OIT and would like a copy of iDDD, stop by my office.)

DDD uses the terminology "core domain," "supporting domain," and "generic domain" to describe three types of sortware systems you might be building or using and how your organization should relate to each. My goal here isn't to expound on DDD; that's a different article. But I think you get the idea of what a core domain is: the "core domain is so critical and fundamental to the business that it gives you a competitive advantage and is a foundational concept behind the business."

Suppose you're an online merchant, for example. The core domain is probably the order processing system and orders are the fundamental artifact you worry about. Inventory is important, but it's a supporting domain. Customers are important too, but they're also supporting. The thing you worry about day in and day out is the order. The object that links items in the inventory, a customer, and a payment transaction.

Consequently, if you were designing an API for an online merchant, you'd probably make orders a top-level object in the API:

/orders

This would for the heart of everything you designed.

Applying this logic to a University API is harder. Universities tend to be pretty complicated places with lots of constituents. For example, it we were to just ask "what business is a university in?" The answer, at the core, is that universities are in the credentialing business. We certify that students have performed at required levels in prescribed sets of classes. Looked at this way, an enrollment object (marked as "complete") might be at the heart of a university API. But it turns out that almost no university systems care about enrollments as such, at least not the same way an ecommerce company cares about orders.

Universities care about students, courses, programs, classes, instructors, and classrooms. These are the key objects that fuel much of the university IT systems. Enrollments are in there, of course. You can ask what students are in a class and what courses a student is in or has completed. But you're always starting from the class or the student, not the enrollment itself.

Which of these is a core domain and which are supporting depends on your context. There's another key concept from DDD: "bounded contexts." The API needs to support each of these core objects, but how the API behaves with respect to a given object type depends on the context you're in. If I'm looking at a student from the context of tuition payments, I care about very different things, than if they've just stopped by the counseling center.

The University API will support different contexts. Trying to support these very different contexts from a single model in unwieldy at best and likely impossible. But that doesn't mean that the University API can't supply a consistent experience regardless of the context. The University API should feel like a well-designed system. This is accomplished through well-known principles of API design including consistency in naming, identifiers, use of plurals, error messages, headers, return values, and HTTP method semantics. Our goal is that developers who've used the API in one context and learned its idioms will be able to easily transfer that experience to another and that using the API in that new context will feel natural and intuitive.


Fuse as an Experiment in Internet of Things Architectures

Yellow car

The Internet is decentralized, heterarchical, and interoperable. Unfortunately, today's Internet of Things is none of those. We might better call it the "CompuServe of Things." What architecture should the Internet of Things embrace?

This talk will discuss the problems with the current model for the Internet of Things and present the lessons we've learned from architecting an Internet of Things product that is more in keeping with the fundamental architectural principles of the Internet.

Fuse is a connected-car product built as an experiment with software architectures for the Internet of Things. Fuse is event-based, open-source, hostable, and has an extensible API enabled by a microservice architecture. Moreover, Fuse is designed to keep vehicle data and services private unless explicitly shared by the owner. At the same time, the architecture preserves the ability for the service and associated data to be sold with the car.

At the heart of the Fuse architecture are persistent compute objects (or picos). Picos are lightweight, general-purpose, online objects that have persistent state. Picos are used to represent anything with a unique identity including people, places, things, and even concepts. Picos are decentralized and networked. Fuse knits picos representing the owner, fleet, and the owner's vehicles together. The owner controls the picos, with whom and how they share data, and where the picos are hosted. Picos are individually extensible. Consequently, the set of services a given pico presents is controlled by what services the owner has installed.

Among the lessons learned in building fuse are the value of self-healing services, idempotent actions, asynchrony, and accretive functionality. These lessons have allowed us to build a connected-car service that is modular, loosely-coupled, and has unprecedented support for user control of data.


The Quantified Car

The quantified-self movement is all about measuring your personal activities and gaining insight from the data. Many of us dabble in it with a Fitbit or a Withing scale. Others measure everything and use the data to change their life. Data changes behavior. I walk more because I have a Fitbit.

One of the things I've noticed is that Fuse gives me data about a part of my life where I often make incorrect assumptions that cost me time and money: my driving. Here's a few things I've realized:

  • Travel costs both more and less than I'd have thought. I spend $4/day getting to and from BYU. That's less than I'd have thought. But, $80/month is still a significant spend. I spent $16 driving from work to REI in Sandy and back. That's more than I'd have thought and could change how I think about trips.
  • My doctor is in Alpine. My gut told me going from BYU to I-15 and then up to SR-92 and out to Alpine might not be the shortest rouute, but would be the fastest. I did an experiment and drove to the doctor's office for an appointment by going down State Street and then up Canyon Road from Plesant grove. Lots of lights and stop and go, but more direct. The trip took 30 minutes and was 19 miles. The trip back via the freeway was about 26 miles and took 40 minutes. So, neither faster nor shorter. IMG_7151
  • I was thinking about a new car with better mileage. How much will I really save? Fuse is able to show me my fuel costs for each vehicle and help me see what I really spend.

Having real data often reveals behaviors that seem logical but are, in fact, not optimal. Watching data from Fuse has changed how I drive. I think more about trips I take and the money I'm spending on gas. I'm looking forward to getting maintenance data in Fuse as well because I'm sure that will be eye opening.


Why Do CS Majors Study Calculus

NooNoo studying calculus

John Dougall asked me recently if I thought Calculus was really necessary for a CS degree. I do and here's why.

Calculus teaches and encourages abstract thinking, something that is necessary to be a good programmer. Those who are discouraged by Calculus may find that they can get along just fine in college programming exercises, but they won't do well in the modeling that is necessary for real-life software architecture—not because they're missing Calculus, but because they struggle with abstract thinking.

Engineers typically use calculus to optimize some part of a bridge, circuit, or something else. They do this by creating a model of the thing they're building. Computer scientists rarely optimize models in this way. Rather they create their own abstractions that model some real-world system. Thinking abstractly is vital to good system design. Seeing how others use mathematics to abstractly model things (i.e. learning calculus) is good preparation for that challenge.

In general, one of the chief reasons for going to college to study CS isn't to learn to program. You can learn that at Code Academy or whatever. You go to college to change the way you think and that's going to happen both inside and outside the courses in your major.


Fuse App Launches

Fuse Logo

I have two important announcements regarding Fuse. One is likely to be much more exciting than the other, but both are important, so please read the entire post.

First, we're excited to announce that apps are available (finally!) for Fuse for both iOS and Android. Thanks to our Kickstarter backers for helping to get this done and especially to Alex Olson for making it all happen. We're excited to have you download them and try them out.

A few notes:

  • The app provides fleet views for trips and fuel as well as details about trips, a feature for fuel management, and "Find My Car."
  • Log into the app with the Fuse account you created at Joinfuse.com in the Fuse Management Console
  • The app doesn't have functionality for managing your vehicles, etc. That's what the Fuse Management Console is for. You'll still need to use the Fuse Management Console.
  • We are still working on and plan to release maintenance features. The API for maintenance is complete, but it hasn't been surfaced in the app yet.
  • There is still one stability bug remaining that we're working on, but we decided it was usable as is (I've been using it for several months) and continue to work on the stability. The bug involves how the menu works and results in the app buttons being non-responsive. The workaround at this point is to kill the app.
  • The Android version is likely less polished than the iOS version because neither Alex nor I are Android users, so we're at a disadvantage there.

In addition to fixing the obvious bugs and getting maintenance done, we're working to open source the app code. It's in Cordova. The primary roadblock there is ensuring that we haven't left any keys lying around in the code. In the meantime, if you're interested in forking the project and having a look at the code (or better yet, submitting a pull request), send me your Github ID and I'm happy to add you to the project.

The second part of this announcement is that we have launched a forum for Fuse at forum.joinfuse.com. This will be our last post on Kickstarter. From now on we will treat the Fuse Forum as the primary channel for communicating with you about Fuse. Please look around and feel free to ask questions there. We're excited to have a real forum for interacting with everyone about Fuse. In addition to the forum, you'll often find interesting articles about the technical implementation of Fuse and its philosophical underpinnings here on my blog.


Building Docker on Centos

The Dockers Monument In Limerick City Ref-441

I couldn't find a single place with good instructions on getting the latest Docker installed on Centos. Sure I could use Ubuntu, but where's the fun in that. I found lots of places with partial solutions, but nothing that just walked me through it. Consequently, I am writing down what I did. Maybe this will help someone else.

The first step is to get some version of docker installed. Fortunately, docker is available via yum on late-model Centos installations, but it's old. There are slight differences in Centos 6.X and Centos 7. On Centos 7, just install docker via yum:

sudo yum -y update
sudo yum -y install docker
sudo service docker start

On 6.5, it's a little harder cause there's another package called docker in the standard RPMs and dockers in an EPEL.

sudo yum install epel-release
sudo yum -y update
sudo yum -y erase docker          # removes the conflicting package
sudo yum install docker-io
sudo service docker start

After you've got a version of docker installed, you can use these instructions on Setting Up a Dev Environment. Essentially, it uses the existing copy of docker to build an updated version of docker (in a container, of course). The build process takes a while, but I didn't have any problems. Note that the build process puts the binaries in a shared volume on the disk. You can copy them out or just link to them.