Reactive Programming with Picos

Updated to include discussions about identity and KRL.

pico labs stacked logo

The Reactive Manifesto describes a type of system architecture that has four characteristics (quoting from the manifesto):

  • Responsive: The system responds promptly if at all possible.
  • Resilient: The system stays responsive in the face of failure.
  • Elastic: The system stays responsive under varying workload.
  • Message Driven: Reactive Systems rely on asynchronous message-passing to establish a boundary between components that ensures loose coupling, isolation, location transparency, and provides the means to delegate errors as messages.

reactive system property stack

These are often represented as a stack since the only explicit architectural choice is to be message driven. The others are emergent properties of that and other architectural choices.

The goal of the reactive manifesto is to promote architectures and systems that are more flexible, loosely-coupled, and scalable while making them amenable to change. The manifesto doesn't specify a development methodology so reactive systems can be built using a wide variety of systems, frameworks, and languages.

Persistent Compute Objects

Persistent compute objects (picos), are a good choice for building reactive systems—especially in the Internet of Things.

Picos implement the actor model of distributed computation. Actors extend message-driven programming with two additional required properties. In response to a received message,

  1. actors send messages to other actors
  2. actors create other actors
  3. actors implement a state machine that can affect the behavior of the next message received

This is also a good, high-level description of the properties that picos have. Picos respond to events and queries by running rules. Depending on the rules installed, a pico may raise events for itself or other picos. Picos can create and delete other picos. Each pico has a persistent data store that can only be affected by rules that run in response to events. I describe picos and their API and programming model in more detail elsewhere. Event-driven systems, like those built from picos, are the basis for systems that meet the reactive manifesto.

Picos support direct asynchronous messaging by sending events to other picos. Also, picos implement an event bus internally. Events sent to the pico are placed on the internal event bus. Rules in the pico are selected to run based on declarative event expressions. The pico matches events on the bus with event scenarios declared in the event expressions. Event expressions can specify simple single event matches, or complicated sets of events with temporal ordering.

Picos share nothing with other picos except through messages exchanged between them. Picos don't know and can't directly access the internal state of another pico.

As a result of their design, picos exhibit the following important properties:

  • Lock-free concurrency—picos respond to messages without locks.
  • Isolation—State changes in one pico cannot affect the state in other picos.
  • Location transparency—picos can live on multiple hosts and so computation can be scaled easily and across network boundaries.
  • Loose coupling—picos are only dependent on one another to the extent of their design.

Channels and Messaging

Pico-to-pico communicate happens over event channels. Picos have many event channels. An event channel is point-to-point and delivers events directly to the pico. Picos can only interact by raising events and making requests on a specific event channel.

Picos can only get an event channel to another pico in one of four ways:

  1. Parenthood—when a pico creates another pico, it is given the only event channel to that new child pico.
  2. Childhood—when the parent creates a child, the child receives an event channel to its parent1.
  3. Endowment—as part of the initialization, a parent pico can give channels in its possession the child.
  4. Introduction—one pico can introduce a second pico to a third. In addition, the OAuth flow supported by picos returns a channel to the pico.

Children only have a channel to their parent, unless (a) the parent gives them other channels during initialization, (b) the parent introduces them to another pico, or (c) they are capable of creating children. Consequently, a pico is completely isolated from any interaction that its parent (supervisor) doesn't provide. This creates a security model similar to the object capability model.


A pico-based application is a collection of picos operating together to achieve some purpose. A single pico is almost never very interesting.

Picos live inside a host called KRE2. KRE is the engine that makes picos work. A given instance of KRE can host many picos. And there can be any number of KRE instances. Picos need not exist in the same instance of KRE to interact with each other.

When an account is created in KRE, a root pico is also created. That pico is special in that it is directly associated with the account and cannot be deleted without deleting the account. Only the root pico can be introduced to another application via OAuth. The root pico can create children, and those picos can also create children.

For example, in Fuse, the connected-car application we built using picos, each Fuse owner ends up with a collection of picos that look like this:

fuse microservice overall

The Fuse application isn't a single pico, but a collection. Fuse arranges picos in a particular configuration and endows them with specific functionality to create the overall experience. The owner pico is the root pico in the account. When it's created and initialized, it created a fleet pico. The fleet pico creates vehicle picos as necessary as people manipulate the mobile application to add a new vehicle.

Picos belong to a parent-child hierarchy with every pico except for the root pico having a parent. Further, each child pico in the hierarchy belongs to the same account as the root pico. Picos can be moved from one account to another although KRE does not yet support this.

The parent-child relationship is important because the parent installs the initial rules in the child, imbuing it with functionality and then initializes the pico to give it relationships to other picos and set its initial state. Since the parent is installing rules of its choosing, initialization consists of installing the initialization rule and then sending the child pico an event that causes the initialization rule to fire.

But, the collection of picos that make up an application need not communicate hierarchically or even within the picos of a single account. The channels that link picos create the relationships that allow the application to work

For example, consider the following diagram, again from Fuse:

Fuse with multiple owners

In this diagram, the fleet pico has two owners. That is, there are two picos that have owner rights for the fleet. The two owners are in different accounts and could be on different KRE hosts. The behavior of the application depends on the pico relationships represented by the arrangement of channels, channel attributes, and the rules installed in each pico. Because there's no fixed schema or configuration of picos, the overall application is very dynamic and flexible.

Each pico presents a potentially unique API based on the rulesets it contains. Mobile and web-based applications communicate with the pico and use its API using a model called the pico application architecture (PAA).

Internet First

Picos were developed to be Internet-centric:

  • Picos are always online. Picos don't crash, and they only go away when explicitly deleted. Picos can be programmatically deleted or when the account they are in is deleted.
  • Rulesets are loaded by URL. There is a registry that associates a short-name (called the ruleset ID) with the URL, but KRE only caches the ruleset. The definitive source is always the ruleset specified by the URL.
  • Channels are URLs. Events are raised using the URL. This is primarily done over HTTP, although a SMTP gateway exists and KRE could support other transport mechanisms such as MQTT.

Being first-class Internet citizens sets picos apart from other actor-based programming tools used to build resilient systems. The Internet-centric design supports Internet of Things applications and means that pico-based systems inherently support the important properties that set the Internet apart including decentralized operation, heterarchical structure, interoperability of components, and substitutability of servers.


Scattered throughout the preceding discussion are numerous references to identity issues, including authentication and authorization. Picos free developers from worrying about most identity issues.

Picos have a unique identity that is the basis of their isolation. Persistent data is independently stored for each pico without special effort by the developer. A channel is owned by only one pico. Rulesets are installed on a pico-by-pico basis.

Each root pico is associated with a single account. And descendents of the root pico are owned by the same account. The account handles authentication, including OAuth access, for any picos in the account. KRE implements the account identity layer.

Clients can use the OAuth flow to get access to a channel for the root pico. Access to another pico in the account is mediated by rulesets installed in the root pico. The client (i.e. relying party) can be a mobile application, Web application, or an online service. KRE acts as the authorization server, and the pico acts as the resource server. The resource is the pico's API.

pico and oauth

Giving out a channel is tantamount to giving permission to the holder to send events. Future versions of the system will support policies on channels that restrict interactions. But for many uses (e.g. the Fuse connected car application) such restrictions are unnecessary since the only systems with access to channels were all under the account owner's control.

Reactive Programming with Picos

Picos present an incredibly flexible programming model. In particular, since picos can programmatically create other picos, form relationships with other picos by creating and sharing channels, and change their behavior by installing and uninstalling rulesets, they can be very dynamic.

There are several important principles that you should remember when using picos to create a reactive system.

Think reactively. Picos are much more responsive and scalable when programmed with events. While picos can query other picos to get data snapshots, queries can lead to excessively synchronous operation. Because picos support lock-free asynchronous concurrency, they are more efficient when responding to events to accomplish a task. But this requires a new way of thinking for developers who have traditionally programmed in object-oriented and imperative languages. As an example, I've described in detail how I converted a report generation tool for Fuse from a synchronous request-response system to one based on the scatter-gather pattern. The reactive solution is more resilient because it's designed to accommodate failure and missed messages. Moreover, the reactive solution is more scalable because it doesn't block and thus isn't subject to timeouts:


Use picos to create the model. Picos represent individual entities since they have a unique identity and are persistent. They can have unique behavior and present their own API to other picos and applications. The overall structure of a system of picos should correspond to the relationships that entities have in the modeled system.

In Fuse, as we've seen, there are picos that represent the owner (a person), the fleet (a concept and collection), and each vehicle (a physical device). Picos were designed to build models for the Internet of Things, but they can be used for other models we well. For example, we used picos to model a guard tour system that included entities as diverse as locations and reports:

guard tour pico relationships

Think in interactions. Going along with the idea of modeling with picos, developers have to think in terms of the interactions that picos have with each other. While the rulesets installed in each pico defines its behavior, the behavior of the application derives from the interaction that the picos have with each other. Developers can use tools like pico maps that show those relationships and swim-lane diagrams that show the interactions that happen in response to events.

In addition to the interactions between picos, the rules in a single pico are responsive to events and often raise an event to the same pico, causing other rules to respond. For example, this diagram shows the interactions of a few rules in a vehicle pico in response to a single pds:profile_updated event:

fuse microservice

You can see from the diagram, that the single event sets off a chain of reactions, including sending events to other picos and calls to external APIs. You can read the details in Fuse as a Microservice Architecture.

Let the system do the scheduling and avoid blocking. When an event is raised in a pico, the pico event evaluation cycle picks rules to run based on their event expressions and schedules them.

event eval cycle

Developers can order rule evaluation inside a pico through the use of events, event expressions, and ad hoc locks using persistent storage. Between picos, controlling order is even harder. In general, avoid using these to sequence operations as much as possible and let the system do the scheduling.

Use idempotency. Failure is easier to handle when picos are not sensitive to repeated delivery of the same event since they are free to retry without having to determine if the previous event completely partially or was been delivered. Many operations are naturally idempotent. For those that aren't, the pico can often create guard rules that assure idempotency. Since one pico can't directly see another pico's state, the receiving pico must take the responsibility for idempotency. Idempotent Services and Guard Rules provides more detail, including an example:


Be resilient. Developers can't anticipate every situation, but picos can be modified over time to be resilient to common failure modes. Picos have a number of methods for handling errors including the ability to set default rulesets for error handling and the ability to select rules based on error conditions. In addition, because pico-based applications are incredibly dynamic, they can frequently be programmed to self-heal. For example, A Microservice for Healing Subscriptions describes how a simple rule can check for missing channels and recreate them:


Whither KRL?

If you've had a prior introduction to picos, you might be wondering about KRL, the rule language used to program picos. How is it that I managed to write a long post about reactive programming and picos without mentioning KRL?

Picos relate to KRL in a way analogous to the relationship between Unix and C. I can talk about Unix programming, processes, file systems, user IDs, scheduling, and so on without ever mentioning C. Similarly, I can talk about reactive programming and picos without explicitly mentioning KRL.

Does that mean KRL isn't important? On the contrary, there's no other way to program picos. But building a pico-based application isn't so much about KRL as it is the function, behavior, and arrangement of the picos themselves.

People often ask if we couldn't just get rid of KRL and use JavaScript or something else. The answer is a qualified "yes." The qualification is that to get pico functionality, you need to add event expressions, persistent data, a runtime environment, accounts and identifiers, channels, and the ability to manage the pico lifecycle dynamically, including the JavaScript installed in each one. By the time you're done, JavaScript is a very small part of the solution.

For most people, learning KRL isn't going to be the hard part. The hard part is thinking reactively. Once you're doing that, KRL makes implementing your reactive design fairly easy and you'll find that its features map nicely to the problems you're solving.


Picos are a dynamic, actor-based system that support building reactive systems. The Internet-centric nature of picos means that they are especially suited to building Internet of Things applications that are decentralized and independent. All of the components that support building systems with picos are open source. I have a group of students helping to build, test, and use pico-based systems and further develop the model. We invite your questions, feedback, and help.

I found inspiration for writing this article and seeing actors as an answer to building reactive systems from Vaughn Vernon's book Reactive Messaging Patterns with the Actor Model: Applications and Integration in Scala and Akka.

  1. Strictly speaking, it isn't necessary for the child to automatically get the parent's channel and it may be better to let the parent supply that only if necessary. This may change in future versions of KRE. Let me know if you have feedback on this.
  2. KRE used to be an acronym for the Kynetx Rules Engine. Now, it's just KRE.

Ambience and Personal Learning Systems


My work on the Internet of Things has led me to be a big believer in the power of ambience. The Internet of Things is going to lead to a computing experience that is immersive and pervasive. Rather than interacting with a computer or a phone, we'll live inside the computation. Or at least that's how it will feel. Computing will move from being something we do, like a task, to something we experience all around us. In other words, the computing environment will be ambient.

Remarkably, we've been moving the opposite direction with learning.

Since the advent of the Web, more and more learning activities have moved inside the computer. Everything from textbooks to lecture delivery to quizzes has moved onto the tiny screens that seem to dominate our lives. Consequently, when I speak to people about personal learning systems (PLS), the first questions often concern the UI. "What kind of dashboard will it have?" I even labeled one of the primary boxes in my personal learning system diagram the "dashboard."

As we started talking about the PLS and how it would work, I realized that a dashboard was the wrong way to think about it. One of the primary features of the personal learning system is an API that other systems can use. As a result, a lot of what a learner might do in a dashboard in a closed system will happen via the interactions that the learner has with other systems that then use the API.

Some of these interactions are obvious. For example, if an instructor schedules a quiz in Canvas (Instructure's LMS product), then Canvas ought to tell the student's PLS about the new quiz. And once the student completes the quiz, the quiz results and learning objectives (and even the quiz itself depending on instructor preferences) ought to get pushed out to the student's portfolio. The quiz is administered in Canvas since that's the LMS the instructor chose for that class. We shouldn't, indeed we can't, replicate every possible learning activity in the PLS. That's not its job.

We can easily imagine and LMS telling the PLS about a quiz and results via an API. But how does the student know the quiz was scheduled? And where is the student notified of results?

Your first thought might be the PLS dashboard, but I don't think most of us are looking for another place to check for things to do or get messages. The PLS ought to just use the systems the student already has.

The initial version of the PLS that we build will have a calendar server that is available via the PLS API. Regardless of how the LMS tells the PLS about an upcoming event, the student will see it in whatever calendar tool they already use because they'll be able to link the calendar tool on their phone or desktop to their personal API.

Similarly, notifications ought to come to the learner by whatever channel they choose. We have a separate NotifyMe project that we're building at BYU. Students give permission to senders and choose the channel for delivery of messages from a specific sender. Senders queue messages for delivery via and OAuth-moderated API. Students can revoke permission to send at any time. Channel types include email, SMS, dead-letter drop, and even things like Twitter and Facebook. We're planning to fold this capability into the PLS.

Browser extensions, mobile apps, etextbook readers, Slack integrations, the student's domain, and any other tool students use, could be considered part of the student's learning environment and thus something that should be talking to the PLS.

Thinking outside the box a little, I've asked the group that works on BYU's tech classrooms to consider how a classroom might be part of the LMS so that the classroom is configured based on the needs of the faculty member teaching the class. You could extend this idea to the PLS as well. Why shouldn't the classroom write to the class member's PLS? That way the classroom, already part of the student's learning environment, is also integrated with the PLS.

Even functions that probably are part of the PLS, like planning, don't necessarily just have a dashboard. There may be some default set of screens or an app where learners plan their personal syllabus. But because there's an API this function might be farmed out in other tools as well. For example, the student's learning plan may be influenced or even controlled (based on their goals, age, etc.) from systems used by counselors, HR departments, and other advisors.

A PLS shouldn't be one more place learners need to go, but rather something that hooks to everything else they do. Sure, there will be configuration screens, but since most learning happens ambiently, the PLS should respect that and be as unobtrusive as possible while still doing all it can to help people learn.

Tutorial Proposal: Using Domain Driven Design to Architect Microservices

I just posted this as a proposed tutorial at the O'Reilly Software Architecture conference next April in New York. I'm not holding my breath. My track record with O'Reilly conferences isn't great.


One of the hardest parts of creating a microservices architecture is knowing where the boundaries are. Whether we're designing a new system or refactoring a monolithe that has become a "big ball of mud," microservides depend on getting the boundaries and interfaces right. Without proper boundaries and a good understanding of the necessary interaction patterns, your brand-new microservice-based system will just make the "big ball of mud" more complicated.

Microservices and Domain-Driven Design (DDD) are made for each other--they are both about boundaries. Using principles from DDD, we can understand the right places to split up the monolithe. DDD concepts like bounded contexts and ubiquitous language are concrete tools, with a large body of work behind them, that help designers find and enforce boundaries. Moreover, DDD principles help the architect understand how microservices can be as loosely coupled as possible so that problems in one part of the system don't spill over into the rest.

Part I of this tutorial is an introduction to microservices and the challenges of architecting a system using them. We will explore important microservices concepts such as isolation, team autonomy, planning for failure, and lightweight communication.

Part II covers the key parts of so-called strategic DDD such as domains, bounded contexts, and ubiquitous languages. We will apply them to microservices and explore how DDD's bounded contexts help with the design of key microservice requirements.

Part III focuses on microservice interfaces and shows how DDD context maps can be used to design them. Context maps not only document the interfaces, but classify them and give architects a tool for exploring possible problems with interface decisions. Good interface design is a key to building resilient microservice-based systems. DDD provides concrete tools for baking resiliency into the design

As part of the tutorial, students will participate in two hands-on exercises where they apply the principles they are learning. These exercises provide tools they can take back to their teams and use to get started on their own microservice designs.

Learning Objectives

At the end of this tutorial, students will be able to:

  • describe the key requirements for successful microservice design
  • understand DDD principles and apply them in finding the boundaries in a microservice architecture
  • explain a context map and show how context maps communicate interface issues in microservices


Phillip J. Windley, Ph.D. has been a computer science professor at Brigham Young University for over 20 years where he teaches classes on distributed systems. Currently Phil is an Enterprise Architect in the Office of the CIO at BYU where he leads efforts across the university to apply domain-driven design (DDD) and microservice principles in architecting campus systems. Phil was the founder and CTO of, an early ecommerce tools company and founder and CTO of Kynetx, an Internet of Things company that built the connected-car product Fuse.

Personal Learning Systems and Life-Long Learning


Over the years I've been interested in personal APIs, personal data stores, and personal clouds. If you've followed my work at all, you're aware that I care a lot about creating systems that are owned by the people who use them and give people control over how they work and how their data is used and shared. This is what sets personal apart from personalized. Personalized means that someone else's system recognizes me and uses that as context to change the data it shows me. Fine as far as it goes. Personal means that I own and control the system, the processing, and the data.

The best news is that personal systems are not at odds with enterprise and vendor provided services. As we'll see below, personal systems augment other systems and provide them with more accurate and more plentiful data.

In my present job, personal means thinking about how people relate to what we call learning management systems (LMS). Learning management systems are great, but they don't really do all that much for learners. And I think it's a mistake to try and build systems that work for students and schools alike. I think we're better off building different systems for each party and letting them talk to each other using APIs.

Learning Management Systems

Over the years, I've used a lot of different systems to provide online learning aids for my students. My first attempt was a class Web site I put together for CS330 at BYU in 1994. After blogging software came along I mostly used a blog as the Web site for the course and augmented it with things from Github, Blackboard, LearningSuite, and even Moodle. I'm lazy, so I can't say it was ever awesome, but it was functional so far as students finding the class schedule and the resources I put together for class.

In one way or another, they were all what today we call a "learning management system" or LMS. The perspective was me, the instructor, managing the class by controlling the communication, resources, quizes, etc. that the students used to learn. My focus was on my class.

Lately, I've been thinking about this whole thing from the perspective of the learner--the student. Students don't just have my class, they have a complete schedule. Their goal is to participate in the various learning activities that their instructors have planned and complete a set of courses. This starts when a student plans their class schedule and continues until the courses are over.

Beyond that, the student may have their own learning agenda outside of an organized school. That should seamlessly mix in with more formal content. And there's no need to envision this process with fixed start and end times. Students should be able to start and finish classes or even smaller modules on a flexible schedule.


The end result of all these learning activities are various artifacts that range from read papers, discussion comments, completed assignments, quizzes, projects, and so on to credentials and degrees. In a traditional LMS, all this work disappears unless the student saves it somehow. Only grades, degrees, and other credentials are retained by the institution.

There's a lot of talk about ePortfolios in higher education. Like the LMS, these are often thought of from the institution's perspective. Here's why.

Departments, colleges and universities go through a process called accreditation on a periodic basis where outside visitors audit the program or even the entire institution to ensure it's meeting certain desired metrics. One of the things that happens in an accreditation visit is a review of whether courses and programs can show evidence that students who complete the course of study have met the learning objectives the institution has outlined for the course or program.

Traditionally this entails collecting and saving samples of student work across the range of performance and a range of time. Years ago, this would result in the department conference room table covered with stacks containing paper copies of exams, assignments, etc. for accreditors to review. Now it's mostly online. And, not surprisingly, people are less forgiving of missing evidence...but that's another rant.

People who worry about accreditation love the idea of ePortfolios because they imagine that if the results of learning activities are captured in the student's portfolio like so much digital exhaust, then the evidence for accreditation could be gathered with the push of a button.

Another reason institutions love ePortfolios is for purposes of assessment. Student-created artifacts are graded to assess student progress and, eventually, competency in the course's stated objectives. Again, institutions gather all this student-created work and an ePortfolio seems like it would ease that burden by organizing it in a consistent way.

I believe what we call an ePortfolio is really three different systems:

  1. An institutional system for accreditation purposes
  2. An institutional system for grading purposes
  3. A personal system for the student's own record

We make a mistake when we conflate these three purposes. From a DDD perspective, they are three different domains with different contexts and languages.

A Personal Learning System

Lots has been written about the enterprise-side of this problem. I'd like to focus on what I call a personal learning system (PLS). The following diagram outlines some ideas for what a PLS might look like. The student is at the center, orchestrating the learning that takes place by:

  • Completing learning activities
  • Reviewing achievements
  • Choosing what the share and with whom to share it
  • Choosing what to learn
  • Reviewing and acting on feedback
  • Managing the learning plan and resources
personal learning system

The student owns and uses a PLS that contains both a learning dashboard and a portfolio. I like to think of the student transforming learning objectives and resources in the dashboard into completed activities and competencies in the portfolio as she completes various learning activities. I see the PLS dashboard and the portfolio as mirror images of each other, reflected across the now; things to do and things done.

The dashboard gets feedback from the portfolio that can be used to inform the student of the best next steps for the student to accomplish her objectives. This may involve changing the detailed learning plan, redoing work, skipping some things, etc.

Both the dashboard and the portfolio have an API that is part of the student's personal API. This API provides access to other outside systems. The dashboard is the repository and interface to the student's personally selected objectives, plan, and resources. All of these interact with external learning guides that might include:

  • instructors,
  • peers and social groups,
  • digital textbooks,
  • learning management systems, and
  • employers

Anyone can affect the student's PLS dashboard if she has granted them permission to access to her personal API.

On the portfolio side, the student's personal API provides permissioned access to external organizations and systems that might include:

  • institutional ePortfolio systems,
  • learning management systems,
  • parents,
  • social circles and groups,
  • instructors,
  • personal coaches, and
  • employers

Again, these organizations and systems connect to the portfolio via the student's personal API and only with permission.

There may or may not be interaction between the learning guides and other external systems. It's likely, for example, that the student would give an external LMS that they're using permission to access both the dashboard and the portfolio. Similarly with employers.

There's no reason the PLS would talk to just one LMS. The learner may be using multiple systems for various purposes. The PLS would present all of these in a single view.

Being Personal Enables Life-Long Learning

By far, the most interesting aspect of the PLS is its focus on personal. Traditionally both LMS and ePortfolio systems have been enterprise systems, bought and installed by an institution for their purposes.

The PLS, on the other hand, is chosen and controlled by the learner. The institution, and others, link to and use the PLS, but the student is in control.

Focusing on personal is why the PLS is such a good tool for life-long learning. By giving students their own learning system and teaching them to use it, we make them responsible for their learning and teach them skills that will enable them to learn new things without institutional support. Moreover, we put the student at the center of their learning experience as an active participant rather than a passive consumer.


A personal learning system is not:

  • an assessment engine—assessment would be provided by external systems.
  • a content repository—learning resources might come from many sources.
  • a group interaction system—chat, discussion, etc. happens elsewhere
  • courseware—specialized texts, exercises, and simulations live outside the PLS
  • a course—course content, syllabus, objectives, assignments, and assessments live in an LMS

In short, there's still plenty of room in a world of personal learning systems for learning management systems, grade books, schools, and instructors. The PLS augments these systems by extending the realm of the portfolio to include activities yet to be done rather than merely recording and what's done and storing those artifacts.

A Pilot Proposal

The BYU Domains project, our version of Domain of One's Own, is a good place to build a prototype PLS. BYU Domains is a CPanel system, so you could imagine a PLS that installs via CPanel and interfaces with institutional systems via an OAuth-mediated API.

If you're interested in these ideas or have comments and suggestions, I invite you to contact me.

The ideas in this post have been influenced by numerous discussions with Kelly Flanagan and Troy Martin in the Office of the CIO at Brigham Young University.

Using the Scatter-Gather Pattern to Asynchronously Create Fuse Reports


Three Wishes

Fuse is an open-source, connected-car platform that I used use to experiment with techniques for building a true Internet of Things.

Fuse is built on a platform that supports persistent compute objects, or picos. Picos are an Internet-native, reactive programming system that supports the Actor model. You program them using a rule language called KRL. You can read more about picos here.

Fuse sends a periodic (weekly for now) report to the fleet owner providing all the details of trips and fuel fillups for each vehicle in the reporting period. The report also aggregates the detail information for each vehicle and then for the fleet. Here's the start of a weekly report for my fleet:

Fuse Weekly Report

The owner is represented by a pico, as are the fleet and each of the vehicles. Each of these picos is independent, network addressable, stores data for itself, and executes processes that are defined by functions and rules. They can respond to requests (via functions) or events (via rules). They communicate with each other directly without intermediation.

Synchronous Request-Response Solution

The most straightforward way to create a report, and the one I used initially, is for the fleet to make a request of each of its vehicles, asking them to compile details about trips and fillups and return the resulting JSON structure. Then the fleet formats that and sends an event to the owner pico indicating the report is ready to email. That process is represented by the following diagram. The methods for coding it are straightforward and will be familiar to anyone who's used an API.


The owner pico kicks everything off by sending the periodic_report event to the fleet pico. The fuse_periodic_report rule in the fleet pico calls a function in the fleet pico called fleetDetails() that makes synchornous requests to each of the vehicles over HTTP using their API. Once all the vehicles have responded, the rule formats the report and tells the owner pico its ready via the periodic_report_ready event.

This works pretty well so long as the vehicles respond in a timely manner. For performance reasons, I have the HTTP timeouts set fairly short, so any big delay causes a vehicle to get missed when a request for its details times out. For people with a few vehicles in their fleet, it's fairly rare for this to happen. But with lots of vehicles, the chances go up. Somewhere around 10 vehicles in the fleet and your chances of at least one vehicle timing out get fairly good.

If my only tool was synchronous request-response-style interactions, then this would be a pretty big problem. I could increase the time out, but that's a bandaid that will only mask the problem for a while. I could make the vehicleDetails() function more performant, but that's a lot of work for reasons having to do with how the underlying platform does queries in Mongo. So that's a can of worms I'd rather not open now. Besides, it's still possible for something to get delayed due to network latency or some other problem regardless of how fast the underlying platform is.

Scatter-Gather Solution

A more entertaining and intellectually interesting solution is to use a scatter-gather pattern of rules to process everything asynchronously.

Vaughn Vernon describes the scatter-gather pattern on page 272 of his new book Reactive Messaging Patterns with the Actor Model1. Scatter-gather is useful when you need to get some number of picos to do something and then aggregate the results to complete the computation. That's exactly the problem we face here: have each vehicle pico get its trip and fillup details for the reporting period and then gather those results and process them to produce a report.

The diagram below shows the interactions between picos to create the report. A few notes about the diagram:

  • Each pico in the diagram with the same name is actually the same pico, reproduced to show the specific interaction at a given point in the flow.
  • The rules send events, but only to a pico generally, not to a specific rule. Each pico provides an event bus that rules use to subscribe to events. Any number of rules can be listening for a given event.
  • There are no requests (function calls) in this flow, only asynchronous events.

Here's how it works.

The owner pico kicks everything off by sending the request_periodic_report event to the fleet. Because events are asynchronous, after it does so, it's free to do other tasks. The start_periodic_report rule in the fleet pico scatters the periodic_vehicle_report event to each vehicle in the fleet, whether there's 1 or 100. Of course, these events are asynchronous as well. Consequently they are not under time pressure to complete.

When each vehicle pico completes, it sends a periodic_vehicle_report_created event to the fleet pico. The catch_vehicle_reports rule is listening and gathers the reports. Once it's added the vehicle report, it fires the periodic_vehicle_report_added event. Another rule in the fleet pico, check_report_status is checking to see if every vehicle has responded. When the number of reports equals the number of vehicles, it raises the periodic_report_data_ready event and the data is turned into a report and the owner pico is notified it's ready for emailing.

Some Messy Details

You might have noticed a few issues that have to be addressed in the preceding narrative.

First, while unlikely, it's possible that the process could be started anew before the first process has completed. To avoid clashes and keep the events and data straight, each report process has a unique report correlation number (rcn) Each report is kept separate, even if multiple reports are being processed at the same time. This is not strictly necessary for this task since reports run once per week and are extremely unlikely to overlap. But it's a good practice to use correlation numbers to keep independent process flows independent.

Second, the check_report_status uses events from the vehicle picos to determine when it's done. But event delivery is not guaranteed. If one or more vehicle picos fail to produce a vehicle report, then no fleet report would be delivered to the owner. There are several tactics we could use:

  • We could accept the failure and tell owners that sometimes reports will fail, possibly giving them or someone else the opportunity to intervene manually and regenerate the report.
  • We can set a timeout and continue, generating a report with some missing vehicles.
  • We can set a timeout and reissue events to vehicle picos that failed to respond. This is more complicated because in the event that the vehicle pico still fails to respond after some number of retries, we have to adopt the strategy of continuing without the data.

I adopted the second strategy. Picos have the ability to schedule events for some future time (either once or repeating). I chose 2 minutes as the time out period. That's plenty long enough for the vehicles to respond.

This idea of creating timeouts with scheduled events is very important. Unlike an operating system, picos don't have an internal timer tick. They only respond to events. So it's up to the programmer to determine when a timer tick is necessary and schedule one. While it's possible to use recurring scheduled events to create a regular, short-delay timer tick for a pico, I discourage it because its generally unnecessary and wastes processing power.


Using the scatter-gather pattern for generating reports adds some complexity over the synchronous solution. But that point is moot since the synchronous solution fails to work reliably. While more complicated, the scatter-gather solution is only a handful of additional rules and none of them are very long (142 additional lines of code in total). Each rule does a single, easy-to-understand task. Using the scatter-gather solution for generating reports increases the reliability of the report generating system at an acceptable cost.

The scatter-gather solution makes better use of resources since the fleet pico isn't sitting around waiting for all the vehicles to complete before it does other important tasks. The fleet pico is free to respond to other events that may come up while the vehicles are completing their reports.

The concurrent processing is done without locks of any kind. Because each pico is independent, they have no need of locks when operating concurrently. The fleet pico could receive events from multiple vehicles, but they are queued and handled in turn. Consequently, we don't need locks inside the pico either. Lockless concurrency is a property of Actor-model systems like picos.

In general, I'm pretty happy with how this works and it was fun to think about. Next time I'm faced with a similar problem, scatter-gather will be my first choice, not the one I use after the synchronous solution fails.


  1. I recommend Vaughn's book for anyone interested in picos. While the language/framework (Scala and Akka) is different, the concepts are all very similar. There's a lot of good information that can be directly applied to programming picos.

Asynchronous Boundaries and Back Pressure

Vanising Train

A significant component of reactive programming is asynchronous boundary between the message sender and receiver. The problem is that the receiver might not work as fast as the sender and thus fall behind. When this happens the sender can block (blocking), the event bus can throw messages away (losiness), or the receiver can store messages (unlimited message queue). None of these are ideal.

In response, a key concept from reactive systems is non-blocking back pressure. Back pressure allows queues to be bounded. One issues is that back pressure can't be synchronous or you lose all the advantages. Another is that if the sender doesn't have anything else to do (or can't do it easily) then you effectively get blocking.

Picos, as implemented by KRL, are lossy. They will queue requests, but the queue is, obviously finite, and when it reaches its limits, the event sender will receive a 5XX error. This could be interpreted as back-pressure, a NACK or sorts. But nothing in the krl event:send() action is set up to handle the 5XX gracefully. Ideally, the error code out to be something the sender could understand and react to.

Regaining Control of Our Data with User-Managed Access


User control is central tenant of any online world that most of us will want to live in. You don't have to consider things like surveillance-based marketing or devices that spy on us long to realize that a future that's an extrapolation of what we have now is a very real threat to personal autonomy and freedom.

A key part of the answer is developing protocols that make it easy to give control to users.

The Limitations of OAuth

Even if you don't know what OAuth is, you've likely used it. Any time you use Twitter, Facebook, or some other service to login into some other Web site or app, you're using OAuth. But logging in is only part of the story. the "auth" in OAuth doesn't stand for "authentication" but "authorization." For better or worse, what you're really doing when you use OAuth is your granting permission for the Web site or app that's requesting you "log in with Facebook" to access your Facebook profile.

Exactly what you're sharing depends on what the relying site is asking for and what you agree to. There's usually a pop-up or something that says "this app will be able to...." that you probably just click on to get it out of the way without reading it. For a fun look at the kinds of stuff you might be sharing, I suggest logging into Take This Lollipop.

But while I think we all need to be aware that we're granting permissions to the site asking us to "log in," my purpose isn't to scare you or make you think "OAuth is bad." In fact I think OAuth is very, very good. OAuth gives all of us the opportunity to control what we share and that's a good thing.

OAuth is destined to grow as more and more of us use services that provide or need access to APIs. OAuth is the predominant way that APIs let owners control access to another service.

But OAuth has a significant limitation. If I use OAuth to grant a site to access Twitter, the fact that I did so and my dashboard for controlling it is at Twitter. Sounds reasonable until you imagine OAuth being used for lots of things and the user having dozens of dashboards for controlling permissions. "Let's see...did I permission this site to access my Twitter profile? Facebook? BYU?" I've got to remember and go to each of them separately to control the permission grants. And because each site is building their own, they're all different and most aren't terribly sophisticated, well-designed, or easy to find.

User-Managed Access to the Rescue

The reason is that while OAuth conceptually separates the idea of the authorization server (AS, the place granting permission) and the resource server (RS, the thing actually handing out data), it doesn't specify how they interact. Consequently everyone is left to determine that for themselves. So there's really no good way for two resources, for example, to use a single authorization server.

That's where UMA, or User-Managed Access, comes in. UMA specifies the relationship between the AS and RS. Further, UMA envisions that users could have authorization servers that are independent of the various resources that they're granting permission to access. UMA has been a topic at Internet Identity Workshop and other places for years, but it's suddenly gotten very real with the launch of ForgeRock's open-source OpenUMA project. Now there's code to run!

Side note: If you're a developer you can get involved in the UMA developer working group as well as the OpenUMA effort depending on whether your interests lie on the client or server side.

With UMA we could each have a dashboard, self-hosted or run by the vendor of our choice, where we control access permissions. This may sound complicated, like a big mixing board, but it doesn't have to be. Once there's a single place for managing access, it's easier for default policies and automation to take over much of the busy work and give owners better control at the same time.

UMA and Commerce

Doc Searls coined the term "vendor relationship management" or VRM years ago as a play on the popular customer relationship management (CRM) tools that businesses use to manage sales and customer communications. It's a perfect example of the kind of place where UMA could have a big impact. VRM is giving customers tools for managing their interactions with vendors. That sounds, in large part, like a permissioning task. And UMA could be a key piece of technology for unifying various VRM efforts.

Most of us hate seeing ads getting in the way of what we're trying to do online. The problem is that even with the best "targeting" technology, most of the ads you see are wasted. You don't want to see them. UMA could be used to send much stronger signals to vendors by granting permission for them to access information would let them help me and, in the process, make more money.

For example, I've written about social products. Social product provide a link back to their manufacturer, the retailer who sold them, the company who services them, and so on. These links are permissioned channels that share information with companies that tell them what products and services I need.

UMA is a natural fit for managing the permissions in a social product scenario, giving me a dashboard where I can manage the interactions I have with vendors, grant permission for new vendors to form a relationship, and run policies on my behalf that control those interactions.

Gaining Control

I'm very bullish on UMA and its potential to impact how we interact with various Web sites and apps. As the use of APIs grows there will be more and more opportunity to mix and mash them into new products and services. UMA is in a good position to ensure that such efforts don't die from user fatigue trying to keep track of it all or, worse, fear that they're losing control of their personal data.

Culture and Trustworthy Spaces

Karneval der Kulturen|Carnival of Cultures

In Social Things, Trustworthy Spaces, and the Internet of Things, I described trustworthy spaces as abstract places where various "things" could come together to accomplish tasks that none of them could do on their own.

For example, in that post I posit a scenario where a new electric car needs to work with other things in its owner's home to determine the best times to charge.

The five properties I discussed for trustworthy spaces were decentralized, event-driven, robust, anti-fragile, and trust-building. But while I can make points about why each of these is desirable in helping our car join a trustworthy space and carry out negotiations, none of them speak to how the car or space will actually do it.

In Systems Thinking, Jamshid Gharajedaghi discusses the importance of culture in self-organizing systems. He says "cultures act as default decision systems." Individuals identify with particular cultures when their self-image aligns with the shared image of a community.

Imagine a trustworthy space serving as a community for things that belong to me and use a lot of power. That space has default policies for power management negotiations. These aren't algorithms, necessarily, but heuristics that guide interactions between members.

In its turn, the car has a power management profile that defines part of its self-image and so it aligns nicely with the shared image of the power management space. Consequently, when the car is introduced to the household, it gravitates to the power management space because of the shared culture. It may join other spaces as well depending on its self image and their culture.

My description is short on detail about how this culture is encoded and how things discover the cultures of spaces upon being introduced to the household, but it does provide a nice way to think about how large collections of things could self organize and police themselves.

Gharajedaghi defines civilization as follows:

[C]ivilization is the emergent outcome of the interaction between culture (the software) and technology. Technology is universal, proliferating with no resistance, whereas cultures are local, resisting change with tenacity.

I like this idea of civilization emerging from a cultural overlay on our collections of things. By finding trustworthy spaces that are a cultural fit and then using that culture for decision making within a society of things, our connected things are tamed and become subject to our will.

Resources, Not Data

Rest Area?

You'll often hear people explain the mainstay HTTP verbs, GET, POST, PUT, and DELETE, in terms of the venerable CRUD (create, retrieve, update, and delete) functions of persistent storage systems. Heck, I do it myself. We need to stop.

In a RESTful API, the HTTP verbs are roughly analogous to the CRUD functions, but what they're acting on is quite different. CRUD functions act on data...static, stupid data. In REST, on the other hand, the verbs act on resources. While there are cases where a resource is just static data, that case is much less interesting than the general case.

To see how, let's pull out the old standby bank account example. In this example, I have a resource called /accounts and in a CRUD world, you could imagine deposits and withdrawals to an account with identifier :id being PUTs on the /accounts/:id resource.

Of course, we'd never expose an API where you could update an account balance with a PUT. In fact, I can't imagine anything you'd do with the account balance in such an API except GET it. There are too many necessary checks and balances (what we call "model invariants") that need to be maintained by the system.

Instead, what we'd do is design an account transfer resource. When we wanted to transfer $100.00 from /accounts/A to /accounts/B, we'd do this:

POST /transfers

  source: /accounts/A,
  destination: /accounts/B,
  amount: 100.00

This creates a new transfers resource and while it's true that data will be recorded to establish that a new transfer was created, that's not why we're doing it. We're doing it to effect the transfer of money between two accounts. Underneath this resource creation is a whole host of processes to maintain the transactional integrity and consistency of the bank's books.

Interesting resources have workflow rather than just being a collection of data. So stop focusing on the HTTP verbs and think about the resources instead. REST is resource-oriented and that doesn't just nicely map to objects, relational databases, and remote procedure calls. Most bad APIs are a result of this mistaken attempt to understand it in terms of old programming paradigms.

Tesla is a Software Company, Jeep Isn't

Tesla Sightings

Marc Andreessen has famously said that "software is eating the world." Venkatesh Rao calls software, "only the third major soft technology to appear in human civilization."

"So what?" you say. "I'm not in software, what do I care?"

You care, or should, because the corollary to this is that your company is a software company, whether you like it or not. Software is so pervasive, so important that is has or will impact every human activity.

The recent hacks of a Jeep Cherokee and Tesla Model S provide an important example of what it means to be a software company—even if you sell cars. Compare these headlines:

After Jeep Hack, Chrysler Recalls 1.4M Vehicles for Bug Fix

Researchers Hacked a Model S, But Tesla’s Already Released a Patch

If you were CEO of a car manufacturer, which of these headlines would you rather were written about you? The first speaks of a tired, old manufacturing model where fixes take months and involve expense and inconvenience. The second speaks of a nimble model more reminiscent of a smart phone than a car.

You might be thinking you'd rather not have either and, of course, that's true. But failure is inevitable, you can't avoid it. So mean-time-to-recovery (MTTR) is more important than mean-time-between-failures (MTBF) in the modern world. Tesla demonstrated that by not just having a fix, but by being able to deliver it over the air without inconvenience to their owners. If you're a Tesla owner, you might have been concerned for a few hours, but right now you're feeling like the company was there for you. Meanwhile Jeep owners are still wondering how this will all roll out.

The difference? Tesla is a software company. Jeep isn't.

Tesla can do over-the-air updates because the ideas of continuous delivery and after-sale updates are part of their DNA.

No matter what business you're in, there's someone, somewhere figuring out how to use software to beat or disrupt you. We've seen this over and over again with things like Uber, Fedex, Walmart, and other companies that have used IT expertise to gain an advantage their competitors didn't take.

Being a software company requires a shift in your mindset. You have to stop seeing IT as the people who run the payroll system and make the PCs work. IT has to be part of the way you compete. In other words, software isn't just something you use to run your company. Software becomes something you use to beat the competition.