Personal Learning Systems and Life-Long Learning


Over the years I've been interested in personal APIs, personal data stores, and personal clouds. If you've followed my work at all, you're aware that I care a lot about creating systems that are owned by the people who use them and give people control over how they work and how their data is used and shared. This is what sets personal apart from personalized. Personalized means that someone else's system recognizes me and uses that as context to change the data it shows me. Fine as far as it goes. Personal means that I own and control the system, the processing, and the data.

The best news is that personal systems are not at odds with enterprise and vendor provided services. As we'll see below, personal systems augment other systems and provide them with more accurate and more plentiful data.

In my present job, personal means thinking about how people relate to what we call learning management systems (LMS). Learning management systems are great, but they don't really do all that much for learners. And I think it's a mistake to try and build systems that work for students and schools alike. I think we're better off building different systems for each party and letting them talk to each other using APIs.

Learning Management Systems

Over the years, I've used a lot of different systems to provide online learning aids for my students. My first attempt was a class Web site I put together for CS330 at BYU in 1994. After blogging software came along I mostly used a blog as the Web site for the course and augmented it with things from Github, Blackboard, LearningSuite, and even Moodle. I'm lazy, so I can't say it was ever awesome, but it was functional so far as students finding the class schedule and the resources I put together for class.

In one way or another, they were all what today we call a "learning management system" or LMS. The perspective was me, the instructor, managing the class by controlling the communication, resources, quizes, etc. that the students used to learn. My focus was on my class.

Lately, I've been thinking about this whole thing from the perspective of the learner--the student. Students don't just have my class, they have a complete schedule. Their goal is to participate in the various learning activities that their instructors have planned and complete a set of courses. This starts when a student plans their class schedule and continues until the courses are over.

Beyond that, the student may have their own learning agenda outside of an organized school. That should seamlessly mix in with more formal content. And there's no need to envision this process with fixed start and end times. Students should be able to start and finish classes or even smaller modules on a flexible schedule.


The end result of all these learning activities are various artifacts that range from read papers, discussion comments, completed assignments, quizzes, projects, and so on to credentials and degrees. In a traditional LMS, all this work disappears unless the student saves it somehow. Only grades, degrees, and other credentials are retained by the institution.

There's a lot of talk about ePortfolios in higher education. Like the LMS, these are often thought of from the institution's perspective. Here's why.

Departments, colleges and universities go through a process called accreditation on a periodic basis where outside visitors audit the program or even the entire institution to ensure it's meeting certain desired metrics. One of the things that happens in an accreditation visit is a review of whether courses and programs can show evidence that students who complete the course of study have met the learning objectives the institution has outlined for the course or program.

Traditionally this entails collecting and saving samples of student work across the range of performance and a range of time. Years ago, this would result in the department conference room table covered with stacks containing paper copies of exams, assignments, etc. for accreditors to review. Now it's mostly online. And, not surprisingly, people are less forgiving of missing evidence...but that's another rant.

People who worry about accreditation love the idea of ePortfolios because they imagine that if the results of learning activities are captured in the student's portfolio like so much digital exhaust, then the evidence for accreditation could be gathered with the push of a button.

Another reason institutions love ePortfolios is for purposes of assessment. Student-created artifacts are graded to assess student progress and, eventually, competency in the course's stated objectives. Again, institutions gather all this student-created work and an ePortfolio seems like it would ease that burden by organizing it in a consistent way.

I believe what we call an ePortfolio is really three different systems:

  1. An institutional system for accreditation purposes
  2. An institutional system for grading purposes
  3. A personal system for the student's own record

We make a mistake when we conflate these three purposes. From a DDD perspective, they are three different domains with different contexts and languages.

A Personal Learning System

Lots has been written about the enterprise-side of this problem. I'd like to focus on what I call a personal learning system (PLS). The following diagram outlines some ideas for what a PLS might look like. The student is at the center, orchestrating the learning that takes place by:

  • Completing learning activities
  • Reviewing achievements
  • Choosing what the share and with whom to share it
  • Choosing what to learn
  • Reviewing and acting on feedback
  • Managing the learning plan and resources
personal learning system

The student owns and uses a PLS that contains both a learning dashboard and a portfolio. I like to think of the student transforming learning objectives and resources in the dashboard into completed activities and competencies in the portfolio as she completes various learning activities. I see the PLS dashboard and the portfolio as mirror images of each other, reflected across the now; things to do and things done.

The dashboard gets feedback from the portfolio that can be used to inform the student of the best next steps for the student to accomplish her objectives. This may involve changing the detailed learning plan, redoing work, skipping some things, etc.

Both the dashboard and the portfolio have an API that is part of the student's personal API. This API provides access to other outside systems. The dashboard is the repository and interface to the student's personally selected objectives, plan, and resources. All of these interact with external learning guides that might include:

  • instructors,
  • peers and social groups,
  • digital textbooks,
  • learning management systems, and
  • employers

Anyone can affect the student's PLS dashboard if she has granted them permission to access to her personal API.

On the portfolio side, the student's personal API provides permissioned access to external organizations and systems that might include:

  • institutional ePortfolio systems,
  • learning management systems,
  • parents,
  • social circles and groups,
  • instructors,
  • personal coaches, and
  • employers

Again, these organizations and systems connect to the portfolio via the student's personal API and only with permission.

There may or may not be interaction between the learning guides and other external systems. It's likely, for example, that the student would give an external LMS that they're using permission to access both the dashboard and the portfolio. Similarly with employers.

There's no reason the PLS would talk to just one LMS. The learner may be using multiple systems for various purposes. The PLS would present all of these in a single view.

Being Personal Enables Life-Long Learning

By far, the most interesting aspect of the PLS is its focus on personal. Traditionally both LMS and ePortfolio systems have been enterprise systems, bought and installed by an institution for their purposes.

The PLS, on the other hand, is chosen and controlled by the learner. The institution, and others, link to and use the PLS, but the student is in control.

Focusing on personal is why the PLS is such a good tool for life-long learning. By giving students their own learning system and teaching them to use it, we make them responsible for their learning and teach them skills that will enable them to learn new things without institutional support. Moreover, we put the student at the center of their learning experience as an active participant rather than a passive consumer.


A personal learning system is not:

  • an assessment engine—assessment would be provided by external systems.
  • a content repository—learning resources might come from many sources.
  • a group interaction system—chat, discussion, etc. happens elsewhere
  • courseware—specialized texts, exercises, and simulations live outside the PLS
  • a course—course content, syllabus, objectives, assignments, and assessments live in an LMS

In short, there's still plenty of room in a world of personal learning systems for learning management systems, grade books, schools, and instructors. The PLS augments these systems by extending the realm of the portfolio to include activities yet to be done rather than merely recording and what's done and storing those artifacts.

A Pilot Proposal

The BYU Domains project, our version of Domain of One's Own, is a good place to build a prototype PLS. BYU Domains is a CPanel system, so you could imagine a PLS that installs via CPanel and interfaces with institutional systems via an OAuth-mediated API.

If you're interested in these ideas or have comments and suggestions, I invite you to contact me.

The ideas in this post have been influenced by numerous discussions with Kelly Flanagan and Troy Martin in the Office of the CIO at Brigham Young University.

Using the Scatter-Gather Pattern to Asynchronously Create Fuse Reports


Three Wishes

Fuse is an open-source, connected-car platform that I used use to experiment with techniques for building a true Internet of Things.

Fuse is built on a platform that supports persistent compute objects, or picos. Picos are an Internet-native, reactive programming system that supports the Actor model. You program them using a rule language called KRL. You can read more about picos here.

Fuse sends a periodic (weekly for now) report to the fleet owner providing all the details of trips and fuel fillups for each vehicle in the reporting period. The report also aggregates the detail information for each vehicle and then for the fleet. Here's the start of a weekly report for my fleet:

Fuse Weekly Report

The owner is represented by a pico, as are the fleet and each of the vehicles. Each of these picos is independent, network addressable, stores data for itself, and executes processes that are defined by functions and rules. They can respond to requests (via functions) or events (via rules). They communicate with each other directly without intermediation.

Synchronous Request-Response Solution

The most straightforward way to create a report, and the one I used initially, is for the fleet to make a request of each of its vehicles, asking them to compile details about trips and fillups and return the resulting JSON structure. Then the fleet formats that and sends an event to the owner pico indicating the report is ready to email. That process is represented by the following diagram. The methods for coding it are straightforward and will be familiar to anyone who's used an API.


The owner pico kicks everything off by sending the periodic_report event to the fleet pico. The fuse_periodic_report rule in the fleet pico calls a function in the fleet pico called fleetDetails() that makes synchornous requests to each of the vehicles over HTTP using their API. Once all the vehicles have responded, the rule formats the report and tells the owner pico its ready via the periodic_report_ready event.

This works pretty well so long as the vehicles respond in a timely manner. For performance reasons, I have the HTTP timeouts set fairly short, so any big delay causes a vehicle to get missed when a request for its details times out. For people with a few vehicles in their fleet, it's fairly rare for this to happen. But with lots of vehicles, the chances go up. Somewhere around 10 vehicles in the fleet and your chances of at least one vehicle timing out get fairly good.

If my only tool was synchronous request-response-style interactions, then this would be a pretty big problem. I could increase the time out, but that's a bandaid that will only mask the problem for a while. I could make the vehicleDetails() function more performant, but that's a lot of work for reasons having to do with how the underlying platform does queries in Mongo. So that's a can of worms I'd rather not open now. Besides, it's still possible for something to get delayed due to network latency or some other problem regardless of how fast the underlying platform is.

Scatter-Gather Solution

A more entertaining and intellectually interesting solution is to use a scatter-gather pattern of rules to process everything asynchronously.

Vaughn Vernon describes the scatter-gather pattern on page 272 of his new book Reactive Messaging Patterns with the Actor Model1. Scatter-gather is useful when you need to get some number of picos to do something and then aggregate the results to complete the computation. That's exactly the problem we face here: have each vehicle pico get its trip and fillup details for the reporting period and then gather those results and process them to produce a report.

The diagram below shows the interactions between picos to create the report. A few notes about the diagram:

  • Each pico in the diagram with the same name is actually the same pico, reproduced to show the specific interaction at a given point in the flow.
  • The rules send events, but only to a pico generally, not to a specific rule. Each pico provides an event bus that rules use to subscribe to events. Any number of rules can be listening for a given event.
  • There are no requests (function calls) in this flow, only asynchronous events.

Here's how it works.

The owner pico kicks everything off by sending the request_periodic_report event to the fleet. Because events are asynchronous, after it does so, it's free to do other tasks. The start_periodic_report rule in the fleet pico scatters the periodic_vehicle_report event to each vehicle in the fleet, whether there's 1 or 100. Of course, these events are asynchronous as well. Consequently they are not under time pressure to complete.

When each vehicle pico completes, it sends a periodic_vehicle_report_created event to the fleet pico. The catch_vehicle_reports rule is listening and gathers the reports. Once it's added the vehicle report, it fires the periodic_vehicle_report_added event. Another rule in the fleet pico, check_report_status is checking to see if every vehicle has responded. When the number of reports equals the number of vehicles, it raises the periodic_report_data_ready event and the data is turned into a report and the owner pico is notified it's ready for emailing.

Some Messy Details

You might have noticed a few issues that have to be addressed in the preceding narrative.

First, while unlikely, it's possible that the process could be started anew before the first process has completed. To avoid clashes and keep the events and data straight, each report process has a unique report correlation number (rcn) Each report is kept separate, even if multiple reports are being processed at the same time. This is not strictly necessary for this task since reports run once per week and are extremely unlikely to overlap. But it's a good practice to use correlation numbers to keep independent process flows independent.

Second, the check_report_status uses events from the vehicle picos to determine when it's done. But event delivery is not guaranteed. If one or more vehicle picos fail to produce a vehicle report, then no fleet report would be delivered to the owner. There are several tactics we could use:

  • We could accept the failure and tell owners that sometimes reports will fail, possibly giving them or someone else the opportunity to intervene manually and regenerate the report.
  • We can set a timeout and continue, generating a report with some missing vehicles.
  • We can set a timeout and reissue events to vehicle picos that failed to respond. This is more complicated because in the event that the vehicle pico still fails to respond after some number of retries, we have to adopt the strategy of continuing without the data.

I adopted the second strategy. Picos have the ability to schedule events for some future time (either once or repeating). I chose 2 minutes as the time out period. That's plenty long enough for the vehicles to respond.

This idea of creating timeouts with scheduled events is very important. Unlike an operating system, picos don't have an internal timer tick. They only respond to events. So it's up to the programmer to determine when a timer tick is necessary and schedule one. While it's possible to use recurring scheduled events to create a regular, short-delay timer tick for a pico, I discourage it because its generally unnecessary and wastes processing power.


Using the scatter-gather pattern for generating reports adds some complexity over the synchronous solution. But that point is moot since the synchronous solution fails to work reliably. While more complicated, the scatter-gather solution is only a handful of additional rules and none of them are very long (142 additional lines of code in total). Each rule does a single, easy-to-understand task. Using the scatter-gather solution for generating reports increases the reliability of the report generating system at an acceptable cost.

The scatter-gather solution makes better use of resources since the fleet pico isn't sitting around waiting for all the vehicles to complete before it does other important tasks. The fleet pico is free to respond to other events that may come up while the vehicles are completing their reports.

The concurrent processing is done without locks of any kind. Because each pico is independent, they have no need of locks when operating concurrently. The fleet pico could receive events from multiple vehicles, but they are queued and handled in turn. Consequently, we don't need locks inside the pico either. Lockless concurrency is a property of Actor-model systems like picos.

In general, I'm pretty happy with how this works and it was fun to think about. Next time I'm faced with a similar problem, scatter-gather will be my first choice, not the one I use after the synchronous solution fails.


  1. I recommend Vaughn's book for anyone interested in picos. While the language/framework (Scala and Akka) is different, the concepts are all very similar. There's a lot of good information that can be directly applied to programming picos.

Asynchronous Boundaries and Back Pressure

Vanising Train

A significant component of reactive programming is asynchronous boundary between the message sender and receiver. The problem is that the receiver might not work as fast as the sender and thus fall behind. When this happens the sender can block (blocking), the event bus can throw messages away (losiness), or the receiver can store messages (unlimited message queue). None of these are ideal.

In response, a key concept from reactive systems is non-blocking back pressure. Back pressure allows queues to be bounded. One issues is that back pressure can't be synchronous or you lose all the advantages. Another is that if the sender doesn't have anything else to do (or can't do it easily) then you effectively get blocking.

Picos, as implemented by KRL, are lossy. They will queue requests, but the queue is, obviously finite, and when it reaches its limits, the event sender will receive a 5XX error. This could be interpreted as back-pressure, a NACK or sorts. But nothing in the krl event:send() action is set up to handle the 5XX gracefully. Ideally, the error code out to be something the sender could understand and react to.

Regaining Control of Our Data with User-Managed Access


User control is central tenant of any online world that most of us will want to live in. You don't have to consider things like surveillance-based marketing or devices that spy on us long to realize that a future that's an extrapolation of what we have now is a very real threat to personal autonomy and freedom.

A key part of the answer is developing protocols that make it easy to give control to users.

The Limitations of OAuth

Even if you don't know what OAuth is, you've likely used it. Any time you use Twitter, Facebook, or some other service to login into some other Web site or app, you're using OAuth. But logging in is only part of the story. the "auth" in OAuth doesn't stand for "authentication" but "authorization." For better or worse, what you're really doing when you use OAuth is your granting permission for the Web site or app that's requesting you "log in with Facebook" to access your Facebook profile.

Exactly what you're sharing depends on what the relying site is asking for and what you agree to. There's usually a pop-up or something that says "this app will be able to...." that you probably just click on to get it out of the way without reading it. For a fun look at the kinds of stuff you might be sharing, I suggest logging into Take This Lollipop.

But while I think we all need to be aware that we're granting permissions to the site asking us to "log in," my purpose isn't to scare you or make you think "OAuth is bad." In fact I think OAuth is very, very good. OAuth gives all of us the opportunity to control what we share and that's a good thing.

OAuth is destined to grow as more and more of us use services that provide or need access to APIs. OAuth is the predominant way that APIs let owners control access to another service.

But OAuth has a significant limitation. If I use OAuth to grant a site to access Twitter, the fact that I did so and my dashboard for controlling it is at Twitter. Sounds reasonable until you imagine OAuth being used for lots of things and the user having dozens of dashboards for controlling permissions. "Let's see...did I permission this site to access my Twitter profile? Facebook? BYU?" I've got to remember and go to each of them separately to control the permission grants. And because each site is building their own, they're all different and most aren't terribly sophisticated, well-designed, or easy to find.

User-Managed Access to the Rescue

The reason is that while OAuth conceptually separates the idea of the authorization server (AS, the place granting permission) and the resource server (RS, the thing actually handing out data), it doesn't specify how they interact. Consequently everyone is left to determine that for themselves. So there's really no good way for two resources, for example, to use a single authorization server.

That's where UMA, or User-Managed Access, comes in. UMA specifies the relationship between the AS and RS. Further, UMA envisions that users could have authorization servers that are independent of the various resources that they're granting permission to access. UMA has been a topic at Internet Identity Workshop and other places for years, but it's suddenly gotten very real with the launch of ForgeRock's open-source OpenUMA project. Now there's code to run!

Side note: If you're a developer you can get involved in the UMA developer working group as well as the OpenUMA effort depending on whether your interests lie on the client or server side.

With UMA we could each have a dashboard, self-hosted or run by the vendor of our choice, where we control access permissions. This may sound complicated, like a big mixing board, but it doesn't have to be. Once there's a single place for managing access, it's easier for default policies and automation to take over much of the busy work and give owners better control at the same time.

UMA and Commerce

Doc Searls coined the term "vendor relationship management" or VRM years ago as a play on the popular customer relationship management (CRM) tools that businesses use to manage sales and customer communications. It's a perfect example of the kind of place where UMA could have a big impact. VRM is giving customers tools for managing their interactions with vendors. That sounds, in large part, like a permissioning task. And UMA could be a key piece of technology for unifying various VRM efforts.

Most of us hate seeing ads getting in the way of what we're trying to do online. The problem is that even with the best "targeting" technology, most of the ads you see are wasted. You don't want to see them. UMA could be used to send much stronger signals to vendors by granting permission for them to access information would let them help me and, in the process, make more money.

For example, I've written about social products. Social product provide a link back to their manufacturer, the retailer who sold them, the company who services them, and so on. These links are permissioned channels that share information with companies that tell them what products and services I need.

UMA is a natural fit for managing the permissions in a social product scenario, giving me a dashboard where I can manage the interactions I have with vendors, grant permission for new vendors to form a relationship, and run policies on my behalf that control those interactions.

Gaining Control

I'm very bullish on UMA and its potential to impact how we interact with various Web sites and apps. As the use of APIs grows there will be more and more opportunity to mix and mash them into new products and services. UMA is in a good position to ensure that such efforts don't die from user fatigue trying to keep track of it all or, worse, fear that they're losing control of their personal data.

Culture and Trustworthy Spaces

Karneval der Kulturen|Carnival of Cultures

In Social Things, Trustworthy Spaces, and the Internet of Things, I described trustworthy spaces as abstract places where various "things" could come together to accomplish tasks that none of them could do on their own.

For example, in that post I posit a scenario where a new electric car needs to work with other things in its owner's home to determine the best times to charge.

The five properties I discussed for trustworthy spaces were decentralized, event-driven, robust, anti-fragile, and trust-building. But while I can make points about why each of these is desirable in helping our car join a trustworthy space and carry out negotiations, none of them speak to how the car or space will actually do it.

In Systems Thinking, Jamshid Gharajedaghi discusses the importance of culture in self-organizing systems. He says "cultures act as default decision systems." Individuals identify with particular cultures when their self-image aligns with the shared image of a community.

Imagine a trustworthy space serving as a community for things that belong to me and use a lot of power. That space has default policies for power management negotiations. These aren't algorithms, necessarily, but heuristics that guide interactions between members.

In its turn, the car has a power management profile that defines part of its self-image and so it aligns nicely with the shared image of the power management space. Consequently, when the car is introduced to the household, it gravitates to the power management space because of the shared culture. It may join other spaces as well depending on its self image and their culture.

My description is short on detail about how this culture is encoded and how things discover the cultures of spaces upon being introduced to the household, but it does provide a nice way to think about how large collections of things could self organize and police themselves.

Gharajedaghi defines civilization as follows:

[C]ivilization is the emergent outcome of the interaction between culture (the software) and technology. Technology is universal, proliferating with no resistance, whereas cultures are local, resisting change with tenacity.

I like this idea of civilization emerging from a cultural overlay on our collections of things. By finding trustworthy spaces that are a cultural fit and then using that culture for decision making within a society of things, our connected things are tamed and become subject to our will.

Resources, Not Data

Rest Area?

You'll often hear people explain the mainstay HTTP verbs, GET, POST, PUT, and DELETE, in terms of the venerable CRUD (create, retrieve, update, and delete) functions of persistent storage systems. Heck, I do it myself. We need to stop.

In a RESTful API, the HTTP verbs are roughly analogous to the CRUD functions, but what they're acting on is quite different. CRUD functions act on data...static, stupid data. In REST, on the other hand, the verbs act on resources. While there are cases where a resource is just static data, that case is much less interesting than the general case.

To see how, let's pull out the old standby bank account example. In this example, I have a resource called /accounts and in a CRUD world, you could imagine deposits and withdrawals to an account with identifier :id being PUTs on the /accounts/:id resource.

Of course, we'd never expose an API where you could update an account balance with a PUT. In fact, I can't imagine anything you'd do with the account balance in such an API except GET it. There are too many necessary checks and balances (what we call "model invariants") that need to be maintained by the system.

Instead, what we'd do is design an account transfer resource. When we wanted to transfer $100.00 from /accounts/A to /accounts/B, we'd do this:

POST /transfers

  source: /accounts/A,
  destination: /accounts/B,
  amount: 100.00

This creates a new transfers resource and while it's true that data will be recorded to establish that a new transfer was created, that's not why we're doing it. We're doing it to effect the transfer of money between two accounts. Underneath this resource creation is a whole host of processes to maintain the transactional integrity and consistency of the bank's books.

Interesting resources have workflow rather than just being a collection of data. So stop focusing on the HTTP verbs and think about the resources instead. REST is resource-oriented and that doesn't just nicely map to objects, relational databases, and remote procedure calls. Most bad APIs are a result of this mistaken attempt to understand it in terms of old programming paradigms.

Tesla is a Software Company, Jeep Isn't

Tesla Sightings

Marc Andreessen has famously said that "software is eating the world." Venkatesh Rao calls software, "only the third major soft technology to appear in human civilization."

"So what?" you say. "I'm not in software, what do I care?"

You care, or should, because the corollary to this is that your company is a software company, whether you like it or not. Software is so pervasive, so important that is has or will impact every human activity.

The recent hacks of a Jeep Cherokee and Tesla Model S provide an important example of what it means to be a software company—even if you sell cars. Compare these headlines:

After Jeep Hack, Chrysler Recalls 1.4M Vehicles for Bug Fix

Researchers Hacked a Model S, But Tesla’s Already Released a Patch

If you were CEO of a car manufacturer, which of these headlines would you rather were written about you? The first speaks of a tired, old manufacturing model where fixes take months and involve expense and inconvenience. The second speaks of a nimble model more reminiscent of a smart phone than a car.

You might be thinking you'd rather not have either and, of course, that's true. But failure is inevitable, you can't avoid it. So mean-time-to-recovery (MTTR) is more important than mean-time-between-failures (MTBF) in the modern world. Tesla demonstrated that by not just having a fix, but by being able to deliver it over the air without inconvenience to their owners. If you're a Tesla owner, you might have been concerned for a few hours, but right now you're feeling like the company was there for you. Meanwhile Jeep owners are still wondering how this will all roll out.

The difference? Tesla is a software company. Jeep isn't.

Tesla can do over-the-air updates because the ideas of continuous delivery and after-sale updates are part of their DNA.

No matter what business you're in, there's someone, somewhere figuring out how to use software to beat or disrupt you. We've seen this over and over again with things like Uber, Fedex, Walmart, and other companies that have used IT expertise to gain an advantage their competitors didn't take.

Being a software company requires a shift in your mindset. You have to stop seeing IT as the people who run the payroll system and make the PCs work. IT has to be part of the way you compete. In other words, software isn't just something you use to run your company. Software becomes something you use to beat the competition.

Authorization, Workflow, and HATEOAS

APIs present different authorization challenges than when people access a Web site or other service. Typically, API access is granted using what are called "developer keys" but are really an application specific identifier and password (secret). That allows the API to track who's making what call for purposes of authorization, throttling, or billing.

Often, more fine-grained permissioning is needed. If the desired access control is for data associated with a user, the API might use OAuth. OAuth, is sometimes called "Alice-to-Alice sharing" because it's a way for a user on one service to grant access to their own account at some other service.

For more fine-grained authorization than just user-data control, I'm a proponent of policy-engine-based access control to resources. A policy engine works in concert with the API manager to answer questions like "Can Alice perform action X on resource Y?" The big advantages of a policy engine are as follows:

  • A policy engine allows access control policy to be specified as pattern-based declarations rather than in algorithms embedded deep in the code.
  • A policy engine stops access at the API manager, saving resources below the manager from being bothered with requests that will eventually be thrown out.

Recently, Joel Dehlin at got me thinking of another pattern for API access control that relies on workflow.

Consider an API for course management at a university. The primary job of the course management API is to serve as a system or record for courses that the university teaches. There are lots of details about how courses relate to each other, how they're associated programs, assigned to departments, expected learning outcomes, and so on. But we can ignore that for now. Let's just focus on how a course gets added.

The university doesn't let just anyone add classes. In fact, other than for purposes of importing data in bulk, no one has authority to simply add a class. Only proposals that have gone through a certain workflow and received approvals required by the university's procedure can be considered bonafide courses.

So the secretary for the Univerity Curriculum Committee (UCC) might only be allowed add the class if it's been proposed by a faculty member, approved by the department, been reviewed by the college, and, finally, accepted by the UCC. That is, the secretary's authorization is dependent on the current state of the proposal and that state includes all the required steps.

This is essentially the idea of workflow as authorization. The authorization is dependent on being at the end of a long line of required steps. There could be alternative paths or exceptions along the way. At each step along the way, authorization to proceed is dependent on both the current state and the attributes of the person or system taking action.

In the same way that we'd use a policy engine to normalize the application of policy for access control, we can consider the use of a workflow engine for many of the same reasons:

  • A general-purpose workflow engine makes the required workflow declarative rather than algorithmic.
  • Workflow can be adjusted as procedures change without changing the code.
  • Declarative workflow specifications are more readable that workflow hidden in the code.
  • A workflow engine provides a standard way for developers to create workflow rather than requiring every team to make it up.

One of our principles for designing the University API at BYU is to keep workflow below the API since we can't rely on clients to enforce workflow requirements. What's more, developers writing the clients don't want that burden. As we contemplated how best to put the workflow into the API, we determined that HATEOAS links were the best option.

If you're not familiar with HATEOAS, it's an awkward acronym for "hypertext as the engine of application state." The idea is straightforward conceptually: your API returns links, in addition to data, that indicate the best ways to make progress from the current state. There can be more than one since there might be multiple paths from a given state. Webber el. al.'s How to GET a Cup of Coffee is a pretty good introduction to the concept.

HATEOAS is similar to the way web pages work. Pages contain links that indicate the allowed or recommended next places to go. Users of the web browser determine from the context of the link what action to take. And thus they progress from page to page.

In the API, the data returned from the API contains links that are the allowed or recommended next actions and the client code uses semantic information in rel tags associated with each link to present the right choices to the user. The client code doesn't have to be responsible for determining the correct actions to present to the user. The API does that.

Consider the application of HATEOAS to the course management example from above. Suppose the state of a course is that it's just been proposed by a faculty member. The next step is that it needs approval by the department chair. GETting the course proposal via the course management API returns the data about the proposed course, as expected, regardless of who the GET is for. What's different are the HATEOAS links that are also returned:

  • For the faculty member, the links might allow for updating or deleting the proposal.
  • For the department chair, the links might be for approving or rejecting the course proposal.
  • For anyone else, the only link might be to return a collection of proposals.

Seen this way, a workflow engine is a natural addition to an API management system in the same way a policy engine is. And HATEOAS becomes something that can be driven from the management tool rather than being hard coded in the underlying application. I'm interested in seeing how this plays out.

Social Things, Trustworthy Spaces, and the Internet of Things

20110529 Bee Swarm-3

Humans and other gregarious animals naturally and dynamically form groups. These groups have frequent changes in membership and establish trust requirements based on history and task. Similarly, the Internet of Things (IoT) will be built from devices that be must be able to discover other interesting devices and services, form relationships with them, and build trust over time based on those interactions. One way to think about this problem is to envision things as social and imagine how sociality can help solve some of the hard problems of the IoT.

Previously I've written about a Facebook of Things and a Facebook for My Stuff that describe the idea of social products. This post expands that idea to take it beyond the commercial.

As I mentioned above, humans and other social animals have created a wide variety of social constructs that allow us to not only function, but thrive in environments where we encounter and interact other independent agents—even when those agents are potentially harmful or even malicious. We form groups and, largely, we do it without some central planner putting it all together. Individuals in these groups learn to trust each other, or not, on the basis of social constructions that have evolved over time. Things do fail and security breaks down from time to time, but those are exceptions, not the rule. We're remarkably successful at protecting ourselves from harm and dealing with anomalous behavior from other group members or the environment, while getting things done.

There is no greater example of this than a city. Cities are social systems that grow and evolve. They are remarkably resilient. I've referenced Geoffrey West's remarkable TED talk on the surprising math of cities and corporations before. As West says "you can drop an atom bomb on a city and it will survive."

The most remarkable thing about city planning is perhaps the fact that cities don't really need planning. Cities happen. They are not only dynamic, but spontaneous. The greatness of a city is that it isn't planned. Similarly, the greatness of IoT will be in spontaneous interactions that no one could have foreseen.

My contention is that we want device collections on the Internet of Things to be more like cities, where things are heterarchical and spontaneous, than corporations, where things are hierarchical and planned. Where we've built static networks of devices with a priori determined relationships in the past, we have to create systems that support dynamic group forming based on available resources and goals. Devices on the Internet of Things will often be part of temporary, even transient, groups. For example, a meeting room will need to be constantly aware of its occupants and their devices so it can properly interact with them. I'm calling these groups of social things "trustworthy spaces."

My Electric Car

As a small example of this, consider the following example: suppose I buy an electric car. The car needs to negotiate charging times with the air conditioner, home entertainment system, and so on. The charging time might change every day. There are several hard problems in that scenario, but the one I want to focus on is group forming. Several things need to happen:

  • The car must know that it belongs to me. Or, more generally, it has to know it's place in the world.
  • The car must be able to discover that there's a group of things that also belong to me and care about power management.
  • Other things that belong to me must be able to dynamically evaluate the trustworthiness of the car.
  • Members of the group (including the car) must be able to adjust their interactions with each other on the basis of their individual calculations of trustworthiness.
  • The car may encounter other devices that misrepresent themselves and their intentions (whether due to fault or outright maliciousness).
  • Occasionally, unexpected, even unforeseen events will happen (e.g. a power outage). The car will have to adapt.

We could extend this situation to a group of devices that don't all belong to the same owner too. For example, I'm at my friend's house and want to charge the car.

The requirements outlined above imply several important principles:

  • Devices in the system interact as independent agents. They have a unique identity and are capable of maintaining state and running programs.
  • Devices have a verifiable provenance that includes significant events from their life-cycle, their relationships with other devices, and a history of their interactions (i.e. a transaction record).
  • Devices are able to independently calculate and use the reputation of other actors in the system.
  • Devices rely on protecting themselves from other devices rather than a system preventing bad things from happening.

I'm also struck that other factors, like allegiance, might be important, but I'm not sure how at the moment. Provenance and reputation might be general enough to take those things into account.

Trustworthy Spaces

A trustworthy space is an abstract extent within which a group of agents interact, not a physical room or even geographic area. It is trustworthy only to the extent an individual agent deems it so.

In a system of independent agents, trustworthiness is an emergent property of the relationships among a group of devices. Let's unpack that.

When I say "trustworthiness," that doesn't imply a relationship is trustworthy. The trustworthiness might be zero, meaning it's not trusted. When I say "emergent," I mean that this is a property that is derived from other attributes of the relationship.

Trustworthy spaces don't prevent bad things from happening, any more than we can keep every bad thing from happening in social interactions. I think it's important to distinguish safety from security. We are able to evaluate security in relatively static, controlled situations. But usually, when discussing interactions between independent agents, we focus on safety.

There are several properties of trustworthy spaces that are important to their correct functioning:


By definition, a trustworthy space is decentralized because the agents are independent. They may be owned, built, and operated by different entities and their interactions cross those boundaries.


A trustworthy space is populated by independent agents. Their interactions with one another will be primarily event-driven. Event-based systems are more loosely coupled than other interaction methodologies. Events create a networked pattern of interaction with decentralized decision making. Because new players can enter the event system without others having to give permission, be reconfigured, or be reprogrammed, event-based systems grow organically.


Trustworthy spaces are robust. That is they don't break under stress. Rather than trying to prevent failure, systems of independent agents have to accept failure and be resilient.

In designed systems we rely on techniques such as transactions to ensure that the system remains in a consistent state. Decentralized systems rely on retries, compensating actions, or just plain giving up when something doesn't work as it should. We have some experience with this in distributed systems that are eventually consistent, but that's just a start at what the IoT needs.

Inconsistencies will happen. Self-healing is the process of recognizing inconsistent states and taking action to remediate the problem. Internal monitoring by the system of anything that might be wrong and then taking corrective action has to be automatic.


More than robustness, antifragility is the property systems exhibit when they don't just cope with anomalies, but instead thrive in their presence. Organic systems exhibit antifragility; they get better when faced with random events.

IoT devices will operate in environments that are replete with anomalies. Most anomalies are not bad or even errors. They're simply unexpected. Antifragility takes robustness to the next level by not merely tolerating anomalous activity, but using it to adapt and improve.

I don't believe we know a lot about building systems that exhibit antifragility, but I believe that we'll need to develop these techniques for a world with trillions of connected things.

Trust Building

Trust building will be an important factor in trustworthy spaces. Each agent must learn what other agents to trust and to what level. These calculations will be constantly adjusted. Trust, reputation, and reciprocity (interaction) are linked in some very interesting ways. Consider the following diagram from a paper by Mui et al entitled A Computational Model of Trust and Reputation:

The relationship between reputation, trust, reciprocity, and social benefit

We define reputation as the perception about an entity's intentions and norms that it creates through past actions. Trust is a subjective expectation an entity has about another's future behavior based on the history of their encounters. Reciprocity is a mutual exchange of deeds (such as favor or revenge). Social benefit or harm derives from this mutual exchange.

If you want to build a system where entities can trust one another, it must support the creation of reputations since reputation is the foundation of trust. Reputation is based on several factors:

  • Provenance—the history of the agent, including a "chain of custody" that says where it's been and what it's done in the past, along with attributes of the agent, verified and unverified.
  • Reciprocity—the history of the agent's interaction with other agents. A given agent knows about it's interactions and the outcomes. To the extent they are visible, interactions between other agents can also be used.

Reputation is not static and it might not be a single value. Moreover, reputation is not a global value, but a local one. Every agent continually calculates and evaluates the reputation of every other agent. Transparency is necessary for the creation of reputation.

A Platform for Exploring Social Things

A few weeks ago I wrote about persistent compute objects, or picos. In the introduction to that piece, I write:

Persistent Compute Objects, or picos, are tools for modeling the Internet of Things. A pico represents an entity—something that has a unique identity and a long-lived existence. Picos can represent people, places, things, organizations, and even ideas.

The motivation for picos is to design infrastructure to support the Internet of Things that is decentralized, heterarchical, and interoperable. These three characteristics are essential to a workable solution and are sadly lacking in our current implementations.

Without these three characteristics, it's impossible to build an Internet of Things that respects people's privacy and independence.

Picos are a perfect platform for exploring social products. They come with all the necessary infrastructure built in. Their programmability makes them flexible enough and powerful enough to demonstrate how social products can interact through reputation to create trustworthy spaces.

Benefits of Social Things

Social things, interacting with each other in trustworthy spaces offer significant advantages over static networks of devices:

  • Less configuration and set up time since things discover each other and set up mutual interactions on their own.
  • More freedom for people to buy devices from different manufactures and have them work together.
  • Better overall protection from anomalies, perhaps even systems of devices that thrive in their presence.

Social things are a way of building a true Internet of Things instead of CompuServe of Things.

My thoughts on this topic were influenced by a CyDentity workshop I attended last week put on by the Department of Homeland Security at Rutgers University. In particular, some of the terminology, such as "provenance" and "trustworthy spaces," were things I heard there that gelled with some of my thinking on reputation and social things.

Choosing a Car for it's Infotainment System


Recently when I've rented cars I've increasingly asked for a Ford. Usually a Ford Fusion.

It's true that I like Fords, but that's not why I ask for them when renting. I'm more concerned about a consistent user experience in the car's infotainment system.

I have a 2010 F-150 that has been a great truck. I wrote about the truck and it's use as a big iPhone accessory when I first got it. The truck is equipped with Microsoft Sync and I use it a lot.

I don't know if Sync is the best in-car infotainment system or not. First I've not extensively tried others. Second, car company's haven't figured out that they're really software companies, so they don't regularly update them. I've reflashed the firmware in my truck a few times, but I never saw any significant new features.

Even so, when faced with a rental car, I'd rather get something that I know how to use. Sync is familiar, so I prefer to rent cars that have it. I get a consistent, known user experience that allows me to get more out of the vehicle.

What does this portend for the future? Will we become more committed to the car's infotainment system than we are to the brand itself? Ford is apparently ditching Sync for something else. Others use Apple's system. At CES this past January there were a bunch of them. I'm certain there's a big battle opening up here and we're not likely to see resolution anytime soon.

Car manufacturers don't necessarily get that they're being disrupted by the software in the console. And those that do aren't necessarily equipped to compete. Between the competition in self-driving cars, electric vehicles, and infotainment systems, car manufacturers are in in a pinch.