A System Ruleset for the Pico Engine

FIGURE 10.2 Reinforcing feedback: An increase results in further increase, and vice versa

I have a problem: a long time ago, Kynetx built a ruleset management tool called AppBuilder. There are some important rulesets in AppBuilder. I'd like to shut down AppBuilder, but first I need to migrate all the important rulesets to the current ruleset registry. There's just one tiny thing standing in my way: I don't know which rulesets are the important ones.

Sure, I could guess and get most of them. Then I'd just wait for things to break to discover the rest. But that's inelegant.

My first thought was to write some code to instrument the pico engine. I'd increment a counter each time it loads a ruleset. That way I see what's being used. No guessing. I'd need some way to get stuff in the database and get it out.

But then I had a better idea. Why not write intrumentation data into the persistent variable space of a system ruleset. The system ruleset can access and modify any of these variables. And it's flexible. Rather than making changes to the engine and rolling to production each time I change the monitoring, I update the system ruleset.

Right now, there's just one variable: rid_usage. The current system ruleset is simple. But it's a start. All the pieces are in place now to use this connection for monitoring, controlling, and configuring the pico engine.

I like this idea a lot because KRL is being used to implement important services on the platform that implements KRL. Very meta... And when systems start to be defined in their own language, that's a good thing.


Failure and the Internet of Things

Summer Sprinkler

I'm now on my second Internet-connected sprinkler controller. The first, a Lono, worked well enough although there were some features missing. Last week, I noticed that the program wasn't running certain zones. I wasn't sure what to do and I couldn't find help from Lono, so I decided I'd try a second one. My second purchase based on both friend's recommendations and reviews on Amazon was a Rachio. I installed it on Saturday.

As I was working on setting up the programs and experimenting with them I noticed that the new sprinkler controller had stopped working. When I went to check on it, I discovered that it was completely dead: no lights, no response.

I rebooted the controller and started over. It got to the same point and sprinkler controller died again. A little thought showed that the Rachio sprinkler controller was dying at exactly the same point that the Lono was failing to complete its program. The problem? A short in one of the circuits.

The Lono and the Rachio both fail at handling failure. The old controller, an Irritrol, just dealt with it and kept right on going. None of them, including the Irritrol, did a good job of telling me that I had a short-circuit.

Building sprinkler controllers is a tough job. The environment is dirty and wet. The valves and sensors are numerous and varied. I don't know about you, but it's a rare year I don't replace valve solenoids or rewrite something. A sprinkler controller has to roll with this environment to pass muster. To be excellent, it has to help with debugging and solving the problems.

Fancy water saving features, cool Web sites, and snaky notifications are fine. But they're like gold-plated bathroom fixtures in a hotel room with dirty sheets if the controller doesn't do it's basic job: run the sprinklers reliably.


Fitbit as Gizmo

Fitbit

In the taxonomy of Bruce Sterling's Shaping Things, the Fitbit is a Gizmo.

"Gizmos" are highly unstable, user-alterable, baroquely multifeatured objects, commonly programmable, with a brief lifespan. Gizmos offer functionality so plentiful that it is cheaper to import features into the object than it is to simplify it. Gizmos are commonly linked to network service providers; they are not stand-alone objects but interfaces. People within an infrastructure of Gizmos are "End-Users."

People buy Fitbits believing that they're buying a thing, but in fact, they're buying a network service. The device is merely the primary interface to that service. The Fitbit is useless without the service. Just a chunk of worthless plastic and silicon.

The device is demanding. We buy Fitbits and then fiddle with them incessantly. Again, to quote Bruce:

...Gizmos have enough functionality to actively nag people. Their deployment demands extensive, sustained interaction: upgrades, grooming, plug-ins, plug-outs, unsought messages, security threats, and so forth.

Sometimes we're messing with them cause we're bored and relieve it with a little configuration. Often we're forced to configure and reconfigure because its not working. We feel guilt over buying something we're not using. Usually, the Fitbit ends up in a drawer unused after the guilt wears off and the pain of configuration overwhelms the perceived benefit.

Fitbit isn't selling things. They probably fancy themselves selling better health or fitness. But, Fitbit is really selling a way to measure, and perhaps analyze, some aspect of your life. They package it up like a traditional product and put it on store shelves, but the thing you buy isn't a traditional product. Without the service and the account underlying it, you have nothing.

Of course, I'm not talking about Fitbit alone. Fitbit is just a well-known example. Everything I've said applies to every current product in the so-called Internet of Things. They are all just interfaces to the real product: a networked service. I say "so-called" because a more appropriate name for the Gizmo ecosystem is CompuServe of Things.

Bruce's book is a trail guide to what comes after Gizmos: something he calls a "spime." Spimes are material instantiations of an immaterial system. They begin and end with data. Spimes will have a different architecture than the CompuServe of Things. To work, they will cooperate and interact in true Internet fashion.


Notes from Gluecon 2016

Gluecon Logo2016_Master

I took the following notes during various sessions at Gluecon 2016 at the Omni Interlocken in Broomfield Colorado. Notes were live tweeted during the event on @windley using Kevin Marks' Noter Live tool.

melody meckfessel:

starting off the day with how cloud accelerates innovation in software development

Three waves of cloud tools: Colocation, virtualized data centers, and 3rd wave: actual, global, flexible cloud

Goal is NoOps: auto everything. No need to manage or spin up servers. Write code, rather than manage servers

Kubernetes manages containers, supports multiple envs & container runtimes, 100% open source

Speaking of developers: "We keep raising the bar on ourselves"

Goal is to let developers focus on code. PaaS (e.g. appEngine) needs to evolve

Now: PaaS is a walled garden; Future: choice of tools, more complex apps, global scale

31% time spent troubleshooting; ofter in time-critical situations; need better tools: trace, error reporting, prod debug

Duncan Johnson-Watt:

Up next: building apps on the blockchain

Hyperledger is new project from Linux Foundation.

Business is increasingly interested in permissioned blockchains rather than promiscuous or permissionless blockchains

Governance of blockchain will be incredibly important if we're going to bet on this technology

Requirements for blockchains vary greatly across different use cases

"This is too important to be owned by a single entity" speaking of distributed ledger tech

Hyper ledger has ~50 members, >$6M in funding, 2300 membership requests

Brian Behlendorf is now executive director

Showing how to deploy a blockchain application with Cloudsoft AMP

Key concepts: shared ledger, smart contract, consensus network, membership, events, management, wallet, integration

live demo of asset transfer demo. #brave Challenge is speed at the moment. Proof of asset ownership controls transfer

Mary Scotton:

want to be diverse? Start by diversifying your twitter feed.

Jen Tong:

Crash course in electrical engineering at start of #IoT session

Done with signals; moving on to components. "kind of like legos except sometimes they catch on fire"

bread boards, perf boards, jumpers, resistors, LEDs, push button, capacitor, servo motors

The recipe: put components on bread board, arduino uno converts component signals, feeds to RaspPI over USB

Johnny Five is a JS code for #IoT on Arduino Uno

Firebase is a real-time database; "data that is best served fresh"

Where's the Bus? as an example of real-time data. I care where the bus is now, not yesterday.

Collaborative drawing is another example where real-time matters. Much less interesting with several second lag

When you're working on your Arduino always do it unplugged or you'll be sad.

After getting the button and LED hooked up: we have a thing, but no Internet yet. Let's add Firebase

why live code when you can live copy/paste

Firebase shows button presses in FB console. "Now, let's go through the Internet to change the LED status"

Connects button to her slides. Button adds rick-roll to the slide.

Celebrate the first time something catches on fire

Slides are here: http://mimming.com/presos/internet-of-nodebots/index.html#/

Alex Balazs:

speaking on how Intuit is breaking up the monolith

Intuit moving everything to AWS; shutting down local data centers

TurboTax is a $2 Billion business; product managers don't want to touch it (other than updating tax logic)

our vision is to make tax prep obsolete; this makes TurboTax irrelevant

20yo tech stack; terrible, horrible, monolith. Written by tax specialists who became programmers.

Going beyond the interview to personal experience. Why as 20yo barista in NYC if they get California RR Retirement?

Can't replace TurboTax by creating something complex to replace it. #FAIL'd twice already. #gallslaw

2nd problem: trying to create better TurboTax instead of creating product to kill TurboTax

Bearing up the TurboTax monolith: everything as a service; quickly create frictionless experiences

Teams work at their own speed; teams are decoupled; services built for other teams.

path forward to create a pirate ship. Everyone wants to be a pirate.

pirate ship means "this was not a sanctioned project"

took the narrowest part of TurboTax: vehicle registration, 3 screens; TurboTax interview has 53K screens

hardest problem to solve in TurboTax: what does back button do.

back button takes you back a screen; should it save the data or not?

Built vehicle registration in 4 weeks and pushed to production; Sr leadership then sanctioned project;

Old stack: changing user experience took 3 months. New stack: 1 week.

Now 14 most common topics in TurboTax are running on new stack

In a world with 50K interview screens, you can build them manually. Intuit has a "tax player" for tax content

Intuit's ability to enter markets on new devices skyrocketed

Old product had 6 different "beaconing" libraries

In three years Intuit will have eliminated every line of code from the monolith and be completely service based

1. Everything as a service 2. Attack the monolith; 3. Build common application fabric (prescriptive on standards)

Rajesh Raman:

Time is Hard: Doing Meaningful Things with Data

Doing meaningful things with fast data

fast data continually reflects changing state; enables real time decision making

Time data often implies big data

Ex: sentiment analysis on Twitter; seismic sensor networks; data fusing from distributed sensors (phones in cars)

individual records are small; all have timestamps; repeated measurements yield time series

Value of time series data diminishes over time; 2 strategies: store nothing & store everything

tiered storage: recent data at high fidelity; older data at low fidelity; store analysis not raw sensor data

Batch processing is common, but reduces responsiveness

Alternative is stream processing; stream processor is stateful and incremental; typically using O(1) algorithm

stream processing: read-once, write-once. No do overs.

stream processing is not only far more timely, but also more efficient than batch processing.

BUT: you lose ability to rewind and do a redo.

Good news: simple primitives take you a long way; bad news: dealing with time is hard

merging streams by timestamp; skewed, irregular, bursty, laggy, jittery, lossy

skew: data from different time series arrive at different timestamps

irregular: aperiodic or unsteady periodicity

bursty: no activity for while, then all arrive at once

laggy: difference between generation and receipt

skew happens all the time. must logically align data within each period; requires understanding data

skew requires alignment between alignment and the periodicity the data is arriving at

Bustiness and lag require deciding how long to wait

The longer you wait the more likely data will appear; but computation is less timely

wait time must be bounded in some way because of finite resources

Types pf clocks: measurement time, receipt time, analytic/processing time

deadlines exclude data: schemes: static guarantees timeliness while dynamic adapts to changing conditions

dynamic deadlines can be set based on how much data is being excluded by deadline

Elliot Turner:

Morning starts off with cognitive computing

Arthur Bock wrote the first gam playing computer (checkers) on the IBM701

IBM's Deep Blue (chess) was 10M times faster than the IBM701; massively parallel with specialized chess playing ASICs

In 1997 Deep Blue was in top 500 super computers. Today the same compute is available on a $400 graphics card

IBM Watson was a 305 year project; team of 15 people. Within a year, it could regularly beat some champions.

Problem domain: broad domain; complex language; high precision, accurate response

Real Jeopardy champions buzz in 80% of the time & answer correctly 80% of the time. That's incredible performance

Can't be solved with lookup table. In 20,000 questions there are over 2500 types; bigger bucket is 3%

Cognitive has evolved from systems that play games to multi-model understanding speech, emotions; all API driven

Cognitive system is a partnership between humans & computers

Cognitive computing depends on understanding, reasoning, and learning.

Cognitive systems are trained, not programmed. They work with humans to develop they're capabilities.

Turning cognitive computing loose on Internet leads to interesting results. For example, it learned dogs are people.

The problem is that there's not "one truth" In some contexts people equate their dogs with people. But dogs aren't people

Now, understanding human speech is an API call away

Three rest calls: speech understanding -> translation -> txt-to-speech yields a speech translator

Eric Norlin:

thanks Kim, Brian, and rest of the staff. They make the conference work

Brendan Burns:

Introspection: find out what went wrong; Insight is finding out things you didn't know

Introspection requirements: specification, status, events, attribution

Audit requirements: transparency, immutability, verification, restrictions & limits, automation (APIs)

Insight requirements: dynamic organization, interactive exploration, visualization

At first blush, IaaS checks a lot of these boxes, but not all of them (eg. immutability, monitoring APIs are wrong)

Containers are the right API object, can be immutable, and can be verifiable

Cluster management has organization, introspection, specification, status, events

Demo of KSQL for querying Kubernetes. Here's the repo: https://github.com/brendandburns/ksql

Kubernetes API server enables immutability by limiting actions on containers (eg ensure code checked in)

ABAC in cluster management allows policies to control actions (eg access, deployment)

Policy ex: Only allow certain people to create containers that come from specific registries/repos

Admission control policies can control resource use (eg auto-approve if resource overage explained in issue ticket)

Mosquito checks for things that haven't changed in 3 months. Finds dead resources.

Or find services or machines that restart the most.

Benjamin Hindman:

@alexwilliams interviewing

VMs didn't change things; containers and cluster management did.

Rosanna Myers:

Robots are lots cheaper; robots are safer;

A collaborative robot or cobot

KRL is Kuka Robot Language (or Kynetx Rule Language)

A 3D printer is a blind robot. Having vision is a good thing.

Big breakthrough is cloud robotics. Off load processing to the cloud. Ex: self driving cars

Big advantages: all robots learn from experiences of the others

Cloud robotics provides designing freedom, collaborative learning and application development

Manufacturing is still area for robotics; only 10% of manufacturing is automated. Barrier: robots are hard to use

Research is another. 90% of research projects not repeatable. And the pipetting by hand for hours isn't fun.

Another: 12M people require full time care. Eg. attach robot arm to wheelchair

Joe Beda:

I made the term "production identity" up

Google systems have largely been in production for > 10yrs & are highly integrated

GOOG doesn't have all the answers, but they do have all the problems.

Solutions aren't as important as understanding how to breakdown and frame problem

Question: how do we identity production services

Trend: Manual -> Automation

We have more things we're dealing with and the change more often than they have in the past.

Evolving security: (1) network problem; lock down network (2) application security both operations and code analysis

Micro segmentation: surround any piece of hardware with it's own policy. chroot for your network

Does readability imply authorization? Doesn't sound very secure

Microservices -> many connections between components. When a micro service has 100 connections, reachability doesn't cut it

Devops is therapy for large organizations

identity is lower level function from authn or authz

We can come up with a one-size-fits-all solution for production ID where authn and authz not so much

Many applications have their own idea of a user. So secret stores become key translators

GOOG has LOAF: stuff in production has an identity that is transported ambiently

SPIFFE: Secure Production Identity Framework for Everyone

SPIFFY is dailtone for identity

SPIFFE ID: urn:spiffe:example.com:alpaca-service

Developer experience: get SPIFFE ID, give it key pair with certificate chain & root certs to trust

Cert usage: TLS verification and message signing

When we talk about message signing, think JWOT

SPIFFE could be integrated in micro service and RPC frameworks, in smart "side car" proxies, & off-the--shelf systems

Future directions for SPIFFE: federation, authorization, delegation, capability tokens

See https://spiffe.io for more information

Chris Richardson:

speaking on patterns languages for microservices

Successful software development depends on architecture, process & organization

Organization should be small, autonomous teams

Process should be agile

There's no silver bullet for architecture (reference to Fred Brookes)

Architecture patterns are a reusable solution to a problem in a particular context

Patterns force you to consider tradeoffs: benefits, drawbacks, issues to resolve

Patterns force you to consider other patterns: alternative solution and solution introduced to the pattern

Microservices pattern available at http://microservices.io

Infrastructure patterns include deployment and communication patterns

Core patterns include cross-cutting concerns

Application patterns include database architectures and data consistency

Monolithic architecture are relatively simple to develop, test, deploy, & scale (in certain contexts)

Problem is that successful applications keep growing; adding code day-after-day; you end up with a "ball of mud"

Monolithic architectures break process goals of agile and continuous deliver & the org goal of autonomous teams

Microservices architecture functionally decomposes app into many services intermediated by API gateways

Microservice architecture drawbacks: complexity, IPC, partial failure, TxN span multiple services; testing is hard

Issues: deployment; communication, partitioning; distributed data management

Shared databases lead to tight coupling between services; each service needs it's own data store

Data store per service -> services communicating via API only

Event-driven, eventually consistent architecture is solution to data store per service downsides

dual write problems traditionally solved using TxNs. Instead must reliably publish events; use Event sourcing

2nd problem: queries are no longer easy across several services; pattern is CQRS and materialized views

There are many more patterns for deployment, communication, etc.

Mark VanderWiele:

Connect and control #IoT devices in minutes using voice commands

Architecture uses MQTT to publish & subscribe data from device; processing in cloud; connecting homekit & TI devs

Learned: make devices more self-describable; allows generic UIs that devices plug into and work

Voice is the last mile in device interaction

Doing demos: monitor and control a device with voice commands

Demo using services from Bluemix services catalog

"ambient computing at your disposal"

Nodered used to get commands from speech application; program processes keywords; sends JSON using mQTT to robot

Using an iPod touch as the HomeKit gateway; using another iPod touch as gateway for bluetooth spheros

Created composite applications from multiple device types and the IoT foundation

Capabilities unfortunately change when manufacturers send firmware updates

John Musser:

APIs can be great, but not always... API Ops is the answer

APIs go down; have unversioned changes; API Ops to the rescue

API Ops is like DevOps for APIs

API Ops should build, test, and deploy APIs more reliably.

API Ops and Dev Ops are similar, but different in subtle ways

We're seeing more and more stories about API failures.

API Ops: design, build, test & release APIs more rapidly, frequently, & reliably

Elephant in the room is micro services; DevOps necessary for managing all these services.

Use of API specification has exploded. So has the number of API tools

Why all the tools? The API Lifecycle. 1st gen focused on operation. 2nd gen focused on the rest

API Lifecycle: requirements, design, development, test, deployment, and operations

DevOps is about looking at entire lifecycle. API Ops is similarly focused on entire lifecycle

Going meta: APIs for API Ops

The entire API lifecycle can be controlled with APIs


I Am Sybil

Split personalities

Online, I am Sybil. So are you. You have no digital representation of your individual identity. Rather, you have various identities, disconnected and spread out among the administrative domains of the various services you use.

An independent identity is a prerequisite to being able to act independently. When we are everywhere, we are nowhere. We have no independent identity and are thus constantly subject to the intervening administrative identity systems of the various service providers we use.

Building a self-sovereign identity system changes that. It allows individuals to act and interact as themselves. It allows individuals to have more control over the way they are represented and thus seen online. As the number of things that intermediate our lives explodes, having a digital identity puts you at the center of those interchanges. We gain the power to act instead of being acted upon.

This is why I believe the discussion of online privacy sells us short. Being self-sovereign is about much more than controlling how my personal data is used. That's playing defense and is a cheap substitute for being empowered to act as an individual. Privacy is a mess of pottage compared to the vast opportunities that being an autonomous digital individual enables.

Technically, there are several choice for implementing a self-sovereign identity system. Most come down to one of three choices:

  • a public, permissionless distributed ledger (blockchain)
  • a public, permissioned distributed ledger
  • a private, permissioned distributed ledger1

Public or private refers to who can join—anyone can join a public ledger. A public system allows anyone to get an identity on the ledger. Private system restrict who can join. I owe this categorization to Jason Law.

Permissioned and permissionless refers to how the ledger's validators are chosen. As I discussed in Properties of Permissioned and Permissionless Blockchains, these two types of ledgers provide a different emphasis on the importance of protection from censorship and protection from deletion. People of a more libertarian bent will prefer permissionless because of it's emphasis on protection from censorship while those who need to work within regulatory regimes will prefer permissioned.

We could debate the various benefits of each of these types of self-soveregn identity systems, but in truth they are all preferable to what we have today a each allows individuals to create and control identities independent of the various administrative domains with which people interact. In fact, I suspect that one or more instantiations of each these three types will exist in parallel to serve different needs. Unlike the physical world where we live in just one place, online, we can have a presence in many different worlds. People will use all of these systems and more.

Regardless of the choices we make, the principle that ought to guide the design of self-sovereign identity systems is respect for people as individuals and ensuring they have the ability to act as such.

In my discussion on the CompuServe of Things, I said:

"On the Net today we face a choice between freedom and captivity, independence and dependence."

I don't believe this is overstated. As more and more of our lives are intermediated by software-based systems, we will only be free if we are free to act as peers of these services. An independent identity is the foundation for that freedom to act.


  1. A private, permissionless ledger is an oxymoron.


Building a Virtual University

Imagine you wanted to create a virtual university (VU)1. VU admits students and allows them to take courses in programs that lead to certificates, degrees, and other credentials. But VU doesn't have any faculty or even any courses of its own. VU's programs are constructed from courses at other universities. VU's students take courses at whichever university offers it. In an extreme version of this model, VU doesn't even credential students. Rather, those come from participating institutions who have agreed, on a program-by-program basis, to accept certain transfer credits from other participating universities to fulfill program requirements.

Tom Goodwin writes

Uber, the world’s largest taxi company, owns no vehicles. Facebook, the world’s most popular media owner, creates no content. Alibaba, the most valuable retailer, has no inventory. And Airbnb, the world’s largest accommodation provider, owns no real estate.

These companies are thin layers sitting on an abundance of supply. They connect customers to that supply. VU follows a similar pattern. VU has no faculty, no campus, no buildings, no sports teams. VU doesn't have any classes of its own. Moreover, as we'll see, VU doesn't even have much student data. VU provides a thin layer of software that connects students anywhere in the world with a vast supply of courses and degree programs available.

There are a lot of questions about how VU would work, but what I'd like to focus on in this post is how we could construct (or, as we'll see later, deconstruct) IT systems that support this model.

Traditional University IT System

Before we consider how VU can operate, let's look at a simple block model of how traditional university IT systems work.

traditional university system design
Traditional University IT System Architecture

Universities operate three primary IT systems in support of their core business: a learning management system (LMS), a student information system (SIS), and a course management system.2

The LMS is used to host courses and is the primary place that students use course material, take quizzes, and submit assignments. Faculty build courses in the LMS and use it to evaluate student work and assign grades.

The SIS is the system of record for the university and tracks most of the important data about students. The SIS handles student admissions, registrations, and transcripts. The SIS is also the system that a university uses to ensure compliance with various university and government policies, rules, and regulations. The SIS works hand-in-hand with the course management system that the university uses to manage its offerings.

The SIS tells the LMS who's signed up for a particular course. The LMS tells the SIS what grades each student got. The course management system tells the SIS and LMS what courses are being offered.

Students usually interact with the LMS and SIS through Web pages and dedicated mobile apps.

VU presents some challenges to the traditional university IT model. Since these university IT systems are monoliths, you might be able to do some back-end integrations between VU's systems and the SIS and LMS of each participating university. The student would have to then use VU's systems and those of each participating universities.

The Personal API and Learning Records

I've written before about BYU's efforts to build a university API. A university API exposes resources that are directly related to the business of the university such as /students, /instructors, /classes, /enrollments, and so on. Using a standard, consistent API developers can interact with any relevant university system in a way that protects university and student data and ensures that university processes are followed.

We've also been exploring how a personal API functions in a university setting. For purposes or this discussion, let's imagine a personal API that provides an interface to two primary repositories of personal data: the student profile and the learning record store (LRS). The profile is straightforward and contains personal information that the student needs to share with the university in order to be admitted, register for and take courses, and complete program requirements.

student profile
A Student Profile

The LRS stores the stream of all learning activities by the student. These learning activities include things like being admitted to a program, registering for a class, completing a reading assignment, taking a quiz, attending class, getting a B+ in a class, or completing a program of study. In short there is no learning activity that is too small or too large to be recorded in the LRS. The LRS stores a detailed transcript of learning events.3

One significant contrast between the traditional SIS/LMS that students have used and the LRS is this: the SIS/LMS is primarily a record of the current status of the student that records only course-grained achievements whereas the LRS represents the stream of learning activities, large and small. The distinction is significant. My last book was called The Live Web because it explored the differences between systems that make dynamic queries against static data (the traditional SIS/LMS) and those that perform static queries on dynamic streams of data. The LRS is decidedly part of the live web.

The personal API, as it's name suggests, may provide an interface to any data that the person who owns, but right now, we're primarily interested in the profile and LRS data. For purposes of this discussion, we'll refer to the combination profile and LRS as the "student profile."

We can construct the student profile such that it can be hosted. By hosted, I mean that the student profile is built in a way that each profile could, potentially, be run on different machines in different administrative domains, without loss of functionality. One group of students might be running their profiles inside their university's Domain of One's Own system, another group might be using student profiles hosted by their school, other students might be using a commercial product, and some, intrepid students might choose to self-host. Regardless, the API provides the same functionality independent of the domain in which the student profile operates.

Even when the profile is self hosted, the information can still be trusted because institutions can digitally sign accomplishments so others can be assured they're legitimate.

Deconstructing the SIS

With the personal-API-enabled student profile, we're in a position to deconstruct the University IT systems we discussed above. As shown in the following diagram, the student profile can be separated from the university system. They interact via their APIs. Students interact with both of them through their respective APIs using applications and Web sites.

student profile and university systems
Interactions of the University and Student Profile

The university API and the student profile portions of the personal API are interoperable. Each is built so that it knows about and can use the other. For example, the university API knows how to connect to a student profile API, can understand the schema within, respects the student profile's permissioning structures, and sends accomplishments to the LRS along with other important updates.

For its part, the student profile API knows how to find classes, see what classes the student is registered for, receive university notifications, check grades, connect with classmates, and sends important events to the university API.

VU can use both the university systems and the student profile. Students can access all three via their APIs using whatever applications are appropriate.

adding a virtual university
Adding a Virtual University

The VU must manage programs made from courses that other universities teach and keep track of who is enrolled in what (the traditional student records function). But VU can rely on the university's LMS, major functions of its SIS, and information in the student profile to get its job done. For example, if VU trusted that the student profile would be consistently available, it would need to know who its students are, but could evaluate student progress using transcript records written by the university to the student profile.

Building the Virtual University

With this final picture in mind, it's easy to see how multiple universities and different student profile systems could work together as part of VU's overall offerings.

virtual university
The Virtual University

With these systems in place, VU can build programs from courses taught at many universities, relying on them to do much of the work in teaching students and certifying student work. Here is what VU must do:

  • VU still has the traditional college responsibility of determining what programs to offer, selecting courses that make up those programs, and offering those to its students.
  • VU must run a student admissions process to determine who to admit to which programs.
  • VU has the additional task of coordinating with the various universities that are part of the consortium to ensure they will accept each others courses as pre-requisites and, if necessary, as equivalent transfer credits.
  • VU must evaluate student completion of programs and either issue certifications (degrees, certificates of completion, etc.) itself or through one of its member institutions.

Universities aren't responsible for anything more than they already do. Their IT systems are architected differently to have an API and to interact with the student profile, but otherwise they are very similar in functionality to what is in place now. Accomplishments at each participating institution can be recorded in the student profile.

VU students apply to and are admitted by VU. They register for classes with VU. They look to VU for program guidance. When they take a class, they use the LMS at the university hosting the course. The LMS posts calendar and notifications to their student profile. The student profile becomes the primary system the student uses to interact with both VU and the various university LMSs. They have little to no interaction with the SIS of the respective universities.

One of the advantages of the hosted model for student repositories is that they don't have to be centrally located or administered. As a result student data can be located in different political domains in accordance with data privacy laws.

Note that the student profile is more than a passive repository of data that has limited functionality. The student profile is an active participant in the student's experience, managing notifications and other communications, scheduling calendar items, and even managing student course registration and progress evaluation. The student profile becomes a personal learning environment working on the student's behalf in conjunction with the various learning systems the student uses.

Since the best place to integrate student data is in the student profile, it ought to exist long before college. Then students could use their profile to create their application. There's no reason high school activities and results from standardized testing shouldn't be in the LRS. Student-centric learning requires student-centric information management.

We can imagine that this personal learning environment would be useful outside the context of VU and provide the basis for the student's learning even after she graduates. By extending it to allow programs of learning to be created by the student or others, independent of VU, the student profile becomes a tool that students can use over a lifetime.

The design presented here follows the simple maxim that the student is the best place to integrate information about the student. By deconstructing the traditional centralized university systems, we can create a system that supports a much more flexible model of education. APIs provide the means of modularizing university IT systems and creating a student-centric system that sits at the heart of a new university experience.

Related Reading:


  1. Don't construe this post to be "anti-university." In fact, I'm very pro-university and believe that there is great power in the traditional university model. Students get more when they are face-to-face with other learners in a physical space. But that is not always feasible and once students leave the university, their learning is usually much less structured. The tools developed in this post empower students to be life-long learners by making them more responsible for managing their own learning environment.
  2. Universities, like all large organizations, also use things like financial systems, HR systems, and customer management systems, along with a host of other smaller and support IT systems. But I'm going to ignore those in this post since they're really boring and not usually specialized from those used by other businesses.
  3. The xAPI is the proposed way that LRSs interact with each other and other learning systems. xAPI is an event-based protocol that communicates triples that have a subject, verb, and object. Using xAPI, systems can communicate information such as "Phillip completed reading assignment 10." In my more loose interpretation, an LRS might also store information from Activity Streams or other ways that event-like information can be conveyed.


Why Companies Need Self-Sovereign Identity

347/365 Trip to the docs

The Problem

While the Internet seems to have made most everything else in our lives easier, faster, and more convenient, certain industries like healthcare and finance remain stubbornly in the 20th century.

Consider the problem of a person using their medical data. Most of us have multiple healthcare providers ranging from huge conglomerates that run hospitals and clinics to small businesses operated by doctors to retail pharmacies large and small. None of us get healthcare under a single administrative umbrella. Consequently, we leave a trail of medical data spread about the various healthcare provider's incompatible systems.

In this setup, patients can’t easily identify themselves across different administrative domains. Consequently, linking patient data is made more complicated than it otherwise would be. Interoperability of systems is nearly impossible. The result is, at best, increased costs, inefficiency, and inconvenience. At worst, people die.

The seemingly obvious solution is to create a single national patient identifier. But can you hear the howls and screams? This solution has never gained traction because of privacy concerns and fear of corporate and government control and overreach.

What if there were an end-run around a national patient ID? There is.

Self-Sovereign Identity

Self-sovereign identity is based on the premise that the best point for integrating data about a person is the person. At first the idea seems fanciful, but we've arrived at a point where self-sovereign identity is feasible.

You’d have to be living in a cave to have not heard something about Bitcoin in the past few years. But if you’re not a Bitcoin geek, you might not realize that a critical underlying piece of technology that Bitcoin introduced—and popularized—is the blockchain.

A blockchain is a revolutionary way of building a distributed ledger, and distributed ledgers have exciting possibilities for solving sticky identity problems.

Using a distributed ledger, we can create an identity system that:

  • is not owned and controlled by anyone—in the same manner that the Internet is a set of protocols and some loose governance1, distributed ledgers allow us to create identity systems that are usable by everyone equally, without any single entity being able to control it.
  • provides people-controlled identifiers—this is the essence of the term self-sovereign. The identity is under the control of the person using it.
  • is persistent—since the identifier is under the sovereign control of the patient, it is irrevocable and long-lived, potentially for life.
  • is multipurpose—the identifier used at one healthcare provider can be used at another, not to mention your financial institution, your school, and anywhere else.
  • is privacy enhancing—only the parties to any given transaction can see any of the details of the transaction, including who’s involved. People choose what to reveal and to whom in exchange for the services they need. People can have multiple identifiers that correspond to a different persona.
  • is trustable—the distributed ledger and its governance process, both human and algorithmic, provide a means whereby systems using the identity can instantly verify claims against third-parties when self-assertion (of the patient) is insufficient.

Healthcare providers, financial institutions, educational institutions and even governments can be part of this identity system without any of them being in charge of it. Sound impossible? It’s not, that’s how the Internet works today. Not only is it possible, it’s technically feasible right now.

The benefits of using self-sovereign identity go well beyond interoperability and include secure messaging, auditable transaction logs, and natural ways to manage consent.

How Does Self-Sovereign Identity Work?

Think about your online identities. Chances are you don’t control any of them. Someone else controls them and could, without recourse, take them away. While most people can see the problems with that scenario, they can’t imagine another way. They think, “if I want to use an online service, I need an identity from it, right?” We don’t.

There are lots of examples online right now. Have you ever used Google or Facebook to log into some other system (called a “relying party”)? Most of us have. When you log into another system with Facebook, you are using your Facebook identity to establish an account on the other system. Identities and accounts are not the same thing.

Plenty of places already see the value in reducing friction by allowing people to use an online identity from a popular identity provider (e.g. Google or Facebook).

Now imagine that the identity you used to log into a relying party didn’t come from a big company, but instead was something you created yourself on a distributed-ledger-based identity system. And further, imagine that relying parties could trust this identity because of its design. This is the core idea behind self-sovereign identity.

Any online service could use a self-sovereign identity. In fact, because of the strong guarantees around verified claims that a distributed-ledger-based identity system can provide, a self-sovereign identity is significantly more secure and trustworthy than an identity from Facebook, Google, and other social-network-based identity providers, making it usable in healthcare, finance, and other high-security applications.

Companies Need Self-Sovereign Identity

The bottom line is that companies need self-sovereign identity as much as people. Self-sovereign identity based on a distributed-ledger is good for people because it puts them in control of their data and protects their privacy.

But it's also good for companies. This is a classic win-win scenario because the same technology gives companies, governments, and other institutions an identity system they can trust, that enhances their operational capabilities, and is not their responsibility to administer. Organizations can get out of the business of being identity providers, an enticing proposition for numerous reasons. Here's a few examples:

  • As companies face more and more security threats, they are coming to see personally identifying information as a liability. A self-sovereign identity system provides a means for companies to get the data they need to complete transactions without having to keep it in large collections.
  • Claims that have been verified for one purpose can be reused in other contexts. For example, if my financial institution has verified my address as part of their know your customer process, that verified address could be used by my pharmacy.

The Internet got almost everyone except the telecom companies out of the long-haul networking business: throw your packets on the Internet and pick them up at their destination. Think of the efficiencies this has provided. A distributed-ledger-based identity system can do the same thing for identity.

Let’s create one, shall we?


  1. The Internet is governed (mostly) by protocol: the rules that describe how machines on the Internet talk to each other. These rules, determined by "rough consensus and running code," are encoded in software. There are other decisions that need to be made by humans. For example: who gets blocks of IP addresses, how they're handed out, and what top-level domains are acceptable in the domain name system. These decisions are handled by a body called ICANN. This system, while far from perfect, manages to make decisions about the interoperation of a decentralized infrastructure that allows anyone can join and participate.


Principles of Self-Sovereign Identity

Christopher Allen has a nice slide deck online, Identity on the Blockchain: Perils and Promise from his talk at Consensus 2016 Identity Workshop.

His discussion of self-sovereign identity and the principles he believe identity systems ought to possess start on slide 6:

There's tremendous power in the simple declaration that every human being is the original source of their identity. Think about your online identities. Chances are you don't control any of them. Someone else controls them and could, without recourse or appeal, take them from you.

This is, of course, untenable. As software intermediates more and more of our lives we must either gain control of our online identities or be prepared to surrender key rights and freedoms that we have taken for granted in the physical world.

In the principles, Chris lays out necessary attributes that a self-sovereign identity system must have to protect human freedom1:

  1. Existence People have an independent existence — they are never wholly digital
  2. Control People must control their identities, celebrity, or privacy as they prefer
  3. Access People must have access to their own data — no gatekeepers, nothing hidden
  4. Transparency Systems and algorithms must be open and transparent
  5. Persistence Identities must be long-lived — for as long as the user wishes
  6. Portability Information and services about identity must be transportable by the user
  7. Interoperability Identities should be as widely usable as possible; e.g. cross borders
  8. Consent People must freely agree to how their identity information will be used
  9. Minimization Disclosure of claims about an identity must be as few as possible
  10. Protection The rights of individual people must be protected against the powerful

I could quibble with some of the wording, but I think this is a pretty good list. Identity systems that support these principles are possible. And they would not just work with existing administrative systems (which no one is proposing would go away), but enhance them.

I know there are some people reading this and thinking of all the reasons it will never work. If you've got specific comments, questions, or critiques, feel free to use the annotation system on the right of the page to post them. Let's have a discussion.


  1. I've changed "user" to "people" in this list because Doc Searls long ago conditioned me associate it with "drug user" and I've never recovered.


Building a Personal API

Honey bee having a drink

As part of my abstract for a talk at APIStrat 2016, I wrote that BYU was interested in equipping students with personal APIs in an effort to teach them digital autonomy and make them more responsible for their learning.

Unlike others who attempted personal APIs, we're lucky in that we control both sides of the transaction. That is, we have control of the university systems and we have students who will generally use the tools we build for them. So rather than build a conventional tool to, say, create a directory, we can build one that uses the student's personal API. The university systems still work and students get a new tool that they learn to use. This is meaningful because students have a significant number and variety of interactions with the university, making it a great place to explore how architectures based on personal APIs can be designed and used.

The BYU Domains project provides a free domain name, with hosting, for every matriculated student and all faculty and staff. BYU Domains gives students an online space of their own to teach them that they can be contributors to online conversations and introduce them to the concept of self-sovereign identity. BYU Domains is a Cpanel-based hosting system. Consequently students can install numerous applications and give each a unique URL through subdomains, paths, or both. This makes a BYU Domain site a great place to host a personal API.

The BYU Domains project is creating a community directory. The purpose of the directory is to highlight uses of BYU Domains to showcase how different members of the campus community are using them. Here's University of Mary Washington's directory as an example.

The traditional way to build this would be to write some software that looks in the BYU Domains administrative database and uses the information store there to create a directory. Here's a simple diagram illustrating that approach:

community directory traditional architecture
A traditional architecture for the community directory

In this architecture the directory is tightly coupled to the specific system used to implement BYU domains. The application is dependent on the schema in the database. Any changes to the underlying system would necessitate that the directory application be updated as well.

An alternative that avoids this problem is to architect the directory application as an aggregator of information from the personal APIs of any domains that want to participate in the directory. The following diagram illustrates that architecture.

community directory api architecture A personal-API-based architecture for the community directory

In this architecture, the directory uses information from the various APIs of domains that participate. The directory application has a database for information in its domain as well as for caching information from the various APIs. We're unlikely to render a page for the directory by synchronously calling dozens of individual APIs. This design gives the directory application the traditional advantages of API-based architectures:

First, the API represents a contract upon which the directory system can depend. The contract insulates the directory from underlying implementation changes in the personal domain.

Second, the directory has access to the APIs because their owners have authorized the connection using OAuth. Domain owners can take themselves out of the directory at any point by visiting the Authorizations control panel in their domain. This eliminates the need for data sharing agreements and other information governance hurdles.

Third, domains that aren't part of the BYU Domains administrative system can still be part of the directory. For example, a faculty domain hosted by her department could just as easily be featured in the directory as one hosted by BYU Domains.

BYU's University API Committee is currently exploring how to make this work. We've had some good discussions on creating personal APIs that are built incrementally as data is needed, and done flexibly using JSON-LD markup to communicate schema information. I'm excited by the possibilities. We're also exploring API management that can run on the individual domains.

The BYU Domains Community Directory project gives us a good excuse to design and build a basic personal API application for BYU Domains. Follow-on projects, like a xAPI-responsive learning record store (LRS), will take advantage of this foundation to add portfolio data to the student's personal API. University systems can be tuned to use the profile and LRS information while leaving it all firmly in the hands of its owner.


Personal APIs in a University Setting

Southern armyworm, eggs_2014-06-06-14.28.04 ZS PMax

The API economy is in full swing and numerous commercial entities are engaged in producing APIs as a way of extending their reach. But as more and more of life is intermediated by computers, individuals also benefit from providing a personal API that can be used by applications on equal footing with other APIs.

Brigham Young University is teaching students about digital autonomy so that they are better prepared to be lifelong learners. In addition to a Domain of One's Own project, we have also embarked on a program of giving each student a personal API. By making students responsible for their own data, we teach them that they can be active participants in the digital realm.

In addition to profile information, the API is also a means of accessing the student's personal learning record system (LRS). Students grant university and other systems access to the resources in their API. The personal API provides the same features that other APIs have including API management and authorization. The personal API works in concert with BYU's University API to create a rich, permissioned data ecosystem for application developers inside and outside the university.

This talk will discuss design principles, implementation decisions, initial projects, and our experience to date.