The Future of Internet Identity: Data Access and Modeling

Internet Identity Workshop

In my previous blog post, wrapping up IIW X and discussing what wasn't discussed, I talked about what was missing at IIW: discussions about authentication. What was hot at IIW were discussions about authorization and personal data. OAuth, UMA, and PDX talks were happening in every corner this time and these topics (with data access and modeling as their unifying theme) will be a major area of focus as IIW continues.

Back in the dark days of the Web, if you wanted access to data in your account in someone's system via an API, you had to pass along your login credentials. This is fine as long as the developer is the only person using the app, but when I use Tweetdeck, or more to the point a Web site that uses the Twitter API, I don't want to give the app my credentials. When I do, the app might misuse them. What's more, if I want to change them to revoke access from one app, I have to update every other app so they know about the new credentials. Unsafe and unscalable.

Unsafe, unscalable and yet the whole idea of cloud computing rested on the idea of authorized access to APIs. Take that away and the cloud becomes just a more efficient way of managing servers. Along came OAuth. In a world where APIs are gaining traction OAuth was a (relatively) simple answer to some hard problems and it took off. This is why authorization is such an interesting topic. Authentication is technically difficult, but ultimately pretty dry. But figuring out who gets to do what and when they get to do it is the stuff of drama and intrigue.

But OAuth isn't enough. OAuth is fine as far as it goes, but there are even larger problems that OAuth isn't designed to handle.

I believe that access to data, ever more personalized, is a trend that will shape our world--online and off--for the coming decade and lead to changes more profound and comprehensive that anything we've seen yet. This is the idea behind David Siegel's book Pull (read my blog post on the Power of Pull or listen to the podcast for more). David's book paints a compelling picture that is breath taking in its scale and scope; there is no aspect of our lives that won't be impacted by these changes. I think it's impossible to overstate their importance.

But before any of that can happen we have to overcome the limitations of OAuth.

  • OAuth creates one to one relationships. In OAuth there's no distinction between the authorization manager and the data host. The data host runs an OAuth server. A user has to manage each relationship independently at the host. There's no user-centric repository of access policy and activity.
  • OAuth has a very course access granularity; it gives access to the entire API. Once I've authorized a client to access the Twitter API on my behalf, it's me for all intents and purposes. OAuth isn't designed to tease out specific functions or elements and give them different access rights. OAuth 2.0 has some weak scoping concepts, but they're not based on generalizable policy principles.

UMA, or User Managed Access, is a project designed to overcome these shortcomings. UMA, for all intents and purposes is OAuth with one more level of indirection. UMA splits off the authorization management function and creates a place where users can manage access to resources they control on UMA-compliant hosts (places that have data). This solves the problem of one-to-one relationships. (Here's an excellent diagram that describes how UMA works.)

But this architecture is a two-fer. Because there's an authorization manager, there's a place to store policy and policy gives us the ability to control access in a more fine-grained manner. Want to give an app access to your friend list in Twitter but keep it from posting updates? Create a policy for that. All this policy stuff could be a user experience nightmare, but the UMA people are paying a lot of attention to use cases and usability.

UMA, as a protocol, goes a long way toward creating a world where fine-grained access to API-mediated data is possible. But there's another, related area that needs work before the vision of a fully functional, data-driven world of pull can become reality: personal data.

I wrote a post last month about Personal Data, Freedom, and Value Creation. The post was about the values that drive personal data and derive from personal data. Mediating access to data from APIs is fine, as far as it goes, but ultimately individuals will be the source of more and more data (what's called "volunteered personal information" or VPI) and will need to be able to manage that data, authorize it's use, and choose what gets done with it.

Imagining a world where your golf clubs automatically register your strokes, power, and so on; your toilet automatically analyzes your waste and registers the results; or your purchases (even those made offline) are collected and collated is fine, but done wrong or even poorly it's not a very nice world to live in. That's where the work of what's being called the PDX (personal data X -- the right noun hasn't been invented yet) comes in. Can we build systems that not only let us authorize the use of the data we have at various services but also put us in control of all our personal data? On the answer to that question hang our future privacy and security. Now we've got drama and action. No wonder people are excited!

PDX is not really about authorization (that question is orthogonal and rightly so). PDX is more about the models that will allow data sharing, synchronization and exchange. The area of authorization that PDX does touch on is in describing policies. If you have the right models for data, then you can also more easily create the policies that govern it's access. Modeling, understanding, and desribing our personal data is a necessary foundation to a future data-driven world.

On day two of this IIW, Steve Gillmor told me we ought to quit and declare victory. There's an appealing logic to that, but ultimately I think the questions that haven't been answered yet are even more interesting than the ones that have. And I think IIW has a continuing role in serving as a gathering place where the people working on the answers to those questions can meet, debate, and build the future Internet. Come to IIW XI on Nov 9-11, 2010 and join us.