« April 2006 | Main | June 2006 »
May 30, 2006
MacBook Pro and the Bleeding Edge
One problem with life on the bleeding edge is that there isn't as much infrastructure built up and parts are scarce. Take Apple's MacBook Pro, for example. I've got a gleaming example sitting by my desk that I've been playing with for a few days. It's very fast at some things and Rosetta works well. Too well, perhaps. You might forget to upgrade some G4 apps to native ones because they still work passibly. But, that's another story.
Two problems I've run into so far. First, Apple switched to SATA drives for the MacBook Pro. That probably gave them some advantages in manufacturing, but it severely limits choices of drives. There's a nice 160Gb IDE hard drive available, but the only 160Gb SATA drive I've been able to find (from Fujitsu) runs are 4200 RPM.
Similarly, I've wanted to get an EVDO card for sometime and was about to pull the trigger when I remembered that the MacBook Pro uses the ExpressCard/34 instead of having a PCMCIA slot. Newer, smaller, fancier--I'm sure--but there are thousands of PCMCIA cards out there that are now no use. I've got several for media cards and such that I won't be able to use any more. There's hope that an EVDO card for ExpressCard/34 will be available "real soon now," but who knows what getting it to work will entail.
Nevertheless, I'm not going to give the MacBook Pro back. After all--I'm just crazy enough to like these little challenges.
10:23 PM | Comments (1) | Recommend This | Print This
Google's Serendipitous Uses
Derrick Story has a nifty tip for using GMail to convert Word docs to HTML. Just send it as an attachment to your GMail account and then select "View as HTML" next to the attachment. I just tried it with this Word doc and got this HTML document. Very nice.
Now, if someone would just get around to building a tool that you drag a Word doc onto and it uses GMail to convert it to HTML and deposit the result in the same directory, that would be awesome.
9:19 PM | Comments (5) | Recommend This | Print This
May 26, 2006
WWW2006 Conference Wrap-Up
Rent-a-cop at the convention center. They look very
professional here. (click to enlarge) |
So, WWW2006 is wrapping up. There are still a few sessions and dinner tonight with some new friends, but for the most part it's done.
Overall, this has been a good conference. When I looked at the conference program before I came it was overwhelming and, frankly, there wasn't much that looked all that interesting based on the titles that I scanned. In spite of that, when I got here, I found that it was rather easy to focus on specific tracks that looked interesting and there were numerous sessions that I enjoyed.
The conference center itself is a nice place. There's even a concierge to arrange dinner plans, taxis and what not. There are a lot of support staff from the conference center directing people where to go, a lot of people from "In Any Event" the professional conference group who's running the details, and students from Southhampton University. Putting on a conference of this size is a lot of work and the organizers deserve credit for pulling it off admirably. All in all things ran very smoothly.
Endinburgh Internation
Conference Center (click to enlarge) |
This is a big conference--I heard 1200 people registered. Apparently the Edinburgh Conference Center is charging them 500,000 pounds for hosting it here. Not sure if that's high or low for an event this size, but it seems expensive. That explains the steep conference fee. I was a little put off by that and I've talked to others that were as well.
I've heard quite a bit of complaining about the multi-track system offering competing sessions on the same subject at the same time. That's a problem at any conference, but with one that's got 11 tracks going on simultaneously, it's inevitable. Having tutorials going throughout the conference instead of on the first few days hasn't helped.
The call for papers is up for WWW2007. I think the topic areas tend to pigeonhole this conference into being not as widely accepted and used as it could be. For example, there academic research sessions on semantic Web (several) plus the W3C has it's own tracks that focus on that as well. That gives this conference a decided "semantic web" feel. I heard REST mentioned only a few times and mostly in a disparaging context.
Concierge at EICC (click to enlarge) |
Tim Bray says this conference used to be great. I think it's OK, but I wonder how much better it might be if the topic areas were more broadly inclusive and it were held in major metro areas that were easier to get to instead of various exotic locations (two years ago it was in Brazil, last year in Japan, next year it's in Banff Canada). Nothing against those places, but this conference ought to have 2500-3000 people attending. On the other hand with only 11% paper acceptance, maybe it's popular enough.
The crowd here is mostly unknown to me from other conferences that I go to. There are a few cross-over people like Rohit Khare, but not many. Nevertheless, there is material here that should be of interest to those groups as well. We need venues for academic and industry types to mix and share ideas. I was hoping the WWW conference would be that venue, but it doesn't appear so.
10:10 AM | Comments (2) | Recommend This | Print This
Late Breaking News Session
My presentation on LDDI was in the "Late Breaking News" session since we basically missed all the deadlines. There were some other interesting presentations in that session as well.
Daniel Harris and Niel Harris (no relation) presented Kendra, a non-profit initiative to create an open market for digital goods. They presented Kendra Base, a tool for describing digital goods using meta-data. They describe it as "a semantic information publishing and querying system prototype." They also called it a "provocation," meaning that they're hoping someone can do it better--they're just exposing the ideas. The user shouldn't have to know RDF, XML, or anything else to make it work. On the other hand, you do have to understand ontological concepts for it to make sense. Multiple systems can be linked together to form a distributed system for purposes of search.
Inigo Surguy presented "Using ontologies to repair your car" a description of MyCarEvert, a European project by a consortium of vendors and service providers to allow access to automotive events (emergencies, breakdowns, etc.) using mobile devices. The project uses OWL and RDF in the query service to provide smart search and smart filtering.
Why use an ontology? To create a single data model that multiple vendors can use to classify their systems, parts, services, and so on in a single classification system. Vendors and service providers maintain their own datastores and these are combined to form the overall system. The query system is based on SPARQL and uses an OWL-DL reasoning engine called Pellet (PDF).
Searching on a specific car and system for example gives you a list of possible faults to select from and even cures. It strikes me that given the goal of being usable from mobile devices, building the ontology so that the system can make suggestions is desirable. Inigo mentions that getting all the SPARQL right was difficult. They created a SPARQL syntax highlighter to aid in debugging.
Julien Anguenot of Nuxeo spoke on " Multiparadigm application development with the Zope 3 component architecture." Unfortunately Julien spent more than half his time talking about the history of Zope and the differences between Zope 2 and Zope 3.
Zope 3 components are objects with introspectable interfaces. Content components manage data while factory components create other components. View components create presentation and adaptor components hold business logic. ZCML is a mark-up language for specifying how components are configured and interact. This is the core, but there is much more to the component architecture.
Next, Mark Seaborne from ORIGO Services spoke on "Building Web Forms The Easy Way." Mark proposes a reproducible process for creating business forms on the Web. Certain industry verticals use and re-use specific form patterns and these con be codified for re-use.
First Mark uses the IBM XForms generator to create a form from the WXS. Next Schematron constraints are layered on. Third, you apply a labeling that makes sense, group like components together, and so on to aid the end user. This gives a form that is business complete XForm that has the right components. The final step is to use CSS to style the form for presentation. The goal is to make a sausage machine so that small changes to a form don't require large development efforts.
Mark is creating XSLT stylesheets that can perform these translations based on validation, processing, presentation, and styling rules for each step.
8:46 AM | Comments () | Recommend This | Print This
Teaching with Games
Russell Hunter sent me a link to this article about using games in the classroom. As I mentioned earlier, this is an issue that I've grown a little concerned about.
In the article David McDivitt, a high school teacher from Indiana, talks about a controlled experiment he did with his 20th century history class. 65 students were taught a subject (status of Europe prior to WWII) using classroom discussions and video games and 45 were taught the same subject using traditional methods including a textbook and classroom discussions. All were given test before and after the week-long experiment.
The results show that students learned more using the video games than they did using other methods. Not surprisingly, they were more engaged, more willing to work extra hours, and had more out-of-class discussion. They lived the history rather than merely reading about it.
3:33 AM | Comments (2) | Recommend This | Print This
China, the Internet's Broken Link
Danny Weitzner, W3C, at
WWW2006 (click to enlarge) |
Danny Weitzner from the W3C started out today's plenary session with a discussion of the Internet and Society called "China: A Broken Link on the Web.
Is it the case that if everyone's a publisher, then too is every government a filter and interceptor? He starts off noting the story of Yahoo! "helping jail a Chinese writer" and made some interesting points:
- Yahoo! has no basis for ignoring Chinese law while obeying the laws of other countries.
- That leaves the choice of simply not doing business in China.
- There's an argument that being in China and obeying the law is better for the cause of freedom in China than not being there at all.
He brings up Google's "do no evil" motto and says it's become an albatross around Google's neck. The problem is that you get caught up in the semantics of evil.
He points out the Reporters Without Borders principles:
- Email: No US company should host email services in a 'repressive country' -- demands for access would have to go government to government.
- Search: No search engine should filter 'protected' words such as democracy
- Content hosting (blogs, etc.): No US company should be allowed to host content in a repressive country.
- Internet filtering: No US company should be allowed to sell filtering technology to a repressive country.
- Surveillance: US companies must obtain permission from the USG to export Internet surveillance technology.
- Training: No training in filter or surveillance to repressive states without US Department of Commerce permission.
The Internet has changed the thinking about the way media is regulated:
- The Internet is characterized by abundance rather than scarcity.
- User control can replace censorship.
- Carrier liability limits have changed because in decentralized networks, responsibility shifts to the end-points.
Ira Magaziner, part of the Clinton administration, worked hard in the 90's to convince world government to keep their hands off the Internet. This bring us back to China. The "hands off" strategy has some weak points:
- Abundance has changed from open web to no limits except for political speakers.
- User control has shifted to content choice by the government.
- Carrier liability is absolute. China has turned ISPs into agents of the state.
There are three possible outcomes:
- We accept China's sovereignty and let things continue. This is the status quo.
- We see real support for a human right of free expression by governments around the world. This principle has been articulated and is even given lip service by countries, including China.
- We shine a light on China's actions leading to increased global transparency.
The last point is a third way, perhaps. Google has laid out a set of transparency principles.
- transparency to users: including an indication of what's blocked
- transparency to the world: letting everyone else know what Google blocks and, where possible, why
- protection of customer information
- insistence on rule of law and due process
- Shareholders in companies (especially minority shareholders) should insist on compliance with openness principles.
These principles have application in every country. For example, Google filters copyrighted material from results in some countries and certain kinds of hate sites in others. Google is in a tight place in China, however since China State Secrets Law prohibits revealing the specific sites that they block in China. China doesn't actually say which sites to block, they have a list of criterion and companies have to apply those themselves.
There are groups, however that compare results in China with results elsewhere and then publish lists of sites that have been blocked.
The Chinese Government's policy on reform has two tracks, an aggressive program of economic reform and a slower program of political reform. Some believe that domestic demands will eventually result in political reform. While many don't support this optimistic view, many reformers closest to the situation (Chinese dissident groups) believe it.
2:52 AM | Comments (1) | Recommend This | Print This
CS Department RSS
The BYU CS Department has added RSS feeds to it's Web site. Now, if I could convince the CS department to not send them to the faculty mailing list, I'd be set. Otherwise, I just see them in RSS after I've deleted them from my mailbox. A good first step though...
2:08 AM | Comments () | Recommend This | Print This
May 25, 2006
Identity Management Panel
I attended an identity management panel moderated by Arnaud Sahuguet of Google. On the panel were Rick Hull, Bell Labs, Conor Cahill, Intel, Kim Cameron, Microsoft, Mike Neuenschwander, Burton Group, and Stefan Brands, Credentica & McGill University.
Arnaud started off with the famous "no one knows your a dog" cartoon and the ACLU pizza video. He asked each panelist how many different identities they have. The answers ranged from 40 to 313 (Cahill knew exactly). Kim said he uses classes of identities (my own strategy) for different kinds of sites.
Converged networks (wireless, television, Internet) make the problem of identity more difficult. Many are not free. Subscriber and user are subtly different notions.
Rick Hull said that federated ID management should increase user ease-of-use, but unless someone makes money from it, it won't happen. Businesses will realize increased "stickiness" and decreased churn, but that's hard to quantify. Some possible sources of revenue include increased ad revenue from targeted ads, increased eCommerce sales via the Web, and direct charges for identity services. We can grow this by building on current walled-garden relationships between identity and service providers.
Conor Cahill, a strong Liberty Alliance proponent, starts off admitting that to date, there's no large eCommerce implementation of Liberty. SSO hasn't been adopted outside the enterprise. Why? There's no perceived benefit for the service provider. In fact, they see a downside: a loss of relationship with the user. Furthermore, users haven't perceived the pain.
There's a new driver, however: phishing attacks. Managing strong authentication, including tokens, is more costly than passwords and this may force banks to sign onto (no pun intended) the notion of federated identity.
Kim Cameron introduced his laws of identity and talked about InfoCard. Kim said it wasn't evil he feared so much as incompetence. The primary role Microsoft can provide in the identity space is by adopting an infrastructure and working collegially across the industry.
Mike said that he was encouraged by incompetence. The ACLU pizza video would require linking up things in a way that would be very difficult. When you get to know something their identity, in the sense of credentials, is not something you care about. We don't typically ask dates or new acquaintances for their ID card. Liberty takes a stylistic approach to identity that is based on an engineering solution rather than a social approach. InfoCards takes yet another.
Wikipedia is another example of where identity matters but not so much because of credentials, but because of social aspects. You get the feeling that people are watching and that your actions will be found out. This is identity arising out of social context. This is identity based on recognition and shared experiences. These are the bases upon which society can begin to work on identity.
Stefan talks about building transaction systems that have identity flows at their core. This protects against external and internal attacks. One main objective is to minimize the powers of the central host or provider. This is a failing of current solutions. Clients are particularly dumb because we can't ask user's to install software (actually Microsoft can).
Stefan talks about Canada's eGovernment initiative where they've tried to implement SSO. Various agencies, and particularly provincial governments, have been reluctant to turn over control of "their" users to a centralized service.
We should look to the financial industry as a metaphor. There are dozens of financial instruments including credit cards, cash, money orders, and so on. These have developed because users have different needs, there are various trust relationships, and so on. This is a good way to think of the various user-centric identity technologies that have sprung up. They have different uses and relying parties will pick the one that fits best.
Mike says that the term "user-centric identity management" is a funny term because if it were really about the user we wouldn't say "user" and we wouldn't say "management." I'm not sure people do say "management" with the term "user-centric identity."
Someone in the audience ask for the financial drivers behind user-centric identity and Mike said, I think, that their aren't any. I'd disagree. In fact, I think that user-centric identity systems reduce financial risk for relying parties and identity providers alike and this are more likely over time to be adopted.
All in all, this was a nice panel, but it was pretty tutorial in nature. That's understandable given that most attendees weren't very familiar with the identity space.
9:11 AM | Comments (2) | Recommend This | Print This
WWW2006 Conference Dinner
The conference reception was held at Edinburgh castle. I've been taking photos while I'm here. Here are a few from our visit to Edinburgh Castle last night.
Edinburgh Castle |
Edinburgh Castle's Main Gate |
The Firth of Forth from Edinburgh Castle
| Tim Berners-Lee chatting with a bagpipe player at Edinburgh Castle
|
Yesterday my nine year-old son asked me if I'd seen a bagpiper yet. I hadn't, so when I saw on at the castle, I went over to take a picture. Interestingly Tim Berners-Lee was chatting with him, so I snapped a picture.
The trip to the castle was a lot of fun, but I have to say that I was very disappointed in the food. It was all finger food and further it was served by people walking around with trays. There weren't nearly enough of these, so every time one showed up it looked like when I feed my fish in the pond outside. People who brought a guest paid 50 pounds (almost $100) for the event. That's a lot of money for sparse finger food even in such fabulous setting.
7:54 AM | Comments () | Recommend This | Print This
Free the Data!
Free the Data! Panel (click to enlarge) |
A specially arranged panel session called Freeing the Data was moderated by Kieron O'Hara (Univ. of Southhampton). On the panel were Daniel Weitzner (W3C & MIT), Daniel Harris (Kendra), and Jeremy Frey (Univ. of Southhampton).
Jeremy Frey is a chemist and took the position that any scientist doing research should not only make results available, but the data as well. But making the data available isn't enough. We need to make it findable as well. Moreover, we need the context to be available and machine readable.
Another issue with data is correctness. Published papers have greater trust because they've been peer-reviewed. What about the data? There are several issues: first, the information about how the data was created must be preserved. Second, data must be versioned so that as it changes, reference can be made to the data set at a particular time. At least for scientific data, we should move to a position where you pay for privacy rather than publicity.
Harris took an opposing view that said authors should have the choice of whether to free data or not. What's the aim? Stabilization or destabilization? Are we talking about free access or free cost? What do we want? Legislation? Arguments? Tools? We have to decide.
Weitzner claimed that freeing data wasn't just about freedom, but about re-use. Beyond making data available, we must:
- Re-use structures including schemas and ontologies. It's more important to use well-understood structures than to use any particular idiom.
- Re-use the licenses that have already been developed. Licensing meta-data (ala Creative Commons) is also important.
- Enable re-use of ideas (contrasted with the expression of the idea). We have to find the proper scope of 'derivative works' and re-examine the issue of database copyright. Shockingly, copying the bibliographic data from a work (for purposes of citation) can be seen as a violation of some licenses.
- Attach policy information that says how the information can be used. Some experimental data depends critically on personally identifying information. Anonymization is a hard task either not working well or being at odds with the underlying research purpose of the data.
- Use open standards
Someone from the audience gave an indication of the scale of the changing data problem. She said they make 5000 changes to their data per day. Traditional versioning systems won't work if this level of granularity is really this small.
Harris mentioned Brin's The Transparent Society as a good reference on the inherent conflict between freely available data and privacy.
Weitzner said we shouldn't make the mistake of believing that we can publish information to one (fairly large) group of people and keeping it away from billions of other people. That won't work and will end up harming us in the end.
6:52 AM | Comments () | Recommend This | Print This
Detecting Cloaking in Web Pages
Baoning Wu from
LehighUniversity (click to enlarge) |
Here's something I'd never heard of before: cloaking. Cloaking is the process of returning different pages to a search engine crawler for a given URL than you return to other users. You can imagine why people intent on getting higher search engine rankings than they deserve might want to do this. When you change the meaning of the page (rather then merely its structure) it's called "semantic cloaking."
So, how can you detect semantic cloaking? Baoning Wu from Lehigh University presented work aimed at answering this question. (See the paper.)
You can't reliably detect cloaking from a single copy of the page. You need a page from the browser's perspective and one from the search engine's perspective. But even this isn't enough--some sites serve up different versions of a page to every visitor (e.g. changing a page or a feature). To reliably detect semantic cloaking, you need four copies.
That's a significant resource problem for the Web crawler and the sites serving up the pages. This research proposes a two step process that reduces the resources required and still yields good results.
The first step uses just two copies, one from each perspective, to filter out sites that aren't cloaking. The second step downloads two more pages and then uses all four to find sites that are really cloaking.
The classifier uses Joachim's SVMlight and looks at 162 features from each URL. The investigators manually labeled over 1200 URLs and then used 60% of these for training and retained the remainder for testing effectiveness.
The results are pretty good:
- Accuracy was 96%, precision was 93%, and recall was 85%. From the paper: "... precision means what percentage of the pages predicted by the classifier as semantic cloaking pages are actually utilizing semantic cloaking. Recall refers to the fraction of all semantic cloaking pages that are detected by the classifier."
- A test against the dmoz Open Directory Project showed that the filtering step reduces the 4.3 million candidate pages to just under 400,000 URLs that needed to be looked at further. The classifier found 46,000 URLs that used cloaking.
- The ODP test also showed that, not surprisingly, some categories are more likely to contain cloaked pages than others. Pages in Arts and Games categories were much more likely to use cloaking than pages in News, for example.
5:06 AM | Comments () | Recommend This | Print This
Improving Search Results Inside the Enterprise
Pavel Dmitriev from Cornell (click to enlarge) |
Organizations often use search engines as part of their corporate information infrastructure. The problem is that inside corporations creating Web pages is typically much more difficult than it is on the Web at large and consequently, links to pages are a much less useful indicator of page relevance. How do you solve this problem?
I attended a presentation by Pavel Dmitriev from Cornell that discusses one such solution. (See the paper.)
Within an organization, users are much more likely to be interested in improving the results from a search engine. Dmitriev and his co-authors built a system called Trevi for researching various strategies for annotating search results to improve relevance. There are two types of annotations: explicit and implicit.
Explicit annotations involve having the user indicate relevance of results using some kind of feedback mechanism. Once users are at a page, they are unlikely to go back, so you need to provide a means for them to indicate relevance of open pages. This research used a toolbar where users could enter a relevance rank.
Implicit annotation use the query logs along with logs indicating what users clicked on to represent a vote of relevance. There are a couple of different strategy vectors.
First do you count every click or just the last click? User's tend trust the search engine, so they click on higher ranked entries without necessarily reading them closely. To eliminate this bias, you might only annotate the last entry the user clicks on.
Second, you can assume each search is independent or combine the results from a series of searches over a short interval of time on the assumption that searches made close together are related to the task the user is working on right now.
The team did a series of experiments and determined that explicit annotation leads to a significant improvement in relevant pages being highly ranked (13.9% vs an 8.9% baseline). Implicit annotation, regardless of strategy used did not result in a significant improvement. The author's concluded:
A closer investigation showed that there was little overlap between the pages that received implicit annotations, and the ones that were annotated explicitly. We suspect that, since the users knew that the primary goal of annotations is to improve search quality, they mostly tried to enter explicit annotations for pages they could not find using Trevi. We conclude that a different experimentation approach is needed to evaluate the true value of implicit annotations, and the differences among the four annotation strategies.
4:31 AM | Comments (1) | Recommend This | Print This
DIDW and IIW: Two Great Tastes that Taste Great Together
The Digital ID World conference will be held September 11-13 this year in Santa Clara. We're going to have a 3/4-day IIW event on the 11th before the keynotes begin in the late afternoon. We're hoping to attract some of the usual IIW crowd to DIDW and some of the DIDW crowd to IIW. I'd like to see more cross over there. Attendees at the IIW event will qualify for a discount registration at DIDW. We'll have details forthcoming soon. Watch this space!
3:59 AM | Comments () | Recommend This | Print This
May 24, 2006
Knowing the User's Every Move
I sat through Richard Atterer's talk on User Activity Tracking for Website Usability Evaluation and Implicit Interaction. (See the paper.)
The problem is that putting code on the client to track user actions is invasive and users aren't likely to put up with it. On the other hand, putting the code on the server misses JavaScript actions that don't result in server requests.
Their answer was to use a rewriting proxy called UsaProxy that rewrites any page you request to make sure their tracking JavaScript is included. Very clever and related to some other things I've seen for modifying sites temporarily.
10:19 AM | Comments () | Recommend This | Print This
Visualizing Flickr Tags
This afternoon I popped into Andrew Tomkins' talk on Visualizing Tags over Time. The paper was nominated for a best paper award. The research looks at visualizing Flickr tags. Images and tags form a bi-partite graph that encourages "pivot browsing."
Tag clouds represent the default way of visualizing tags. Tags are not fixed in time. Does the temporal structure lead to a representation that allows up to surf through time and pick a gestalt sense of what was happening over time?
He demos a visualization that scans through the tags for each day, picks out representative tags and then streams pictures and associated tags across the screen (right to left) showing about 1 day per second. This is the "river" metaphor.
There are 87 million tags in Flickr and 1.26M of them are unique. Finding a "representative" tag depends on the temporal granularity. For example, a representative tag for a year is a major theme whereas the representative tag for a day is usually some quirky thing that happened on that day.
To find a representative tag in an interval, they divide the number of occurrences of a tag in an interval by a constant plus the number of total occurrences of that tag in order to find tags that occur more frequently in the interval and less frequently outside. Don't be fooled by sparse occurrences
They maintain scores of object occurrences at doubling length intervals. The paper covers algorithms for efficiently aggregating the threshold data for given intervals. Using an efficient algorithm is very important. The naive approach takes too long for anything but very short intervals.
10:01 AM | Comments () | Recommend This | Print This
Symmetric Queries in XML
Also in the XML session, Shuohao Zhang from Washington State University spoke on Symmetrically Exploiting XML. This paper was nominated as a best student paper. (See the paper.)
XML queries are asymmetric because they're hierarchical. Rearranging the hierarchy requires changing the query. This work is aimed at making a single query work across multiple structures. This is useful when you don't know what the schema is, for heterogeneous or irregular data, or when the schema evolves.
Axes (parent, child, ancestor, descendent, preceding, following, etc.) are all directional. This work proposes a non-directional axes called closest. The semantics is a function that takes the context node and returns a sequences of closest nodes. You can ask for closest::* and get all the nodes on any axis that are closest. Asking for closest::price will return the closest price node on any axis. Node selection is limited by a minimum distance between nodes.
The naive approach to implementation is to compute Closest for every node. This has time complexity O(sn2) where s is the number of nodes in the signature and n is the number of nodes.
A better approach is to convert non-directional expressions into directional queries. The advantage of this is that you can use an existing directional query engine a the basis for implementing this. Conversion is linear and fast enough relative to the query to add very little to the overall computation.
This looks like a pretty simple and efficient addition to XPath (and hence into XQuery and XSLT) that gives much increased flexibility.
7:54 AM | Comments () | Recommend This | Print This
XML Screamer, a Fast XML Parser
This afternoon I attended the XML session. The first speaker was Eric Perkins who spoke on XML Screamer, an integrated, high-performance XML parser/validator. This paper has been nominated for the best paper award. (See the paper.)
XML parsers are slow. Many people think that the human readability of XML is what makes it slow. How fast should we be able to go? Reading through an input file should take about 10 cycles/byte (1GHz processor). Xerces-C does 6Mbytes/Sec/GHz. Expat is 12Mbytes/Sec/GHz. What's happening with all the other cycles?
Eric walks through the steps required to parse a file. There are a dozen steps--a lot of UTF-8 to UTF-16 conversions. This is because schemas are typically in UTF-16 so comparisons all require conversion.
XML Screamer takes a schema and an desired output API and produces a custom parser in C or Java for that combination. Screamer optimizes across layers, avoids intermediate forms, and avoids format conversion.
XML Screamer is 1.9 times faster than Expat and 3.8 times faster than Xerces for non-validating tasks. For business object creation (non-validating), the numbers are 2.9 and 5.9 times as fast for Expat and Xerces respectively. For validating, these numbers go up to 5.5 and 11.6. This is getting to within 20-40% of the raw character scan rate.
A few conclusions:
- XML stacks are designed that way but parsers don't need to be built that way.
- Good API design is crucial to performance. Some APIs require string conversions, creation, or even buffer manipulations.
- Schema compilation means that compiled artifacts must be deployed with each application. This proves to be a significant drawback.
- This system is a prototype and IBM has no plans to release it.
7:28 AM | Comments () | Recommend This | Print This
Mashups, Web Data, and APIs
Frank Mantek, Jeff Barr, Dan Theurer, and Kevin Lawver (click to enlarge) |
I decided to take in Rohit Khare's panel on Next Wave (Business) this morning. This was part of the developer track that has normally been Rohit was kind enough to invite me to the panel dinner last night. It was fun and I
Dan Theurer from Yahoo! was first up and used the theme "What Powers Web 2.0 Mashups?" Dan introduced the Yahoo! Developer Network. The first APIs that Yahoo! launched were the search APIs a little over a year ago. He showed a long list of APIs that Yahoo! has released since then. You can get to a lot of Yahoo! content at this point.
The Yahoo! APIs are RESTful. The standard format is hostname, service, version, and call. XML is the default response. JSON can be used to avoid XML parsing and the need for proxies.
Dan points out some mashups. The first he mentions is Rollyo for rolling your own search engine. You can create custom search domains using a simple user interface. He also mentioned an event browser which I couldn't get to come up. I don't know if it's protected or just the network here at the conference.
The second panelist was Kevin Lawver from AOL. He talked about a microformat for widgets. Yahoo, Google, AOL, and other's all have widgets, but they don't talk to each other. Kevin sees microformats as an answer to that. He came up with a microformat called MicroT. This works in Aim Pages. He was also able to make it work in Dashboard. He demoed this using a widget he wrote for AimFight. Using a microformat for this allows a single, human readable format that can easily machine translated into other formats.
Frank Mantek from Google spoke about the Google Data API. Google, has lots of APIs available all kinds of things. Google Data API is a single API for querying and updating data across multiple applications. For example, here's the data API for Google Calendar. The API supports Atom and RSS. Updates are based on Atom.
There are a number of common elements (kinds) and some that are application specific. Google Calendar defines three kinds of data: events, contacts, and messages. Right now there are Java and C# libraries. A PHP library is in the works.
To make this really useful, you have to have 3rd party authentication so that other Web sites can authenticate Google users for purposes of getting Google data for that user. I wrote about this earlier, before it was fully baked. Sophisticated mashups are going to require authentication to sites that own data.
The forth speaker was Jeff Barr, Amazon's Web services evangelist. Jeff introduced the various APIs that Amazon supports. Jeff lists some best practices:
- First, have a program
- Get the business model right
- Get the technology side right
- Support developers
- Create community
The business model comes down to two choices: services enhance your business (the Doc Searls "because" business model) or services are your business. You need a good pricing strategy and a good license. You have to strike a balance between protecting your assets while also being permissive so that you allow the unexpected. Make it easy to get started.
On the technology side, start simple and support multiple protocols (SOAP, REST, JSON). Be a platform, meaning have a version strategy and strive for backwards compatibility. Ensure you're scaled for growth so that you don't have a "success disaster."
To have a developer program, you need more than just APIs. You also need extensive documentation, sample code, and technical support. Developers will count on you for their success, so they need support.
Building a community means forums, outreach, evangelism, blogging, and lots of random interaction. You have to get out and talk to people. Work hard to get into situations where interactions can happen.
Rohit brings up the Web 2.0 phenomenon of looking at the intersection of services and once you find an empty spot, you've got a start-up.
I asked about the multiple authentication problem. Google's authentication system works similar to OpenID in that it allows you to get a key which than can be later repudiated by the owner to deny that service access. Getting multiple services from different vendors in a single application would require storing multiple keys. In many cases, this would require users trusting 3rd parties with their login information. Jeff mentioned that there are already 3rd party sites that ask users for their Amazon user ID so that they can create an S3 account for them. Scary...
There was considerable discussion of security concerns on mash-ups. This hasn't been a big topic yet, but will be. All of the panelists said their company does things to prevent cross-site scripting, cookie stealing, and so on in their services. Rohit mentioned that in a simple service they built at CommerceNet the sanitization code was have the application.
6:19 AM | Comments (1) | Recommend This | Print This
The Next Wave of the Web
Plenary Panel (click to enlarge) |
WWW2006 was started this morning with an introduction to the technical program. The conference is very competitive; of the 697 papers submitted this year, 84 were accepted or 11%.
The plenary panel was entitled "The Next Wave of the Web." Nigel Shadbolt (University of Southampton) was the panel chair. The panelists were Tim Berners-Lee (W3C), Richard Benjamins (iSOCO), Clare Hart (Dow-Jones), and Jim Hendler (MINDSWAP). The discussion was mostly about the semantic web.
Shadbolt asked Berners-Lee what the achievements have been in the semantic web since the first article appeared in Scientific American. He pointed to RDF and OWL as very real achievements. He said that the big stopper has been not having a query language. SPARQL fills that gap.
Hendler said that we've underestimated the scale needed for deployment. People in the bioinformatics world can generate a 100 million triples per day. The challenge of scaling was way more than we've expected.
Benjamins said that it's easy to get people to understand the vision of semantic Web, but it's harder to get them to adopt. OWL has helped companies understand that the risk is limited now and it's easier to bring these technologies to the market.
Hart said that businesses have to figure out how to tap unstructured and structured information resources o build competitive advantage. Getting an inventory, determining a vocabulary, building sector relationships, applying tags automatically, and then building smart applications are key steps.
Are ontologies and taxonomies anything people can actually use? Hart says "yes." 80% of the data in businesses is unstructured. Organization is unique to the way the data is collected and used. Random indexing is great, but classification is necessary for many applications. For example, you can tell how many news articles in a given day mention IBM with indexing, but how many of those are about IBM?
Benjamins said that businesses think of ontologies as a necessary evil. They should recognize that they are doable and useful. We need better ways of estimating the cost of building an ontology so that businesses can plan for them and determine the ROI. Ontologies can be a source of competitive advantage. Factors in the cost include size, the difficulty of gathering the data, and the number of people involved in building it.
Hart mentions that Web 2.0 is leading expectations. Hendler says that he thought tagging would eventually work it's way up to ontologies, but its going both directions. There's some notion of converging to a "middle" but it's unclear to me where the middle is. What if these two activities are happening on different planes?
Berners-Lee says that the notion of people adding semantic tags to documents using Notepad is too simple. Hart says, again, that technology has to be the driver of categorization--people writing tags can only go so far. The amount of time people spend "searching" is increasing, not decreasing. This is a waste of time. You pay people to analyze data. The more time they spend searching, the less time they spend analyzing. Are you trusting your information strategy to a search engine?
Rohit Khare pointed out (via Bon Jour) this slide deck which has some good examples of Person OWL. This illustrates Benjamins earlier point perfectly--it's not hard to demonstrate the utility of the Semantic Web. People get it with only a few simple examples. They problem is making it happen.
Hendler says (again) that 1/3rd of the clicks on the Web are happening inside social Web sites like MySpace.com. This is layering a social component on top of the Web.
Berners-Lee says that his goal for RDF stores is that they will be invertible, With invertibility, you get the ability to debug where inferences go wrong. Thus you could get more than just a search engine; you'd get an explanation engine.
The title of this panel "the next wave of the Web" was appropriate for this panel, not because I necessarily believe that the Semantic Web is the next wave, but because it's still being talked about in the future tense and something that "will be really cool when it gets here."
3:27 AM | Comments () | Recommend This | Print This
May 23, 2006
MoodViews: Analyzing Mood Data from Blogs
Krisztian Balog (click to enlarge) |
Blogs are one of the places on the web you can reliably find people's writing about their moods. Krisztian Balog presented a paper called "Decomposing Bloggers' Moods: Towards a Time Series Analysis of Moods in the Blogosphere." This can be used to produce interesting data. For example, MoodViews tracks a stream of mood-annotated text from LiveJournal. MoodViews tracks, predicts, and analyzes moods on blogs.
Moods have a cyclic component. Some moods depend on time of day, some on the day of week. You can show a correlation between major events (say the London Bombing) and mood. So far, there's less than a year's worth of data, so seasonal fluctuations are not included.
Blog posts that label themselves "stressed" show a slight drop in the summer and a huge spike before and then drop after Christmas. This is age dependent. Cheerful has a huge spike during the holidays. Annoyed shows a drop at the same time. Loved shows a peak at Valentines day.
Excited and Lonely are on the decline over the test period while Contemplative and Creative are on the rise. Stressed, Busy, and Working are all correlated.
Analyzing the gap between an expected mood and the actual mood can give an early indication that something is happening.
Future work will analyze other correlations between moods and look at longer periods.
9:47 AM | Comments (1) | Recommend This | Print This
Detecting Splogs
I went to a session on blogging this afternoon. One talk was by Tim Finin on detecting splogs. He is part of the ebiquity research group at UMBC. He and his students do some interesting work in recognizing splogs. Tim wrote a funny splog bait post to see where it would get picked up.
Here's an interesting data point: the in-degree distribution of authentic blogs are described by a power-law, but splogs are not. The same is true of the out-degree. Ping times for real blogs is periodic according to the sleep cycle of the blogger. Splogs ping on a more constant basis.
Not surprisingly, English language blogs are much more likely to be splogs as are blogs in the info TLD.
It can take a minute or more for a person to determine whether a given blog is a splog or not. This makes coming up with good training data a problem.
Our reputation framework might be useful, with data from places like registrars, Alexa, and Netcraft to attach reputation data to URLs. You could also imagine a toolbar that would let users classify blogs they visit as splogs or not.
7:56 AM | Comments () | Recommend This | Print This
UTC Calls Hatch a Champion of Technology
I just got to the conference center in Edinburgh. The trip wasn't bad--I slept most of the way from Chicago.
When I checked my email, I had several emails that people had forwarded to me pointing out an email from UITA requesting that members attend a $500/person fund raising reception for Sen. Orrin Hatch. The announcement read, in part:
In recent years I've worked closely w/ Senator Orrin Hatch and found him to be a true champion of technology issues in our country. Since he's had such an important impact on our technology community, I hope you'll join me on May 30th to thank him directly.
Huh?!? A true champion of technology issues? Sen. Hatch is the single biggest threat to the Internet in the Senate and, consequently, to the livelihood of many of the UTC members this announcement was sent to.
At best, this can be seen as an acknowledgment that, barring divine intervention, Hatch will win the Senate race he's in and be in office six more years and so you'd better make friends with the powers that be. Still, it rankles me to see Hatch called a "champion of technology." He's not.
4:54 AM | Comments (3) | Recommend This | Print This
May 22, 2006
WWW2006
I'm just getting ready to leave to WWW2006 in Edinburgh Scotland. I'll be blogging interesting talks and events after I get there (sometime tomorrow).
You can follow my coverage by looking at my www2006 tag or even subscribing to my www2006 specific RSS feed.
9:21 AM | Comments () | Recommend This | Print This
May 19, 2006
May CTO Breakfast Report
We spent a good deal of time talking about offsite backup and things like Jungle Disk. Jungle Disk is a application for Windows, OSX, and Linux that uses Amazon's S3 as the storage substrate.
Bruce brought up Verisign's PIP, an OpenID service. This is particularly cool, I think, because its shows there's some momentum behind user-centric ID when big companies start to jump in. Michael Graves, of Verisign, was at IIW2006 and he talks about PIP on his blog.
We discussed how to build good teams and companies. We were surprised to see that getting it right the first time is such a popular title for books. The one on innovative companies was the one recommended. We also discussed how creativity has more to do with interaction that brilliance. Every person you lose, even the receptionist, takes all their interactions, and the creativity they give rise to, with them.
Pete Kruckenberg pointed out WorldMapper. This is a pretty cool site with maps that show various geographic features through size. of particular interest were the immigration and emigration maps.
9:48 AM | Comments () | Recommend This | Print This
May 18, 2006
Using Live Clipboard
Steve Farrell sent me an example of using Ray Ozzie's Live Clipboard. Ray's talk from ETech 2006 went up last week at IT Conversations--it's worth listening to so you understand what Ray's doing. Steve is a proponent of microtemplates. The Web site says that the "goal of microtemplates is to make it as easy to publish dynamic information as it is to publish static information." They're complementary to microformats, one of the staples of Ray's Live Clipboard.
I'm just learning about microtemplates, but they seem like a great way to avoid long chunks of JavaScript that do nothing but format HTML strings. Steve's Live Clipboard example, naturally, makes use of them. Go ahead and give it a try--I had seen Ray do it at ETech, but it was kind of fun to give it a go.
10:19 AM | Comments () | Recommend This | Print This
Software Symposium 2006
No Fluff Just Stuff is hosting a software symposium on June 16-17. It's still not too late for the early bird discount. The program and content look pretty good if you're interested in Java and agile methodology. I'm a little miffed that it's on a Friday and Saturday. I generally boycott conferences on weekends and so don't plan on going. I resent the encroachment of these kinds of activities into what I consider my leisure and family time. But, if you don't and you're in the Salt Lake area, you might enjoy the conference.
9:45 AM | Comments () | Recommend This | Print This
May 17, 2006
Hardware Video Encoders and Decoders
I need to broadcast video, fairly high quality, line-of-sight. The basic idea is to broadcast an encoded signal over a 5.7GHz link using some Motorola Canopy gear. In February when I needed to do this, I hired a company who used hardware encoders and decoders from a company called Integral Systems Design, but I can't find them in Google. Any one know of this company or similar boxes? I'd love to pick some up on eBay or something so I can avoid the ongoing cost of renting.
8:30 PM | Comments () | Recommend This | Print This
SOA Forum Wrap-up
Halley
Suitt (click to enlarge) |
My laptop was giving me grief yesterday (I think it's a memory problem) so I didn't get to everything I was planning on writing up. For example, I went to Halley Suitt's talk at Syndicate in the afternoon. Halley is one of the early bloggers and a great writer. She writes Halley's Comment and is the CEO of Top Ten Sources. She's also a sometime contributor at IT Conversations, doing a show called Memory Lane (I'd like her to do more shows--hint, hint).
The panel on SOA Governance went very well and we had some great comments. There were probably over 100 people there--the room was packed. My presentation on digital identity as the foundation for SOA went pretty well too. There was a good crowd and I had a number of them offer positive comments when it was done. I promised I'd point them to my sample digital identity policies, so there you go.
I'm home for a few days and then off to Scotland for a week at WWW2006. Watch for my reports.
4:09 PM | Comments () | Recommend This | Print This
SOA Name Change?
I got an email from Pheloxi in the Netherlands who informed me that SOA is the Dutch acronym for sexually transmitted disease. I guess if InfoWorld does a European version of the SOA forum, they may want to change the name. :-)
3:41 PM | Comments (3) | Recommend This | Print This
May 16, 2006
Rocketboom
Amanda Congdon (click to enlarge) |
I went to Amanda Congdon's plenary session at the end of Syndicate. She's the host of Rocketboom, a videoblog that deals with serious and not so serious news.
I've heard of Rocketboom, but hadn't seen it before. I enjoyed the clips she showed and will probably go have a look from time to time. They are getting 350,000 unique views per day and half that is international. There's only about 6 staff members, allowing the production timeline to move at a very quick pace.
Syndicating video requires working with multiple formats: wmv (multiple versions), mov (multiple versions), mp4, etc. This makes it a little trickier than audio where if you've go mp3, you're set.
She points at Daryl Hannah's video blog. Why does a Hollywood veteran do a video blog instead of doing something in a more traditional video medium? Its easier, costs less, and is more convenient for the producer and the viewer. Basic Brewing is another example of targeted niche content.
Amanda has a few thoughts about what all this means:
- Democratization of content
- Aggregating system, like iTunes and TiVo will become the norm, leveling the playing field.
- More and more people are becoming producers.
- Traditional ads that talk at you rather than to you won't work in this medium.
- This isn't about what you know or who you know. Producers can keep their integrity as artists and human beings.
What's the business model?
- Ads
- Subscription, and premium services
- Pay-per-view (iTunes model)
- Merchandising
- Consulting
- Write a book
She credits the Video iPod and Steve Jobs' featuring of them as a huge win. That one event gave them one of the biggest spikes they've had, increasing their viewers by 100,000.
3:56 PM | Comments (1) | Recommend This | Print This
Dave Weinberger on Tagging
Syndicate, another IDG event, is happening at the same hotel on the same days. I had some time before my talk on digital identity, so I snuck up to the third floor to heard Dave Weinberger talk about tagging, a subject that near to my heart lately.
What's the big fuss about? After all, aren't tags just keywords and metadata? Sure, but they're metadata written in ordinary language without a special vocabulary and are (usually) applied by the reader, rather than the writer. For some reason, people are more willing to tag other people's work than they are their own. Because this happens in public, there are social effects.
Dave Weinberger (click to enlarge) |
Without tags, Flickr is just another place to stick photos. Tags are what made it different. If you're doing research, tags represent a stream of social interest. Dave uses the example that he's doing research on taxonomy. By subscribing to that tag at Del.icio.us, he gets other people to do his research for him.
Tags don't always give us everything we need. If you go to Flickr and search for pictures of London and miss a few because they were tagged differently (say with "picadily")--no big deal. But if you're a brain surgeon or a planning a trans-atlantic flight, missing some data could be deadly.
Tag intersections can help solve this problem and some sites (like Flickr) are getting really good at this. Tag clusters help users navigate dissimilar items with similar tags. For example, if you look at things tagged with capri, you can see them clustered by tags intersections.
The basic message of tagging is that we can't pre-define them--they are user generated and flexible. This is why tagging works where taxonomies don't. People tag for themselves in a selfish way. The social benefits follow from this selfish behavior.
Dave brings up Quicken documentation and how lousy it is (they're not alone--just a convenience example). How much better would it be if users could tag it and help other users see it organized in a variety of different ways?
11:40 AM | Comments () | Recommend This | Print This
Getting Started with SOA
I'm at the InfoWorld SOA Executive Forum today. I'm moderating a panel on SOA governance and speaking on digital identity. The conference is completely sold out.
I was part of a team that wrote a feature for InfoWorld last week on the SOA lifecycle. I've watched (and helped) InfoWorld move into this space over the last few years and I think they've done more than just report on what's happening: they're part of the conversation and clarifying concepts in helpful ways.
Bruce Graham, BEA (click to enlarge) |
Tony Bishop gave the opening keynote. He's the SVP for Corporate Investment Banking Technology group at Wachovia. Tony gave 16 points that you need to follow for implementing SOA. I didn't catch them all, but I'll try to remember to point to his slides when they come up. The presentation was quite detailed--maybe too much so for a keynote. You could have filled a workshop with the ideas he brought up. He got a question that asked for a differentiation between components and services. I think the best explanation of that is from Roger Sessions. Sessions also clearly distinguishes objects.
Bruce Graham from BEA spoke on "accelerating your SOA implementation." He points to a survey of how companies are using SOA. Last year 52% said they hadn't started using SOA. Now only 19% say that. 41% of companies will spend more than $500K this year. The mean average is $2.1 million.
SOA Initiation Patterns (click to enlarge) |
Some of Graham's talk is essentially the same content as Bret Dixon's talk from March. I particularly liked the SOA initiation patterns discussion. This is a good place to start when you're looking at your own organization and trying to figure out what approach to take. There's quite a few examples in this talk about how organization at different points in this graph start their SOA implementations. This would be a good talk to listen to again. Unfortunately, it's not one of the one's we have on IT Conversations. Here are PowerPoint slides from an earlier talk on this topic. I'll try to link to the slides from this version when they're available since they're quite different.
Graham talks about six dimensions of an SOA project:
- Business strategy and process
- Architecture
- Building blocks
- Projects and applications
- Organizations and governance
- Cost and benefits
On governance, Graham recommends that you don't make the answer more difficult than it has to be. CIOs should directly engage with O&G questions as early as possible. Don't give it to a subteam. The CIO should act as a "benevolent dictator." Focus on aligning your SOA objectives, expected benefits, and guiding principles. Clarify roles and responsibilities.
Graham gives this advice for companies just getting started:
- Know your SOA entry point and the associated strategy
- Plan and manage holistically
- Simplify
8:06 AM | Comments () | Recommend This | Print This
May 15, 2006
Your Cell Phone Is Watching You
One of my favorite programs from last week was Nathan Eagle's Where 2.0 presentation on using cell phones to predict user behavior. Using only publicly available data, Eagle was able to deduce relationships between pairs and groups of individuals.
There are privacy concerns to be sure. Your cell provider already has much of this data. Every time two cell providers merge, what little protection we get from disparate carriers is broken down.
What interested me most though it not the privacy concerns, but the potential to infer and enhance social interactions using the wearable computers each of us carries around everyday.
What's needed to make this not only more private, but also more useful is real user-centric identity that trasfers across carriers and domains. People often move past identity to get to the fun stuff, but it's the identity infrastructure that makes it all useful and practical.
4:49 PM | Comments () | Recommend This | Print This
May 12, 2006
eVoting Security Holes
I put a piece about Black Box Voting's report up at Between the Lines. The report found significant security problems. The investigation is a result of Bruce Funk's courageous action in letting independent security experts look at his Diebold machines.
Should we panic? No. But we ought not to dismiss this security concern out of hand either as Diebold seems to hope we will. More states should subject more voting machines to independent tests by real computer security experts. If there's nothing to hide, then this should be a relatively painless thing to do. The fact that Diebold and other manufacturers are so unwilling to be forthcoming about the security of their machines leads me to wonder what they're worried about.From » Voting machine security flaws uncovered | Between the Lines | ZDNet.com
Referenced Fri May 12 2006 11:33:49 GMT-0600 (MDT)
It's interesting that the New York Times has released a story about this, but Utah's newspapers have ignored it.
10:40 AM | Comments () | Recommend This | Print This
Utah Senate Blog Is Effective eGovernment
The Utah Senate Site blog was featured in a story at Stateline.org.
Joining the nation's growing proliferation of political Web logs, or blogs, the Utah site was the first of its kind to strike up a digital dialogue that included entries not just from state Senate Republicans but also from minority Democrats and lawmakers in the opposite chamber. Unfolding comment by comment, the unofficial daily log often paralleled official debate taking place under the dome -- with the added bonus of anonymity.From Power blogging debuts in Utah capitol
Referenced Fri May 12 2006 10:23:04 GMT-0600 (MDT)
Ric Cantrell, on the Senate Staff, is the guy who makes it all work, but the blog features posts from Senators in the majority and even some in the minority. Plus, in a big nod to openness, the blog has comments open, giving anyone a chance to feedback to the writers.
Last year, for example. Sen. Chris Buttars posted about his "origins or life" bill (which eventually died in the House) and there were 99 comments. Did they change Buttar's mind? No. But they were a more effective way for the public to comment than we've ever had before.
What makes this blog work is what makes every blog work in the end: writing that interests people, openness, and a human voice. This isn't a collection of press releases. These are posts by real people, explaining in their own words why they're doing what they're doing. I think it's very effective and a great example of using IT in service of democracy. Add to that the fact that it cost about $150 to set up and you have to love it.
10:33 AM | Comments (1) | Recommend This | Print This
May 11, 2006
Grabbing Cell Data
Nathan Eagle's presentation at the Where 2.0 conference has some very interesting information about how easy it is to deduce interesting facts by monitoring cell phone location and proximity. Todd Biske has taken that and turned it into a call for better logging in SOA applications for the purpose of improving usability. This point to the need to carefully construct security policies around XML documents that are passed from place to place so that this kind of monitoring can occur without compromising sensitive data.
9:17 PM | Comments () | Recommend This | Print This
Browser Statistics Redux
It's been a while since I looked at browser statistics for Technometria. There's been some big changes. Here are the browser stats for 2006 to-date.
| Browser | Percentage |
|---|---|
| Internet Explorer | 50.76 |
| Firefox | 36.20 |
| Safari | 7.69 |
This is interesting because in the fourth quarter of 2005, IE garnered 56% and Firefox had 30%. This is a trend that is consistent with my earlier snapshots. That's a pretty big shift. Admittedly, my blog attracts techies and they're more prone to using Firefox, but I think it's a shift that reflects where things are headed.
The reason I noticed this is I was wondering about screen resolutions. I'm considering expanding the horizontal aspect of Technometria. Right now it's 800 pixels wide. According to my statistics over 92% of my readers have screens that are at least 1024 pixels wide.
3:41 PM | Comments (4) | Recommend This | Print This
May 10, 2006
Social Computing Symposium
Ross Mayfield is blogging the Social Computing Symposium. There a lot of information about games and their application and social impact. Good stuff, given my questions about the space.
1:35 PM | Comments () | Recommend This | Print This
Speeding Up Tags
A while back, I added a tag cloud to my blog. The idea was to replace categories with tags, a much more flexible system. I bend the Movable Type (MT) keywords and search to my purpose. One thing I did to make that work was modify the search script in MT to search keywords exclusively when it's called with the SearchElement=keywords option.
My next task, which I describe here, had three goals:
- Make something with a prettier URL
- Add RSS for tags
- Speed things up
The last point was important if I wanted this to work at any kind of scale. MT's search feature isn't speedy by any means and there's no way I could add RSS, with all the beauty of pull delivery, without solving number (3) as well.
There was a pretty simple answer to all of these goals: wrap the search function with some code to make it all work. I wrote a small function called tags in perl and installed it with the following declaration in the httpd.conf file:
<Location /tags>
SetHandler cgi-script
Options +ExecCGI
</Location>
With this in place, I could parse the tag from the path info to get URLs that look like this:
http://www.windley.com/tags/blogging
The wrapper just builds the old, crufty search URL and then calls it with a GET. Why use a GET? Loose coupling. I've found that I sometimes later moving things around and if everything is based on HTTP it just keeps working.
I found a template from Naill Kennedy that creates RSS for MT searches, but because MT assumes everything is HTML, it returned the wrong content type. The wrapper solved this problem as well since I could return any content type I wanted. Here's the RSS feed for that same tag:
http://www.windley.com/tags/blogging?flavor=rss
RSS is just another flavor on the tag. I like that. Now, tags are syndicated using RSS. That's important because I use category RSS feeds to drive some of the other things I do. For example I automatically grab all my posts on IT Conversations from the last week each Monday to include in the IT Conversations newsletter. Having the RSS feed makes that simple.
The speed problem is solved by having the wrapper cache results. If a file exists for the tag and it's less than a certain number of seconds old (currently I have it set to 6 hours), then the contents of the file are returned. If not, it performs a search. Whether it builds a search URL for HTML or RSS, the result is stored in a file for re-use. This makes most repeat searches very fast and put an acceptable load on the processor.
The code is available for you to look at. Let me know if you use it or improve it.
11:26 AM | Comments () | Recommend This | Print This
Conversing With Your Customers
I just posted an essay on Conversing With Your Customers. This will be my Connect Column for July.
10:34 AM | Comments () | Recommend This | Print This
Opening Finder Folder in iTerm
I saw a little script in Macworld that allows you to right click on a folder in Finder and have it open in Terminal. I often find it handy to use the command line and the Finder simultaneously, so this seemed like a handy thing to do. Note that you can always open the current directory in Finder by typing open . at the command line.
The problem is that the Macworld script is set up for Terminal and I use iTerm because I like the tabs. I found this script but it required the installation of some other things which I didn't want to do. So, I combined the ideas from both scripts to create one that does the job:
tell application "Finder"
set myWin to window 1
set theWin to (quoted form of POSIX path of (target of myWin as alias))
tell application "iTerm"
make new terminal
tell the first terminal
activate current session
launch session "Default Session"
tell the last session
write text "cd " & theWin
end tell
end tell
end tell
end tell
tell application "Finder"
activate
end tell
tell application "iTerm"
activate
end tell
To use this, open Automator and select Finder in the Library column and then drag the Get Selected Finder Items action followed by the Run Applescript action. Replace all the code in the Run Applescript action with the above code. Save it as a plug-in for Finder with a meaningful name (like "Open in iTerm"). You're done. Now, right-click on a folder in Finder and select the script from the menu. You'll get that directory in a new tab in iTerm.
8:44 AM | Comments (1) | Recommend This | Print This
May 9, 2006
IABC/PRSA Spring Conference
I spoke, along with Bruce Fryer and Charley Foster at the spring conference of the Utah chapters of IABC and PRSA. Most of the audience was either public relations or marketing and communications folks. The subject was blogging. Charley live blogged the talk as we went. I put together a set of del.icio.us bookmarks that record the sites we mentioned.
The main message: speak with a human voice and be honest or don't bother. We also went over my notes on how to start a blog and told people to study Scoble's corporate blogger manifesto. This was a lot of fun. The only problem is that 50 minutes isn't enough time to go over everything in enough detail. We could easily spend a hour each on getting started, syndication, internal blogs, wikis, and many other topics. Maybe we should do a day-long seminar (open-space).
2:52 PM | Comments (3) | Recommend This | Print This
May 8, 2006
MovableType Congifuration
If you've upgraded to MovableType 3.2, here's a clue for you: delete your old mt.cfg configuration file (since it's been moved to mt-config.cgi). MT will continue to read mt.cfg even though you're busily editing mt-config.cgi. This can be frustrating.
5:18 PM | Comments () | Recommend This | Print This
Learning CSS
A friend of mine is learning CSS. Like me, his standard MO when learning something new is to just look at the source and start playing around until you get it right. Mostly that works for CSS, but I found that there were some subtle points that I didn't just pick up and having a book helped. Here were two I found very useful:
The
Zen of CSS Design : Visual Enlightenment for the Web (Voices That
Matter) by Dave Shea and Molly E. Holzschlag, based on the
CSS Zen Garden, was not
necessarily useful for learning CSS (although it is that) but for the
inspiration of what you can do with CSS. You can probably
get the same thing from the Web site, but I'm a book person. I loved
the rich color graphics and the ability to read it over a stack of
pancakes at breakfast.
CSS
Cookbook by Dan Cederholm and Christopher Schmitt is
your work-a-day O'Reilly book with lots of examples, good reference
material, and clear exposition. This one is handy for learning some
of the subtle tricks and seeing best practice.
What about reference material? For that I prefer asking Mr. Google. If you need syntax, you can always find that online. I don't even have a favorite site--I just type css textwidth (or whatever) into Google and read the first site that pops up.
4:41 PM | Comments (2) | Recommend This | Print This
Clueless in SimLand
I was listening to Edward Castronova's PopTech! presentation (Gold From Thin Air: The Economy of Virtual Worlds) today and had a scary thought.
I've never been into video games, but as I listen to presentation's like Ed's, I always feel like I'm missing something. Not the fun and adventure, but that the world is moving to a new place and I'm not following.
I've always prided myself on staying up with technology and not being stuck in the last decade, but now I'm not so sure. Maybe there's this whole world out there that I'm conveniently ignoring. For example, is gaming technology the right way to teach certain subjects or concepts? If so, I'd never know it.
I don't get the feeling that I'm alone. I think there's a large group of folks who don't play online games and so they are mostly ignoring what's happening in that space. What do you think? Is this a hole in my education that I ought to rectify?
1:42 PM | Comments (13) | Recommend This | Print This
Reputation Podcast
Tom Maddox had his podcasting gear at IIW2006 and was interviewing people both days. He was just sitting in the main hall, so there's quite a bit of background noise, but they material is pretty good. So far, he's published the following:
10:44 AM | Comments () | Recommend This | Print This
TiVo's Got Podcasts
This weekend, I noticed that my TiVo could play podcasts. I don't know how long that feature's been there, months probably, but I just found it. Unfortunately IT Conversations isn't on their pre-populated list, but there were some that I was interested in. For example I listened to the latest TWiT this morning while I was getting ready. Here's a few thoughts and observations:
- This is a no-brainer for TiVo. Lot's of free content that they can put on their box for little effort and garner a "new" feature.
- You can add podcasts not on TiVo's list, but you have to use the cursor keys to select letters. This is almost as painful as entering URLs on your phone. No one's going to do it.
- If the are, they're not going to do it for long URLs. IT Conversations RSS URL is too long: http://www.itconversations.com/recentWithEnclosures.php. To add insult to injury, after I typed it in, something didn't work. I don't know what it didn't like.
- Once you start a podcast, there's no way to fast forward, back up, or pause at least not that I found in several minutes of playing with it. Its frustrating to have this remote in your hand that you can't really use for anything.
- The machine seems to get sluggish after it starts playing. I thought it was stuck at first, but eventually it responded to all the previous input I'd given it.
- It's possible to have a podcast playing and go to live TV so that they're both sending sound to the speakers. Seems that going to some other function ought to stop the podcast.
So, while I'm happy to see the new podcast feature in TiVo and will probably use it, like I did this morning, there are definitely some bugs that need to be worked out of the system. I'm not holding my breath.
8:27 AM | Comments (2) | Recommend This | Print This
May 6, 2006
IIW Identity Space Map
Kaliya created a wall hanging from butcher paper and lots of little colored construction paper icons to hang on it. This was hanging on the wall the entire workshop and people were free to add to it. The "map" was designed to represent the evolution of Internet or user-centric identity over the last 2 years or so and look into the future about a year. Kaliya had already pre-populated it and I took a picture to represent the intial state.
The above picture is the final state, at the end of the conference and reflects everyone's additions. Steve carter created this high resolution image if you'd like to read it
12:01 PM | Comments () | Recommend This | Print This
May 5, 2006
Bose Service Rocks
Last month by Bose Quiet Comfort Headphones broke. A little piece of plastic on the right side of the head band broke, allowing the right earpiece to flop out. I was bummed; these aren't cheap headphones.
I called Bose expecting a run around of one sort or another. Instead, I got a flat-out "send them back and we'll replace them free." No receipt proving purchase date, nothing. Just "send them back." The new ones arrived today, about a week after they received my old pair in the mail. Very impressive.
6:01 PM | Comments () | Recommend This | Print This
IIW2006 Wrap
After a day of decompressing from Internet Identity Workshop, I've had a few random thoughts that I thought I'd record. I was very pleased with how things turned out, that participation, the venue, the food, everything. Here are some specific things:
- First, Kaliya (aka Identity Woman) did an amazing job of putting the program together. She does this professionally, so if you're running a workshop that you'd like to do in a "unconference" format--she's someone you have to hire to do it for you. You won't be sorry.
- The Computer History Museum was a great venue for this sort of workshop and served our purposes perfectly. I highly recommend it.
- Steve Williams did a great job with the audio. We have some MP3's of the open and closing session for day 2 and day 3. We unfortunately we're set up to capture audio on the first day--my fault for not thinking about it ahead of time.
- The Hotel Avante was one of the best hotel stays I've had in a long time. Kudos to them. I'll stay there again next time I'm in the area.
I put up a piece at Between the Lines on the value of the unconference format. I have to admit that when Kaliya suggested it before last October's IIW, I was skeptical. I've been converted. Read the entire BTL piece for my thoughts on that.
We're planning two more events in 2006. The first will be a half day intro to Internet identity in conjunction with Digital ID World in September. I'd like to do a parallel series of session with some structured sessions for people new to the space and open format sessions for veterans who need to talk. The second will be another workshop like the one we just did. We're toying with the idea of doing on the East coast in November, but nothing's been cast in stone yet. If you have opinions, let me know.
4:42 PM | Comments () | Recommend This | Print This
SOA Governance Panel Reprise
We'll be doing a reprise of the SOA governance panel at the SOA Executive Forum on May 16th in New York. The panelists will be:
- Ed Vazquez of Sprint-Nextel. Ed's the Group Manager of the Web Service Integrations & SOA.
- Jeff Schneider of MomentumSI
- Johannes Viegener of Software AG. Johannes is Vice President of the R&D Crossvision Suite
- Michael Hill of HP. Mike is the Global Director for Enterprise Architecture & Governance
This should be a great panel. Ed and Jeff were on the panel in March and did a bang up job. I've heard good things about both Johannes and Mike.
Like last time, the panel will be run with a strict moratorium on PowerPoint slides. I'll spend 3-5 minutes introducing the subject, give each panelist 2-3 minutes to introduce themselves, and then we'll launch into questions. I'll have some prepared and also take audience questions. Here are some of the questions that we used last time to get things moving:
- Why is governance important?
- How does your company govern SOA?
- What big mistake did you make early on that convinced you that you needed to govern your SOA efforts?
- What role do policies play in governing SOA? What policies do you have?
- How do you distinguish between design-time, deploy-time, and run-time policies? Do you treat them differently in the governance process?
- Do you have a center of excellence (COE)? What role does it play? How does it work?
- How are governance and architecture related?
- Have you encountered resistance to SOA governance in your organization? How do you overcome it?
- How do you enforce policies? Who enforces policy?
- What process do you use for feeding back information from enforcement and taking corrective action?
- Is any of your policy enforcement automated (WS-I checks for example)? How do you do it?
- What role do registries play in your governance efforts?
- What role do Web services management systems play in your governance efforts?
- Do you use SLAs or other contract devices between providers and consumers? How are they managed?
The audience responded very well last time and asked quite a few questions themselves. If you've got other questions about SOA governance that you think ought to be addressed, please leave a comment.
2:12 PM | Comments (2) | Recommend This | Print This
IIW2006 Kudos for Unconferences
Kim Cameron has some very nice words for IIW2006 and the unconference format on his blog:
Everyone in attendance was awe-struck by the IIW 2006 that just took place in Mountainview. It was incredible.
With Doc Searls and Phil Windely navigating at the macro-level, the amazing Identity Woman Kaliya orchestrated an ”unconference” that was one of the most effective events I’ve ever attended. It’s clear that creating synergy out of chaos is an art that these three have mastered, and participants floated in and out of sessions that self-organized around an ongoing three-day hallway conversation - the hallway actually being the main conference room and event! So we got to engage in all kinds of one-on-one (and few) conversations, meet new people, work out concerns and above all work on convergence. Many people told me they felt history was being made, and I did too.
People showed amazing new demos of identity metasystem software from many different approaches and on many platforms. People, we are achieving orbit.From Kim Cameron’s Identity Weblog
Referenced Fri May 05 2006 08:07:17 GMT-0600 (MDT)
8:07 AM | Comments () | Recommend This | Print This
May 4, 2006
Speaking at Yahoo! on Reputation
Yahoo! (click to enlarge) |
I gave a presentation on identity and reputation at Yahoo! today as Chad Dickerson's guest. The talk (slides) introduced user-centric identity and then introduced the reputation framework that my students built. I hope we'll have releasable code and a paper available soon. I'm looking for funding to support further development of the framework. If reputation is interesting to you or your organization, contact me. I'd be happy to talk to you about what we've done and how you might be able to participate.
5:31 PM | Comments () | Recommend This | Print This
v|100 Selectees
vSpring has released the names of the v|100 for 2006. "The v|100 was conceived by vSpring as a tool to recognize the region's outstanding entrepreneurs and to support and promote partnering and collaboration among the state of Utah's top entrepreneurs." I'm happy to say that I'm among them for the third year in a row, particularly since the nomination and selection process is done by the entrepreneurial community in Utah.
10:21 AM | Comments () | Recommend This | Print This
May 3, 2006
Wiki Wednesday
After the day was over at IIW, Eugene Kim was headed down to SocialText to speak at Wiki Wednesday, so some of us tagged along. The evening was nice enough that we moved it outside. A very informal and nice conversation.
Eugene was being controversial and said that recent improvements to wikis are missing the point. Wikis are transformational tools for communities. They are neutral space. So what features are needed and central to the notion of a wiki?
Yet wikis need single sign on. Lightweight SSO solutions are viable. If wikis are supposed to be about community, then you ought to be able to get one them easily. Wikipedia is multiple wikis. Every single one requires that you create a new user. This breaks community because you can't carry identity from one place to another.
Reputation is another issue for wikis. The number theme at the last wiki conference was reputation. Wiki people don't know what reputation means. Wikis already have reputation, but we're not acknowledging it or branding it. Wikipedia articles, for example, have discussion pages, authors, author expertise, number of edits, and so on. People should be able to see these. More transparent metrics are important.
A simple thing you could do is to "age" pages so that page color changes the older and staler a page is. Using a visual metaphor gives information without making an explicit reputation claim.
I made the point that reputation is "my story about you." Reputation is not a community thing. Communities tend to arrive at similar reputation judgments because of webs of trust. The key is to expose facts so that reputation can be computed based on those facts.
Wikis shouldn't look too good or too finished. When someone sees a document that looks finished, they assume that someone owns it or that there's no point making comments.
The second, and most important characteristic of wikis is that wikis are about language--specifically shared language. Shared language is the number on thing in building collaborative communities. "Link as you think." Referring to pages by name means you've for to think about page names (thinking about language). You might be using words differently than someone else. This can relate in namespace clash--a good think in wikis. This drives nomenclature convergence.
This affordance that wiki page links have to encourages name space clash is similar for tags. Tags create clashes and serendipitous connections. If you do a page index on any wiki, you're seeing the vocabulary of that community. The default for this listing is an alphabetical list. We should us tag clouds based on backlinks to words as a better visual metaphor for nomenclature. Wiki tags should not just show pages so tagged, but also wiki words.
Nomenclature isn't an exercise where you can get everyone together for a few meetings before the project starts and build a taxonomy. Any tool built on a wiki structure will get "link as you think" as a matter of course. That doesn't need to be. There's no reason you can't have a "link as you think" feature in forums, email, blogs, and so on. Link as you think encourages convergence around ideas.
8:19 PM | Comments () | Recommend This | Print This
IIW2006: Wednesday Sessions
Randy Farmer leads the
skeptic session (click to enlarge) |
Kaliya started the day with a call for anyone else who wanted to create new sessions and then did a "spectrogram." She put a long piece of tape on the floor and asked questions where people arrayed themselves along the spectrum represented by the tape. She interviewed people at spots on the tape. A good way to get a feel for how the group is thinking about some things.
I did my session on reputation and showed off the reputation system we built in my 601 class last semester. Generally well received and good comments.
Chris Allen led a session on the notions of reputation and collective choice.
Chris
Allen (click to enlarge) |
In collective choice, there are three primary actions: selection, opinion, and comparison. Selection happens by voting, deliberation (Robert's Rules), and consensus. Opinion systems mainly rely on polling. Comparison happens by ranking, rating, markets, and reputation.
Which bring us to reputation. Reputation can be used for collaborative filtering, collaborative sanctioning, and threshold maintenance. Negative reputation can be problematic. Altruistic punishment is a way of getting around that.
Reputation changes over time. Reputation accumulates. Timestamps on transaction are important. Disassociating from past choices is important. Who's not my friend any more?
Chris has been looking at attack categories for reputation.
- Shilling attacks (using a shill) can add to a reputation with no substance, yielding high positive ratings. Called "astroturf" in some circles--fake grassroots.
- Spamming attacks can lead to excess bad results. Also called "griefing."
- Whoring (as in "karma whoring" on Slashdot). You do things right right for a time in order to build up a reputation which you exploit later. Sell a bunch of cheap books on Amazon and then use the good reputation to rip someone off on a single large transaction. A better word for this might be "Stinging."
- Collusion (Chris called this "faking") is people working together to build a false reputation.
- Coercion & retaliation
- ID attacks
- Flooding attacks
There's a mailing list for people who are interested in reputation.
Drummond, Paul Trevethick, and Andy Dale discuss Higgins and XRI (click to enlarge) |
There were some great sessions and the workshop seemed to have given most people some new things to talk about. We're contemplating what to do for the next workshop in the Fall. We'll be doing a short event just before DIDW on September 11. THat will be a 6 hour event and attendees will get a break on DIDW entrance fees.
We also want to do a longer event later in the fall. We've contemplated doing an event on the East coast in November, but I'm inclined to stick with the West Coast because of the energy that's being generated in the events that we've had so far. If you have ideas, contact me or Kaliya and let us know.
12:53 PM | Comments () | Recommend This | Print This
May 2, 2006
IIW2006: Tuesday Afternoon Sessions
Doc, Dave Winer, and
Don Park (click to enlarge) |
The afternoon started for me with a session that Dave Winer led on identity in OPML and RSS. There's a need to identify owners and authors in OPML and RSS without creating email addresses that can harvested by spammers. This is a good time to have this discussion because OPML 2.0 is being spec'd.
The <head> section in the spec includes a <ownerId> that is defined thusly:
[T]he http address of a web page that containsan HTMLa form that allows a human reader to communicate with the author of the document via email or other means.
The line was long (click to enlarge) |
The is a "contact me" form like the ones 2idi provides for i-names and NetMesh provides for LIDs. I think Dave is anticipating some identity infrastructure that doesn't necessarily exist in a standard way yet. Since OPML 2.0 will freeze the OPML spec, this is a good time for people in the identity space to offer some input.
So, an i-name or LID contact form foots the bill for what Dave is after. We got into an interesting discussion, however, about what's missing in the current schemes. I used myself as an example. I use my i-name contact form in different contexts. I frequently get contacts from people who don't give me enough context in their message and I don't know what context they clicked on the link in and have to guess.
Dave needs the <ownerId> form so that people who collaborate on OPML can be connected through a contact form in an automatic way (that is, from the information in the OPML file). The URL of the document that the contact is linking from and the title of that document (or node in OPML) would work here.
Dale Olds (Novell) works on the map (click to enlarge) |
Dale Olds from Novell ran a session aimed at creating a map of the open source identity space. There are notes from the session on the wiki.
This turned out to be a useful exercise in gathering information and it generated a lot of discussion. Here's a hi-res version of the whiteboard at the end of the discussion. It interesting to me how talks like this one educate people in ways that are far removed from the stated goal of the discussion.
We convened another discussion of Identity Rights Agreements. Drummond led and the interaction turned to the most concrete discussion of terms and ideas we've had yet. We mostly determined that there were two concepts duration and party (or maybe purpose) that break out like this:
| Duration | Party |
|---|---|
| use once | yourself/stated purpose |
| relationship | yourself/related purposes |
| forever | affiliates |
| others so bound | |
| anyone |
The key thing is to be simple and have the right defaults. We need to get a strawman proposal on the wiki and start hacking it out.
The closing circle (another open space thing) let people summarize the day and say things to the entire group that they might not have wanted or been able to say in the smaller gatherings. Some were encouragement, others were more like anouncements.
3:36 PM | Comments () | Recommend This | Print This
IIW2006: Tuesday Morning Sessions
Monday Dinner (click to enlarge) |
Last night's conference dinner was very well attended and very good.
We started the morning in true unconference fashion by putting together the agenda. This happens by having anyone who wants to lead a session write it down on an 8.5x11 inch piece of paper and post it on a time grid on the wall. Everyone who posts something gets an opportunity to say something about their session. the agenda is fairly full and there are some good topics.
Putting together the agenda (click to enlarge) |
Kaliya said that the guy who invented open space spent a year planning a conference and then had someone tell him that the breaks were the best part of the conference. So, he decided to create a conference that was less structured. You might say that the theme of open space is "all breaks, all the time." After everyone had introduced and posted their session, there was 30 minutes or so before they were set to begin. There was some jockeying to get things in the right place and not opposite potentially conflicting topics. All in all, pretty good.
The first session I went to was Gail-Joon Ahn from Univ. of North Carolina. Gail-Joon and his students built an open source implementation of InfoCards. They're interested in creating potable, interoperable, and multi-modal identity card selectors (part of InfoCard).
Gail-Joon Ahn and students (click to enlarge) |
Gail-Joon's students demo'd a Java version of the InfoCard selector. The demo included logging into a site using a selected InfoCard, creating cards, and interacting with identity providers and relying parties in a couple of scenarios. All of the code is in Java. This is an impressive effort, but also illustrative of the fact that InfoCard
- doesn't have to be just a .Net/Microsoft thing and
- is simple enough to allow multiple implementations.
Part of their work involves moving InfoCard beyond the desktop and to mobile devices. They demo'd what's called an "i-button" that contains a secure token. The i-button could be on a ring or key fob. There was also a demo showing an InfoCard selector on a mobile phone.
Chuck Mortimore did a 5-minute demo of a Firefox plugin he's done for InfoCards. He created a card and then logged into Kim Cameron's blog using the card. Pretty cool.
Kim Cameron took over to show the code that Chuck was hitting on his blog. The relying party stuff he's using is all written in PHP. Kim showed various debugging tools for seeing what's going back and forth and demo'd the use of various InfoCard pieces from various players together.
In the second session of the morning, I dropped into a discussion led by Yan Cheng (AOL) on making identity systems work together. He came up with a three-axis diagram that he used to classify identity systems. Each axis represented a context that the identity system supported. One axis was "business," another was "private," and the third was "public."
To give some examples, Yan saw things like InfoCard, AOL, and other similar systems as more in the private context. LID, OpenID, SXIP with along the public access. Liberty was associated with the business axis. As you can imagine, this engendered considerable discussion--that's good.
11:31 AM | Comments () | Recommend This | Print This
May 1, 2006
IIW2006: SXIP, InfoCard, XRI, and Doc
The new "just right" room (click to enlarge) |
We moved upstairs to accommodate the crowd and ended up with a lot more elbow room. Dick Hardt was the first speaker after the break. he gave a new version of his famous Identity 2.0 talk.
Dick mentions BCeID, a government identity service that forms a basis for digital identity in BC. I've long argued that governments have abdicated the responsibility for provide commerce supporting infrastructure online. (By "infrastructure" I mean legal frameworks more than hardware and software.) BCeID looks to be mostly about government online services, but Dick points out that he's interested in seeing how it can be used by other places, like BC Hydro (power company).
Dick quotes Larry Wall's dictum about Perl, "Easy things are easy and hard things are possible," as a good basis for evaluating identity schemes. He lists a number of ideas that fall into the "hard things" category: agency, compartmentalization, notification, and granularity.
Mike Jones and the demo (click to enlarge) |
Mike Jones from Microsoft was given the task of introducing the Laws of Identity and InfoCard. As a way of introducing InfoCard, Mike talks about claims and credentials in the physical world and how we use them. Mike spent a good deal of time talking about the laws. I think that was time well spent--they form a good basis for many of the conversations we want to have at IIW.
The identity metasystem concept is aimed at not inventing a new identity system, but inventing a system that can unify different identity systems. InfoCard confuses people because it seems like an identity system and has to be, in some sense, but it's open because of the standards involved, so other identity systems can be adapted to work with it. The fact that there will be at least one open source and one commercial InfoCard system up before Microsoft releases it is testament to this.
InfoCard is an attempt to provide a simple user abstraction for digital identities that's grounded in a physical world metaphor of credentials. The success of InfoCard is dependent on others implementing InfoCard.
Eve Maler from Sun was charged with discussing the Liberty Alliance Project. She quotes H.H. Monroe as "a little inaccuracy sometimes saves lots of explanation" by way of saying that in 20 minutes, she's going to have to wave her hands a bit to get it all in.
About half the audience was familiar with SAML. Eve went through some high-level use cases as a way of introducing concepts and then moved into SAML and Liberty specific use cases.
DSC_0002.JPG (click to enlarge) |
Drummond Reed spoke about XRIs. XRIs are a way of using a URL-like syntax, that is backwards compatible with the Web, to represent identifier authorities. On the IRC backchannel (#identity on freenode.net), someone said "isn't an email address a URI?" when Johannes was talking and URL-based identity. XRI, as a Yadis compatible identity syntax, makes it clear that email addresses are part of URI-based identity.
So why a new addressing scheme? There are many different devices and different addressing schemes for each one. Even though each (like phone numbers and email) are controlled by a single entity, they each have a different syntax and controlling authority. A unified identifier can make managing these various addresses more convenient and add new services.
Drummond yielded some of his time to Andy Dale to speak a little about XDI. I wrote extensively about this last December when I was at the XDI workshop that Andy put on.
DSC_0004.JPG (click to enlarge) |
Doc Searls got here right before the break and I asked him to redo his talk to set some things up for tomorrow. Doc brings up the Cluetrain Manifesto and how he realized over time that identity was critical to that vision. He recounts the history of "how we got here" (see Kaliya's Map).
Moving from history, Doc starts talking about attention, intention, and marketplaces. These all get down to relationships. Doc has blogged about this at the IT Garage under the banner Starring in Your Own Constellation: Independent Identity in Networked Markets.
5:54 PM | Comments (1) | Recommend This | Print This
IIW2006: Identity, Lexicon, and URLs
The identity map (click to enlarge) |
One of the nice things about an informal workshop is the freedom to rearrange things as necessary. Doc, who was opening, was running a little late, so we re-did some of the schedule.
Eugene Kim was first up at IIW. Eugene's job was to introduce the ideas behind user-centric identity. He introduces the concepts of identity by introducing himself. User centric identity is about users controlling their own identity. Where does that lead us?
Eugene Kim (click to enlarge) |
Eugene contrasts the idea of single sign on with portable identity. While many people use a single ID and password for most Internet sites, that's not really the point. Most identities on the 'Net aren't portable. Users would get choice; businesses would get more accurate information (how many people lie on registration forms to avoid this very problem?).
Eugene brings up the Yahoo/Flickr story as an example of how attached people get user names. When people thought they were losing their Flickr user names, they got angry.
Paul Trevithick was next, speaking about the community around user-centric identity and the lexicon that's being developed.
The lexicon project is aimed at coming up with common definitions for identity related terms. He went through a number of these. I won't record them here, but recommend you go over to the lexicon and look through them.
He works through the concepts of "entities," "subjects," and finally to "digital identity." Paul distinguishes subjects as things that have attributes and identities as sets of claims. The claims are about attributes and may or may not be true. A question raises the point that claims are not first class--you can't make a claim about a claim--at least not in the definition that exists now.
Johannes Ernst was the next speaker. The topic as URL-based identities. URLs are empowering because they can be bookmarked, tagged, linked to, subscribed to, explored, and customized. We already do these activities for lots of things. URL-based identities allow us to to do them for people. Simplicity is an important attribute of URL-based identity. "Light-weight" identity is an architectural statement.
The original "too small" room (click to enlarge) |
URL-based identities are engendering innovation in the identity space. He points to Yadis, a protocol for discovering the capabilities of identity URL. Based on that foundation, you can build authentication in various forms, profile queries, registration, messaging, and so on. This is what's we've done with the reputation framework that my lab is building: we're building functionality on top of URL based identities.
We've got a lot more people here than we planned so we're going to break early and move to a bigger room upstairs. That's good news. There are probably 25 more people here today than wed planned on.
3:03 PM | Comments () | Recommend This | Print This
IIW2006: Getting Started
The Internet Identity Workshop starts today. I'm actually sitting in the Computer History Museum right now, getting things set up. It's not too late to come, if you're interested. I've added a one day option to the registration page. That includes snacks, lunch, and dinner (on Tuesday).
I'll be live blogging, as will others. Instead of doing some kind of Planet aggregator like I did last time, I figured we could just advertise that we were using iiw2006 as the tag and then count on others, like Technorati to pull them all together.



