This afternoon I popped into Andrew Tomkins' talk on Visualizing Tags over Time. The paper was nominated for a best paper award. The research looks at visualizing Flickr tags. Images and tags form a bi-partite graph that encourages "pivot browsing."
Tag clouds represent the default way of visualizing tags. Tags are not fixed in time. Does the temporal structure lead to a representation that allows up to surf through time and pick a gestalt sense of what was happening over time?
He demos a visualization that scans through the tags for each day, picks out representative tags and then streams pictures and associated tags across the screen (right to left) showing about 1 day per second. This is the "river" metaphor.
There are 87 million tags in Flickr and 1.26M of them are unique. Finding a "representative" tag depends on the temporal granularity. For example, a representative tag for a year is a major theme whereas the representative tag for a day is usually some quirky thing that happened on that day.
To find a representative tag in an interval, they divide the number of occurrences of a tag in an interval by a constant plus the number of total occurrences of that tag in order to find tags that occur more frequently in the interval and less frequently outside. Don't be fooled by sparse occurrences
They maintain scores of object occurrences at doubling length intervals. The paper covers algorithms for efficiently aggregating the threshold data for given intervals. Using an efficient algorithm is very important. The naive approach takes too long for anything but very short intervals.