« Blogging for Dollars | Main | Sprint's Product Management Foibles »

Power Laws, Longtails, and Software

Yesterday I spoke to a group of about 60 students at UVSC and then in the evening, I addressed the BYU Unix Users Group. I spoke on Power Laws, Longtails, and Software. Here’s the abstract:

If you took statistics and are a Computer Scientist, chances are you learned about the wrong kinds of distributions. Hardly anything about CS is normal…or Gaussian for that matter. This talk will explore power law distributions and their relationship to Internet businesses like Amazon.com and Rhapsody. Having a tough time figuring out who to work for? Power laws can help.

This is a fun talk to give to CS students and I enjoyed both sessions.

Posted by windley on September 22, 2006 10:34 AM

See related posts:

6 Comments

Comment from Jeremy P at September 22, 2006 12:41 PM

Having gone through the BYU CS program in the mid-90s, I actually thought there wasn't enough math and statistics. Zipfian, Gaussian, or otherwise. I suppose one of the problems is that it is hard to know what a future graduate might need. For someone who would be getting a job in device driver software coding, I suppose EE 220 was a good course to include in the CS program. But for someone like myself, who went more into information theory and research, I didn't really need to know much about that layer. I would have rather spent my time learning about more advanced statistics or about how to do good, efficient approximation of NP-hard problems.

And whether learning Gaussian vs. Zipfian is the wrong type of distribution to learn.. I suppose it all depends, again, on what you are going into. If you are going to be a marketer (Amazon/Google/eBay) or social networking, then yes, learn the power law family. But the sort of work I have been doing is with content-based music and image retrieval -- e.g. imagine presenting the system with a "cha cha" and having it automatically scan through your collection and find, from the raw audio signal of the music itself, other songs that fit the cha cha profile. Or humming a tune into the computer, and having it automatically find the song you are looking for. Or imagine pulling up a picture from your digital collection, and drawing a box around the face of your Aunt Frannie.. and telling the system to find all the other pictures in your collection with this face.

For that sort of CS work, you'd better know the Gaussian, too. Power laws don't quite cut it.

I suppose it is all about having the right tool for the right job. And at the undergrad level, it has got to be really hard to anticipate all possible future tools that people will need, would you say?


Power laws are trendy, but many situations actually follow the log-normal distribution, and are incorrectly misidentified as following the power-law with a combination of sloppy analysis and wishful thinking.

See this article for a perfect illustration:
http://www.cscs.umich.edu/~crshalizi/weblog/390.html

Interesting application of the power-law to company size. Gotta think about that.

Also, I hadn't come across the Bezos "two-pizza team" rule before.

Comment from Ben at September 22, 2006 11:45 PM

I agree with Jeremy in that it totally depends on the type of work you plan on doing. Doing ML, image processing, or any scientific computing requires a strong math background with a frequent use of the "wrong" kind of distributions.


Comment from Matthew Fry at September 23, 2006 1:22 PM

I was one of those CS students at UVSC. Just wanted to thank you for speaking on Thursday. I really found the subject very interesting. In fact, I had audible downloads left for this month so I downloaded The Long Tail and have been listening to it nonstop every since. This sure seems like where our economy is heading.

Comment from Rod at September 30, 2006 1:51 PM

Is there a way to download the audio track
for one or both of Dr. Windley's talks mentioned
above? I didn't see it on itconversations.

thanks.

Leave a comment

I encourage you to leave a comment below. Your email address will not be displayed on Technometria, but allows me to communicate with you directly. Your email address won't be displayed, but will be used to compute a MicroID for your comment.