Cloud Computing: Dr. Kai-Fu Lee of Google


Main hall where keynotes
were held.   I love the red slip covers on the chairs.  They were
more comfortable than your standard hotel chair.
Main hall where keynotes were held. I love the red slip covers on the chairs. They were more comfortable than your standard hotel chair.
(click to enlarge)

The opening keynote at WWW2008 is Dr. Kai-Fu Lee of Google.

Before the keynote, we were treated to a presentation that featured dancers in blue Spiderman uniforms, a dancer in what I assume was traditional dress, and a guy with a "Welcome to Beijing" banner running through them all. Somehow, it seemed to fit perfectly even though it was the first of it's kind at any tech conference I've been too--especially one that's essentially academic.

We received a welcome speech from Dr. Yong Shang who is the Vice-Minister of the Ministry of Science and Technology. It basically said "thanks for coming, China's pushing forward with Internet technology." No mention of the firewall. :-) As an aside, the fact that I can find him and his ministry on Google in English speaks louder about what he was saying than his actual words. No doubt the Chinese government understands the power of the Internet. That said, in terms of eGovernment, there was mostly information there, not much in the way of services I could see.

The Internet connectivity has failed and we're not even 30 minutes in. Hopefully it will come back up. I was planning on watching Twitter for news of the Pennsylvania primary. The opening ceremony has gone on for 40 minutes now. Finally we're ready for Lee's keynote.

Cloud Computing: Dr. Kai-Fu
Lee of Google
Cloud Computing: Dr. Kai-Fu Lee of Google
(click to enlarge)

He starts out asking what people want. Many of his answers were specifically about accessibility and it's control. There are four key attributes of cloud computing

  1. Data stored in the cloud
  2. Software services are increasingly moving to the cloud and accessed through the browser
  3. Based on standards and protocols
  4. Accessible from any device

Interesting that this is more or less the Google's core set of beliefs. Companies often distinguish themselves from Google in departing from these principles. The world has moved from hardware-centric to software-centric to service-centric.

Six ideas driving cloud computing:

  1. User centric Data is stored in the could and follows you and your devices. Data accessible anywhere and easily, safely shared with others. He mentions several obvious examples of Google services that meet this definition.
  2. Task centric People don't want to make spreadsheets or write documents. Rather they want to plan a curriculum or collaborate on a business plan. Right now, of course, all Google's examples are simply documents or spreadsheet with collaboration built in.
  3. Power Lots of computers in a cloud can do things you can't do with a single PC. Google search is faster than desktop search because there's lots of computers on the task. Cloud computing isn't just about moving things off the desktop, but bring more data and compute power to bear on the problem.
  4. Intelligent Intelligence comes from data mining of massive data. "A ton of data is more valuable than an ounce of algorithm." I'm not sure that says much. Machine translation is a good example where feeding lots of good translation data into a learning algorithm leads better translation of general text. Storage + analytics = intelligence.
  5. Affordable Of course, this all uses a lot of computers and that gets expensive. Google's strategy is to use cheap machines. 1000 CPU PC-Class machines cost about the same as on 64-way high end machine and give 30x the performance (warning: data may be out of date). The actual numbers at Google are even greater since Google builds it's own hardware. Faulty hardware can be overcome with a sophisticated software layer. This is the heart of engineering.
  6. Programmable How do you program 1000's of flaky servers? Fault tolerant distributed disk storage, distributed shared memory, and a new programming paradigm. Google uses GFS for file storage: every piece of data is replicated three times. Anytime a server holding on of the three chunks dies, the others notice and make another copy. The shred memory architecture is Big Table. The programming is done using MapReduce, a way of creating parallel algorithms. Between Mar 2005 and Sept 2007 the number of processes using MapReduce went from around 72000 to over 2 million!

Cloud computing requires new skills. This is very true. We don't do enough to teach these skills to students. We ought to be introducing parallel computing in the cloud as the second programming course--ensuring that the first emphasizes the building blocks for the second. This probably means it's not in Java.

John Breslin has an excellent write-up of this speech as well.