Jim Grey on Distributed Computing Economics


I attended Jim Grey's talk at the Organick Memorial Lecture at the University of Utah this afternoon. He spoke on distributed computing economics.

Jim asks:

  • Why is Seti@Home such a great idea?
  • Why is Napster a great idea?
  • Why is the computational grid uneconomic?
  • When does computing on demand work?
  • What is the "right" level of abstraction?
  • Is the "access grid" the real killer app?

Observation #1: Computing is free. 1 CPU day costs $1 (given a computer costs $1000). But the phone bill isn't free. Internet bandwidth costs $50--500/mbps/m So 1GB costs 1$ to send and 1$ to receive.

So Seti@Home requires: send 300KB (costs $03e-4). User computes for 1/2 a day for a benefit of $0.5e-1. The ROI is 1500:1. So Seti@Home is a good idea. The Seti@Home "supercomputer" has 61TF. This is bigger than the top 4 supercomputers in the world put together. They invested zero and got back a computer worth a billion dollars.

Here's Napster: Send 5MB costs $5e-3, or roughly half a penny per song. Both sender and receiver can afford it. Yahoo! has a similar story: $1e-3/page view in advertising revenue against $1e-5/page view cost of serving the page. That's a 100:1 ROI.

But in reality, computing is not free. IBM, HP, Dell and others make billions of dollars selling computers. But, storage makes up 61% of the cost of the computer.

$1 buys:

  • 1 day of CPU time
  • 4 GB ram for a day
  • 1 GB of network bandwidth
  • 1 GB of disk storage for 3 years
  • 10 M database accesses
  • 10 TB of disk access (sequential)
  • 10 TB of LAN bandwidt (bulk)
  • 10KWhrs == 4 days of computer time

Consequences:

  • Beowulf networking is 10,000x cheaper than WAN networking. Factors of 10^5 matter
  • The cheapest way to move TB cross country is sneakernet. 24Hrs = 4MB/s. You can pay $50 for shipping or $1000 for the WAN. Sneakernet is better performing in most cases, more reliable, and cheaper.

To the extent that the computational grid is like MPI where large data sets are transfered for data analysis, it fails on economic grounds. Move the programs to the data, not the data to the programs.

When to export a task:

IF instruction density > 100,000 instructions/byte
AND remote computer is free (i.e. costs you nothing)
THEN ROI > 0
ELSE ROI < 0

Computing on demand is things like Salesforce.com, Oracle.com, and other outsourced application vendors. Computing on demand works for commoditized services. Airlines outsource reservations and banks outsource ATMS, but Amazon, AMEX, Wal-Mart, eTrade, eBay and others can't outsource their core competencies.

What do you outsource? Here's a stack:

  • Disk blocks?
  • Files?
  • SQL?
  • RPC?
  • Applications?

Not very many successful companies at the lower levels of abstraction, but AOL, Google, Hotmail, and Yahoo! are all vibrant examples of applications on the Internet. So, what about SOAs (the RPC level in this stack)? The jury is out. There are some examples of companies economically offering SOA-level services.

Is the Access Grid the next killer app? Jim asks: "What comes after the telephone?" eMail and IM seem too retro: just text and emoticons. Picture phones have been tried since the 60s. The access grid may be the right idea. The access grid is picture phone for groups. This is the "Internet as a billion TV channels" idea as near as I can tell.

The questions afterwards, inspired by the Access Grid comments, were about Mayor Anderson's recent opposition to UTOPIA. Its ironic that he's opposed to building more highways and the technology that would reduce our dependency on them.