eWeek has an interesting article on eBay's infrastructure that gives some tantalizing clues as to how they solve some of the most immense scaling problems on the Internet. Like most articles in this space, the article focuses on physical architecture issues with a few logical architecture tidbits thrown in (and not necessarily well identified as such). They've taken an interesting route with respect to geographic redundancy:
We've taken a unique approach with respect to our infrastructure. In a typical disaster recovery scenario, you have to have 200 percent of your capacity÷100 percent in one location, 100 percent in another location÷which is cost-ineffective. We have three centers, each with 50 percent of the traffic, actually 55 percent, adding in some bursts.From eBay: Sold on Grid
Referenced Tue Sep 07 2004 14:37:22 GMT-0600
The effect is that they have more overhead in terms of running machines at three locations instead of two, but they cut their potential infrastructure bill by 25%. Throw in the TCO for running those additional servers and they've probably saved quite a bit. You need to be big enough to absorb the overhead before the 3 redundant sites idea makes sense. That is, I suspect that there's a breakeven point for 2 data centers at 100% capacity each and 3 at 55% each. I wonder where it is?