Working the Infrastructure


Die micrograph of Intel's Dunnington 6-core pr...

Image via Wikipedia

I ran across a couple of interesting blog posts that got me thinking about infrastructure and automation.

The first was from Markus Frind, the CEO of Plentyoffish.com. Markus reported that according to hitwise.com, Plentyoffish was the 13th most heavily trafficed site last year. It may be the most popular site you've never heard of if you're not into online dating.

The interesting part of that is that all this is done with just a handful of servers.

  • PlentyOfFish (POF) gets 1.2 billion page views/month, and 500,000 average unique logins per day. The peak season is January, when it will grow 30 percent.
  • POF has one single employee: the founder and CEO Markus Frind.
  • Makes up to $10 million a year on Google ads working only two hours a day.
  • 30+ Million Hits a Day (500 - 600 pages per second).
  • 1.1 billion page views and 45 million visitors a month.
  • Has 5-10 times the click through rate of Facebook.
  • A top 30 site in the US based on Competes Attention metric, top 10 in Canada and top 30 in the UK.
  • 2 load balanced web servers with 2 Quad Core Intel Xeon X5355 @ 2.66Ghz), 8 Gigs of RAM (using about 800 MBs), 2 hard drives, runs Windows x64 Server 2003.
  • 3 DB servers. No data on their configuration.
  • Approaching 64,000 simultaneous connections and 2 million page views per hour.
  • Internet connection is a 1Gbps line of which 200Mbps is used.
  • 1 TB/day serving 171 million images through Akamai.
  • 6TB storage array to handle millions of full sized images being uploaded every month to the site.

Did you catch that? The 13th biggest Web site by visitors is run on five servers! I'm in awe. Note that these are Windows servers.

The seconds was a post about Gnip's numbers:

  • 99.9%: the Gnip service has 99.9% up-time.
  • 0: we have had zero Amazon Ec2 instances fail.
  • 10: ten Ec2 instances, of various sizes, run the core, redundant, message bus infrastructure.
  • 2.5m: 2.5 million unique activities are HTTP POSTed (pushed) into Gnip's Publisher front door each day.
  • 2.8m: 2.8 million activities are HTTP POSTed (pushed) out Gnip's Consumer back door each day.
  • 2.4m: 2.4 million activities are HTTP GETed (polled) from Gnip's Consumer back door each day.
  • $0: no money has been spent on framework licenses (unless you include "AWS").

These too are impressive numbers in their own way. Built entirely on EC2, this is a service that represents a different way to skin the cat with cloud-based computing.

One of the things I liked about the Gnip post was their discussion of architecture and simplicity. One thing struck me: no database.

The reason that stands out is that that was a major design goal for me as I set out to design and build the Kynetx Network Service (KNS) that forms the core of out product offering. I've had such headaches managing services with databases that I was determined to keep them out. We made tough decisions sometimes that would have been easier to do with a database in that architecture.

Sometimes that works and sometimes it doesn't. In the case of a dating site, it's hard to imagine how some kind of database wouldn't be central to the design. In the case of Gnip or Kynetx, the service can and should be delivered without it. Plentyoffish has scaled well by scaling vertically (5 beefy servers) whereas Gnip and KNS are designed to scale horizontally virtually forever.