Continuous Deployment


This morning Steve gave a presentation on context automation and Kynetx at the Utah Technology Council's CTO P2P forum. The presentation was great and the audience asked a lot of good questions. One thing that came up (I don't even remember why) was the subject of continuous deployment. I decided I'd pull a few URLs out of my head and put them in a blog post for people to mull over.

The first URL I think of when I consider continuous deployment is code.flickr.com. If you've never been there, the bottom of the page lists when the last deployment of Flickr was, how many deployments have happened in the last week, and who was involved (pictures). Here's a screenshot:

code.flickr.com

When I first saw that I was astounded. Sometimes there are a dozen or more deployments in a single day. The questions that spring to mind: why? and how? Both are answered in a few posts by Timothy Fitz.

In Continuous Deployment, Timothy discusses the concept. Basically, it comes down to "fail fast." Deploying a few small changes is less likely to break something and when it does, you'll know more quickly what caused the problem and be able to correct it.

In Continuous Deployment at IMVU: Doing the impossible fifty times a day Timothy goes into more detail about how they do this. The idea comes down to

  • Commit early and often
  • Automatically test on commit
  • Automatically roll the code out if the tests pass

There are, of course, some problems to solve to get that done. First, you need a good, thorough test suite. Timothy points out that you also need tests that

  • run fast and
  • execute reliably.

The test suite Timothy is describing takes 4.4 machine hours to execute. That's a lot of testing. To make it run fast enough to deploy continuously, they have a buildbot that runs tests across 36 machines in parallel.

The point about test reliability is important too. Intermitently failing tests will ruin this process. Timothy says:

When I say reliable, I don't mean "they can fail once in a thousand test runs." I mean "they must not fail more often than once in a million test runs." We have around 15k test cases, and they're run around 70 times a day. That's a million test cases a day. Even with a literally one in a million chance of an intermittent failure per test case we would still expect to see an intermittent test failure every day. It may be hard to imagine writing rock solid one-in-a-million-or-better tests that drive Internet Explorer to click ajax frontend buttons executing backend apache, php, memcache, mysql, java and solr. I am writing this blog post to tell you that not only is it possible, it's just one part of my day job.

I love this whole idea. I've lived the life of infrequent deployments and it will suck the soul right out of your engineering and ops teams. That's why when we started up Kynetx, I was determined to not repeat those mistakes. Our system is not as sophisticated as the one Timothy describes, but my goals is to get there and we make specific goals about things that need to happen to get there.

You may not be able to get to 50 deployments a day overnight, but you can increase the frequency of deployment and prioritize the development efforts necessary to increase that frequency. Set some goals and take your life back.