Friday, August 30, 2013

Provisions to last the journey

In the last post, I talked about Vagrant as half of the most important tool IT organizations have gained in the past decade. This post talks about the other half - the provisioner.

The problem we need to solve is replicating the build of a server exactly over and over. In the past (some 10 years ago), I would use Norton Ghost (a backup utility) to clone a server once I had it setup perfectly, then restore that clone to the other servers. And that worked great, so long as I never needed to change what was on that server. For example, a web-server might have had Apache, various modules (mod_perl, mod_proxy, mod_rewrite, etc), and the MySQL client software. Then, we would install the language dependencies (at the time, I was writing Perl apps) from CPAN. We would take a Ghost at that point, replicate that out, then deploy the application using SVN. If we needed a new module or a new version of a module, that required a new Ghost. If we needed a new Apache module or an upgrade, that required a new Ghost. It only took an hour or two, but it was very manual.

This worked great, for production. All of our production servers would be exactly the same, because they were clones of the same Ghost. But, since the production configuration would be on the Ghost, we couldn't use that in QA or in development.

The other problem was that we had no record of what we were doing. Nothing was in source control, largely because there was nothing to put in source control. SVN (and now Git) are only really useful with text files. (Yes, they take binary files, but only as undifferentiable blobs. Not useful.) This meant no code reviews, no history, and no controls. Everyone had to be a sysadmin.

I've heard of other places using a master package (rpm or deb) that does nothing but require all the other packages necessary for the server to be setup properly. And, this works great . . . until it doesn't. The syntax for building packages can be inscrutable. And, while you can do anything in a package (because packages are just tarballs of scripts with metadata), it's very dangerous to allow anyone the ability to do anything. Even if there are no bad actors, everyone is still a human. Humans make mistakes and making mistakes as root is a good way to lose weekends rebuilding from tape.

Luckily, there is a better way.

Unlike the virtualization manager (Vagrant), there are several good choices for a provisioner. Puppet and Chef are the two big ones right now, but several others are nipping at their heels. They differ in various ways, but all of them provide the same primary function - describing how a server should be set up in a parseable format. If you are underwhelmed, just wait a few minutes. (I'll use Puppet in my examples because it's the one I'm using right now. All these examples could be written just as easily in Chef, SaltStack, or Ansible. Juju is a little different.)

The basic unit of work is the manifest (in Puppet) or cookbook (in Chef). This is what contains the parseable description of what needs to be accomplished. In both, you describe what you want to exist, after execution is complete. (Unlike a script, you do not describe how to do it or in what order - it's the provisioner's job to figure that out.) So, you might have something like:

$name = "apache"
package { "apache2":
  require => User[$name],
group { $name:
  ensure => "present",
user { $name:
  ensure => "present",
  gid => $name,
  require => Group[$name],

This would install the apache2 package (found in Ubuntu), create an 'apache' group and an 'apache' user. You'll notice that the apache2 package requires the apache user. So, creating the user would run before installing the package, even though it's defined afterwards. So, define things in the order that makes sense and the provisioner will figure things out. This means, however, that when you watch it run, things won't run in the same order from time to time, and that's okay.

Provisioners are designed to run again and again. They are idempotent, meaning that they will only do something if it hasn't been done already. This property is extremely powerful because we can make a change to a manifest (or cookbook) and, when we run it, only the change (and anything dependent on that change) will execute. This solves the problem of the upgrades with Ghost.

Now, we have a executable description of what a given server should look like. The best part? It's in plaintext. We're going to check this description into our source control so that we can track the changes necessary for each request. We can now treat this as any other code - with changesets, pair programming, and code reviews. Changes to servers can be deployed like every other piece of code in our application. Best of all, they can be tied to the application changes that spawned the need for them (if appropriate). So, our structural changes go through the exact same QA process as the application changes, increasing our confidence in them.

These days, it's really hard to argue against using a provisioner. We can argue which provisioner to use, but it's kinda like using source control. We can argue Git vs. Subversion vs. Mercurial vs. Darcs vs. Bazaar. But, no-one is arguing for the position of "Don't want it." The same should go for provisioners.