Monday, July 29, 2013

Deployment is not source control (pt. 2)

(This is the second post in a series on deployment. See part 1 and part 3.)

+Roman Daszczyszak had a question on Google+ in response to Deployment is not source control. He asked:
While I agree with your points, how do you apply this to developing a web application? My team has run into problems trying to properly package a Django-based app with a Mongo backend. Thoughts?
Whenever anything is installed (or deployed - it's the same thing), there are a set of steps that must be completed. For a standard web application (Django, Rails, etc), that could be something like:

  1. Login to the webserver.
  2. Copy the source code (via git, scp, rsync, etc) into the installation directory (/var/www, etc).
  3. Install any necessary prerequisites (frameworks, libraries, language modules, etc).
  4. Run a script to set things up (compiling / uglifying, configuration, softlinks, etc).
  5. Restart the service (Apache, FastCGI, etc).
  6. Repeat this process for each webserver in the group.
OS packages (rpm or deb) are designed to handle steps 2-5. While each packaging format has their stronger and weaker points, all of them can do the following:
  • Bundle files into a logical hierarchy
  • Execute scripts (in any language) at different points in the installation process
  • Specify prerequisites (including specific versions to require or exclude)
  • Execute tests to ensure a good installation
  • Allow for arbitrary metadata to be stored for later queries
  • Rollback to a prior installed version (most important function)
One important point to remember is that the files in source control are often not the files that belong on the production server. While this is true for compiled applications (such as Apache and MySQL), it has become true for web applications as well. Javascript and CSS assets are often uglified and compressed. You may not even be writing in CSS - Sass/Compass and Less are becoming excellent frameworks to use. Your Javascript assets may have been written in Coffeescript, your HTML in Jade or HAML, and images may be sprited.

This leads us to an important rule of thumb:
Each server should only exactly what it needs to perform its tasks and nothing more.
Applying that to our packaging means the package should only install the compiled, compressed, and otherwise-mangled files that will actually be served from the webserver. If you're putting gcc, git, or make on your production servers, you're doing it wrong. The package should have the compiled versions, not the source versions. It may have templated configuration files ("Insert hostname here"), but the template isn't installed - only the result of filling in the template.

Frameworks, such as Django, and datastores, such as MongoDB, have already been built into packages by their maintainers. Specifying them as dependencies allows the package to be self-describing.

The metadata associated with the package is important to the success of the process. The package version is required. I've found that using "1.[timestamp]" is a good monotonically-increasing version number. As this is only released internally, a nonsensical version number is good enough.

All the packaging formats allow setting arbitrary metadata on a package. A good set of metadata includes:
  • The timestamp this package was built.
  • The SCM identifier of the commit used to built the package (git SHA1, SVN version, etc).
  • The issue number for the changeset that was merged to master that triggered this package build.
With that metadata, any person in the company can hit an internal website and see exactly what the last build to each environment is and what issues are in test that aren't in production. Your issue tracker should be able to provide this, but your servers should also be able to tell you this.

So far, we have discussed putting together the application and its on-server dependencies. Roman's question asked about MongoDB. I'll expand it to datastores in general. It's good practice to keep application servers and datastores on separate horizontal groups. This allows operations to balance the needs of one vs. the other. It's extremely rare for both application and datastore to grow at exactly the same pace. So, we have to figure out a way of managing cross-server dependencies. (This problem also arises when dealing with multiple applications supporting the same product. The solution is the same.)

Datastore change management can and should also be managed with packages. Packages aren't just a set of files to be applied. A package is a set of actions that need to be taken in order to upgrade an installation from version X to version Y. The most common thing to do is provide a new set of files, but a set of actions (such as "ALTER TABLE" statements) is also appropriate. By applying datastore changes with packages, you are now able to ask your datastore "What version are you?" and make decisions based on that. One decision could be "Version X of the application cannot be installed because the datastore is not at version Y."

Roman - I hope this helps!