Wednesday, December 16, 2015

Evaluating a Devops team

End of year has rolled around. Along with new budgets starting, old budgets ending, and bad holiday parties where you spend two hours avoiding that guy, this is the season of annual evaluations.

You're the manager of a devops team, possibly one newly-formed this year. This devops things is all new. The team operates completely differently from the other sysadmin / operations teams you've seen. So, how do you evaluate them?

An operations team is traditionally measured by things like:
  • Production uptime percentage (high)
  • Frequency and length of production downtimes (low)
Everything is focused around the production environment and how well it stays available. These goals lead the team to fear and avoid changes to production. They also lead the team to disregard everything other than production as less important, or even unimportant. Not every operations team is like this, but every human will eventually conform to the actions that are rewarded and avoid the ones that aren't.

Our devops team, though, wants to accomplish different things. This team wants to deliver changes to production rapidly, even daily or hourly. They talk about reconstructing production on every deployment. Reconstructing lower environments on every deployment, too. They also talk about production in a very different way, not focused on external users.

If we want our fledgling devops team to keep doing these things, as different as they are from the traditional operations team, then we need to change what we measure. Or, rather, add to that list. That list is important. Operations teams are responsible for production uptime and need to be measured on that. But, the traditional mistake is only measuring on that.

Let's add a few more items to that list.

  • Cost of Deployment
    • Time to deploy (low)
    • Mean time between deployments (low)
    • Number of people required for a deployment (low)
  • Non-production friction
    • Uptime percentage (high)
    • Frequency and length of downtimes (low)
  • Number of manual tasks (low)
The last one may be unnecessary - it often falls out of what a devops team is trying to do. This team you sometimes struggle to understand isn't an operations team - it's a development team building an application which, when used, results in a new application environment. Yes, we measure them on operational efficiency, but it's much more than that.

Operational efficiency becomes a measurement of the applications the devops team has created, similar to signups and engagement metrics for the web applications we work so hard to deploy. Instead, we start to measure them in how well the application performs its job.