Thursday, May 18, 2017

What is devops?

Ask 12 engineers for a definition of devops and you'll get 13 answers. Yes, that's 13 different links to everyone from Gartner to AWS to Dr Dobbs. There's plenty more on Google, too. Go ahead - take the time to read through them. AWS even has a whitepaper on devops which, hilariously enough, doesn't match up with their definition from the first sentence.

Confused now? Everyone else seems to be, too. The problem is every one of those answers, when you parse it down, is a long-winded way of saying "I know it when I see it." Which may be true, but it's also worthless for you. You can't do anything with that because the author isn't with you to help you whiteboard what devops means in your organization. This lack of clarity leads intelligent people to discard the term "devops" as meaningless and try and forge another way.

After working with a dozen clients in the past couple years, I think the problem is everyone is looking for a singular prescriptive definition. Devops is this thing and not that thing and those things are the same for everyone. And, there isn't one.

Some of the links above provide a lot of words around the characteristics that some successful organizations have. But, these characteristics (no silos, automation, etc) don't tell you how to solve your problem. It's as if doctors decided to practice medicine by looking at the healthiest people in the world and giving all their patients a list of those characteristics in response to every patient visit. That can be helpful ("lose weight" is a good thing to do, in general), but it can also be useless ("lose age" isn't helpful, even though most healthy people are between 20-30.)

Instead, I offer a descriptive definition which enables SMART milestones.
Devops is developing applications whose business domain is your operations.
 This isn't a singular thing. This isn't the same for everyone. This doesn't tell you what else you have to do, either. For example, you don't have to do Agile, if you don't want to. (Yes, you can devops within Waterfall, if you want.) But, it tells you exactly what you should be doing in order to do the devops and how.

Why this definition and not some statement about automation or build/release engineering? For one thing, this definition includes all those ideas. (Well, it leads to them, as we'll see.) For another, this definition recognizes that every organization is different. Each organization has different requirements for how they want to do IT and any definition of devops needs to accommodate that.

So, what do you do with that definition? The first thing is to gather requirements, just like any other application. Topics like:
  • How do you (the IT teams) want to operate?
  • What does the business side expect from the IT department?
  • What things suck right now?
The second thing is to figure out what you actually do today. I mean, actually do. Can you bring in a new senior person anywhere in IT and give them a clear picture of how your organization does IT work within the first day? First week? First month? Most clients I've worked with struggle with integrating new staff, with 1-year veterans saying "I never knew how we did that before."

Once you have both, you can describe the gap. Bridging that gap is your devops journey. That journey is going to be very different from company to company and even between teams at the same company. Which makes sense.

Monday, May 15, 2017

Why have an Operations team?

I just came back from a meeting with the manager of a team I'm working with. The meeting went well, but at the end, he asked me a really interesting question.
How do I justify an ops team (or a devops team) to my executives?
I sat back for a second and realized I've never heard anyone actually lay out the case for an operations team. It's always been assumed that you just have ops, just like you have devs and QA. Of course you just have them.

To answer, let's start with a premise.
The business gains value from users consuming its application(s).
This should be self-evident; it's the whole point of the business funding an application.

Developers, obviously, build new features which creates more value. That's the whole point of developers - to modify the application to create new features to create new business value. We'll call this a direct contribution. Their work has immediate and visible impact on users consuming the business's applications.

The business analysts who write the requirements, the designers who create the UI/UX - they also provide a direct contribution.

Operations and QA, however, don't make that direct value contribution. Their work provides an indirect value contribution. The QA team can get its own post some other time. As for the Ops team, they do two things:

  1. Reduce business risk by maintaining the user experience.
  2. Improve time-to-market (TTM) for new changes, such as features, bugfixes, and security.
Let's take each one in its own.

Maintaining the user experience is primarily maintaining and supporting the production environment. These are tasks that would have to be done even if all development was halted. The primary beneficiary of this work is the external user. This aspect includes:
  • Gathering and analyzing logs
  • Gathering and reporting on metrics
  • Tracking and applying updates to third-party services (such as IIS, Windows, etc)
  • Monitoring for and alerting on any issues that arise (such as server failures)
  • Maintaining awareness of security issues (such as hacking)
  • Maintaining awareness of performance issues (such as bad database queries)
  • Reducing and/or eliminating deployment downtime
These tasks may be handled by people outside a formal operations team (such as developers handling performance issues), but these tasks are within that operations role.

The value of these tasks is directly proportional to how expensive any specific failure in production is to the business. The more expensive a downtime, the more valuable these tasks are. This justification for an operations team is well-known and well understood.

Improving TTM is primarily about everything other than the production environment. These tasks are only done because development is ongoing and the primary beneficiaries of this work are the internal users (developers, QA, etc). This aspect includes:
  • Creating and managing developer environments
  • Creating and managing CI environments and the CI process
  • Creating and managing a consistent deployment process
  • Providing support for all non-production environments (such as QA, Load, Demo, Train, etc)
  • Creating and managing a consistent promotion process, tied to the issue tracker.
  • Managing and tightening all feedback loops
Again, these tasks are often handled by staff outside a formal operations team, but these tasks are still operational in nature; they focus on the operations of building the business's applications.

The value of these tasks is directly proportional to how expensive a delay in delivering changes to production is to the business. Note I said changes and not new functionality. Improving TTM also improves delivering bugfixes and security updates. For many businesses, the marginal value is of increasing TTM is very low. I once worked on an application targeted at small governments where the users wanted 3-month delivery cycles. The flip-side is Amazon or Google or Netflix where new functionality is delivered every 5-10 minutes.

In short, the value of the operations team is based on the business's needs. The more important a business values its uptime and TTM, the more value a dedicated operations team provides.

Monday, May 1, 2017

Announcing Devops Katas

The best way to learn a new tool is to use it. In the development world, there are hundreds of short exercises called katas specifically for this. They're designed to take anywhere from 1-4 hours and help you learn a new language, a new technique, a new library, etc. You go through the process of writing tests, writing code, and seeing something at the end.

One thing that distinguishes good katas is the framework. There's a place you go to with all the files necessary to write tests and write code already in place. The only things missing are the tests and the code. You can focus on the problem at hand instead of the formalities necessary to do a good job. For example:
For devops, even though we have more tools, we haven't had anything resembling a kata. Instead, we have things like:
These aren't bad. It's very helpful to have a focused exercise to work with. Except, I can't share my work with you. You can't ask me questions. We can't build on this exercise together. The bowling kata, FizzBuzz, roman numerals - these are all well-known and understood exercises from the development world, regardless of the language you work in.

In that vein, I'm releasing "proper" devops katas at https://github.com/greenfishbluefish/devops-katas. These katas have a Vagrant and Serverspec framework to work within, giving the user the ability to do proper TDD in devops. I'll be releasing at least one/month and would love to get feedback on the next one to do.