Monday, May 15, 2017

Why have an Operations team?

I just came back from a meeting with the manager of a team I'm working with. The meeting went well, but at the end, he asked me a really interesting question.
How do I justify an ops team (or a devops team) to my executives?
I sat back for a second and realized I've never heard anyone actually lay out the case for an operations team. It's always been assumed that you just have ops, just like you have devs and QA. Of course you just have them.

To answer, let's start with a premise.
The business gains value from users consuming its application(s).
This should be self-evident; it's the whole point of the business funding an application.

Developers, obviously, build new features which creates more value. That's the whole point of developers - to modify the application to create new features to create new business value. We'll call this a direct contribution. Their work has immediate and visible impact on users consuming the business's applications.

The business analysts who write the requirements, the designers who create the UI/UX - they also provide a direct contribution.

Operations and QA, however, don't make that direct value contribution. Their work provides an indirect value contribution. The QA team can get its own post some other time. As for the Ops team, they do two things:

  1. Reduce business risk by maintaining the user experience.
  2. Improve time-to-market (TTM) for new changes, such as features, bugfixes, and security.
Let's take each one in its own.

Maintaining the user experience is primarily maintaining and supporting the production environment. These are tasks that would have to be done even if all development was halted. The primary beneficiary of this work is the external user. This aspect includes:
  • Gathering and analyzing logs
  • Gathering and reporting on metrics
  • Tracking and applying updates to third-party services (such as IIS, Windows, etc)
  • Monitoring for and alerting on any issues that arise (such as server failures)
  • Maintaining awareness of security issues (such as hacking)
  • Maintaining awareness of performance issues (such as bad database queries)
  • Reducing and/or eliminating deployment downtime
These tasks may be handled by people outside a formal operations team (such as developers handling performance issues), but these tasks are within that operations role.

The value of these tasks is directly proportional to how expensive any specific failure in production is to the business. The more expensive a downtime, the more valuable these tasks are. This justification for an operations team is well-known and well understood.

Improving TTM is primarily about everything other than the production environment. These tasks are only done because development is ongoing and the primary beneficiaries of this work are the internal users (developers, QA, etc). This aspect includes:
  • Creating and managing developer environments
  • Creating and managing CI environments and the CI process
  • Creating and managing a consistent deployment process
  • Providing support for all non-production environments (such as QA, Load, Demo, Train, etc)
  • Creating and managing a consistent promotion process, tied to the issue tracker.
  • Managing and tightening all feedback loops
Again, these tasks are often handled by staff outside a formal operations team, but these tasks are still operational in nature; they focus on the operations of building the business's applications.

The value of these tasks is directly proportional to how expensive a delay in delivering changes to production is to the business. Note I said changes and not new functionality. Improving TTM also improves delivering bugfixes and security updates. For many businesses, the marginal value is of increasing TTM is very low. I once worked on an application targeted at small governments where the users wanted 3-month delivery cycles. The flip-side is Amazon or Google or Netflix where new functionality is delivered every 5-10 minutes.

In short, the value of the operations team is based on the business's needs. The more important a business values its uptime and TTM, the more value a dedicated operations team provides.

No comments:

Post a Comment