Thursday, May 18, 2017

What is devops?

Ask 12 engineers for a definition of devops and you'll get 13 answers. Yes, that's 13 different links to everyone from Gartner to AWS to Dr Dobbs. There's plenty more on Google, too. Go ahead - take the time to read through them. AWS even has a whitepaper on devops which, hilariously enough, doesn't match up with their definition from the first sentence.

Confused now? Everyone else seems to be, too. The problem is every one of those answers, when you parse it down, is a long-winded way of saying "I know it when I see it." Which may be true, but it's also worthless for you. You can't do anything with that because the author isn't with you to help you whiteboard what devops means in your organization. This lack of clarity leads intelligent people to discard the term "devops" as meaningless and try and forge another way.

After working with a dozen clients in the past couple years, I think the problem is everyone is looking for a singular prescriptive definition. Devops is this thing and not that thing and those things are the same for everyone. And, there isn't one.

Some of the links above provide a lot of words around the characteristics that some successful organizations have. But, these characteristics (no silos, automation, etc) don't tell you how to solve your problem. It's as if doctors decided to practice medicine by looking at the healthiest people in the world and giving all their patients a list of those characteristics in response to every patient visit. That can be helpful ("lose weight" is a good thing to do, in general), but it can also be useless ("lose age" isn't helpful, even though most healthy people are between 20-30.)

Instead, I offer a descriptive definition which enables SMART milestones.
Devops is developing applications whose business domain is your operations.
 This isn't a singular thing. This isn't the same for everyone. This doesn't tell you what else you have to do, either. For example, you don't have to do Agile, if you don't want to. (Yes, you can devops within Waterfall, if you want.) But, it tells you exactly what you should be doing in order to do the devops and how.

Why this definition and not some statement about automation or build/release engineering? For one thing, this definition includes all those ideas. (Well, it leads to them, as we'll see.) For another, this definition recognizes that every organization is different. Each organization has different requirements for how they want to do IT and any definition of devops needs to accommodate that.

So, what do you do with that definition? The first thing is to gather requirements, just like any other application. Topics like:
  • How do you (the IT teams) want to operate?
  • What does the business side expect from the IT department?
  • What things suck right now?
The second thing is to figure out what you actually do today. I mean, actually do. Can you bring in a new senior person anywhere in IT and give them a clear picture of how your organization does IT work within the first day? First week? First month? Most clients I've worked with struggle with integrating new staff, with 1-year veterans saying "I never knew how we did that before."

Once you have both, you can describe the gap. Bridging that gap is your devops journey. That journey is going to be very different from company to company and even between teams at the same company. Which makes sense.

Monday, May 15, 2017

Why have an Operations team?

I just came back from a meeting with the manager of a team I'm working with. The meeting went well, but at the end, he asked me a really interesting question.
How do I justify an ops team (or a devops team) to my executives?
I sat back for a second and realized I've never heard anyone actually lay out the case for an operations team. It's always been assumed that you just have ops, just like you have devs and QA. Of course you just have them.

To answer, let's start with a premise.
The business gains value from users consuming its application(s).
This should be self-evident; it's the whole point of the business funding an application.

Developers, obviously, build new features which creates more value. That's the whole point of developers - to modify the application to create new features to create new business value. We'll call this a direct contribution. Their work has immediate and visible impact on users consuming the business's applications.

The business analysts who write the requirements, the designers who create the UI/UX - they also provide a direct contribution.

Operations and QA, however, don't make that direct value contribution. Their work provides an indirect value contribution. The QA team can get its own post some other time. As for the Ops team, they do two things:

  1. Reduce business risk by maintaining the user experience.
  2. Improve time-to-market (TTM) for new changes, such as features, bugfixes, and security.
Let's take each one in its own.

Maintaining the user experience is primarily maintaining and supporting the production environment. These are tasks that would have to be done even if all development was halted. The primary beneficiary of this work is the external user. This aspect includes:
  • Gathering and analyzing logs
  • Gathering and reporting on metrics
  • Tracking and applying updates to third-party services (such as IIS, Windows, etc)
  • Monitoring for and alerting on any issues that arise (such as server failures)
  • Maintaining awareness of security issues (such as hacking)
  • Maintaining awareness of performance issues (such as bad database queries)
  • Reducing and/or eliminating deployment downtime
These tasks may be handled by people outside a formal operations team (such as developers handling performance issues), but these tasks are within that operations role.

The value of these tasks is directly proportional to how expensive any specific failure in production is to the business. The more expensive a downtime, the more valuable these tasks are. This justification for an operations team is well-known and well understood.

Improving TTM is primarily about everything other than the production environment. These tasks are only done because development is ongoing and the primary beneficiaries of this work are the internal users (developers, QA, etc). This aspect includes:
  • Creating and managing developer environments
  • Creating and managing CI environments and the CI process
  • Creating and managing a consistent deployment process
  • Providing support for all non-production environments (such as QA, Load, Demo, Train, etc)
  • Creating and managing a consistent promotion process, tied to the issue tracker.
  • Managing and tightening all feedback loops
Again, these tasks are often handled by staff outside a formal operations team, but these tasks are still operational in nature; they focus on the operations of building the business's applications.

The value of these tasks is directly proportional to how expensive a delay in delivering changes to production is to the business. Note I said changes and not new functionality. Improving TTM also improves delivering bugfixes and security updates. For many businesses, the marginal value is of increasing TTM is very low. I once worked on an application targeted at small governments where the users wanted 3-month delivery cycles. The flip-side is Amazon or Google or Netflix where new functionality is delivered every 5-10 minutes.

In short, the value of the operations team is based on the business's needs. The more important a business values its uptime and TTM, the more value a dedicated operations team provides.

Monday, May 1, 2017

Announcing Devops Katas

The best way to learn a new tool is to use it. In the development world, there are hundreds of short exercises called katas specifically for this. They're designed to take anywhere from 1-4 hours and help you learn a new language, a new technique, a new library, etc. You go through the process of writing tests, writing code, and seeing something at the end.

One thing that distinguishes good katas is the framework. There's a place you go to with all the files necessary to write tests and write code already in place. The only things missing are the tests and the code. You can focus on the problem at hand instead of the formalities necessary to do a good job. For example:
For devops, even though we have more tools, we haven't had anything resembling a kata. Instead, we have things like:
These aren't bad. It's very helpful to have a focused exercise to work with. Except, I can't share my work with you. You can't ask me questions. We can't build on this exercise together. The bowling kata, FizzBuzz, roman numerals - these are all well-known and understood exercises from the development world, regardless of the language you work in.

In that vein, I'm releasing "proper" devops katas at These katas have a Vagrant and Serverspec framework to work within, giving the user the ability to do proper TDD in devops. I'll be releasing at least one/month and would love to get feedback on the next one to do.

Monday, April 24, 2017

Announcing SimsLoader

I'm happy to announce the official release of SimsLoader, a new tool for generating test data for relational databases. It's opensource, available as a Docker image, and works with MySQL, Postgres, Oracle, and SQL Server.

Unlike existing tools, SimsLoader doesn't assume it knows anything about your database. Instead, it reads your schema each time. This means:

  • You only tell it what you care about
  • It automagically handles schema changes
SimsLoader requires a minimum of information. You tell it what you want and it figures out everything else you need. Whether it's a NOT NULL column or a row in a parent table, SimsLoader will fill in everything necessary to give you what you asked for. And give it all back to you.

SimsLoader works by taking two configuration files - a model file and a specification file - in either YAML or JSON. The specification file contains what you want, specified with as much or as little detail as you want. At minimum, you can say:

users: 100

And 100 new rows in the users table are created with all the necessary columns filled in. If the users table has foreign keys, rows in those tables are created and on down the line until every row has everything it needs.

The specification file can be as complex as you want it to be. For example:

  - name: John Doe
      type: timestamp_in_past_5_years
      name: Acme Industries
  - name: Jane Doe
      type: timestamp_in_past_2_years
      name: Acme Corporation
    type: timestamp_in_past_2_years
  lineitems: 5

If you load this specification file, the returned value would be a YAML document with two users and one invoice. Even though you specify two new organizations and five lineitems, those are attributes of the three rows you actually asked for.

The model file adds information to SimsLoader's understanding of your database. It's primarily used to add type information to the various columns. For example:
    name: us_name
    address: us_address
    city: us_city
    state: us_state
    zipcode: us_zipcode
    started_on: timestamp_in_psat
        - invoice_id
        source: invoices
          - id

Now, whenever a row in the users table is generated, if a name isn't provided, it will be filled in with a reasonable-looking name from the US. If this hadn't been specified, the column would be filled with random characters (or potentially left blank, if it is a nullable column). In addition, you can specify missing foreign keys (has_many as in the example or the reverse belongs_to), missing unique constraints, and several other aspects.

Thursday, April 6, 2017

Vagrant on Hyper-V: A crazy journey

I love Vagrant. It is the most important devops tool, period. Frankly, far more important than Docker. I use Vagrant with Virtualbox, AWS, Azure, VSphere ... with pretty much every virtualization environment in the world. (Except Docker.)

Funnily enough, it was when I started to heavily use Docker (for SimsLoader) and upgrading one of my laptops to Windows 10 that led me to using Vagrant with Hyper-V. Hyper-V is really cool. Docker runs "natively" under it and it's fast. Really fast.

Unfortunately, Microsoft's work to get all their cool stuff to function with all the opensource tools the Linux ecology has been working with for years. There have been a few posts on this, but they weren't sufficient or complete for me to get up and running.

So, this post is to track and list all the necessary steps to be successful with Vagrant and Hyper-V. These steps have been validated in Windows 10. They may work in Windows 8.1 - if they do, could someone tell me?

Also, if you have cool Powershell scripts to do any of these manual steps, please let me know. Ideally, there's a script that does the "only Once" stuff and a script that does the "Occasionally" stuff.

Stuff you do only Once

Enable Hyper-V

This enables the Hyper-V virtualization provider.
  1. Launch "Turn Windows features on or off"
  2. Scroll down to select "Hyper-V"
  3. Reboot
If you installed the "native" Docker, then you've probably already done this step.

Note: This will disable Virtualbox and all other virtualization providers. To re-enable them, reverse these steps and reboot.

Create a virtual switch

This provides the basis for networking within Hyper-V. Unlike Virtualbox, Hyper-V doesn't auto-create networks for you.
  1. Launch "Hyper-V Manager"
  2. Click on "Virtual Switch Manager..." (right-hand side)
  3. Select "Internal"
  4. Click on "Create Virtual Switch"
  5. Name the switch "Shared"
  6. Select the "Internal Network" radio button
  7. Click "Ok"
  8. Exit the "Hyper-V Manager" window

Connect the virtual switch to an existing network

This is required if you want your Vagrant-launched VM to get a DHCP-assigned IP address. Again, unlike Virtualbox, Hyper-V networks don't auto-allocate IP addresses.
  1. Go to "Control Panel > Network and Internet > Network Connections"
  2. Select a network connection with Internet (likely your Wireless)
  3. Right-click, select "Properties"
  4. Select the "Sharing" tab
  5. Click the "Allow other network users to connect through this computer's Internet connection"
  6. Select "vEthernet (Shared)"
  7. Click "Ok"
  8. Exit the "Network Connections" window

Disable SMB idle disconnects

Again, unlike Virtualbox, Hyper-V doesn't provide a file-sharing mechanism. (Granted, vboxfs isn't the greatest, but at least it's immediately functional). So, Vagrant uses SMB sharing. By default, Windows hosts will disconnect the SMB connection if it's idle for too long, which requires you to reboot your VM to reconnect the shared folder.
  1. In an administrative powershell, run "net config server /autodisconnect:-1"
Note: this may or may not be what you want to do for other purposes, particularly if you use SMB for other purposes.

Stuff you will have to do occasionally

Resharing the network with your virtual switch

If your internet-enabled network changes, you will need to attach your virtual switch to it. For example, if you switch from wireless to wired.

If your sharing network changes, you will need to reconnect your virtual switch. For example, if you go from home to work or work to Starbucks.

I haven't experienced any problems if I put my laptop to sleep, but awaken onto the same network, but that is also a possible time you may have to reshare your network.

Stuff you will have to do all the time

Launching a VM

  1. Open a terminal with administrative privileges. (Hyper-V requires this.)
  2. "vagrant up"
  3. You will be asked for your windows username/password to mount shared folders.
    • Make sure it's the Windows username/password
    • Depending on how your laptop is setup, you may or may not have to add a domain to that. (mmouse vs. mmouse@disney)

Additional Documentation

Wednesday, December 16, 2015

Evaluating a Devops team

End of year has rolled around. Along with new budgets starting, old budgets ending, and bad holiday parties where you spend two hours avoiding that guy, this is the season of annual evaluations.

You're the manager of a devops team, possibly one newly-formed this year. This devops things is all new. The team operates completely differently from the other sysadmin / operations teams you've seen. So, how do you evaluate them?

An operations team is traditionally measured by things like:
  • Production uptime percentage (high)
  • Frequency and length of production downtimes (low)
Everything is focused around the production environment and how well it stays available. These goals lead the team to fear and avoid changes to production. They also lead the team to disregard everything other than production as less important, or even unimportant. Not every operations team is like this, but every human will eventually conform to the actions that are rewarded and avoid the ones that aren't.

Our devops team, though, wants to accomplish different things. This team wants to deliver changes to production rapidly, even daily or hourly. They talk about reconstructing production on every deployment. Reconstructing lower environments on every deployment, too. They also talk about production in a very different way, not focused on external users.

If we want our fledgling devops team to keep doing these things, as different as they are from the traditional operations team, then we need to change what we measure. Or, rather, add to that list. That list is important. Operations teams are responsible for production uptime and need to be measured on that. But, the traditional mistake is only measuring on that.

Let's add a few more items to that list.

  • Cost of Deployment
    • Time to deploy (low)
    • Mean time between deployments (low)
    • Number of people required for a deployment (low)
  • Non-production friction
    • Uptime percentage (high)
    • Frequency and length of downtimes (low)
  • Number of manual tasks (low)
The last one may be unnecessary - it often falls out of what a devops team is trying to do. This team you sometimes struggle to understand isn't an operations team - it's a development team building an application which, when used, results in a new application environment. Yes, we measure them on operational efficiency, but it's much more than that.

Operational efficiency becomes a measurement of the applications the devops team has created, similar to signups and engagement metrics for the web applications we work so hard to deploy. Instead, we start to measure them in how well the application performs its job.

Thursday, September 24, 2015

Opensource isn't a right and it's not free

The trigger for this post is a reasonable request from a Windows user. Well, it's reasonable on its face. They put forward two questions, paraphrased below:

  1. Why is is so hard for opensource developers to get their stuff working on Windows?
  2. Why don't opensource developers care about the Windows population?
When it comes to operating systems, Linux and OSX are practically kissing cousins. Linux grows out of Minix and OSX grows out of NetBSD. There's been significant cross-pollenation and agreement. Both are POSIX-compliant, have relatively sane package systems, and provide a consistent command-line interface and a common configuration mechanism. Filepaths differ, but that's a simple case-statement. The same tools work the same.

Windows, on the other hand, is different. Very different. The entire architecture of the thing is alien. It has different assumptions for what an OS does and how someone will use it. It puts things in different places. It has different ways of reading and setting configuration. It has four different commandlines, each of them completely different. It has three unrelated packaging systems, none of which put stuff anywhere like the others and which don't even acknowledge the others' existences.

That should answer the "Why don't it work right?" question.

90% of all web applications are deployed onto some form of Linux. (Rarely on OSX, but that's the fault of Apple's licensing, not a problem with the platform. IIS may have had its day in the sun, but that sun has practically set.) Given Linux's historical problems with providing a good GUI, OSX has proven to be a great development platform for web applications destined for Linux. Given its great support for virtual machines, it also becomes a great platform for developing mobile applications (XCode and various VM solutions for Android). That hits the developer tetrafecta.

Because of this, most developers are very familiar with Linux and are likely using an OSX laptop. They will only use Windows for testing in IE.

This is what people are paid to do.

Let's talk about who are opensource developers. The average OSS developer (insofar as there is an average) is someone who's paid to do development using OSS tools during the day. They will usually work on these modules to initially scratch an itch for work - the initial release is usually a donation from an employer. The developer then starts to take pride in it at home. Most OSS developers realize that these modules act as advertisement for their skills. (Every job offer I've received in the past 10+ years has referenced my OSS work in some way.)

There is a vast world out there of things that I could learn. Whole new ways of
  • Storing and managing data
  • Interacting with huge datasets
  • Doing operations
  • Doing development
  • Thinking about parallel execution
I don't have a mousefart's chance in a windstorm to even keep up with what I already do for work. So why am I going to spend time learning how to manage a complex system that I will never use for work?

I'm not. And that's why I don't care if my opensource stuff works on Windows. If you want to make it work, then great. I love pull requests. If your employer wants to pay me to support it on Windows, we can discuss my rate.

But you don't have the right to make me work for free to accomplish something I don't care about. That's just not how things work.