Thursday, September 24, 2015

Opensource isn't a right and it's not free

The trigger for this post is a reasonable request from a Windows user. Well, it's reasonable on its face. They put forward two questions, paraphrased below:

  1. Why is is so hard for opensource developers to get their stuff working on Windows?
  2. Why don't opensource developers care about the Windows population?
When it comes to operating systems, Linux and OSX are practically kissing cousins. Linux grows out of Minix and OSX grows out of NetBSD. There's been significant cross-pollenation and agreement. Both are POSIX-compliant, have relatively sane package systems, and provide a consistent command-line interface and a common configuration mechanism. Filepaths differ, but that's a simple case-statement. The same tools work the same.

Windows, on the other hand, is different. Very different. The entire architecture of the thing is alien. It has different assumptions for what an OS does and how someone will use it. It puts things in different places. It has different ways of reading and setting configuration. It has four different commandlines, each of them completely different. It has three unrelated packaging systems, none of which put stuff anywhere like the others and which don't even acknowledge the others' existences.

That should answer the "Why don't it work right?" question.

90% of all web applications are deployed onto some form of Linux. (Rarely on OSX, but that's the fault of Apple's licensing, not a problem with the platform. IIS may have had its day in the sun, but that sun has practically set.) Given Linux's historical problems with providing a good GUI, OSX has proven to be a great development platform for web applications destined for Linux. Given its great support for virtual machines, it also becomes a great platform for developing mobile applications (XCode and various VM solutions for Android). That hits the developer tetrafecta.

Because of this, most developers are very familiar with Linux and are likely using an OSX laptop. They will only use Windows for testing in IE.

This is what people are paid to do.

Let's talk about who are opensource developers. The average OSS developer (insofar as there is an average) is someone who's paid to do development using OSS tools during the day. They will usually work on these modules to initially scratch an itch for work - the initial release is usually a donation from an employer. The developer then starts to take pride in it at home. Most OSS developers realize that these modules act as advertisement for their skills. (Every job offer I've received in the past 10+ years has referenced my OSS work in some way.)

There is a vast world out there of things that I could learn. Whole new ways of
  • Storing and managing data
  • Interacting with huge datasets
  • Doing operations
  • Doing development
  • Thinking about parallel execution
I don't have a mousefart's chance in a windstorm to even keep up with what I already do for work. So why am I going to spend time learning how to manage a complex system that I will never use for work?

I'm not. And that's why I don't care if my opensource stuff works on Windows. If you want to make it work, then great. I love pull requests. If your employer wants to pay me to support it on Windows, we can discuss my rate.

But you don't have the right to make me work for free to accomplish something I don't care about. That's just not how things work.

Thursday, September 17, 2015

Devops - The toolchest

I gotten the same question from a few people about Devops - Where do I start?. "What exactly do you mean by everything?" Yes, there's a list in the post, but the question started to evolve over a few conversations to the related question "What is the full list of tools that a devops team needs to have an answer for".

So, treat this list as a checklist. Your IT operations may not need an example of one or another of these items, but you need to have thought about why you don't need it.

  1. I'm not interested in vim/emacs-type debates. So, questions of Puppet vs. Chef (in specific) aren't going to be in here. But, "configuration management" is going to be one of the line items. 
  2. My experience is primarily focused on web application development. Nearly everything is still correct for mobile, desktop, or embedded development. If there are differences in tool categories, I'd love to hear about them.

Backoffice Operations

This is the list of tools/services that are necessary to just run a company today. It doesn't matter what your company does or even if it writes its own software.
  • DNS
    • Depending on how other things are set up, you may not need this explicitly
  • Firewalls
    • Depending on how other things are set up, you may not need this explicitly
  • Mail (both receiving and sending)
    • Something like Google Mail
    • You may use something additional for sending emails from an application
  • Document Management
    • Something like Google Docs
  • Instant Messaging
    • Something like HipChat or Slack. Your teams are going to be doing it anyways, so best get in front of it.
  • Conferencing
    • Google Hangouts, Skype, or even Your teams are going to be doing it anyways, so best get in front of it.
  • Monitoring
    • Yes, you want to monitor even these things
    • Depending on your topologies, you may have multiple monitoring tools for different purposes, such as Pingdom plus Icinga plus CloudWatch.
  • Alerting / Escalation
    • While Nagios may do alerting, it's best to have a specific tool to handle that. I like PagerDuty.
  • Dashboards
    • The best teams have an up-or-not dashboard for every service the company depends on, whether it's internal or external. It cuts down on the emails that say "Is X down?".
    • Dashboards make executives happy. Non-executives love happy executives. Therefore, non-executives love dashboards, especially ones that executives can manipulate.

Development Tools

This is a list of the internal-facing tools that enable your ability to do development.
  • Source Control Management - specifically a DVCS like Git or Mercurial
  • Issue Tracker
    • This needs to be linked to your SCM so that commits will change issue status
  • Pull Request / Code Review tool
    • This should be the only tool that can merge to master
    • This should also update the issue tracker
  • Job Runner (aka, Continuous Integration / CI)
    • Runs tests upon pull requests (on create and update)
    • Runs packaging upon merge to master
  • Deployables repository
    • Maven, Yum, Rubygems - there's lots of package types and you need to have a place to put them.
  • Deployment process
    • This is Chef / Puppet / Salt / OpsWorks / whatever.
    • This needs to be integrated into however you have constructed your SCM process
    • This needs to update your issue tracker
    • Ideally, anybody in the company should be able to push a button and the button does the right thing.

Production Tools

When you're running your web application, these are the services you need to consider. You will also need to consider the development environment version of each of these. For example, if you're using S3 as your file store, do you provide a development S3 bucket (with all the attendant issues of using a shared resource for multiple developers) or do you use something like fake-s3?
  • Load Balancer
    • This includes SSL termination (you don't want to terminate SSL at your web application)
  • CDN / Static files
    • All your HTML, CSS, Javascript, images, and videos belong here.
    • This is different from your application caching layer, such as Squid (though you may reuse the same tool).
  • Application servers
    • Where all your code goes.
    • You may have multiple tiers of this, depending on your application's topologies.
  • Metrics gathering
    • Something like New Relic or Librato.
    • You may do monitoring on these metrics (for example, N internal server errors per M seconds)
  • Application Caching
    • This may be response caching (such as Squid) or data caching (such as Memcache)
    • It may be ephemeral caching (such as Memcache) or semi-permanent caching (such as Redis)
  • Relational Database(s)
    • More than just production, it's also the development choice and how those interact.
  • Key-Value Database(s)
    • More than just production, it's also the development choice and how those interact.
  • Backups
    • This includes how to test backups, including testing them in production
    • This includes disaster-recovery and offsite storage
  • Data destruction policies
    • This includes how to remove data from backups so that it's guaranteed to be destroyed
  • Non-production environment construction
    • Non-production environments cannot be the same as production. Exactly what tradeoffs are you making and why?
    • Developer environments are even less like production. How are you ensuring the developer environment is as close as possible so that "It works on my machine" is never uttered.

Friday, September 11, 2015

Devops - The "Process"™

Last post, I laid out the argument for using source control in operations. To summarize, put all the things in source control so you can control them. Except tools don't control what people do - only processes can do that. So, let's work through what the process should be, now that everything is in a known place.

First, what are we looking for in this process? It's really hard to know if you've achieved something if we don't know what we're trying to achieve. Sounds pretty obvious, I know, but think back - how many projects have you worked on where that wasn't done? How successful were those projects?

What I want out of my operations process is this:
  1. A guarantee that every piece of infrastructure was:
    1. created solely from things in source control.
    2. changed solely from things in source control.
  2. A guarantee that I can find out:
    1. what changed
    2. why it changed
    3. when it changed
    4. who reviewed/approved the change
    5. when it was applied to each instance in each environment
      1. and which instances in which environments it hasn't been applied to
    6. what is the dependency graph between this and other changes
Oh, and it has to be unobtrusive and not get in the way and be easy to understand. Not so hard.

These guarantees look very similar to the guarantees that the standard Agile development processes provide. The first is obviously implemented - devs aren't allowed to touch deployed servers. So, any change they want to see in the application must come from things in source control. (This is different from most non-Agile processes where code changes to deployed servers are sometimes emailed (or even IM'ed!) from dev to ops and implemented by hand because changes are so infrequent.)

The second is implemented through discipline. There is nothing in Agile that says all changes must be associated with an issue, confined within a branch, and go through a multi-person development/review process. But, every Agile team I have ever seen or heard of does it that way because doing it any other way has led to uncomfortable situations and unanswerable questions. It's so ubiquitous that tools now treat this as "Agile mode". Github's pull request process even creates an issue# just to ensure that there is a record in the issue tracker.

In order to apply this process to operations, we will need discipline of our own. Any operations team, to do its job, will be applying changes manually. This gets the job done now and is easy to reason about. Not a bad thing. Except, how do you know that what was done is what was defined in source control? This is where the discipline comes in.

Ideally, all changes to deployed servers happen via script. If those scripts are executed from a place separate from the deployed servers, that's even better. (For example, only using the AWS SDK to touch your AWS infrastructure.) The script can be a Puppet/Chef/Salt thing or Ruby scripts or even Bash. It doesn't matter, so long as the computer is the one actually doing the changes. The scripted-ness is checked in and you treat it like application code. Including deployment.

In short, you treat deployments of your changes exactly as you treat deployments of the applications under your management. Which makes sense because an application is more than just code - it's also the infrastructure.

"That's great in an ideal world, Rob, but nothing I have is scripted. It's all checklists. Now what?"

Checklists are scripts that run against a human virtual machine. If you look at the checklist, you should be able to replace many of the steps with "execute this script". Maybe even condense 2-3 steps into one script. Over time (and this is definitely a journey, not a destination), each checklist will condense into "invoke these N scripts". Which, itself, is just one script.

"That great, but I don't even have checklists. My people just know what to do."

If they "just know what to do," then you do not. Do not know what they do, do not know they have done it, do not have control. You're responsible that they do it. And, most importantly, you're responsible that new people can learn it. If it isn't written down, how are new people learning their job?

Tuesday, September 8, 2015

Devops - Where do I start?

Last post, I laid out a series of questions every operations team should be able to answer. Everyone may agree that this list of operational capabilities is good, but getting from here to there is far more complicated. What should we do first?

The absolute first thing every operations team must do is get everything into source control. If it's not in source control, you cannot audit it, review it, or manage it. If a change happens, you don't know how, when, what, or why and there's no chain of custody showing what the approvals were. In short, you do not control it. Not controlling the stuff that makes your stuff isn't sane devops.

This implies there needs to be solid source control. First choice is what to use. Always use a distributed version control system (aka, DVCS) - Git or Mercurial if at all possible. Using a DVCS has two massive advantages over a centralized version control system, like Subversion, Perforce, or CVS (no links to bad choices!). First, DVCS's are far better tools for managing development changesets - lots of discussion about that around the web. The better reason, for operations, is that every clone can be used as the new master if everything else goes pear-shaped. Remember - the first question asks if you are capable of recovering everything else if you have your production backups and the latest checkout of source control. You cannot recover source control itself with the latest checkout of Subversion or CVS. You can do so with Git or Mercurial.

If at all possible, use a service. (This, btw, is going to be a theme I'll expand on more later.) GitHub, AWS's CodeCommit, Atlassian's BitBucket, or Google's Cloud Source Repositories are all excellent choices, along with many others. They all provide private repositories and are extremely scalable and secure. For nearly every organization, these services are capable enough. If you're already using Google's AppEngine or the AWS suite of services, the choice is pretty simple. If you're using the hosted Atlassian suite, BitBucket again seems to be an easy choice. Github is an excellent choice in most other scenarios. Sometimes, for corporate reasons, you have to host internally. In that case, you should strongly consider using GitLab or Atlassian Stash. In all cases, you should be using the same tools as your developers.

Once you picked a tool and a method for hosting, the next step is to get everything into it. Literally and truly everything. If it's a script, check it in. If it's a configuration file, check it in. If it's a Chef recipe, Puppet manifest, Salt pillar, or any other file for a similar tool - yup, check it in. All the secrets (GPG-encrypted first, of course). All the scripts. All the configuration. Everything.

If you don't have an automated way of building it, then write down how to build and check that in. Unless you have a really good reason not to, use a text-based markup language. It's important to use text because text is diff-able by Git/Mercurial. If it's not diff-able, then it becomes very difficult to see the differences when someone wants to change something. I prefer GitHub-flavored Markdown, but there's plenty of other good choices. Use the one that makes the most sense with the rest of your tooling landscape.

Note: I recommend both text (for diff-ability) and GPG-encryption (for secrets). GPG encryption is inherently not diff-able. For secrets, this is good. For instructions and scripts, you want diff-ability.

What's the minimal list of servers/services/activities that you need to check into source control? You guessed it - everything.
  • DNS / Network definitions
  • LDAP / IAM / User authentication lists
  • Mail servers (if you manage mail internally, otherwise treat it as an external service)
  • Monitoring and alerting definitions
    • Especially if you use something like PagerDuty for alerting
  • GPG-encrypted master passwords for external services
  • GPG-encrypted keys (and other authentication methods) for external services
  • GPG-encrypted SSL certificates
  • Server construction methods (for your application)
    • Including where and how to get the base images
  • Application deployment methods
  • Service construction methods (for your internally-hosted supporting services)
    • For example, CI (Jenkins, Stash, etc)
  • Any and all desktop support (including VPN clients, etc)
  • Anything else you are responsible for
(Note: While it doesn't explicitly say it in the documentation, but you can GPG-encrypt a file for multiple recipients and any of them can decrypt it. This is a good thing. Note that you will need to re-key all the secrets whenever someone leaves, not just re-encrypt them. You needed to do this anyway. You only need to re-encrypt them when someone new comes in.)

All of these things may not all live in the same repository. But, don't hesitate to put it into some repository just because you don't know the perfect place. You can always move it later. And, the movement itself will document the growing understanding you have of how to manage the infrastructure.

If you don't know how to rebuild a server, take your best guess and stick that in the repository. As you learn more, you will update what's there. The repository logs will be a trail of exactly what you had to do in order to get all the information you currently have.

The next post will talk about what to do with this repository once you've built it.

Thursday, September 3, 2015

Questions to ask Operations

(. . . or, if you're the devops team, questions you should be asking yourself)

Devops teams exist to make the answers to the following questions a resounding "Absolutely!". If there is any question in this list that you're not 110% confident in the "Absolutely!", then that's the next thing to work on. And, yes, this list is ordered from most important to least important.
  1. Can we confidently rebuild our production environment from source control and backups of data?
    1. In under an hour?
    2. Including all monitoring, alerting, and metrics gathering?
  2. Can we confidently terminate one person's access?
    1. In under 10 minutes?
    2. With one command?
  3. Can we confidently create a instance of the application?
    1. That is a structural clone of production?
    2. With reasonable fake data?
    3. In one command?
    4. On a laptop?
  4. Can we confidently turn off any one server in production at any time?
    1. With zero impact or visibility to users?
    2. Including your:
      1. database master?
      2. session store?
  5. Can we confidently tell anyone to take 3 months leave to care for a sick family member?
    1. Without ever calling them once?
  6. Can we confidently hire into any spot and have that person fully authenticated and authorized?
    1. With nothing missing?
    2. In their first hour?
    3. Before they even show up?
  7. Can we confidently hire someone into IT and have them make a change to production?
    1. In their first week?
    2. In their first day?
  8. Can we confidently say that what is reviewed in QA is EXACTLY what can go to production?
  9. Can we confidently let anyone promote from one environment to the next?
    1. With a button?
    2. Showing them exactly what will be promoted?
      1. As issue numbers linked from your issue tracker?
    3. With rollbacks?
  10. Do you have tests of your infrastructure?
    1. Including monitoring, alerting, and metrics gathering?
    2. Including external interfaces?
    3. Run as part of a CI service?
    4. With automated coverage statistics?
      1. Over 90%?
Implicit in every question is the follow-up "How do you know?" If you ask yourself these questions and cannot point to where you did that yesterday (or the last time, in the case of authn/z changes), then you're treating your infrastructure as magic.

Next post discusses where to start.

    Thursday, August 27, 2015

    The Packager DSL - The second user story

    1. DSLs - An Overview
    2. The Packager DSL - The first user story
    3. The Packager DSL - The second user story
    Our main user is an odd person. So far, all we've made is a DSL that can create empty packages with a name and a version. No files, no dependencies, no before/after scripts - nothing. But, instead of asking for anything useful, the first response we get is "Huh - I didn't think 'abcd' was a legal version number." Some people just think in weird ways.

    Our second user story:
    Throw an error whenever the version isn't an acceptable version string by whatever criteria are normally employed.
    Before we dig into the user story itself, let's talk about why this isn't a "bug" or a "defect" or any of the other terms normally bandied about whenever deployed software doesn't meet user expectations. Every time the user asks us to change something, it doesn't matter whether we call it a "bug", "defect", "enhancement", or any other word. It's still a change to the system as deployed. Underneath all the fancy words, we need to treat every single change with the same processes. Bugfixes don't get a special pass to production. Hotfixes don't get a special pass to production. Everything is "just a change", nothing less and nothing more.

    In addition, the "defect" wasn't in our implementation. It was in the first user story if it was anywhere. The first user story didn't provide any restrictions on the version string, so we didn't place any upon it. And that was correct - you should never do more than the user story requires. If you think that a user story is incomplete in its description, you should go back to the user and negotiate the story. Otherwise, the user doesn't know what they're going to receive. Even worse, you might add something to the story that the user does not want.

    Really, this shouldn't be considered a defect in specification, either. That concept assumes an all-knowing specifier that is able to lay out fully-formed and fully-correct specifications that never need updating. Which is ridiculous on its face. No-one can possibly be that person and no-one should ever be forced to try. This much tighter feedback loop between specification to production to next specification is one of the key concepts behind Agile development. (The original name for devops was agile systems administration or agile operations.)

    All of this makes sense. When you first conceive of a thing, you have a vague idea of how it should work. So, you make the smallest possible thing that could work and use it. While you start with a few ideas of where to go next, the moment you start using it, you realize other things that you never thought of. All of them (older ideas and newer realizations) become user stories and we (the developers and the users together) agree on what each story means, then prioritizes them in collaboration. Maybe the user wants X, but the developers point out Y would be really quick to do, so everyone agrees to do Y first to get it out of the way. It's an ongoing conversation, not a series of dictates.

    All of which leads us back to our user story about bad version strings. The first question I have is "what makes a good or bad version string?" The answer is "whatever criteria are normally employed". That means it's up to us to come up with a first pass. And, given that we're building this in a Ruby world, the easiest thing would be to see what Ruby does.

    Ruby's interface with version strings would be in its packages - gems. Since everything in Ruby is an object, then we would look at Gem and its version-handling class Gem::Version. Reading through that documentation, it looks like the Ruby community has given a lot of thought to the issue. More thought than I would have realized was necessary, but it's good stuff. More importantly, if we use Gem::Version to do our version string validation, then we have a ready-documented worldview of how we expect version strings to be used.

    Granted, we will have to conform to whatever the package formats our users will want to generate require. FreeBSD may require something different from RedHat and maybe neither is exactly what Gem::Version allows. At that point, we'll have failing tests we can write from user stories saying things like "For Debian, allow this and disallow that." For now, let's start with throwing an error on values like "bad" (and "good"). Anything more than that will be another user story.

    As always, the first thing is to write a test. Because we can easily imagine needing to add more tests for this as we get more user stories, let's make a new file at spec/dsl/version_spec.rb. That way, we have a place to add more variations.

    describe Packager::DSL do
        context "has version strings that" do
            it "rejects just letters" do
                expect {
                    Packager::DSL.execute_dsl {
                        package {
                            name 'foo'
                            version 'abcd'
                            type 'whatever'
                }.to raise("'abcd' is not a legal version string")

    Once we have our failing test, let's think about how to fix this. We have three possible places to put this validation. We may even choose to put pieces of it in multiple places.
    1. The first is one we've already done - adding a validation to the :package entrypoint. That solution is good for doing validations that require knowing everything about the package, such as the version string and the type together.
    2. The second is to add a Packager::Validator class, similar to the DSL and Executor classes we already have. This is most useful for doing validation of the entire DSL file, for example if multiple packages need to be coordinated.
    3. The third is to create a new type, similar to the String coercion type we're currently using for the version.
    Because it's the simplest and is sufficient for the user story, let's go with option #3. I'm pretty sure that, over time, we'll need to exercise option #1 as well, and possibly even #2. But, YAGNI. If we have to change it, we will. That's the beauty of well-built software - it's cheap and easy to change as needed.

    class Packager::DSL < DSL::Maker
        add_type(VersionString = {}) do |attr_name, *args|
            unless args.empty?
                rescue ArgumentError
                    raise "'#{args[0]}' is not a legal version string" 
        add_entrypoint(:package, {
            :version => VersionString,
        }) ...

    Note how we use Gem::Version to do the validation, but we don't save it as a Gem::Version object. We could keep the value as such, but there's no real reason (yet) to do so. ___get() and ___set() (with three underscores each) are provided by DSL::Maker to do the sets and gets. The attr_name is provided for us. So, if we wanted, we could reuse this type for many different attributes.

    We could add more tests, documenting exactly what we've done. But, I'm happy with this. Ship it!

    There's something else we haven't discussed about this option in particular. Type validations happen immediately when the item is set. Since values can be reused within a definition, they are safely reusable. For example (once we have defined how to include files in our packages), we could have something like:

    package {
        name 'some-product'
        version '1.2.44'
        files {
            source "/downloads/vendor/#{name}/#{version}/*"
            dest "/place/for/files"

    If we defer validation until the end, we aren't sure we've caught all the places an invalid value may have been used in another value. This is why we attempt a "defense in depth", a concept normally seen in infosec circles. We want to make sure the version value is as correct as we can make it with just the version string. Then, later, we want to make sure it's even more correct once we can correlate it with the package format (assuming we ever get a user story asking us to do so).

    Wednesday, August 26, 2015

    Creating the Packager DSL - Retrospective

    1. Why use a DSL?
    2. Why create your own DSL?
    3. What makes a good DSL?
    4. Creating your own DSL - Parsing
    5. Creating your own DSL - Parsing (with Ruby)
    6. Creating the Packager DSL - Initial steps
    7. Creating the Packager DSL - First feature
    8. Creating the Packager DSL - The executor
    9. Creating the Packager DSL - The CLI
    10. Creating the Packager DSL - Integration
    11. Creating the Packager DSL - Retrospective
    We've finished the first user-story. To review:
    I want to run a script, passing in the name of my DSL file. This should create an empty package by specifying the name, version, and package format. If any of them are missing, print an error message and stop. Otherwise, an empty package of the requested format should be created in the directory I am in.
    Our user has a script to run and a DSL format to use. While this is definitely not the end of the project by any means, we can look back on what we've done so far and learn a few things. We're also likely to find a list of things we need to work on and refactor as we move along.

    It helps to come up with a list of what we've accomplished in our user story. As we get further along in the project, this list will be much smaller. But, the first story is establishing the walking skeleton. So far, we have:

    1. The basics of a Ruby project (gemspec, Rakefile, Bundler, and project layout)
    2. A DSL parser (using DSL::Maker as the basis)
      1. Including verification of what is received
    3. An intermediate data structure representing the packaging request (using Ruby Structs)
    4. An Executor (that calls out to FPM to create the package)
    5. A CLI handler (using Thor as the basis)
      1. Including verification of what is received
    6. Unit-test specifications for each part (DSL, Executor, and CLI)
      1. Unit-tests by themselves provide 100% code coverage
    7. Integration-test specifications to make sure the whole functions properly
    8. A development process with user stories, TDD, and continuous integration
    9. A release process with Rubygems
    That's quite a lot. We should be proud of ourselves. But, there's always improvements to be made.

    The following improvements aren't new features. Yes, an empty package without dependencies is almost completely worthless. Those improvements will come through user stories. These improvements are ones we've seen in how we've built the guts of the project. Problems we've noticed along the way that, left unresolved, will become technical debt. We need to list them out now so that, as we work on user stories, we can do our work with an eye to minimizing these issues. In no particular order:
    • No error handling in the call to FPM
      • Calling an external command can be fraught with all sorts of problems.
    • Version strings aren't validated
    • There's no whole-package validator between the DSL and the Executor
    • We probably need a Command structure to make the Executor easier to work with
    • We probably want to create some shared contexts in our specs to reduce boilerplate
      • This should be done as we add more specs
    It's important to be continually improving the codebase as you complete each new user story. Development is the act of changing something. Good developers make sure that the environment they work in is as nimble as possible. Development becomes hard when all the pieces aren't working together.

    Over the next few iterations, we'll see how this improved codebase works in our favor when we add the next user stories (validating the version string, dependencies, and files).