Thursday, September 17, 2015

Devops - The toolchest

I gotten the same question from a few people about Devops - Where do I start?. "What exactly do you mean by everything?" Yes, there's a list in the post, but the question started to evolve over a few conversations to the related question "What is the full list of tools that a devops team needs to have an answer for".

So, treat this list as a checklist. Your IT operations may not need an example of one or another of these items, but you need to have thought about why you don't need it.

Caveats:
  1. I'm not interested in vim/emacs-type debates. So, questions of Puppet vs. Chef (in specific) aren't going to be in here. But, "configuration management" is going to be one of the line items. 
  2. My experience is primarily focused on web application development. Nearly everything is still correct for mobile, desktop, or embedded development. If there are differences in tool categories, I'd love to hear about them.

Backoffice Operations

This is the list of tools/services that are necessary to just run a company today. It doesn't matter what your company does or even if it writes its own software.
  • DNS
    • Depending on how other things are set up, you may not need this explicitly
  • Firewalls
    • Depending on how other things are set up, you may not need this explicitly
  • Mail (both receiving and sending)
    • Something like Google Mail
    • You may use something additional for sending emails from an application
  • Document Management
    • Something like Google Docs
  • Instant Messaging
    • Something like HipChat or Slack. Your teams are going to be doing it anyways, so best get in front of it.
  • Conferencing
    • Google Hangouts, Skype, or even Appear.in. Your teams are going to be doing it anyways, so best get in front of it.
  • Monitoring
    • Yes, you want to monitor even these things
    • Depending on your topologies, you may have multiple monitoring tools for different purposes, such as Pingdom plus Icinga plus CloudWatch.
  • Alerting / Escalation
    • While Nagios may do alerting, it's best to have a specific tool to handle that. I like PagerDuty.
  • Dashboards
    • The best teams have an up-or-not dashboard for every service the company depends on, whether it's internal or external. It cuts down on the emails that say "Is X down?".
    • Dashboards make executives happy. Non-executives love happy executives. Therefore, non-executives love dashboards, especially ones that executives can manipulate.

Development Tools

This is a list of the internal-facing tools that enable your ability to do development.
  • Source Control Management - specifically a DVCS like Git or Mercurial
  • Issue Tracker
    • This needs to be linked to your SCM so that commits will change issue status
  • Pull Request / Code Review tool
    • This should be the only tool that can merge to master
    • This should also update the issue tracker
  • Job Runner (aka, Continuous Integration / CI)
    • Runs tests upon pull requests (on create and update)
    • Runs packaging upon merge to master
  • Deployables repository
    • Maven, Yum, Rubygems - there's lots of package types and you need to have a place to put them.
  • Deployment process
    • This is Chef / Puppet / Salt / OpsWorks / whatever.
    • This needs to be integrated into however you have constructed your SCM process
    • This needs to update your issue tracker
    • Ideally, anybody in the company should be able to push a button and the button does the right thing.

Production Tools

When you're running your web application, these are the services you need to consider. You will also need to consider the development environment version of each of these. For example, if you're using S3 as your file store, do you provide a development S3 bucket (with all the attendant issues of using a shared resource for multiple developers) or do you use something like fake-s3?
  • Load Balancer
    • This includes SSL termination (you don't want to terminate SSL at your web application)
  • CDN / Static files
    • All your HTML, CSS, Javascript, images, and videos belong here.
    • This is different from your application caching layer, such as Squid (though you may reuse the same tool).
  • Application servers
    • Where all your code goes.
    • You may have multiple tiers of this, depending on your application's topologies.
  • Metrics gathering
    • Something like New Relic or Librato.
    • You may do monitoring on these metrics (for example, N internal server errors per M seconds)
  • Application Caching
    • This may be response caching (such as Squid) or data caching (such as Memcache)
    • It may be ephemeral caching (such as Memcache) or semi-permanent caching (such as Redis)
  • Relational Database(s)
    • More than just production, it's also the development choice and how those interact.
  • Key-Value Database(s)
    • More than just production, it's also the development choice and how those interact.
  • Backups
    • This includes how to test backups, including testing them in production
    • This includes disaster-recovery and offsite storage
  • Data destruction policies
    • This includes how to remove data from backups so that it's guaranteed to be destroyed
  • Non-production environment construction
    • Non-production environments cannot be the same as production. Exactly what tradeoffs are you making and why?
    • Developer environments are even less like production. How are you ensuring the developer environment is as close as possible so that "It works on my machine" is never uttered.