Wednesday, July 31, 2013

Deployment is not source control (pt 3)

(This is the third part of a series on deployment. See part 1 and part 2.)

The deployment process I've outlined in the previous two posts works really well in a continuous deployment environment, where changesets that are merged to master go up quickly to production. It also works really well for mainline development, where changes track in one direction only and all changesets start from master. When there are no bugs in production that have to be fixed right away, before what's in master can be promoted to production.

You and I don't work on projects or teams that run continuous deployment. (Very few teams do, for good reasons.) There will come a time, possibly often, where you will need to fix a bug in production and cannot run the change through the master branch first. You need a hotfix.

The hotfix process is very similar to the mainline process outlined in part 1. The primary difference are the branch and merge points. Mainline development always branches from and merges to master. (You never develop directly in master.) Hotfixes, on the other hand, branch from and merge to the version of master which was used to build what is currently in production. They go through the same process of building a package, promoting to a hotfix-test environment (separate from the mainline test environment), then promoting to production. (This requires a separate hotfix-test package repository.)

At this point, we have successfully promoted our hotfix to production. One last item remains - we need to merge our hotfix into the current mainline development. If you've done everything right, this is one of the only two places where merge conflicts can occur. (The other is when pulling master into your development branch.) Both git and mercurial will apply the new diff (of the hotfix) to the sequence of changes right after the production diff, then apply the subsequent changes made to master on top of it. If any of the diffs conflict, the merging developer will need to fix the conflicts.

After the conflicts (if any) have been fixed, all that's left is to pull the hotfix changes into any existing development branches. And, we're done!

Monday, July 29, 2013

Deployment is not source control (pt. 2)

(This is the second post in a series on deployment. See part 1 and part 3.)

+Roman Daszczyszak had a question on Google+ in response to Deployment is not source control. He asked:
While I agree with your points, how do you apply this to developing a web application? My team has run into problems trying to properly package a Django-based app with a Mongo backend. Thoughts?
Whenever anything is installed (or deployed - it's the same thing), there are a set of steps that must be completed. For a standard web application (Django, Rails, etc), that could be something like:

  1. Login to the webserver.
  2. Copy the source code (via git, scp, rsync, etc) into the installation directory (/var/www, etc).
  3. Install any necessary prerequisites (frameworks, libraries, language modules, etc).
  4. Run a script to set things up (compiling / uglifying, configuration, softlinks, etc).
  5. Restart the service (Apache, FastCGI, etc).
  6. Repeat this process for each webserver in the group.
OS packages (rpm or deb) are designed to handle steps 2-5. While each packaging format has their stronger and weaker points, all of them can do the following:
  • Bundle files into a logical hierarchy
  • Execute scripts (in any language) at different points in the installation process
  • Specify prerequisites (including specific versions to require or exclude)
  • Execute tests to ensure a good installation
  • Allow for arbitrary metadata to be stored for later queries
  • Rollback to a prior installed version (most important function)
One important point to remember is that the files in source control are often not the files that belong on the production server. While this is true for compiled applications (such as Apache and MySQL), it has become true for web applications as well. Javascript and CSS assets are often uglified and compressed. You may not even be writing in CSS - Sass/Compass and Less are becoming excellent frameworks to use. Your Javascript assets may have been written in Coffeescript, your HTML in Jade or HAML, and images may be sprited.

This leads us to an important rule of thumb:
Each server should only exactly what it needs to perform its tasks and nothing more.
Applying that to our packaging means the package should only install the compiled, compressed, and otherwise-mangled files that will actually be served from the webserver. If you're putting gcc, git, or make on your production servers, you're doing it wrong. The package should have the compiled versions, not the source versions. It may have templated configuration files ("Insert hostname here"), but the template isn't installed - only the result of filling in the template.

Frameworks, such as Django, and datastores, such as MongoDB, have already been built into packages by their maintainers. Specifying them as dependencies allows the package to be self-describing.

The metadata associated with the package is important to the success of the process. The package version is required. I've found that using "1.[timestamp]" is a good monotonically-increasing version number. As this is only released internally, a nonsensical version number is good enough.

All the packaging formats allow setting arbitrary metadata on a package. A good set of metadata includes:
  • The timestamp this package was built.
  • The SCM identifier of the commit used to built the package (git SHA1, SVN version, etc).
  • The issue number for the changeset that was merged to master that triggered this package build.
With that metadata, any person in the company can hit an internal website and see exactly what the last build to each environment is and what issues are in test that aren't in production. Your issue tracker should be able to provide this, but your servers should also be able to tell you this.

So far, we have discussed putting together the application and its on-server dependencies. Roman's question asked about MongoDB. I'll expand it to datastores in general. It's good practice to keep application servers and datastores on separate horizontal groups. This allows operations to balance the needs of one vs. the other. It's extremely rare for both application and datastore to grow at exactly the same pace. So, we have to figure out a way of managing cross-server dependencies. (This problem also arises when dealing with multiple applications supporting the same product. The solution is the same.)

Datastore change management can and should also be managed with packages. Packages aren't just a set of files to be applied. A package is a set of actions that need to be taken in order to upgrade an installation from version X to version Y. The most common thing to do is provide a new set of files, but a set of actions (such as "ALTER TABLE" statements) is also appropriate. By applying datastore changes with packages, you are now able to ask your datastore "What version are you?" and make decisions based on that. One decision could be "Version X of the application cannot be installed because the datastore is not at version Y."

Roman - I hope this helps!

Thursday, July 25, 2013

Deployment is not source control

(This is the first post in a series on deployment. See part 2 and part 3.)

A source control manager (or SCM) is the second most-important tool an application team can use, right after a good editor. It preserves history, maintains context, and makes perfect julienne fries, every time. Everything should go into source control - source code, tests, requirements, configuration, build scripts, deployment tools. Everything. Building an application that isn't managed in source control is like trying to cross the Grand Canyon on a high wire - without the high wire.

But, as much as source control is a phenomenal tool, it is not the right tool for every purpose. No-one would replace vim or Sublime with Git or Mercurial. That just makes no sense. Which is why I'm always baffled when I come into a team and see deployments managed with git branches.

Deployment is the process of taking an product from environment A to environment B, usually from test (or beta) to staging (or user-acceptance), then to production. An environment isn't just the application code that lives in a single server. It's the entire stack of processes, such as the database, application(s), third-party libraries, configuration, background jobs, and services that go into providing the features of your product. Ensuring that all the different pieces of that stack are in sync at all times is the major function of deployment.

In order to do this, the deployment tool must understand dependencies. Dependencies between application code and third-party libraries on the same server is just the start of this. Dependency-tracking across server groups, between the application code and the database version, and even configuration changes are all components of this. And everything has to move in lockstep.

There is no single tool that, to my knowledge, manages the entire stack in this holistic fashion. But, an application team can make life a lot simpler for themselves by doing one simple thing - deploy with OS packages and not source control.

OS packaging tools (such as RPM and APT) have been around for decades. They are the way to deploy libraries and applications to Linux (and Windows, thanks to Chocolatey). They manage dependencies, put everything in the right place, update configuration, verify compatibility, and do everything else necessary to make sure that, when they're done, the requested thing works. Often, this means setting specific compilation switches (or even pre-compiling for specific architectures). They encode knowledge that is often hard-won and difficult to rediscover. And, finally, they let a user ask the server what is installed, revert to a previous version, or even uninstall the package (and all downstream dependencies) altogether.

Source control does not do any of those things. Source control is designed to do one and only one thing - track and manage changes between versions of groups of text files. Modern SCMs (such as Git and Mercurial) do this very very well.

Managing a deployment requires a package. When QA approves a specific deployment within their test environment, operations needs to "make prod look like test". The way ops can ensure that production will look exactly like test is to build production exactly as test was built. Server build tools (like Puppet and Chef) help ensure that the servers (or VMs) are built exactly the same every time. The application (and its configuration) needs to have the same treatment.

So, I recommend the following process:

  1. Do your development as you normally do right now. (I will have thoughts on the rest of this later, but those are another set of posts.)
  2. Once a changeset is merged into primary branch (master, for Git or default for Mercurial):
    1. It is tagged with the name of the changeset.
    2. An OS package is built and uploaded to the test package repository.
  3. The OS package is deployed to the test environment.
    1. This can happen either automatically or as a result of a user action.
  4. QA verifies the build.
    1. If it fails, issues are opened and the development process begins anew.
    2. If it fails catastrophically, the environment is reverted.
  5. When QA passes the build, the package is copied into the production package repository.
    1. The commit that was used to build this package is tagged with the date it was promoted to production.
  6. The package is applied to the production environment at the appropriate time.
At the point of merging into the primary branch, the SCM has finished its job. It's now the job of the package manager to replicate that branch out to the various environments in the correct order with the correct dependencies.

Tuesday, July 23, 2013

The canonical source

No, I'm not talking about Mark Shuttleworth's attempt to define Linux for the rest of the world. (Though, come to think of it, that's probably the source of the name. He wants to be the canonical source of all things open-source. Hence, the creation of Juju when Puppet and Chef would seem to be perfectly good solutions to the devops problem.)

To be the canonical source for something means to be the ultimate authority for how that thing is structured. Whenever a new copy of this thing is created, the canonical source is consulted to create it. Whenever a change is made, the change is first made in the canonical source and those changes propagate outwards from it. If this was a religious blog, we would discuss the Bible and its roots in Alpha and the Omega. If this was a legal blog, we'd be talking about constitutions and common law.

In IT, there are dozens of things that are copied and slung around. Database schemas, server configurations, applications, third-party tool configurations, and the like. And each one of them has a canonical source.

The only issue most organizations have is that they haven't clearly defined exactly what the canonical source is for each component in their applications. (Frankly, most organizations don't have a complete list of the components!) Why is this an issue?

Let's digress for a minute and peek over at the DRY principle. It normally is discussed in terms of code and is the explanation given for why refactoring is a good idea. Instead of having the same validation code at the beginning of four subroutines, you pull out the validation into its own subroutine and call that instead. This way, if that validation would ever change (and it will change), it is changed in one place and everywhere that needs it automagically (an awesome word!) receives the update. Without you having to do anything. Without you even having to know everywhere that needed the change.

Most people involved in the creation of software instinctively understand that this is a good idea in software. There should be a single place where the Luhn Algorithm or the email verification algorithm is defined. Why would someone want to have it in two places?

Can the same be said about your database schema? Or the structure for your production servers? What about your testing infrastructure? Do you have canonical sources for each one? And if you do, do you have processes in place that ensure the canonical source is modified first, then the changes flow from there? Is your documentation built from the same source?

If your team cannot point to the canonical source of something, that means one of three things:

  1. There isn't a canonical source.
  2. There's more than one canonical source.
  3. The canonical source is in production.
If there wasn't a canonical source, or a Single Source Of Truth, your application would be in a disastrous mess. Regressions would be occurring on a regular basis and testing would be ineffective (at best). Having more than one canonical source is the exact same thing. (Having two ultimate sources of truth is exactly the same as having none. Which is exactly what the Roman Catholic Church realized about popes.)

So, this means your canonical source is whatever is currently working in production. This would seem to have a nice poetic ring to it - whatever your users see is the canonical source for what your team works on. Many teams operate exactly like this.

Well, they operate poorly like this. Two problems rear their ugly heads very quickly. The first is usually simple-ish to fix. How should someone build a new instance of the canonical item (such as a database or server)? Cloning production would require (in many cases) taking something offline. Taking any part of production offline is usually a "Bad Idea"(tm), so it's only done very rarely.

The second problem is much more insidious. If production is canonical and it is the cleanroom, how do you safely push changes up to it?

Saturday, July 20, 2013

Promoting an environment of clean rooms

In a previous post, I talked about how to take something that's a mess and keeping it from getting worse. You may not be able to fix it, but you can at least not contribute to the mess. Today is about what a clean room gets you.

The concept of "cleanroom" work (notice the pun?) appears in a lot of disciplines. All computing hardware, especially CPUs and disks, are built in cleanrooms. The tolerances for these (and most other computing hardware) are so small that any dust or electrostatic charge would make the final product useless. (Even with this, Intel is rumored to have at best a 95% yield, much worse on newer processes.)

Like so many other things, the concept originates with medicine. When the germ theory was just starting to gain acceptance, early disease control would distinguish between "cleanroom" and "sickroom". The protocol (official process) was that anything in the cleanroom could be taken into the sickroom, but nothing from the sickroom could be taken into the cleanroom without being sterilized.  This isolates the germs in the sick areas and prevents contamination. This protocol is still used today, both by parents dealing with sick children and as the basis for how epidemiologists work with highly infectious diseases like Ebola and meningitis.

The best way to think about this is with bubble children. Due to no immune system, they have to live in a completely isolated and sterile environment - a bubble. Everything (diapers, food, water, books) has to be sterilized and kept sterile before it can be introduced into the bubble. The moment anything non-sterile (a dirty fork) touches something meant for the bubble (a book), the book is now back at square 1, having to go through the whole sterilization procedure again.

In IT, we promote our applications from environment to environment (notice the second pun?). Developers do their work in a "development" or "dev" environment (ideally on their local machines). When they have finished, the work is promoted to a "test" or "beta" environment to be evaluated by the QA staff. When they have finished, the work is promoted (possibly through a "staging" environment for regression testing) to "production". Production is the bubble.

This idea of successive steps of verification is exactly the same as disinfecting anything that leaves a sickroom. Instead of dealing with infections, we deal with bugs. When someone writes a line of code, that line is maximally buggy. The more tests we apply to it (code review, unit-tests, integration tests, regression tests, etc), the more we can say "This line of code doesn't have that bug." We achieve a relatively high degree of confidence that the new line of code is sufficiently bug-free that it can move into a "cleaner environment". Eventually, the new line of code goes through enough cleanings that it can be introduced into the production bubble.

Environments, like cleanrooms, must be isolated from anything that could impact them. In an odd twist, cleanrooms have their own infectious nature. In order for a cleanroom to remain clean, anything that can feed into or affect the cleanroom has to be at the same level of cleanliness. If you touch something in the bubble, you're part of the bubble. If something touches you, then that something is now also part of the bubble. Being part of the bubble means you have to qualify for the exact same stringent requirements for cleanliness, or bug-free-ness.

So, that one cronjob server that is used, among other things, to control when the production backups occur? It's officially part of the prod bubble and has to be treated as such.

Finally, connection points between environments must be dead-drops. This is a concept from spycraft where one person (usually the traitor) puts the stolen files somewhere. The other person (usually the handler) picks up the files later. The two people are never in contact with each other, reducing the chance that a counter-espionage team will be able to figure out the hole in their security. In IT terms, updates to a cleaner environment (such as production) are pushed to a central location (such as an internal package server for rpm or apt). Agents within the production environment (such as Puppet or Chef) are then able to retrieve and deploy the updates on their schedule. Possibly after doing checks to make sure the updates are safe to deploy.

Wednesday, July 17, 2013

What is good documentation?

Imagine this - you are assigned to a new project. An admin gives you permissions on the repository and you grab a copy of the code (via git, svn, or whatever). As you look in the directory, you see a mismash of files and directories. You recognize some of them (lib/, src/, test/, and so on), but some of them are weird (server/, conf/ vs. config/ vs. cnf/). And, sure enough, the project lead is on vacation this week and the other developer is in meetings all morning. Slashdot, here we come, right?

This sorry situation appears to be the norm in most development teams I've worked with. Most information about how to set up an environment, how to function within the team, and what expectations are tends to be transmitted by word of mouth. What little that is written down is always out of date or just blatantly wrong.

Everyone deplores the state of affairs, but very few do anything about it. It seems like a Sisyphean task, constantly pushing that boulder uphill without any support from anyone else. Management certainly doesn't agree that documentation is as important as shipping code. Developers are notorious for not wanting to write anything except code. Testers are supposedly clueless (more on this in another post). And, worst of all, ops thinks everyone else is an idiot out to make their life harder than it already is. No-one trusts anyone, so no-one is going to take the risk of actually writing something down.

Most application teams (dev, ops, analysts - everyone!) operate like guilds from the Middle Ages. Not with a deliberate intent to hide knowledge for the purposes of maintaining a monopoly to collect rents (though the effect ends up being the same). Instead, there's no provision made to actually manage the generation, storage, and transmission of information. Instead, the lowly apprentice has to petition the journeymen and masters of the team to provide them with whatever scraps of information they can get. This becomes the root cause for two unfortunately common phenomena:
  1. Many companies expect a new employee to take up to 3 months to become useful.
    • And consume a "useful resource" at the same time.
  2. Senior developers are never allowed move off a project.
Good documentation fixes both of these problems. 

So, what does good documentation even look like? The short answer is "good documentation is sufficient unto itself." The ideal is that most answers to most questions is not only within the documentation, but can be found by the average reader. It is clear, concise, complete, and comprehensible. It has enough information for the new reader, yet can be read as a reference by the old hand. And it is both current and versioned. Whenever a change happens in the code, it is reflected in the documentation.

Before you say "Impossible!", look at your favorite opensource products. The really good ones - you know which ones I'm talking about. These are the tools, frameworks, and modules that you can understand exactly what it will (and won't!) do within 15 minutes of easy reading. They have examples. Tutorials. References. The number of FAQs is very small.

Most importantly, the number of times you have to google for something or ask in an IRC channel or mailing list? Zero. Zilch. None.

Good documentation is also not always written. The very best form of documentation is executable. This guarantees that the documentation will always be current. It has to be, otherwise the project doesn't work. It's also documentation that everyone likes - it's code and it saves everyone time. Executable documentation includes:

  • Tests (unit-tests, integration tests, customer tests, etc)
    • Cucumber is ideal as executable documentation.
  • Deployment and environment management
  • Build tools (make, Ant, Maven, Grunt, Vagrant)

I'm going to be writing posts for each of these in the future.

Sunday, July 14, 2013

When is something done?

I have five kids. They're good kids, but they're kids. So, when they do something, some part always gets forgotten (usually cleaning up). A perfect example is my oldest, just turning 18. He loves steak. He loves eating steak and he loves cooking steak. He's quite good at both skills, too. If you ask him about steak, he can go on and on about the different cuts, different techniques, and even different seasonings. (Apparently, a rub and a marinade are never to be done together. I'll never make that mistake again!)

And, like many older teenagers, his sleep habits on weekends are not quite . . . habitual. A Saturday 3am steak craving has been known to happen. To his credit, I've never woken up to a fire alarm or any other emergency. But, I always know when he's had a craving. The kitchen is not quite as my wife and I left it the night before. The steak has been put away (in his stomach), but nothing else has.


As far as he is concerned, the process of cooking a steak is finished when the goal of eating a steak has been achieved. Every step to the right of the goal is unnecessary. Which is obviously not true. Processes have a lifecycle of their own, regardless of the goal(s) they are supporting. If he was cooking a steak for his significant other (which he has done), the process remains the same, even if the goal is different.

Every time I get on him about cleaning up after himself, he gives me this look. Every parent knows exactly what look I'm talking about. It's the "I'll do it because you'll punish me if I don't, but you haven't convinced me of why I should care" look. He's a hobbyist.

Compare this to what the chefs do on Hell's Kitchen. These cooks are not just professionals, they're consummate professionals. After a long day of challenges and a night of creating 5-star food while being yelled at in public, they are still cleaning their stations and washing the pots and pans. It doesn't matter how rough it got between them - at the end of the day (literally!), they work together to finish the process of preparing 5-star cuisine. The end of that process is to prepare for the next time the process is execute.

This separation of process from goal is equally true in every facet of our lives, and especially true in the development of software. The goal is obvious - a functioning application that our customers can use. The shortest path to that goal is:

  • Make production server (if necessary, otherwise skip to next step)
  • Edit code on production server
  • service apache restart (or however you source in the edited code)
  • Send email announcing new feature
This is called DIP, or "Developing In Production". It's what 99% of the world population thinks we do, including many stakeholders, most business analysts, and all users. Oh, and that one developer in the back who only works on ancient ASP4 or PHP3 apps. And, to a (very small) degree, this process works. The application does tend to work, to a large degree, most of the time. And, for hobbyists or businesses which can tolerate large amounts of downtime, this minimalist process could serve quite nicely.

For the rest of us . . . not so much.

DIP, as a process, leaves much (well, everything) to be desired. It is this-goal focused. We need to get this feature out. We need to fix this bug. The implicit understanding behind it is "And nothing else matters." There is no preparation made for the next execution of the process. The plate and utensils are just left on the table with the grease congealing. The pan is left on the cooling burner, a hard crust forming. Nobody puts away the seasonings and the sauce curdles overnight.

Tests aren't written, so nobody knows (or remembers) exactly how something is supposed to work. The environment setup isn't documented (or automated), so every server build is a one-off that takes days and is still never the same. There is no source control, so nobody knows why something was done. The ticketing system (if there is one) loses information, so nobody even knows what was done or when or for what purpose.

If the thing you did was the last time anyone would ever have to deal with that application, then none of this matters. But, what's the chance of that? In my 20 years of working in IT, that has never happened to me or anyone I have ever met or even in any of the stories they have told.

In short, every change you ever make to a system will have another change after it. The process of making that change isn't complete until you have cleaned up after yourself.

Wednesday, July 10, 2013

Keeping new things clean in a dirty world

Most software projects are unhygenic. Organization is poor, some scripts don't work, and it's likely that many or all of the test suite doesn't work, either. Very little is automated and nothing is documented. It's the software equivalent of living in a fraternity house. You have to carefully walk through the mess in the living room to get from the filthy kitchen to your disaster of a room.

No-one sets out to live in a dump, just like no-one sets out to work in a disaster of a software project. Little things add up, like that pizza sauce stain or where someone fell drunk against the wall. People tried to clean them up, but it was never the same. And they add up, day after day, until you can't see the walls for the empty beer bottles. And you're not sure what you can do to make a difference, short of condemning the whole thing and starting from scratch.

We can see exactly what is wrong with a fraternity. No-one in the fraternity is focused on cleanliness - on hygiene. If a dish falls on the floor, no-one makes it a priority to pick it up. If it is picked up, then it's not put away in the cupboard. If it's actually put away, then it was put away haphazardly, without stacking nicely. And if, by some miracle, the dishes were actually stacked, it's only done the once. The next time, the dishes will be left to mold in a corner on the table. No-one cares.

There are fraternities where everything is kept neat and tidy. I lived in one (well, mostly neat and tidy). But, it requires twenty-somethings to do something they tend to not do - focus on their environment over themselves. It requires someone who takes charge and chivies everyone else to work at it. To make the big push to clean everything up. To spend their Sunday cleaning instead of sleeping off last night's party and cramming for tomorrow's exam. So that the house can start the week of neat and tidy.

Even that doesn't work. I can see some of you rolling your eyes in memory. Everything does get clean, but does it stay clean? Of course not. Because there's nothing in place to make sure that the work done remains done. There is no ongoing maintenance.

The root is that the house wasn't what needed cleaned. Yes, the beer bottles and pizza boxes needed to be thrown out. Yes, the dirty dishes needed washed and put away in nice stacks. But, the real problem was the mindset that let everything slide. The real fix is in retraining everyone in the house to feel uncomfortable when something is out of place, to itch under the skin when a dirty dish just lays on the end table. If nothing else is accomplished that Sunday afternoon except to fix that mindset, then the house will clean itself, as a matter of course.

So, when you add that configuration file to support the new module, don't just throw it anywhere like everyone else seems to have done. Put it where it should go. If you have to write a script to make sure it's shoehorned into the dirty way of doing things, then so be it. At least you have kept this small part of the house from being dirty. And, you have paved the way for someone else to clean up a piece that was dirty. You have started to create the itch for hygiene.

Sunday, July 7, 2013

Signs your software process isn't working

The business has made it. The product has been well-received. New customers are coming in the door and partners are signing up. You're even profitable! But, something is rotten in Denmark.

Software releases feel slower. You haven't measured it, but you're pretty sure that the number of regressions is up. Emergency releases seem to be happening more often. New features aren't happening as quickly as they used to. Some features are failing now, which never happened before. And when new features do get out, it seems like everyone is exhausted. Some of the early employees are moving on. Some of the first customers are complaining. It's just not as sparkly.

The good news is that it's not in your mind. There are real issues that are causing all the pain. They can be identified, measured, and surfaced to the rest of the organization.

The other good news is that you can fix it. There are real and concrete things your groups can do to reduce the turmoil. Some of the problems can even turn into assets after some grooming.

But, that's about the extent of the good news.

The bad news is that there is no quick fix. Switching your technology stack isn't going to fix it. Hiring a rockstar developer or a great project manager isn't going to fix it. Adding "Agile" (or kanban or whatever) is only going to muddy the waters. And throwing more people at it is only going to exacerbate the problems.

Yes, problems. Because there isn't just one problem. There are multiple problems that contribute to this sense of unease and dread. Not everyone has the same set of problems, and some groups are unique snowflakes and have their own special brand of crazy. But most groups tend to run into some combination of the same sets of issues.
  • Measurement is haphazard.
  • Cross-checking is haphazard.
  • Responsibilities aren't clearly defined.
  • There is no assembly line.
  • All knowledge is internalized.
These issues tend to be carryovers from the attributes that made the company successful in the first place. One or two highly motivated people start a project. They know everything and communicate with each other constantly. Quality is high because it's small. Process is whatever gets something out the door. And, amazingly, everything scales. Going from two or three people to ten means 5x as much work is getting done. So, why did everything just fall over when going from ten people to fifty?

Fred Brooks, in The Mythical Man-Month, touches on one of the root issues. When dealing with three or four people, the number of 1-to-1 lines of communication is manageable (at six and ten, respectively). Everyone can get into one room and share all the knowledge about everything. The business is small enough that a single person can keep the entire thing in their head.

Going up to ten people increases the number of lines to 55. This is much larger, but still manageable because we have specialized. Everyone isn't expected to be able to do everything anymore, so some information can be limited to sharing with just some people. And one person can still keep everything in their head, even if they do less and less of the daily work.

The more astute reader is starting to see where the problem starts to form. Everyone has worked at a place where it's nearly impossible to find out what you need to know in order to do your job. Information is siloed. It is rare that someone is actively hiding the information. (If that does happen, the solution is very simple, if emotionally and politically difficult to do.) More often, the person who has the information you need doesn't know you need it. Sometimes, you don't know you need it, until you can't move forward without it. And one person can no longer keep the entire business in their head.

At the root, there is a limit to the amount of information and cross-references a single person can handle. There is also the limit of how much work a single person can accomplish. We organize groups and companies to exceed those limits. One person doesn't have to communicate with anyone. A few people, usually less than a dozen, can communicate together clearly together. One person can manage what a dozen or so people do. Beyond that, we need systems and processes in place to formalize how information and knowledge are organized, prioritized, and transferred between groups and people.

Alongside the problems communicating information dealing with today's tasks, we have to communicate yesterday's accomplishments and tomorrow's plans. New coworkers have to be trained and environments set up. Those who are leaving need to have their information retrieved. Everyone needs to know what will be coming. The number of information streams rapidly becomes unwieldy without explicit boundaries and organization around them.

In short, the corporate organism needs to learn how to think.