Wednesday, July 29, 2015

Why use a DSL?

  1. Why use a DSL?
  2. Why create your own DSL?
  3. What makes a good DSL?
  4. Creating your own DSL - Parsing
  5. Creating your own DSL - Parsing (with Ruby)
  6. Creating the Packager DSL - Initial steps
  7. Creating the Packager DSL - First feature
Languages exist to communicate ideas. Most of us are familiar with generic human languages like English, Swahili, Japanese - even created languages like Esperanto and Lojban. These are able to express any idea humans can possibly come up with in a way other humans can understand. In programming terms, all human languages are Turing-complete.

Sometimes, though, some ideas are easier to express in certain languages vs. others. Supposedly, Eskimos have 50+ words for snow and the Sami have nearly 1000 words for reindeer. Given how important those topics are in those cultures, that would a lot of sense. People working together in those domains would be able to communicate more quickly because the same effort communicates more concepts. For example, "busat" (in Sami) translates to "male reindeer with a single, very large testicle". I have no idea how often this occurs, but that's probably a unique identifier in most reindeer herds.

We see this as well in programming languages. Programmers were writing object-oriented programs in ANSI C for years before Bjarne Stroustrup created C++. It's easier to write OO programs in C++ than in C. In C, you have to be extremely disciplined to make sure that you're adhering to public vs. private interfaces, that you invoke the "methods" properly (passing the invocant as the first parameter, passing the correct invocant to the right method, etc), and lots of other bookkeeping. It's just exhausting to keep track of all of that, especially across a large codebase. In C++, the language not only reduces the bookkeeping you have to do, but it also reduces the number of characters you have to read (and type, but read is more important).

Like Sami, there are programming languages that trade expressibility in one domain for another. I suspect most of the words for computing and the internet in Sami are borrowed from English (as they are in many other languages). All the usable words are already taken for other purposes. "Scripting" languages, like Ruby and Python and Javascript, make a similar set of tradeoffs. They give up the ability to write programs that execute extremely quickly (like programs written in C would do) in order to make it easy for humans to write the programs. Programs written in these languages are often much shorter (10-100x shorter) than the equivalent in C or Java. They are much more expressive when it comes to specific domains of computing. No-one would write an operating system in Perl, but these languages excel at manipulating text and talking to databases at faster-than-human-reaction-time speeds.

Expressive terseness (aka, "busat") is really important in programming because the hardest part of doing development is working within existing code. Depending on whose percentages you want to use, the maintenance phase of a project is anywhere from 60%-90% of the time and cost of that project. Maintenance, first and foremost, is an effort in reading comprehension. You can't fix a bug unless you understand the code where the bug lives, what code is connected to it, and how the various execution paths wend through that code (and the code around it). This is a lot easier to do when you're dealing with 50 lines of code than 500 (assuming equal cyclomatic complexities). The business-level concepts are easier to see and there are fewer places for bugs to hide.

SQL and CSS are good examples of DSLs that take complex domains (set manipulation and style metadata, respectively) and allow the developer to express exactly and minimally what they are trying to accomplish. Querying sets - writing joins, projections, and all the other logic that SQL provides - is extremely complicated. Doing this in any standard programming language can run to hundreds and thousands of lines with lots of cyclomatic complexity. Plenty of places for bugs to live. A DSL makes it easier to express the desire to do these three set conjunctions (using these indices for lookup), then project these 5 data points (with these manipulations), ordered in this way.

DSLs also make it much easier for people working in different languages (or even business domains) to collaborate and learn from each other within the domain. There are hundreds of forums, discussion boards, and blogs on SQL or CSS tips, tricks, and improvements. These tips work regardless of what programming language you use.

Sunday, July 5, 2015

What is production?

In What is an application?, I propose a definition for "application" as "A set of capabilities provided to a user to enable them to satisfy their desires." But, there are many other terms that are undefined. Over the next several posts, I'll define each one. The most important (after application) is "production", so I'll start there.

Let's do this with a thought experiment. Pretend that your application only has a production, however you define it. This is where your users come and where you make your money (assuming you do). There is only the one instance and, because there's only one, no-one needs a name for it. It's just "the application" - there's nothing to confuse it with. Anytime you need to make a change, you go make it in "the application" and your users immediately see it. Sounds good, right?

Of course, no-one works like this, and for good reason. Some changes are small enough that they can be made directly where your users are interacting, but the vast majority of them are not. Most changes require several hours (if not days) of work, often collaborating between multiple people and are built in stages you don't want your users to see.

So, we distinguish between where users go for the "live" application and where developers work to make changes. Stand up a clone of production, except it doesn't have live users going to it, and call it "development". Developers can make changes to it knowing they are safe from affecting the business. Production remains the place where users satisfy their desires.

So far, it seems pretty clear what production vs. development is. Production is where users go (but not developers) and development is where developers go (but not users). And, from a developer's perspective, that would be enough.

There are more stakeholders in an application than just users and developers. At minimum, you have the business owners. They define what the application is meant to do - what desires the user is attempting to satisfy and what capabilities the user will have to do so. If communication was perfect, then the business owners could tell the developers "Do this" and be assured that the necessary changes would happen exactly as they intended. This also assumes developers will never make mistakes. In real life, neither statement is remotely true. Review of work requested is a fact of life. Business owners need to assure and control the quality of what they pay for. Hence, the name "QA" (or, sometimes, "QC", for quality control).

Some organizations choose to have such review occur within the development instance. This makes a lot of sense for smaller, newer, and/or slower projects who either cannot or do not need the ongoing cost of a separate instance. In most other projects, the shortcomings of this plan become obvious very quickly. Ongoing development makes it difficult to determine if a failure is because of the work under review or the unstable nature of the development instance. Business owners are uncertain what would happen to the production instance if they approve the work done for a request. Will the change for that request work properly when users try to exercise the new capability? Were the failures in that change or in something else?

We have development, QA/QC, and production. It's pretty obvious what "production" is - it's where the users are and it has to be stable with a managed and defined process for change.

So, where does a demonstration/demo or training environment fit? It's not the production, but it needs to be stable for a smaller set of users and a limited window of time. This is where a lot of organizations stumble, attempting to tie the demo or training instance to either the existing production (slow-changing) or QA (quick-changing) environments. Except, the business needs usually require a middle-ground between the two.

Which leads to the better definition of "production". Or, rather, splitting out what constitutes "production" into different knobs we can apply to other environments.

The first knob is change management. Different environments will change within different change control regimens. This knob is based on who decides when the environment changes. Development changes whenever a developer edits a file. QA changes whenever a developer finishes some work. Production, however, changes whenever the business feels a feature is both ready for use and appropriate for release. A demo or training environment will be similarly managed by the business, not the development teams.

The second knob is the stringency of review. We've already seen how changes to production will usually go through a QA environment first before a user will see it in production. Demo and training environments also need similar review because users will be in these environments.

So, what's the difference between production, training, and demo? From a developer's perspective, often nothing. They're all strongly controlled environments with reviewed changes pushed when the business wants them.

All of this discussion leads to this:
  1. Production is where users live.
  2. Production is where change control is at its maximum (whatever that is).
  3. Production is where data robustness is at its maximum. (To be discussed in a later post.)
  4. Production is where availability is at its maximum. (To be discussed in a later post.)
  5. Multiple environments can share aspects of Production and should be treated as such in those axes.