Wednesday, July 29, 2015

Why use a DSL?

  1. Why use a DSL?
  2. Why create your own DSL?
  3. What makes a good DSL?
  4. Creating your own DSL - Parsing
  5. Creating your own DSL - Parsing (with Ruby)
  6. Creating the Packager DSL - Initial steps
  7. Creating the Packager DSL - First feature
Languages exist to communicate ideas. Most of us are familiar with generic human languages like English, Swahili, Japanese - even created languages like Esperanto and Lojban. These are able to express any idea humans can possibly come up with in a way other humans can understand. In programming terms, all human languages are Turing-complete.

Sometimes, though, some ideas are easier to express in certain languages vs. others. Supposedly, Eskimos have 50+ words for snow and the Sami have nearly 1000 words for reindeer. Given how important those topics are in those cultures, that would a lot of sense. People working together in those domains would be able to communicate more quickly because the same effort communicates more concepts. For example, "busat" (in Sami) translates to "male reindeer with a single, very large testicle". I have no idea how often this occurs, but that's probably a unique identifier in most reindeer herds.

We see this as well in programming languages. Programmers were writing object-oriented programs in ANSI C for years before Bjarne Stroustrup created C++. It's easier to write OO programs in C++ than in C. In C, you have to be extremely disciplined to make sure that you're adhering to public vs. private interfaces, that you invoke the "methods" properly (passing the invocant as the first parameter, passing the correct invocant to the right method, etc), and lots of other bookkeeping. It's just exhausting to keep track of all of that, especially across a large codebase. In C++, the language not only reduces the bookkeeping you have to do, but it also reduces the number of characters you have to read (and type, but read is more important).

Like Sami, there are programming languages that trade expressibility in one domain for another. I suspect most of the words for computing and the internet in Sami are borrowed from English (as they are in many other languages). All the usable words are already taken for other purposes. "Scripting" languages, like Ruby and Python and Javascript, make a similar set of tradeoffs. They give up the ability to write programs that execute extremely quickly (like programs written in C would do) in order to make it easy for humans to write the programs. Programs written in these languages are often much shorter (10-100x shorter) than the equivalent in C or Java. They are much more expressive when it comes to specific domains of computing. No-one would write an operating system in Perl, but these languages excel at manipulating text and talking to databases at faster-than-human-reaction-time speeds.

Expressive terseness (aka, "busat") is really important in programming because the hardest part of doing development is working within existing code. Depending on whose percentages you want to use, the maintenance phase of a project is anywhere from 60%-90% of the time and cost of that project. Maintenance, first and foremost, is an effort in reading comprehension. You can't fix a bug unless you understand the code where the bug lives, what code is connected to it, and how the various execution paths wend through that code (and the code around it). This is a lot easier to do when you're dealing with 50 lines of code than 500 (assuming equal cyclomatic complexities). The business-level concepts are easier to see and there are fewer places for bugs to hide.

SQL and CSS are good examples of DSLs that take complex domains (set manipulation and style metadata, respectively) and allow the developer to express exactly and minimally what they are trying to accomplish. Querying sets - writing joins, projections, and all the other logic that SQL provides - is extremely complicated. Doing this in any standard programming language can run to hundreds and thousands of lines with lots of cyclomatic complexity. Plenty of places for bugs to live. A DSL makes it easier to express the desire to do these three set conjunctions (using these indices for lookup), then project these 5 data points (with these manipulations), ordered in this way.

DSLs also make it much easier for people working in different languages (or even business domains) to collaborate and learn from each other within the domain. There are hundreds of forums, discussion boards, and blogs on SQL or CSS tips, tricks, and improvements. These tips work regardless of what programming language you use.