What makes a good Business DSL

Domain-oriented abstractions are only the beginning

As I have described in a previous post, I am convinced that, for true business agility, the domain experts in an organisation have to contribute directly to the development of software. Having them dump requirements into unstructured/informal documents, often Word, and then making developers understand every detail and implement them correctly, is way to slow and error prone. The solution I am advocating (and I will provide proof that this actually works in future posts), is to create DSLs for use by domain experts that allow them to directly model/specify their contribution to the system. So what makes a good business DSL?

Abstractions aligned with the Domain

The absolut minimum is to make sure the abstractions your DSL provides are aligned with the domain. This aspect is fundamental; it is at the core of the definition of a DSL. While it is not always easy to find out what these abstractions are, the point that they need to be aligned with the domain doesn’t need any more repeating, so I’ll leave it at that.

The importance of Notation

Users of a language aren’t necessarily conscious of the abstractions that underlie the language, at least not initially, when they decide whether they “like” the language. However, what they always notice is the notation; this is what they primarily interact with. So getting the notation(s) right is absolutely crucial for getting a language adopted.

Two styles of notations are commonly used in software engineering: textual, which we all know from programming, and graphical, which we know from “modelling tools”. Within those two notations, the degrees of freedom are limited in practice. In the text world, you can adapt the keywords and the ordering of words. In the graphical world, which, in practice, often means box-and-line diagrams, you can vary the shape of boxes, the decorations of lines, and of course color and stuff.

But in many business domains, there is way more notational variability. Tables play a huge role, not just for data collection, but also for expressing multi-criteria decisions. Mathematical symbols are very powerful and can make a huge difference in how complex calculations appear. Prose-style text, where you use (fragments of) natural language sentences, makes learning a language much easier for many non-programmers. And of course, many domains benefit from mixing some of those: using math in text, embedding prose in diagrams, embedding text in tables. You might even want to provide several notations for the same abstractions where one is easier to learn and the other one is more productive (because it scales better in terms of model size or complexity).

So, as part of your domain analysis, make sure you explore different notation options, select a tool that can support all of them, and look out for existing notations in the domain which you can adopt.

A great IDE

The next ingredient for any language, business DSL or not, is a great IDE. These days, people just don’t care about your language if it doesn’t have a good IDE. A good IDE supports code completion and error reporting, but also more advanced features such as test execution, refactorings and debugging. You expect this from your Java (or whatever) IDE, and your DSL users expect that, too.

One of the major contributions of language workbenches, compared to previous language tooling, is that, as part of developing a language, you more or less automatically also get an IDE. Language development implies IDE development.

I will discuss error reporting, debugging and (test) execution below, so let me briefly expand on refactoring here. Refactoring is defined as changing the structure of a program, while retaining its semantics. Usually you perform a refactoring in order to improve some quality attribute, typically modularity, extensibility or understandability. Refactorings are often motivated by the need to evolve the software in the sense that you have to realise additional requirements or define a variant of the program (suggesting to “factor out” the commonalities). Quite obviously, this also applies to models created with DSLs: insurance contracts, medical algorithms or communication protocol definitions evolve too, often quite frequently and significantly. So the IDE must support refactorings that are aligned with the those kinds of changes that occur regularly. And by the way: your language must support the corresponding abstractions, but that’s for another post.

There is also an interaction between IDE support and the syntax. Remember SQL? In SQL, you write

SELECT <fields> FROM <table> WHERE ...

This is problematic int terms of IDE support, because, when you enter the fields, you have not yet specified from which table; code completion for those fields (based on the table) is not possible. This is the reason why more modern query languages have reversed the syntax:

from <table> SELECT <fields> WHERE ...

Keep this in mind when you design a language.

Analyses and Error Reporting

A very much under-appreciated criterion for a good DSL, its alignment with the domain and its acceptance by users is the quality of error messages (this is also true for programming languages, there is interesting research on this). This is all the more true the more expressive or sophisticated (trying to avoid the negatively connotated word “complexity”) your language is.

We all know the situation where some not-so-computer-savvy person complains that something “doesn’t work” and asks us for help. We then read the error message carefully, and figure out immediately what needs to be done. We also know the situation where, after reading error message, we are none the wiser and still have no clue how to fix it. It’s clear that in the latter case, we have to improve the message. But what can we learn from the former? Maybe that users have given up on reading the message, because error messages usually suck.

What can we do? Well, one thing is to just treat error messages as a core part of the user experience. Design them carefully! Maybe even engage in some low-key usability engineering and ask users whether they are helpful. There is also a tooling issue: often an error message can only be attached to oneprogram element even though the error is about the relation of severalprogram elements. So the ability to attach an error to multiple locations (or clickably refer to multiple elements from the message) helps as well.

However, the elephant in the room is the precision of the underlying analyses: you can only report problems that are discovered and pinpointed precisely by the underlying analyses. More precise analyses, in turn, imply more implementation effort and more runtime cost. In general, for a precise analysis and a good error message, you have to recover (some aspects of) the domain semantics from its encoding in the program.

When you design a DSL, you can influence both aspects. First, ideally design your DSL in a way that prevents (some) errors from being made, or engineer your IDE to prevent them automatically. XML is an interesting example here: your IDE can automatically ensure that every opening tag has a corresponding closing tag. However, remember that XML documents are trees; the need for closing tags is an artifact of the concrete syntax. Maybe you can change the language to directly use a tree syntax that avoids the need for closing tags in the first place. This is essentially what JSON does.

However, as the expressiveness of your DSL grows, the degree of freedom allowed by the DSL will be big enough for your users to shoot themselves into their foot in non-trivial ways. In this case, design the language in way that makes the analyses simpler. The good thing is that a language design that simplifies analyses is also a language design that is generally more closely aligned with the domain. For example, detecting a dead state in a state machine is much easier if it is encoded as a first-class state machine as opposed to when it is encoded as a switch statement in C. The synergies between analysis and language design are numerous; I will discuss this more in a later post, but you can also check out this booklet.

Visualisation and Reporting

Error messages are usually local, associated with one (or maybe a few) program elements. However, some problems relate to the overall structure of the program. This leads to two problems. First, finding these problems can be very expensive computationally; it might be infeasible to perform the analysis in realtime, as the user edits the program. Global (qualified) name uniqueness is the obvious example. Second, they might be based on heuristics, i.e., there might not be a clear distinction between right and wrong, but you can identify a spectrum of “badness”.

The same solution solves both problems: reporting (we treat a visualisation essentially as a report that uses a graphical syntax). A report is run on demand; it is ok for it to take a little while. It analyses (usually global) properties of a model and then shows the results in a way that lets the user detect patterns or trends.

For example, the sizes or color of bubbles in a visualisation can represent a characteristic that might be problematic (e.g., the LOC in a function). In a textual report, you can sort the result entries according to some metric; for example, a profiler can show the slowest parts of the program at the top of the list. Or you can highlight the differences between subsequent runs of the report to make users aware of what changed (maybe for the worse).

When you design the feedback system for a language, it is useful to explicitly distinguish between (more or less local) error reporting on the one hand, and (typically global and expensive) reporting and visualisation. Running the latter on the CI server, in regular intervals, is a good idea.

Testing DSL models

Test-driven, or at least test-supported development has taken hold in the developer community. And for good reason: tests assure that your code works correctly, ensure that it does the correct thing, and provides a safety-net during evolution and refactoring. As a consequence, testing frameworks and tools, integrated with the IDE, are ubiquitous. So you want that for DSLs, too! Your telco pricing expert also wants to make sure that he doesn’t lose the company money because his pricing algorithm is buggy!

However, for testing to be feasible, the abstractions and notations used for testing must be aligned with the core of your DSL. For example, to test an old age insurance policy, users should be able to describe the employment history of a customer, using terms relevant to the domain, and then run what is essentially an integration test on this customer: what will his monthly pension be once he retires, and how will it change over time?

A simple state machine DSL (left) and a DSL for testing whether the transitions lead to the expected target states (right).

The degree to which you rely on (the equivalents of) units tests vs. integration tests depends on the domain and needs to be figured out during domain analysis. In any case, make sure you consider testing, and testability, of your DSL programs an inherent part of language design.

To increase the acceptance of testing, make sure that running tests is painless for DSL users. In particular, if the execution of your DSL relies on code generation and the subsequent compilation, packaging and deployment chain, the turnaround time for tests might become too long. Make sure you provide a way for executing the tests with essentially zero overhead. One way of achieving this is to also provide an in-IDE interpreter, i.e., a way of executing programs without any code generation. Since interpreters usually do not have to reproduce non-functional concerns faithfully (i.e., they can be slow :-)), implementing one for your language isn’t that much effort, especially when using suitable frameworks.

Simulation and Debugging

The interpreter has an additional benefit: it also lets you define a simulator, with which users can “play” with the models. In my experience, this ability is often perceived as the main benefit of a DSL and the required formalisation of a domain in the first place.

A simulator is closely related to a debugger for integration scenarios. Both rely on an interpreter (or an instrumented runtime). The difference is that a debugger illustrates, explains (and allows to control) a previously written program. A simulator lets users interactively play with the program, enter/change data or trigger events. The difference can be blurry, but it becomes clearer when behavior changes over time and state change becomes a factor.

Building debuggers can be a lot of work, partially because language workbenches don’t support this to the degree they support other aspects of language development. Hopefully this changes in the future. However, an interpreter framework that is designed to be “debuggable” can make a big difference. More on simulation and debugging in a future post.

Wrap up

Nothing new here for the software engineer, I guess. All of the things I am suggesting here are more or less similar to what one does in programming languages. My goal with this post was to point out why and how these things are relevant in DSLs (maybe especially so, as in error messages) and how the ability to design your own language may simplify the process. Let me know what you think!