Yet another attempt at explaining Domain Specific Languages

As you may know, I always try to find (more) convincing and (more) accessible ways of explaining why using DSLs is a good idea and why they should be used more. Here's another attempt. Looking forward to your thoughts!

And no, the airplane up there has nothing to do with anything. It's just a recent picture I took (and edited) that I happen to like :-)

What do the following three software systems have in common?

· A CNC mill (a computer controlled milling machine) that cuts parts out of raw metal.

· A CAD system that can represent complex three-dimensional shapes and analyse their properties

· A radio telescope used to observe the sky.

All three systems -- and there are many more like these -- allow users to make the system do their own particular things. A CNC machine can produce any 2.5-dimensional part, the CAD system lets users design any shape they want, the radio telescope lets astronomers configure observations at (more or less) any point in the sky in a wide range of wavelengths. All three are highly configurable systems.

To make this possible, these systems all provide a language to the user with which they can express the configuration, express what they want the system to do. For the CNC machine, that language consists of ways for specifying paths, tool selection, mill head RPM and so on. For the CAD system, the language has abstractions for three-dimensional geometry represented WYSIWYG-style as 3D shapes. And for the radio telescope the language is mostly GUI-oriented and consists of all kinds of concepts from astronomy, geometry, Newton's mechanics, wavelengths and data processing.

These systems have one more thing in common. The users who configure them are not software engineers. They are design engineers, draftsmen and astrophysicists. So the languages that make the systems so configurable are targeted at the subject matter experts in these domains. They can be seen as "special-purpose programming languages" for non-programmers.

Here are three more systems that have something in common:

· A system for calculating taxes based on Germany's tax regulations

· A system for executing clinical drug trials

· A tachograph for tracking a truck-driver's driving and break times

These three systems -- and there are many more like these -- contain a whole wealth of application logic: all of Germany's tax law, the precise rules for running a particular drug trial, or all the rules regaring drive times and mandatory breaks for all the countries of the EU.

They have one more thing in common: all this application logic -- the subject matter of the domain -- doesn't originate from the brains of programmers, but from people who are subject matter experts in those domains. Just like the engineer, draftsman or astronomer in the first set of examples.

How does this subject matter get into the software? Is it "configured" by these subject matter experts, just like the in the case of the first three systems? Typically not. Instead, the subject matter experts write requirements, imprecise prose that is then (mis-) interpreted by software developers and translated into code. As we all know, this is a slow and error-prone process. Why do the subject matter experts in these domains accept this tedious, manual process? Why don't we as software engineers use the same approach as for the first three systems and make these "configurable systems" by developing a suitable language for the subject matter experts to directly configure the subject matter for these respective domains? Why is this approach commonplace for some kinds of systems and a rarity for others?

For the CNC machine, the CAD system and the radio telescope it is totally obvious that the manufacturer of these systems can't code up all the parts, drawings or observations their users might want to execute. Imagine a design engineer describing the shape they want milled by the CNC in prose, sending the document as an email to the manufacturer of the machine, and, after four weeks, receiving a patch of the machine's firmware that is able to mill his specific part. Sounds like a joke, right?

So again, why do we not use subject-matter-configuration approach for the second set of systems? One thing I hear is that "non-programmers can't program." But we're not trying to make them program. Programming and software engineering is about mostly the technical aspects of software systems: robustness, scalability, performance, maintainability and the like. We're merely trying to make them use a suitable tool to be precise and complete in (formally) configuring, specifying, modeling the application logic. And that this is possible is demonstrated, among others, by CNC machines, CAD systems and radio telescopes. Yes, it takes some learning and it is a bit of a culture shift for the subject matter experts. But it's absolutely feasible for the vast majority of subject matter experts. In fact, once they have gotten use to the approach, most of the subject matter experts I have worked with feel empowered.

Another reason why this approach might not be used more is that in the case of the business systems the people who configure the systems are inside the company (as opposed to the users of CNC machines, CAD systems or the worldwide community of astrophysicists). So it is *possible* to use the traditional prose-based manual approach. But that doesn't mean that it is a good idea -- again, it is slow and error-prone.

Yet another difference is that in the case of internal systems -- those where the subject matter experts are employees of the company that builds the system -- the domains are usually smaller, so the effort of building the configuration languages and tooling might be harder to justify. This is true. But building a configuration language for tax calculations is also significantly less effort than building a full-blown CAD system. It is true though that the approach only works for organisations that work in a stable -- albeit evolving -- domain for a reasonably long time. But this is the case for most medium to large companies.

In fact, the tax, trial and tachograph systems are examples of where we have done exactly this: we have built configuration languages that allow the non-programmer subject matter experts to express the application logic for their domains, and the underlying platform then executes this logic when it calculates taxes, runs a trial or records a driver's activities. The overall process is now much faster, less error-prone and easier to evolve.

One last comment. When we talk about configuration languages, what does this exactly mean? How do these languages look like? In the very simplest case, they could be a XML, Json or YAML file with a predefined structure. But for realistic systems in interesting domains, you need something more sophisticated, something that scales better with the size and complexity of the domains and provides a better UX for the subject-matter experts. The languages usually have a nice, textual syntax and/or rely on tables and diagrams. Some also use GUI forms, and a few rely on something completely custom, like the CAD system. They have strong tool support that helps with creating correct configurations, plus various analysers and checkers that further help in this regard. Almost all come with means to express tests for configurations, some come with simulators that let users run a configuration in a safe environment. In the future, they might even be supported by ChatGTP & friends, as I demonstrate in this video. We are talking about domain-specific languages.

So, to summarize, what exactly does it mean to build a configurable system?

· Our goal is to let non-programmer subject-matter experts configure what the system,

· in a way that is precise and directly executable (and not just a prose description),

· by using a language and tool that we software engineers have built for them,

· after scoping and analysing the domain and the relevant subject matter.

Unfortunately, there aren't too systems that are built this way more like this. The manual process based on developers translating prose to code is still commonplace. This needs to change.