Are DSLs the catalyst for making AI-based programming practical?


When I introduce the notion of DSLs to people I often hear the objection that they are too complicated to be used by non-programmers -- the intended target audience -- and that they'd prefer to use prose to tell the computer what it should do. I am not totally convinced of the idea because even if we could just write requirements in prose and then have the computer "run" them, we'd still have to be able to express ourselves precisely. With prose. And as we all know from human-to-humam conversations as well as prose requirements documents, that's not so easy. In fact, one big advantage of DSLs is that subject matter experts are able to express themselves precisely and not just write prose that is then misunderstood by developers when they code it up. But I digress.


In any case, ChatGPT has certainly demonstrated the ability to come up with reasonably correct programs based on prompts written in prose. This works well for simple problems where ChatGPT can figure out what we mean and we are able to write complete and correct "requirements" in prose. Plus, since ChatGPT is stateful, we can supplement additional requirements when we see that the generated code is not exactly what we expect. We can even run it to find out if it does what we intended.


Will this work for non-programmers? Well yes and no. They might be able to express tax calculations or drug trial protocol specs or milling machine control as prose, but this idea of then "looking at the code" to see if is correct won't fly. Source code is too technical, at the wrong level of abstraction, inaccessible to non-programmers.


That is if we generate programming language source code. Enter DSLs, again. Andreas Mülder recently ran a very cool experiement (and then wrote about it on Linkedin) in which he taught ChatGPT a DSL through examples. And then let a non-programmer -- his wife in this case -- write prose to have ChatGPT "generate code". But the generated code was not source code in Java or whatever, it was a program expressed in the DSL he previously taught ChatGPT. His post prompted me to write this little article. Here are a few observations about his experiment and where this could go.


First of all, it is pretty cool that ChatGPT is able to learn a new language after maybe an hour of training. I was under the impression that ChatGPT can only write about the stuff they trained it on originally; but apparently, users can train ChatGPT by example. That is pretty neat.


Second, generating DSL programs has a big advantage over generating source code because it is more plausible for the (non-programmer) user to look at the generated DSL code and see if is correct, at least if the DSL is closely aligned with the user's domain and uses a reasonable syntax. These are of course both criteria for any good DSL. Reading something and checking it for at least superficial correctness ("are all the things I've written about at least mentioned in the code?") is much easier for non-programmers than actually writing the DSL program. More generally, this could also be a very good training aid for DSL users: initially, when they have no experience with the DSL, they can just ask the AI to create examples.


Third, and most importantly, I suspect using a DSL instead of source code as the target of AI-based-prose-programming is going to work better than generating source code directly. Why is this? ChatGPT is a language model. It works purely based on syntactic patterns and the statistics behind that. A DSL is basically a reification of program semantics into syntactic patterns. A DSL removes everything from the source code that is non-essential to what the program should do. It is the "purest" formal representation of some behavior. There is also much less syntactic variability in a DSL than in programming languages. From the perspective of a language model there is much less stuff to go wrong -- there's less accidental syntactic complexity in the generated program. So I suspect that a language model can generate larger and more complex DSL programs (compared to GPL code).


It is also much easier to automatically check the generated program for correctness using DSL-specific structure checkers, type checkers or analysers and simulators. Maybe we can even pipe well-written error messages back into ChatGPT for it to then correct the program, just as a user might do after looking at it and identifying a problem.


So here is an approach to prose programming for non-programmers that might actually be successful:


·      analyse the domain of what you want to generate code about

·      factor this into a textual DSL (plus generator or interpreter, of course)

·      train an AI language model on this language

·      let users program in prose

·      users have a fighting chance to look at the code and give feedback

·      users can interactively run the code and see if it works correctly, feeding back problems to ChatGPT to fix

·      and we can even feed back error messages from type checkers and the like.


Here are a few caveats why this might not work, or at least might not work better than if ChatGPT generated programming language source code.


First, there will likely be much fewer training examples for ChatGPT to learn from compared to scraping the whole internet for Java code. I am ultimately not sure about the tradeoff between a simpler target language with less opportunities to make mistakes and the much larger body of examples from which to train.


The second issue is how it scales with complexity, in terms of the DSL itself, the size and intricateness of the generate program and the user's ability to describe real-world problems as prose. Interactive stepwise correction through user feedback probably helps, but I am still unsure about this. Maybe, once the user has learned the DSL by looking and correcting AI generated code, they go back to directly writing DSL programs?


The third one is the well-known problem with ChatGPT: making shit up and then writing about it confidently :-) But again, considering that the generated DSL program can be more easily reviewed by the user, must comply with a formal grammar, pass the type checker and so on might be a good way of constraining the output, maybe even automatically.


EDIT: There's another thought that I think is worth mentioning; the comment by Mike Vogel below triggered this. It kinda gets back to the beginning where I express my skepticism about whether prose programming is a worthwhile goal. The thing is: since you're "programming" in prose, the there's no IDE that can give you code completion. You're on your own. And you probably have to be quite precise and consistent to deterministically make the AI understand it correctly. I can see users creating cheat sheets of how you have to phrase things to get particular outcomes ... which remind me of DSLs in the first place :-)


Summing up, and despite my edit above, I actually think this could work. At the very least, we should try it out systematically. IMHO this is a very interesting field of research. Anybody want to run a research project on this? Or has somebody already started one? Let me know what you think!