Hello, and welcome!
This post discusses the use of AI in software engineering. This is a broad subject, and I will focus on the big picture for starters. In subsequent posts, we will dig into specifics.
Two distinct use cases are emerging in software development today. More could sprout over time, but for now, our hands are full with the two that already exist. The first use case is bottom-up, and it represents a typical POV of an engineer who views AI as a productivity tool. This approach assumes that the engineer stays in the driver's seat and that AI handles the tedious parts of the work, which a machine should do more quickly and accurately than a human. This way of thinking is adopted by Cursor, Claude Code plugins for various IDEs, and more. On the surface, this approach is hardly new - it is like using a calculator vs. doing math on a piece of paper.
In reality, the biggest difference is trust - while we totally trust a calculator, we can't say the same about today's AI. It will improve over time, but I doubt it will become 100% trustworthy anytime soon. Actually, establishing trust in the work of an inherently probabilistic system carries significant risks. We may eliminate risks entirely by sending AI output to a deterministic verification system that formally proves its correctness - but only a handful of academic research projects are moving in that direction, as far as I know. Alternatively, we have to balance risks vs. productivity gains - and we, humans, are poorly equipped to do so.
Let me give you an example. In 1981, the German Federal Highway Agency began investigating the efficiency of anti-lock braking systems in vehicles, which is now known as the Munich Taxi-Cab Experiment. Half of a 91-car Munich taxi fleet was fitted with ABS brakes, while the other half served as a control group. At first, the crash rate of the equipped cars dropped significantly, but after several months, it returned to the level of the control group, and even exceeded it. As it turns out, the drivers of the upgraded vehicles began driving more aggressively. This is the "risk compensation" phenomenon, where humans are subconsciously accustomed to established risk levels.
The reason I brought this up is that we will observe something similar as we continue using AI for software development. As AI becomes more trustworthy, human reviews get less attentive, and the risks accumulate until the entire process breaks apart.
There are a few ways around this that may significantly improve outcomes. I already mentioned one: a formal mathematical proof of AI work, executed by a fully deterministic system - but this approach has limitations because the formal proof requires a formal specification, which very few can produce. Another way is compartmentalization in software architecture - the way to divide software into components with clear boundaries that AI can't cross. AI generates code for a single compartment while "sees" only the interface specs of other components, thereby reducing the impact of its errors and limiting the scope of human review. The overarching architecture shouldn't just expect all components to function as designed; rather, it should be built without blindly trusting the work of individual components. This resembles the modern zero-trust security approach, but extends it to all aspects of the software. The other danger of using AI in coding is entirely different. As engineers rely more on AI for coding, they may lose their coding skills, eventually reducing the depth and reliability of their code reviews. That would lead to a drop in software quality. The question is not "if", but "when". While it may look far-fetched, it isn't, and was predicted long before AI became a thing. Check out the "Profession", a 1957 story by Isaac Asimov. I don't want to spoil it for you by summarizing here - but look it up; it is worth your time.
As promised earlier, let's switch to the top-down use case that I promised to cover. It is something far more powerful, but also far more complex to achieve. I am talking about making AI create entire systems, while the person in the driver's seat is tech-savvy but not necessarily an engineer with the time and skill to review the generated code. Think of a product manager or an entrepreneur. The players in this field are Lovable, Replit, and Base44, just to name a few. I am not going to talk about prototype development, which these platforms handle well at the moment, and will focus squarely on the holy grail: developing complex end-to-end applications from a prompt in English. This is way harder than one might think. And the limitation is not a lack of AI capabilities. The problem lies with the person controlling the AI. The task of specifying what we want to build with clarity, once we do it for real, is overwhelming. There are gazillions of things to consider, research, and write down. In the past, this was partially covered by product specifications written by product experts, developed through extensive collaboration between business stakeholders and engineers. Even then, many questions would surface later in the development process or even once the system went live.
Can we trust AI to ask all the meaningful questions in the process? In theory, yes, but I can hardly see it doing that without our assistance. Proven approaches are limited. A naive attempt to collect all the necessary rules for designing and coding large systems is doomed, both because of the extreme complexity, the obscene human costs, and because of context-window and attention limitations on the AI side. The path I personally subscribe to is having AI develop full systems in domain-specific languages, or DSLs, instead of general-purpose programming languages.
DSLs are specifically designed for a particular application domain, are concise, capture the original intent with clarity, guarantee that the resulting application will function without algorithmic bugs, and can enforce the desired security protections. A DSL can be declarative, making it much easier for both humans and machines to reason about. Examples of such languages include JetBrains MPS, Freon, and my own startup, Rishon. I will do my best to share a few examples in one of the subsequent posts.
Another, more traditional approach is to clearly delineate phases of application development. For instance, you can use AI to develop a product specification, which is then handed off to another AI solution for development. There could be more than two phases in the process - the more, the better. The trick to make this contraption work well is to formalize the handoff between phases. Just as an architect hands off a building blueprint to a builder, an application blueprint can serve as a way to transfer designs and intentions between development phases. Essentially, the DSLs mentioned earlier are nothing more than such blueprints, with clearly defined syntax and semantics.
Why do we need a formal blueprint?
This is because it is harder to misinterpret and because it makes the resulting work much easier to validate. Think of construction again. You can ensure full compliance with construction standards by inspecting blueprints well before physical work begins. It is much easier and cheaper to modify blueprints than to rebuild a house. Later, when construction is underway, the safeguards that verify compliance with the blueprints are trivial to implement.
Software development is no different. The same measures used in construction to address an uneducated workforce are fully applicable to AI writing code. In one of the earlier posts, I suggested we think of AI as a junior hire, and the approach I advocate here is just an extension of that idea. Blueprints, whether in diagram or DSL format, are easy to consume for development, review, and enforcement.
By the way, don't forget that developing an application with AI does not make that application AI-enabled; you merely use a different, hopefully cheaper and faster development process. To maximize the benefits from AI integration into your application, you will need a specialized AI-first software architecture. Today's AI wouldn't generate it for you. Don't blame it; as a new, emerging thing, it wasn't in the AI's training, and wouldn't be anytime soon. To learn more about it, watch the previous episode in the series.
My plan for the next post is to dig deeper into the bottom-up approach to AI engineering, specifically, why some companies achieve 100% coding with AI, while others become 20% slower despite the effort. This plan is subject to change - if a more interesting or urgent topic comes up.
Cheers!