Software development might be the most progressive industry when it comes to adopting AI agents.
Vibe-coding platforms and CLI coding agents already demonstrate a degree of autonomy that is difficult to achieve in most other domains. A coding agent can inspect an existing system, change files, run commands, execute tests, identify failures, correct its implementation, and continue working until it produces a functional result.
I could spend the rest of this article describing what Claude Code can do, but you would probably find that boring. Many developers already use it every day.
The more interesting question is:
Why can coding agents operate with such a high degree of autonomy while agents in legal, finance, healthcare, mortgages, workplace safety, and other regulated domains still hallucinate on critical outputs?
The answer is not only that coding models are better.
The answer is that software development has programming languages, compilers, test runners, type systems, and linters that continuously tell agents when they are wrong.
Even a very capable coding agent does not generate perfect software on its first attempt. It writes code, runs it, receives errors, and makes corrections.
The process usually looks like this:
Understand the goal↓Generate or modify code↓Run compiler and tests↓Receive structured errors↓Reason about the errors↓Fix the implementation↓Repeat until the system works
This feedback loop is one of the main reasons coding agents can operate with so much autonomy.
We do not need to trust the model to remember every rule of a programming language, every dependency in the project, or every test expectation. The surrounding development environment continuously checks its work.
The agent is allowed to make mistakes because those mistakes are converted into useful feedback.
A normal coding-agent user typically works with one session at a time. An even more extreme example is running multiple specialized coding agents concurrently.
This is the approach behind Baro, where several coding sessions can work in parallel on different parts of the same goal.
Instead of asking one agent to inspect an entire application and gradually write tests, the work can be divided between specialized sessions:
In one experiment, ten concurrent sessions generated 808 unit tests across an application in around one hour.
The important point is not the number of tests. It is the amount of autonomy we can safely give these agents.
Each session can explore the code, write tests, run them, inspect errors, and correct its output without requiring a human to approve every small decision.
That is possible because the domain provides immediate, machine-readable feedback.
Now imagine asking an agent to produce a legally compliant mortgage policy, financial agreement, workplace-safety procedure, or regulated healthcare document.
The agent can generate something that looks convincing. It can use the correct vocabulary, follow the expected document structure, and produce an answer that appears professional.
But appearance is not validation.
A document can sound completely correct while breaking an important domain rule.
That is the fundamental difference. When a coding agent writes something invalid, the compiler immediately complains. When it generates a mortgage policy that violates a regulatory threshold, or a safety assessment that misses a required control, there is often nothing to stop it.
We try to compensate with instructions like “always follow applicable regulations” or “never produce a policy that violates legal requirements.”
But instructions are suggestions, not hard constraints. The model may misunderstand a rule, forget it in a large context, apply an outdated interpretation, or fail to recognize that a specific case activates an exception.
If the output has legal, financial, or operational consequences, “the model was instructed not to make a mistake” is not a sufficient architecture.
This leads to a more interesting idea.
What if we gave enterprise agents the same kind of environment that makes coding agents successful?
Not a general-purpose programming language, but a Domain-Specific Language created for a particular business domain.
A Domain-Specific Language, or DSL, is a small structured language designed around the concepts and rules of one domain. Instead of allowing the agent to generate unrestricted text, we ask it to express critical decisions through this controlled language.
For example, a simplified workplace-safety DSL could look like this:
assessment WarehouseForklifthazard "forklift collides with pedestrian"risk highrequires measure "install pedestrian barriers"requires measure "enforce 5 km/h speed limit"requires training "forklift operator license"
This is only a simplified demonstration. But it shows the pattern.
The agent can still reason about the situation, explore possible solutions, and decide how to structure the result. However, the critical output must be expressed using concepts recognized by the DSL.
A compiler can then validate both the structure and the domain rules automatically.
Here is where it becomes powerful.
Suppose the agent generates a safety assessment where the only protection against a high-risk forklift hazard is giving workers a high-visibility vest. The domain compiler rejects it:
Error: ENGINEERING_CONTROL_REQUIREDHazard "forklift collides with pedestrian" is classified as high risk.High-risk hazards require at least one engineering control.PPE alone is not sufficient.
The error tells the agent exactly what domain rule was broken and why. A vest does not stop a forklift. The system knows that.
The error goes back into the agent’s reasoning loop. The agent reads it, understands what needs to change, adds a proper engineering control like pedestrian barriers, and runs the compiler again.
This changes the architecture completely.
We are no longer expecting the agent to never hallucinate. We assume that it can make mistakes, but we build an environment that detects critical mistakes before the output can proceed.
This is the same basic advantage coding agents already have.
In traditional software, compilers are used by developers. But with AI agents, the compiler can become part of the agent’s own reasoning process.
The loop looks like this:
Agent reasons about the request↓Agent generates a domain-specific output↓Compiler validates it against domain rules↓Compiler returns structured errors↓Agent corrects the output↓Compiler validates again↓Only a valid result can proceed
The agent is not forbidden from reasoning. It is not forced into a hardcoded workflow. It can still explore possible solutions and make decisions.
But every critical output must pass through the domain compiler before it can proceed.
The compiler is no longer only a developer tool. It becomes part of the agent’s environment.
A rigid enterprise system tries to predict and hardcode every possible path in advance. At that point, we are often building traditional workflow automation rather than an agentic system.
An unrestricted agent has the opposite problem. It can reason and adapt, but it can also produce invalid results where mistakes have serious consequences.
Domain-Specific Languages create a useful middle ground.
The agent remains free to:
But it cannot successfully pass an output that violates rules enforced by the compiler.
This is a much stronger boundary than adding another paragraph to a system prompt. The restriction exists outside the probabilistic reasoning of the model.
The agent can disagree with the compiler. It can try another solution. It can correct itself repeatedly. But it cannot persuade the compiler to accept an invalid result.
Coding agents currently demonstrate one of the highest practical levels of AI autonomy because software development already has a mature validation environment.
Programming languages tell agents when their syntax is invalid. Type systems identify incompatible values. Test suites expose broken behavior. Linters detect additional classes of problems.
Most regulated enterprise domains do not yet have an equivalent environment.
That is why I believe an important part of enterprise AI will emerge at the intersection of AI agents and Domain-Specific Languages.
A DSL gives the agent a formal way to express domain decisions. Its compiler converts violations into structured feedback. The agent uses that feedback to revise its output until it reaches a valid result.
With this approach, we do not reduce hallucinations by reducing the agent to a rigid workflow. We reduce them by giving the agent a better environment in which to reason.
That is how we can approach the same level of autonomy that coding agents already demonstrate: not by blindly trusting the model, but by surrounding it with domain tools that continuously tell it when it is wrong.
The highest level of agent autonomy does not come from removing every restriction.
It comes from building boundaries strong enough that the agent can operate freely inside them.
With tools and technology we already have, we can build much more valuable systems than most projects today. We can write software that is a pleasure to use and a pleasure to work on; software that doesn't box us in as it grows, but creates new opportunities and continues to add value for its owners.