Building AI agents for enterprises is not just about connecting LLMs with tools.
Especially when you work in complex domains regulated by law.
The hardest question is not:
Can the agent do the task?
The real question is:
How much autonomy should this agent have?
If we give too little autonomy, we end up with a rigid system. Basically old-school automation with an LLM wrapper on top. The agent cannot really reason, adapt, or use the benefits of AI.
But if we give too much autonomy without properly restricting access to critical parts, we create a much bigger problem. The agent can hallucinate exactly where hallucination is not allowed.
And in regulated industries, that is not a small bug.
That can be a legal, financial, or operational problem.
This is why I think enterprise agents need something deeper than prompts, permissions, and surface-level guardrails.
They need domain boundaries built directly into the tools they use.
That is where we can have our cake and eat it too.
We can keep a high degree of agent autonomy, while making it physically impossible for the agent to break critical domain rules.
A common way to restrict agents is to write better system prompts.
Something like:
You must follow all legal rules.
You must never generate invalid contracts.
You must always respect compliance requirements.
This is useful, but it is not enough.
The problem is that prompts are instructions. They are not constraints.
The model can misunderstand them. The context can become too large. The user can ask something unexpected. The agent can reason in a wrong direction.
In a simple product, this might be acceptable.
In a highly regulated domain, it is not.
If the agent is generating something that has legal, financial, medical, or operational consequences, we cannot rely only on the hope that the model will remember every rule correctly.
The architecture itself needs to make invalid actions impossible.
Most agent systems treat tools as simple external capabilities.
For example:
But in regulated domains, this is often too generic.
The tool should not only execute an action.
The tool should contain domain restrictions.
A good enterprise tool is not just an API wrapper. It is a boundary around the domain.
It should know what is allowed, what is not allowed, and why something failed.
That is the difference between:
“Here is a tool the agent can call.”
And:
“Here is a domain-safe capability the agent can use autonomously.”
This distinction is extremely important.
Because when the tool contains domain rules, the agent can stay autonomous without becoming dangerous.
On one highly regulated project, we solved this problem by introducing a Domain-Specific Language.
A DSL is a small language created for one specific domain.
It is not a general-purpose programming language like TypeScript, Python, or Java. It is a language designed around the concepts, rules, and constraints of a specific business domain.
For example, imagine we are building an agent that helps with loan approval policies.
Instead of letting the agent directly generate random JSON, database records, or legal text, we can ask it to generate a small policy program in our DSL.
Something like this:
policy MortgageApprovalapplicant_age must_be_at_least 18loan_amount must_be_less_than 500000loan_term_years must_be_less_than_or_equal 30if applicant_country is "RS" thenrequired_documents include "national_id"required_documents include "income_statement"endif loan_amount greater_than 100000 thenmanual_review requiredend
This is not real legal advice or a real mortgage system. It is just a simplified example.
But the important part is the pattern.
The agent does not directly execute critical business logic.
The agent generates something inside a controlled language.
Then we compile or validate that language.
If the agent produces something invalid, the compiler rejects it.
This is where things become interesting.
In traditional software, compilers are usually used by developers.
But with AI agents, the compiler can become part of the reasoning loop.
The agent generates DSL code.
The compiler validates it.
If the code is valid, we can continue.
If the code is invalid, the compiler returns structured errors back to the agent.
The agent can then reason about the error and try again.
So instead of simply saying:
“The agent hallucinated.”
We create a system where the hallucination becomes feedback.
The agent tries something invalid, the tool rejects it, and the agent receives a precise explanation of what needs to be fixed.
For example, the agent might generate:
policy MortgageApprovalapplicant_age must_be_at_least 16loan_amount must_be_less_than 500000
The compiler can reject this with:
Error: applicant_age cannot be lower than 18.Reason: Legal minimum age for this product is 18.
Now this error goes back into the agent reasoning loop.
The agent can correct itself:
policy MortgageApprovalapplicant_age must_be_at_least 18loan_amount must_be_less_than 500000
This is much better than hoping the model never makes a mistake.
The mistake is expected.
But the domain boundary catches it.
For this kind of work, Langium is a good tool.
Langium allows you to define your own language grammar, parser, validator, and tooling in TypeScript.
You define the shape of your language, then you implement validation rules around it.
A simplified grammar might look something like this:
grammar CompliancePolicyentry Policy:'policy' name=IDrules+=Rule*;Rule:AgeRule | AmountRule | ManualReviewRule;AgeRule:'applicant_age' 'must_be_at_least' minAge=INT;AmountRule:'loan_amount' 'must_be_less_than' maxAmount=INT;ManualReviewRule:'if' 'loan_amount' 'greater_than' threshold=INT 'then''manual_review' 'required''end';
Again, this is simplified, but the idea is clear.
We are creating a language where the agent can only express concepts that exist in our domain.
Then we can add validation rules.
For example:
function validateAgeRule(rule: AgeRule, accept: ValidationAcceptor): void {if (rule.minAge < 18) {accept('error', 'Applicant age cannot be lower than 18.', {node: rule,property: 'minAge'});}}
Or:
function validateLoanAmount(rule: AmountRule, accept: ValidationAcceptor): void {if (rule.maxAmount > 500000) {accept('error', 'Loan amount cannot be greater than 500000 for this product.', {node: rule,property: 'maxAmount'});}}
These are obviously toy examples.
In a real enterprise domain, the rules would be much more complex. They could include jurisdiction, contract type, product category, approval level, required documentation, user permissions, audit rules, or any other domain-specific restriction.
But the important point is this:
The rules do not live only in the prompt.
They live in code.
They live in the compiler.
They live in the tool boundary.
Once we have a DSL compiler or validator, we can wrap it into a tool and attach it to an agent.
This is where the pattern becomes powerful.
The agent is not forbidden from reasoning.
The agent is not forced into a hardcoded workflow.
The agent can still explore possible solutions.
But every critical output must pass through the domain compiler.
A simplified tool could look like this:
import { tool } from "@mozaik-ai/core";import { z } from "zod";import { validatePolicy } from "./policy-dsl/compiler";export const validateCompliancePolicyTool = tool({name: "validate_compliance_policy",description: "Validates a compliance policy written in the internal DSL and returns domain/compiler errors if the policy breaks legal or business rules.",inputSchema: z.object({policySource: z.string().describe("Policy written in the compliance DSL")}),execute: async ({ policySource }) => {const result = await validatePolicy(policySource);if (!result.valid) {return {valid: false,errors: result.errors.map(error => ({message: error.message,location: error.location,rule: error.rule}))};}return {valid: true,normalizedPolicy: result.normalizedPolicy};}});
The exact API can differ depending on how your agent framework defines tools, but the architectural idea is the same.
The tool does not just execute something.
The tool protects the domain.
Then the agent can be instructed to use this tool before finalizing any regulated output.
For example:
import { Agent, AgenticEnvironment } from "@mozaik-ai/core";import { validateCompliancePolicyTool } from "./tools/validate-compliance-policy";const complianceAgent = new Agent({name: "CompliancePolicyAgent",instructions: `You help create compliance policies for a regulated domain.You must express every policy in the internal DSL.Before returning the final answer, always call validate_compliance_policy.If the tool returns errors, reason about them, fix the DSL, and validate again.Never return a policy that did not pass validation.`,tools: [validateCompliancePolicyTool]});const environment = new AgenticEnvironment();environment.addParticipant(complianceAgent);
In this setup, the agent is still autonomous.
It can decide how to model the policy. It can generate the DSL. It can call the validation tool. It can inspect errors. It can correct itself.
But it cannot simply bypass the domain rules if the system is designed properly.
The final policy must pass through the compiler.
A common reaction is:
Why not just hardcode all possible workflows?
Because in complex enterprise domains, that usually does not scale.
If we know every step in advance, we might not need an agent at all.
We can just build deterministic software.
The value of agents appears when the system needs to reason, adapt, and deal with slightly different situations.
But this does not mean the agent should have unlimited freedom.
The goal is not full freedom.
The goal is autonomy inside boundaries.
This is the difference.
Hardcoded workflow:
The developer predicts every step in advance.
Autonomous agent without boundaries:
The agent can do anything, including dangerous things.
Autonomous agent with domain tools:
The agent can reason freely, but only valid domain outputs can pass.
That is the architecture I find much more interesting for enterprise AI.
The most important part of this pattern is not the DSL itself.
The important part is the feedback loop.
The agent does not just call a tool and receive success or failure.
The agent receives domain-specific feedback.
For example:
{"valid": false,"errors": [{"message": "Manual review is required for loan amounts greater than 100000.","rule": "MANUAL_REVIEW_THRESHOLD","location": {"line": 7,"column": 3}}]}
This kind of error is much more useful than a generic validation failure.
It tells the agent what domain rule was broken.
Then the agent can continue reasoning.
It can update the policy.
It can validate again.
It can converge toward a valid result.
This is exactly where agents become useful.
Not because they never make mistakes.
But because they can recover from mistakes when the environment gives them good feedback.
I think this is one of the most important ideas when building agents for enterprises.
The question is not whether agents should be autonomous or restricted.
They need both.
If they are too restricted, they become useless.
If they are too autonomous, they become dangerous.
The solution is not to choose one extreme.
The solution is to design better boundaries.
DSLs, compilers, validators, domain-specific tools, and typed interfaces are all ways to create those boundaries.
They allow us to say:
The agent can think.
The agent can explore.
The agent can make decisions.
But the domain decides what is valid.
That is the balance we need.
For enterprise AI, better prompts are not enough.
More advanced models are not enough.
Even better orchestration is not enough.
In regulated domains, the real challenge is designing the right level of autonomy.
And in my experience, the best way to do that is by moving domain rules out of prompts and into tools that agents physically cannot bypass.
That is how we can build agents that are not blindly trusted, and not so restricted they become useless.
Agents that are autonomous inside properly designed domain boundaries.
That is the kind of architecture I think enterprise AI needs.
With tools and technology we already have, we can build much more valuable systems than most projects today. We can write software that is a pleasure to use and a pleasure to work on; software that doesn't box us in as it grows, but creates new opportunities and continues to add value for its owners.