Mozaik v3.10.0: Inference Interception

Inference Interception — Mozaik v3.10.0

Mozaik v3.10.0 ships streaming. That is the boring part. Tokens arrive earlier, the UI feels alive, everyone nods and moves on. The interesting part is what we refused to do: treat streaming as a transport trick that dies at the SDK boundary. In Mozaik, streamed inference enters the same AgenticEnvironment as everything else — as Semantic Events that any participant can see and react to while the run is still in flight. Not faster text. Live runtime activity in a shared room.

Most “streaming” is a lie for multi-agent systems

The industry default is embarrassingly simple: partial text, chunk callback, print to the client. Fine for a chat box. Useless for agents that are supposed to watch each other. If only the caller sees the stream, you do not have an agentic system — you have a single-player demo with spectators locked outside. The questions that actually matter are whether another participant can see an agent mid-thought, whether someone can step in before a full ModelMessageItem lands, and whether UI and tracing stay out of your business agents. v3.10.0 is built around those questions, not around making the typewriter effect smoother.

Two lanes. Do not mix them.

We did not collapse streaming into context items. That would have been convenient and wrong. Function calls, reasoning, model messages — that is produced context: durable, replayable, belongs in ModelContext with handlers like onExternalModelMessage. Stream deltas and custom runtime signals are something else: SemanticEvent<TData>, delivered through onInternalEvent and onExternalEvent. Today that means provider stream events; tomorrow it can mean policy.violation.detected or handoff.requested without bolting on a second event bus. One environment. Two semantics. Mix them and your context store becomes a junk drawer.

semantic-event.ts
SemanticEvent<TData>

The split is not academic. An agent that reviews another agent needs to treat “still streaming” differently from “message complete”:

reviewer-handlers.ts
async onExternalEvent(source, event) {
// Runtime signal while the other agent is still streaming
}
async onExternalModelMessage(source, item) {
// Produced context once the model message is complete
this.context.addContextItem(item)
this.runInference(this.environment, this.context, this.model)
}

Interrupt the run, not the workflow diagram

Picture a planner agent drafting a risky migration while a safety reviewer listens to the stream — not waiting for a pipeline step, not wired inside the planner, not asking a central orchestrator for permission. The planner runs non-blocking inference; the environment fans out semantic events; the reviewer buffers deltas and fires when the plan smells wrong. No “first planner, then reviewer” fiction. Everyone is already in the room. That is the shape pipelines cannot express without turning into spaghetti, which is why so many “multi-agent” products are still dressed-up single-agent loops with extra logos.

We ship a runnable version of this in mozaik-examples/inference-interception. PlannerAgent streams a migration plan with an AbortController passed into runInference. SafetyReviewerAgent watches onExternalEvent, reads deltas from event.data.delta, and when risky phrases show up it calls source.abortCurrentInference(...) so the planner's stream stops — then starts corrective inference.RuntimeObserver only logs [event] and [model_message] lines.

Console output from a real run — planner events, then the reviewer aborting the stream, then corrective inference:

Console output: planner stream events, safety reviewer aborting the planner, then corrective inference

The planner holds the abort hook; the reviewer holds the interception:

planner-agent.ts
abortCurrentInference(reason?: string) {
if (!this.inferenceAbort) return
console.log("[planner] aborting stream", reason ? ` ${reason}` : "")
this.inferenceAbort.abort()
this.inferenceAbort = undefined
}
private startInference() {
this.inferenceAbort?.abort()
this.inferenceAbort = new AbortController()
// Pass signal through to InferenceRunner streaming stops when aborted.
this.runInference(
this.environment,
this.context,
this.model,
this.inferenceAbort.signal,
)
}
safety-reviewer-agent.ts
async onExternalEvent(source: Participant, event: SemanticEvent) {
if (!(source instanceof PlannerAgent)) return
if (event.type === "response.output_text.delta") {
const payload = event.data as { delta?: string }
const delta = payload?.delta ?? ""
this.buffer += delta
if (!this.intercepted && this.shouldIntercept(this.buffer)) {
this.intercepted = true
console.log("[reviewer] risky output detected aborting planner stream")
source.abortCurrentInference("risky phrases in planner stream")
this.context.addContextItem(
UserMessageItem.create(`
The current migration plan is becoming too risky.
Intercept now and suggest a safer staged rollout with rollback points.
`),
)
this.runInference(this.environment, this.context, this.model)
}
}
}

The reviewer does not wait for the planner to finish. It does not need a workflow engine step. When the planner eventually emits a completed model message, that is a separate path — durable context, not stream noise. The controversy is blunt: if your framework only streams to the caller, you are not building agents that collaborate; you are building faster monologues.

From the repo root (with OPENAI_API_KEY in .env): npm run inference-interception or npx tsx inference-interception/index.ts. Requires @mozaik-ai/core 3.10+.

Observers exist because agents get obese

BaseObserver is the other half of the story. Agents should do agent things. Logging, UI fan-out, metrics, audit trails, and token painting belong elsewhere — on participants that watch but do not infer. Without that split, every agent swells into a god object that owns business logic and every side effect you were too lazy to name. v3.10.0 makes the skinny observer explicit:

runtime-observer.ts
async onExternalEvent(source: Participant, event: SemanticEvent) {
// Tokens, traces, UI not business logic
console.log("[event]", source.constructor.name, event.type)
}
async onExternalModelMessage(source: Participant, item: ModelMessageItem) {
// Durable output separate from the stream
console.log("[model_message]", source.constructor.name, item.toJSON())
}

Inference interception is the product

The headline feature is streaming. The product is inference interception: another participant responding while the original run is still producing. Live review, safety cuts, speculative collaboration, early handoff — those patterns were always obvious on a whiteboard and always missing in code because the stream never left the caller. We are not claiming streaming is novel. Every stack has it. We are claiming that keeping it out of the environment was a design mistake for anyone serious about multi-agent work — and that pretending context items and stream chunks are the same thing was the second mistake.

Mozaik v3.10.0 is a foundation release. Semantic Events carry runtime signals. Produced-context handlers keep model output clean. Observers absorb visibility so agents stay sharp. And because every participant shares one environment, agents can finally react to each other while inference is still happening — not after the user has already read the bad answer. Streaming as an agentic signal. Everything else is just typing speed.

Miodrag Vilotijević

Miodrag Vilotijević

Co-founder @ JigJoy

Building the future of agentic systems

To answer the question of what is going to happen next, we need to work out what has already happened; that is, to understand where we will be tomorrow, we need to understand what it was that got us to where we are today.