What is "Wiring the Agent and Controlling Flow" about?

Master the loop that connects LLMs to real-world tools, from parsing calls to handling output context.

What topics does "Wiring the Agent and Controlling Flow" cover?

This article covers: ai agents, tool use, function calling, agent loop, context window optimization.

Wiring the Agent and Controlling Flow

The last post covered tools and the system prompt. Now we wire everything into a working agent.

The Agent Class

With the system prompt and tools ready, the agent class is just a few lines:

TypeScript

import { AIChatAgent } from 'cloudflare-ai-chat';
import { streamText, convertToModelMessages, stepCountIs } from 'ai';
import { createOpenAI } from '@ai-sdk/openai';
import { tools } from './tools';

class DesignAgent extends AIChatAgent<Env> {
  async onChatMessage() {
    const openai = createOpenAI({ apiKey: this.env.OPENAI_API_KEY });

    const result = await streamText({
      model: openai('gpt-4.1-mini'),
      system: systemPrompt,
      messages: convertToModelMessages(this.messages),
      tools,
      stopWhen: stepCountIs(5),
      providerOptions: {
        openai: { strictJsonSchema: false },
      },
    });

    return result.toUIMessageStreamResponse();
  }
}

convertToModelMessages -- the durable object stores messages in its own format. OpenAI requires a specific format. This converts between them.

this.messages -- the full conversation history from the durable object. No database queries needed.

stopWhen: stepCountIs(5) -- this is what enables the internal agent loop. Without it, streamText calls the LLM once and stops. With it, the AI SDK runs the full tool loop: call LLM, check for tool calls, execute tools, feed results back, repeat, up to 5 times.

strictJsonSchema: false -- deliberately disables OpenAI's strict schema guarantee. The agent is intentionally set up to be bad so there is a clear baseline to measure from. Turning this back on is one of the first improvements.

The model is gpt-4.1-mini: fast, cheap, and not very accurate. That is by design. Swapping to a better model is a free quality improvement, and it will be one of the first things evals confirm is worth doing.

Connecting the Worker

The agent class handles logic. The worker file handles routing. Cloudflare routes incoming WebSocket requests to the right agent instance by name:

TypeScript

import { DesignAgent } from './agent';
import { routeAgentRequest } from 'agents';

interface Env {
  DesignAgent: DurableObjectNamespace;
  OPENAI_API_KEY: string;
}

export { DesignAgent };

export default {
  async fetch(request: Request, env: Env) {
    return (
      await routeAgentRequest(request, env) ??
      new Response('Not found', { status: 404 })
    );
  },
};

routeAgentRequest inspects the incoming request, matches it to the right agent by name, and hands it off. If nothing matches, return a 404. The WebSocket connection and session management happen inside the durable object automatically.

The wrangler.toml needs two additions plus a compatibility flag:

TOML

compatibility_flags = ["nodejs_compat"]

[[durable_objects.bindings]]
name = "DesignAgent"
class_name = "DesignAgent"

[[migrations]]
tag = "v1"
new_sqlite_classes = ["DesignAgent"]

Environment Variables in Cloudflare Workers

Cloudflare injects environment variables as the second argument to the fetch function. You do not use process.env. Inside the AIChatAgent class they are available on this.env. Values come from .dev.vars locally and from the Cloudflare dashboard in production.

The agent URL in the test script includes an agent name in the path. That name is how Cloudflare routes to the right durable object instance. In production this would typically be a user ID so each user gets their own isolated agent and conversation history.

What the Stream Actually Looks Like

When you run the agent and ask it to draw something, the terminal shows a stream of deltas -- small incremental updates, each carrying a few tokens of the response as they are generated. Combined, they produce complete Excalidraw JSON.

The stream includes the tool calls too: a step-start event, a tool-input-start event with the tool name and call ID, then input-text-delta events that together form the JSON argument the LLM is generating. If you watch carefully, you can see the LLM produce the elements array field by field.

This is identical to what happens inside ChatGPT. The "thinking" text, the "calling weather tool" flash, the streaming response -- all of it is a UI built on top of exactly this kind of delta stream. You reassemble the partial JSON as it arrives and render it progressively.

The Agent Is Done

The output will be rough. Shapes misaligned. Labels missing. The agent has no idea what is already on the canvas. That is expected. The agent took one lesson to build. The rest of the course is the work of making it dependable.

Controlling the Flow

Right now the agent uses the default tool loop: pick a tool, run it, feed the result back, repeat. This is the simplest architecture and works for demos. It is not always right for production.

The 12 Factor Agents framework flags this directly: one of its core principles is owning the control flow. By using the default loop, you have handed control to the SDK. That is fine to start. Later there is a lesson about taking it back and building a custom loop when evals show the default one is not good enough.

There are many ways to control agent flow:

Chain of Thought -- prompt the model to reason step by step before acting (like asking someone to "show their work" before giving an answer)
ReAct (Reason + Act) -- interleave reasoning and tool use explicitly, so the agent thinks aloud before each action
Tree of Thought -- explore multiple reasoning paths before committing, like a chess player considering several moves ahead
Swarms -- multiple independent agents working in parallel and sharing results
Handoffs -- a main agent spins up specialist sub-agents and delegates specific tasks to them

These are architectural decisions about who is in control and when. The right choice depends on what evals show is failing. Start simple, measure, and only reach for complexity when the data says you need it.

Staying current on AI research is part of the job here in a way it never was with regular software. When a lab publishes a paper about a problem they solved, there is a real chance you have the same problem. Your evals will confirm or rule it out. If the data matches, try the technique, write a new eval subset, and measure the result.