The Agent Loop and Tool Design

The previous post covered what AI engineering is and why making a system good is the hard part. This post is about the building. It goes fast, because it should.

The Agent Loop

Before writing any code, it helps to understand what an agent actually does internally. The loop is the same regardless of what language, framework, or provider you use:

The user sends a message
The agent stores it in message history
The LLM (Large Language Model -- the AI brain, such as GPT-4 or Claude) is called with the full message history and a list of available tools
The LLM streams back tokens. If it decides it needs a tool, it streams back the tool name and the arguments it wants passed
The system executes that tool and feeds the result back to the LLM
The LLM keeps doing this until it decides it has enough information to answer
It streams the final text response back to the user

The agent loop is the same for every chat agent on the planet. You do not need to build it from scratch. Most frameworks give it to you for free.

The interesting insight is what the tool loop is actually doing: the agent is trying to figure out what context it needs to answer the question. Every tool call is the agent saying "I need more information. Go get this and bring it back to me." When it has enough, it stops and answers.

Everything in an agent is about context. Tool calls are just one mechanism for the agent to gather its own context dynamically.

Ambiguity Is the Feature

A deterministic workflow is the right tool when you already know every possible path a task might take. An agent is the right tool when you do not.

If you are building a coding assistant, every developer on the same project will ask completely different questions. You cannot codify all those paths in advance. You give the system a set of tools and let it decide on its own what to do next based on the user's request.

That ambiguity is the feature. It is also what makes agents unreliable by default. The agent has freedom to choose, and sometimes it chooses wrong. The discipline of AI engineering is making those choices more dependable over time.

What Tools Actually Are

You have probably heard the word MCP. MCP is just a standardised protocol around tools. At the end of the day, tools are the same thing regardless of the protocol: descriptions of abilities you give to an LLM.

A tool has three parts:

A description -- what the tool does and when the LLM should use it
An input schema -- the shape of the data the LLM must provide when calling the tool
The function itself -- the actual code that runs when the tool is called

The LLM does not execute the tool. LLMs run inference -- they predict what the next token (chunk of text) should be, one piece at a time. When the LLM decides a tool should be called, it streams back the tool name and the arguments it wants passed. Your system executes the function, gets the result, and feeds it back. You are running errands for the model.

A clean way to think about it: a tool is a function with extra context attached so the LLM knows when to call it, what to pass, and what to expect back.

What Cloudflare Handles for You

The core problem with running agents on serverless infrastructure (cloud functions that spin up on demand and shut down when idle) is that serverless is stateless by design -- each request starts fresh with no memory of the last one. Agents need to maintain chat history and keep long-lived WebSocket connections open for streaming. A WebSocket is a persistent two-way connection between the browser and the server, unlike a normal HTTP request which closes immediately after a response.

Cloudflare solves this with Durable Objects: a stateful function backed by a SQLite database on a serverless instance. This means the agent can store its entire conversation history and hold a WebSocket connection open without a separate database or third-party service.

The practical result:

Chat history managed automatically via the AIChatAgent abstraction
WebSocket connections work out of the box without hitting serverless TTL limits (TTL = time-to-live, the maximum time a function is allowed to run before the cloud provider shuts it down)
Routing handled automatically -- Cloudflare routes requests to the right agent instance by name

If you are not deploying to Cloudflare, none of this is irreplaceable. Storing chat history in a regular database works fine. It is just more setup.

Practice

0/4 done