What Is AI Engineering?

I spent a long time calling APIs, getting back text, and displaying it in a chat UI. That felt like AI work. It was not. It was a wrapper. The system did not improve over time, could not use tools, and could not take actions. It was just a fancy text box.

AI engineering is the discipline of building AI systems that actually do things: systems with memory, tools, evaluation loops, and the ability to take actions on behalf of users. It is a proper skill with real patterns, failure modes, and mental models. And once I started treating it that way, I realised how different it is from everything else I had built before.

Making Something Good Is the Hard Part

Building an agentic system (one where the AI can take actions, use tools, and make decisions on its own) is not hard. SDKs (software development kits -- pre-built code packages that give you a ready-made connection to a service) let you do it in five lines of code. I have done it. The hard part is making the system good, and that job is never done.

Here is what clicked for me: regular software is deterministic -- the same input always gives the same output. When I write a function that adds two numbers, two plus two is always four. I write tests, they pass or they do not, and I ship.

With AI systems, the output is non-deterministic -- the same prompt can produce a different result every time. I might build something and have only 30% of my evals (automated checks that score whether the AI produced a good result) pass on day one. My instinct from years of traditional engineering was: do not ship this. But in AI engineering, you probably will ship it. You need real users, real inputs, and real failures to gather enough information to improve. Waiting for perfect means waiting forever.

So the goal shifts. Instead of asking "is this correct?", I ask "how do I measure quality in a way that lets me track improvement over time?"

What AI Engineers Actually Do

A question I had early on was: what is the difference between an AI engineer and an ML engineer? They sound like the same job.

ML engineering is focused on training models, managing datasets, and optimising model performance. Research engineers replicate findings from academic papers. Research scientists write those papers. AI engineers sit at the application layer. We take the output of all that and build things people actually use.

The four skills that show up most in AI engineering job postings are: RAG (Retrieval-Augmented Generation -- a technique for giving the AI access to your own data by fetching relevant documents and injecting them into the conversation), evals, agents, and production deployment.

At the application layer, the work breaks down into four areas:

Context engineering. Tokens are the currency of AI systems. A token is roughly three-quarters of a word -- "hamburger" is two tokens, "eating" is one. Every token you send costs compute (processing power on a server), and at scale that becomes real energy cost. The discipline is sending the right tokens to the model at the right time. The system prompt is one of the main tools for this: it is where you put things that do not change often but are essential for the agent to behave dependably, like fixed instructions, personality constraints, and decision-making logic.

Tool design. Agents act through tools. Designing them well means giving the agent the right abilities and keeping it away from the wrong ones. An agent that fires the wrong tool at the wrong time can cause real problems.

Evaluation. How do you know if your agent is doing a good job? You build evals. Without them you are flying blind. Every change you make could be helping or hurting and you would have no way to tell.

Production reliability. Self-healing when things break, giving users visibility into what the agent is doing, handling latency, and making errors understandable.

The Build-Eval-Improve Loop

ExpandThe build-eval-improve loop -- the core workflow of AI engineering

Building an agent is one lesson. Making it dependable is the rest of the course. The process follows a loop:

Build the agent
Eval it to get a baseline (it will be bad)
Improve it based on what the evals tell you
Eval again
Repeat

This loop never ends. Teams at large AI companies have engineers whose entire job is improving one specific part of the system: tool selection, token efficiency, safety, human-in-the-loop behaviour (where a human reviews or approves an action before the AI continues). The work is never finished because the output is non-deterministic.

The hardest part of this loop is figuring out what to measure and how to score it. Coming up with the right metrics gives you signal. Bad metrics give you noise. That is where most of the real work in AI engineering lives.

The 12 Factor Agents Framework

When I was first building agents seriously, I was developing a mental model for how to think about them but struggling to put it into words. I came across a framework called 12 Factor Agents, built on the same idea as the classic 12 Factor App methodology. A lot of what was written there matched what I had already been working through, so it helped me articulate things I had not been able to name yet.

It covers ideas like owning the control flow, letting the model handle ambiguity gracefully, using evals, and failing safely. Most people have probably never heard of it. It is not an industry standard. But it is worth reading if you want a framework for what a well-designed agent looks like before you start building.

Practice

0/5 done