MCP Clients & Running Models Locally

MCP clients connect to your servers. Claude Desktop supports the full spec. Tome supports tools only. Ollama runs models locally. Here's how to pick your setup and what each client actually supports.

April 1, 20268 min read2 / 2

Before you write a single line of MCP server code, you need to understand what a client is -- because everything you build gets consumed by one.

The key distinction:

  • MCP Server -- what you build. Exposes tools, resources, and prompts.
  • MCP Client -- what connects to your server. Runs the LLM and feeds it your tools.
Diagram

The Main Clients

ClientFull MCP Support?ToolsResourcesPromptsNotes
Claude DesktopBest for learning -- full spec
CursorBest for coding workflows
VS Code (Agent Mode)Free, solid
Claude CodeCLI tool, great for vibe coding
TomePartialTools only -- open source
ℹ️ Course recommendation: Use Claude Desktop while learning. It supports the full MCP spec -- tools, resources, and prompts. When you reach the resources and prompts sections, Tome won't work for those parts. Once you're comfortable, try Cursor or Claude Code for real coding workflows.

Why Not Just Use Claude on the Web?

Good question. The answer is transport.

When you build an MCP server locally and run it on your machine, it communicates via stdio -- standard input/output, a Linux/Unix way of piping data between processes. There's no URL, no port, just a process running on your computer.

Claude.ai (the website) can't reach a process running on your local machine. It would need a public URL.

Claude Desktop is a native app. It can spawn local processes, read their output, and communicate directly. That's what makes local MCP development possible without deploying anything.

Diagram

Tome -- The Open Source Alternative

Tome is an open-source MCP client inspired by Claude Desktop. The main advantage: it works with Ollama (local models), so you can use it completely free.

What Tome supports:

  • ✅ Tools (the most important part of this course)
  • ❌ Resources
  • ❌ Prompts

When to use it:

  • You want a free, open-source option
  • You're running Ollama models locally
  • You only need tool-calling (which is 80% of real MCP use)

If you're following along with just Tome, you'll need to treat the resources and prompts sections as read-only -- you can follow the theory, just can't test them in Tome.


Ollama -- Run Models Locally, Free

Ollama is a local model manager. Install it once, pull any open-source model, run it on your hardware. No API key, no per-token cost, no cloud.

Bash
# Install Ollama (Mac: brew install ollama, or download from ollama.com) # Then pull a model: ollama pull qwen3:8b # List installed models ollama list # Run a model interactively ollama run qwen3:8b

Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1, so any client that can talk to OpenAI can point at a local model instead.

⚠️ Hardware matters a lot. Running a model locally is GPU/RAM-intensive. The numbers in model names (0.6B, 8B, 70B) are approximate capability indicators within the same series -- but they're not comparable across series. An 8B Qwen model isn't necessarily "13x smarter" than a 0.6B Qwen model.

What matters for your hardware is how much VRAM you have. On Apple Silicon, RAM and VRAM are shared -- so 24GB is a good amount. On Windows/Linux, dedicated GPU VRAM is what counts.

HardwareRecommended ModelNotes
8–12GB VRAMQwen 3: 0.6BFast, basic results
16GB VRAMQwen 3: 8BGood balance of speed + quality
24GB+ VRAM (or Mac M2/M3)Qwen 3: 8B or 14BComfortable choice
Gaming PC with RTX 4090Any 30B+ modelGo wild

Tool-Calling Support

Not every model supports tool calling. You need a model that's been trained to use tools -- otherwise it'll ignore your MCP server entirely.

Brian's recommendation: use Qwen 3 models while experimenting with Ollama. The 8B model handles single-tool calls very reliably. The 0.6B handles them for simple cases.

ℹ️ Ollama Turbo: Ollama recently introduced a hosted mode where models run in the cloud instead of locally. That defeats the "local and free" purpose, so we're ignoring it. We want local. Stick to local pulls.

Understanding "Billions of Parameters"

When you see model names like "Qwen 3: 8B" or "Llama 3: 70B", the "B" stands for billions of parameters. Here's the mental model Brian uses:

Think back to the Intel vs AMD processor wars. AMD had slower clock speeds in MHz -- but their chips performed comparably to Intel's faster-clocked ones. People kept comparing MHz directly, which was wrong.

Same thing here. You can only compare parameter counts within the same model family. Qwen 3: 0.6B vs Qwen 3: 8B is a meaningful comparison. Qwen 3: 8B vs Phi 4: 3.8B is comparing apples and cars.

Diagram

The Temperature Parameter

When you work with LLMs -- especially through Ollama -- you'll encounter temperature. This is worth understanding for MCP because it affects tool selection.

Diagram

For tool selection, higher temperature means the model might occasionally decide not to use a tool it has available -- even when it should. Most clients use a temperature around 0.7–0.8.

If you see your model ignoring an obvious tool, lowering the temperature is one thing to try.


Setting Up Your Environment

Here's the recommended setup for following this course:

Bash
# 1. Install Claude Desktop # → Download from https://claude.ai/download # 2. Install Ollama (optional, for free local models) # → Download from https://ollama.com ollama pull qwen3:8b # 3. Install Tome (optional, open-source client) # → Download from GitHub releases # 4. Install Node.js 18+ node --version # should print v18.x.x or higher # 5. You'll also need this pattern for package.json # when building MCP servers: # { "type": "module" }
✅ Minimum viable setup: Claude Desktop + Node.js 18+. That's all you need to build and test everything in this course. Add Ollama if you want to experiment with local models for free.

Lab -- Explore the Clients

Before writing any server code, spend 5 minutes exploring your client:

JavaScript · Live Editor
Loading editor...
ℹ️ Why does this matter? Every tool in this list gets sent to the LLM as tokens on every request. Claude starts degrading around 40 tools. Keep your tools focused, descriptions clear, and disable tools you're not using in the current session.

Key Takeaways

  • Clients consume MCP servers. Claude Desktop, Cursor, VS Code Agent Mode, Claude Code, and Tome are all clients.
  • Claude Desktop supports the full MCP spec -- tools, resources, and prompts. Tome only supports tools.
  • Use Claude Desktop for learning. Add Ollama + Tome if you want a free path with local models.
  • Tool calling requires specific model support. Not all Ollama models handle tools. Qwen 3 series does.
  • Parameter counts only compare within a family. Don't compare Qwen vs Phi by billions alone.

What's Next

Time to build your first MCP server. Project setup, installing the SDK, registering your first tool -- and understanding why Zod is the backbone of every tool's input schema.