MCP Clients & Running Models Locally
MCP clients connect to your servers. Claude Desktop supports the full spec. Tome supports tools only. Ollama runs models locally. Here's how to pick your setup and what each client actually supports.
Before you write a single line of MCP server code, you need to understand what a client is -- because everything you build gets consumed by one.
The key distinction:
- MCP Server -- what you build. Exposes tools, resources, and prompts.
- MCP Client -- what connects to your server. Runs the LLM and feeds it your tools.
The Main Clients
| Client | Full MCP Support? | Tools | Resources | Prompts | Notes |
|---|---|---|---|---|---|
| Claude Desktop | ✅ | ✅ | ✅ | ✅ | Best for learning -- full spec |
| Cursor | ✅ | ✅ | ✅ | ✅ | Best for coding workflows |
| VS Code (Agent Mode) | ✅ | ✅ | ✅ | ✅ | Free, solid |
| Claude Code | ✅ | ✅ | ✅ | ✅ | CLI tool, great for vibe coding |
| Tome | Partial | ✅ | ❌ | ❌ | Tools only -- open source |
Why Not Just Use Claude on the Web?
Good question. The answer is transport.
When you build an MCP server locally and run it on your machine, it communicates via stdio -- standard input/output, a Linux/Unix way of piping data between processes. There's no URL, no port, just a process running on your computer.
Claude.ai (the website) can't reach a process running on your local machine. It would need a public URL.
Claude Desktop is a native app. It can spawn local processes, read their output, and communicate directly. That's what makes local MCP development possible without deploying anything.
Tome -- The Open Source Alternative
Tome is an open-source MCP client inspired by Claude Desktop. The main advantage: it works with Ollama (local models), so you can use it completely free.
What Tome supports:
- ✅ Tools (the most important part of this course)
- ❌ Resources
- ❌ Prompts
When to use it:
- You want a free, open-source option
- You're running Ollama models locally
- You only need tool-calling (which is 80% of real MCP use)
If you're following along with just Tome, you'll need to treat the resources and prompts sections as read-only -- you can follow the theory, just can't test them in Tome.
Ollama -- Run Models Locally, Free
Ollama is a local model manager. Install it once, pull any open-source model, run it on your hardware. No API key, no per-token cost, no cloud.
# Install Ollama (Mac: brew install ollama, or download from ollama.com)
# Then pull a model:
ollama pull qwen3:8b
# List installed models
ollama list
# Run a model interactively
ollama run qwen3:8bOllama exposes an OpenAI-compatible API at http://localhost:11434/v1, so any client that can talk to OpenAI can point at a local model instead.
What matters for your hardware is how much VRAM you have. On Apple Silicon, RAM and VRAM are shared -- so 24GB is a good amount. On Windows/Linux, dedicated GPU VRAM is what counts.
Recommended Models by Hardware
| Hardware | Recommended Model | Notes |
|---|---|---|
| 8–12GB VRAM | Qwen 3: 0.6B | Fast, basic results |
| 16GB VRAM | Qwen 3: 8B | Good balance of speed + quality |
| 24GB+ VRAM (or Mac M2/M3) | Qwen 3: 8B or 14B | Comfortable choice |
| Gaming PC with RTX 4090 | Any 30B+ model | Go wild |
Tool-Calling Support
Not every model supports tool calling. You need a model that's been trained to use tools -- otherwise it'll ignore your MCP server entirely.
Brian's recommendation: use Qwen 3 models while experimenting with Ollama. The 8B model handles single-tool calls very reliably. The 0.6B handles them for simple cases.
Understanding "Billions of Parameters"
When you see model names like "Qwen 3: 8B" or "Llama 3: 70B", the "B" stands for billions of parameters. Here's the mental model Brian uses:
Think back to the Intel vs AMD processor wars. AMD had slower clock speeds in MHz -- but their chips performed comparably to Intel's faster-clocked ones. People kept comparing MHz directly, which was wrong.
Same thing here. You can only compare parameter counts within the same model family. Qwen 3: 0.6B vs Qwen 3: 8B is a meaningful comparison. Qwen 3: 8B vs Phi 4: 3.8B is comparing apples and cars.
The Temperature Parameter
When you work with LLMs -- especially through Ollama -- you'll encounter temperature. This is worth understanding for MCP because it affects tool selection.
For tool selection, higher temperature means the model might occasionally decide not to use a tool it has available -- even when it should. Most clients use a temperature around 0.7–0.8.
If you see your model ignoring an obvious tool, lowering the temperature is one thing to try.
Setting Up Your Environment
Here's the recommended setup for following this course:
# 1. Install Claude Desktop
# → Download from https://claude.ai/download
# 2. Install Ollama (optional, for free local models)
# → Download from https://ollama.com
ollama pull qwen3:8b
# 3. Install Tome (optional, open-source client)
# → Download from GitHub releases
# 4. Install Node.js 18+
node --version # should print v18.x.x or higher
# 5. You'll also need this pattern for package.json
# when building MCP servers:
# { "type": "module" }Lab -- Explore the Clients
Before writing any server code, spend 5 minutes exploring your client:
Key Takeaways
- Clients consume MCP servers. Claude Desktop, Cursor, VS Code Agent Mode, Claude Code, and Tome are all clients.
- Claude Desktop supports the full MCP spec -- tools, resources, and prompts. Tome only supports tools.
- Use Claude Desktop for learning. Add Ollama + Tome if you want a free path with local models.
- Tool calling requires specific model support. Not all Ollama models handle tools. Qwen 3 series does.
- Parameter counts only compare within a family. Don't compare Qwen vs Phi by billions alone.
What's Next
Time to build your first MCP server. Project setup, installing the SDK, registering your first tool -- and understanding why Zod is the backbone of every tool's input schema.
Keep reading