What is "MCP Clients & Running Models Locally" about?

Explore the ecosystem of AI clients that support the Model Context Protocol.

What topics does "MCP Clients & Running Models Locally" cover?

This article covers: mcp clients, claude desktop, cursor mcp, ai host, mcp ecosystem.

MCP Clients & Running Models Locally

Before you write a single line of MCP server code, you need to understand what a client is -- because everything you build gets consumed by one.

The key distinction:

MCP Server -- what you build. Exposes tools, resources, and prompts.
MCP Client -- what connects to your server. Runs the LLM and feeds it your tools.

Diagram

The Main Clients

Client	Full MCP Support?	Tools	Resources	Prompts	Notes
Claude Desktop	✅	✅	✅	✅	Best for learning -- full spec
Cursor	✅	✅	✅	✅	Best for coding workflows
VS Code (Agent Mode)	✅	✅	✅	✅	Free, solid
Claude Code	✅	✅	✅	✅	CLI tool, great for vibe coding
Tome	Partial	✅	❌	❌	Tools only -- open source

â„¹ï¸ Course recommendation: Use Claude Desktop while learning. It supports the full MCP spec -- tools, resources, and prompts. When you reach the resources and prompts sections, Tome won't work for those parts. Once you're comfortable, try Cursor or Claude Code for real coding workflows.

Why Not Just Use Claude on the Web?

Good question. The answer is transport.

When you build an MCP server locally and run it on your machine, it communicates via stdio -- standard input/output, a Linux/Unix way of piping data between processes. There's no URL, no port, just a process running on your computer.

Claude.ai (the website) can't reach a process running on your local machine. It would need a public URL.

Claude Desktop is a native app. It can spawn local processes, read their output, and communicate directly. That's what makes local MCP development possible without deploying anything.

Diagram

Tome -- The Open Source Alternative

Tome is an open-source MCP client inspired by Claude Desktop. The main advantage: it works with Ollama (local models), so you can use it completely free.

What Tome supports:

✅ Tools (the most important part of this course)
❌ Resources
❌ Prompts

When to use it:

You want a free, open-source option
You're running Ollama models locally
You only need tool-calling (which is 80% of real MCP use)

If you're following along with just Tome, you'll need to treat the resources and prompts sections as read-only -- you can follow the theory, just can't test them in Tome.

Ollama -- Run Models Locally, Free

Ollama is a local model manager. Install it once, pull any open-source model, run it on your hardware. No API key, no per-token cost, no cloud.

Bash

# Install Ollama (Mac: brew install ollama, or download from ollama.com)
# Then pull a model:
ollama pull qwen3:8b

# List installed models
ollama list

# Run a model interactively
ollama run qwen3:8b

Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1, so any client that can talk to OpenAI can point at a local model instead.

âš ï¸ Hardware matters a lot. Running a model locally is GPU/RAM-intensive. The numbers in model names (0.6B, 8B, 70B) are approximate capability indicators within the same series -- but they're not comparable across series. An 8B Qwen model isn't necessarily "13x smarter" than a 0.6B Qwen model.

What matters for your hardware is how much VRAM you have. On Apple Silicon, RAM and VRAM are shared -- so 24GB is a good amount. On Windows/Linux, dedicated GPU VRAM is what counts.

Recommended Models by Hardware

Hardware	Recommended Model	Notes
8–12GB VRAM	Qwen 3: 0.6B	Fast, basic results
16GB VRAM	Qwen 3: 8B	Good balance of speed + quality
24GB+ VRAM (or Mac M2/M3)	Qwen 3: 8B or 14B	Comfortable choice
Gaming PC with RTX 4090	Any 30B+ model	Go wild

Tool-Calling Support

Not every model supports tool calling. You need a model that's been trained to use tools -- otherwise it'll ignore your MCP server entirely.

Brian's recommendation: use Qwen 3 models while experimenting with Ollama. The 8B model handles single-tool calls very reliably. The 0.6B handles them for simple cases.

â„¹ï¸ Ollama Turbo: Ollama recently introduced a hosted mode where models run in the cloud instead of locally. That defeats the "local and free" purpose, so we're ignoring it. We want local. Stick to local pulls.

Understanding "Billions of Parameters"

When you see model names like "Qwen 3: 8B" or "Llama 3: 70B", the "B" stands for billions of parameters. Here's the mental model Brian uses:

Think back to the Intel vs AMD processor wars. AMD had slower clock speeds in MHz -- but their chips performed comparably to Intel's faster-clocked ones. People kept comparing MHz directly, which was wrong.

Same thing here. You can only compare parameter counts within the same model family. Qwen 3: 0.6B vs Qwen 3: 8B is a meaningful comparison. Qwen 3: 8B vs Phi 4: 3.8B is comparing apples and cars.

Diagram

The Temperature Parameter

When you work with LLMs -- especially through Ollama -- you'll encounter temperature. This is worth understanding for MCP because it affects tool selection.

Diagram

For tool selection, higher temperature means the model might occasionally decide not to use a tool it has available -- even when it should. Most clients use a temperature around 0.7–0.8.

If you see your model ignoring an obvious tool, lowering the temperature is one thing to try.

Setting Up Your Environment

Here's the recommended setup for following this course:

Bash

# 1. Install Claude Desktop
# → Download from https://claude.ai/download

# 2. Install Ollama (optional, for free local models)
# → Download from https://ollama.com
ollama pull qwen3:8b

# 3. Install Tome (optional, open-source client)
# → Download from GitHub releases

# 4. Install Node.js 18+
node --version   # should print v18.x.x or higher

# 5. You'll also need this pattern for package.json
# when building MCP servers:
# { "type": "module" }

âœ… Minimum viable setup: Claude Desktop + Node.js 18+. That's all you need to build and test everything in this course. Add Ollama if you want to experiment with local models for free.

Lab -- Explore the Clients

Before writing any server code, spend 5 minutes exploring your client:

JavaScript Â· Live Editor

Loading editor...

â„¹ï¸ Why does this matter? Every tool in this list gets sent to the LLM as tokens on every request. Claude starts degrading around 40 tools. Keep your tools focused, descriptions clear, and disable tools you're not using in the current session.

Key Takeaways

Clients consume MCP servers. Claude Desktop, Cursor, VS Code Agent Mode, Claude Code, and Tome are all clients.
Claude Desktop supports the full MCP spec -- tools, resources, and prompts. Tome only supports tools.
Use Claude Desktop for learning. Add Ollama + Tome if you want a free path with local models.
Tool calling requires specific model support. Not all Ollama models handle tools. Qwen 3 series does.
Parameter counts only compare within a family. Don't compare Qwen vs Phi by billions alone.

What's Next

Time to build your first MCP server. Project setup, installing the SDK, registering your first tool -- and understanding why Zod is the backbone of every tool's input schema.