
Ollama Just Became an OpenClaw Provider
Ollama 0.18 shipped with native OpenClaw integration. Local models now get tool calling, multi-agent workflows, and permission boundaries. No API costs, no data leaving your network.
Five days ago, local AI stopped being a hobbyist thing. You probably missed it.
Ollama 0.18 shipped with native OpenClaw integration. Not a hack. Not a custom baseUrl workaround. A real auth provider that slots into OpenClaw's onboarding flow next to Anthropic and OpenAI.
What changed: local models now get the same orchestration that made Claude Code useful. Tool calling. Multi-agent session spawns. Permission boundaries. Context management. No API cost. No data leaving your network.
I've shipped both local model deployments and cloud orchestration at production scale. The gap between "runs on my laptop" and "runs my workflow" has always been the permission model. Ollama 0.18 closes it.
What you need
- Ollama 0.18 or later (shipped Mar 14, 2026)
- OpenClaw 2026.3.7+ (native Ollama provider support)
- 16GB+ RAM for local models (8GB works for smaller ones like
glm-4.7-flash) - Optional: Ollama cloud account for hybrid local+cloud mode
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Check the version
ollama --version # Should show 0.18.0+
# Install OpenClaw if you haven't
npm install -g openclaw
# Confirm Ollama shows as a provider
openclaw providers list | grep ollama
Why native beats the old /v1 hack
Before 0.18, connecting local models to OpenClaw meant manual config edits:
{
"providers": [
{
"id": "ollama-custom",
"baseUrl": "http://localhost:11434/v1",
"apiKey": "not-needed"
}
]
}
This sort of worked. Tool calling broke, though. OpenClaw expected OpenAI-compatible /v1 responses, but Ollama's /v1 mode is "best effort" compatibility. It outputs raw JSON instead of structured tool_calls objects.
What that looks like in practice: your agent sees "call the read_file tool" as a text string, not an executable action. No file operations. No git commands. No web search. Just text generation pretending to be agentic.
The native integration uses Ollama's actual API (/api/chat), which has real tool support. Your local models can now run the same skills Claude Code uses (file ops, shell commands, browser automation) without cloud dependencies.
I've debugged enough "works in dev, breaks in prod" issues to spot a protocol mismatch. OpenAI compat mode optimizes for "works with existing code." Native APIs optimize for "works correctly." Different goals.
Step by step
1. Onboard with Ollama
openclaw onboard --auth-choice ollama
OpenClaw will:
- Auto-discover local models via Ollama's
/api/tagsendpoint - Ask you to choose Cloud + Local or Local-only mode
- Open a browser for OAuth if you pick cloud mode
- Pull your selected model if it's not cached
- Write config to
~/.openclaw/openclaw.json
Here's what that looks like:
✓ Discovered 3 local models:
- llama3.3:70b (context: 128K)
- qwen2.5:27b (context: 32K)
- glm-4.7-flash:latest (context: 8K)
Choose mode:
1. Cloud + Local (recommended)
2. Local-only (no external API calls)
Selection: 1
Recommended cloud model: kimi-k2.5:cloud (fast, good for agentic tasks)
Recommended local model: glm-4.7-flash (7B params, fits 16GB RAM)
Pulling glm-4.7-flash... ████████████████ 100%
✓ OpenClaw configured with Ollama provider
2. Verify tool calling works
Create a test file:
echo "Test content for AI agent" > /tmp/test-file.txt
Launch OpenClaw and ask it to read the file:
openclaw launch
You: Read /tmp/test-file.txt and tell me what's in it.
If tool calling works (native API):
Agent: [reads file using read tool]
The file contains: "Test content for AI agent"
If tool calling is broken (/v1 mode):
Agent: I cannot read files directly, but you can use the `cat` command...
That's how you know. Native Ollama API returns tool_calls as a structured field. The /v1 compat mode buries it as text inside the completion.
3. Spawn a multi-agent workflow
This is the real unlock. Multi-agent orchestration on local models.
openclaw launch --model qwen2.5:27b
You: Spawn a research agent to find the top 3 GitHub repos trending this
week in backend development. Then spawn a summarizer to write a
one-paragraph summary of each.
OpenClaw spawns a research subagent (uses web search or GitHub API tools), waits for it to finish, spawns a summarizer with the research results as context, and returns aggregated output.
Before 0.18, local models were isolated. You could run ollama run llama3.3 for a single conversation, but multi-agent orchestration required custom Python scripts or framework-specific glue code. Now Ollama models are native OpenClaw providers. Subagent spawns work. Context passing works. Tool calling works.
The same patterns that work with Claude Code work with local Qwen or GLM models.
4. Try hybrid mode
If you chose Cloud + Local during onboarding:
# Fast cloud model (low latency, pay-per-token)
openclaw launch --model kimi-k2.5:cloud
# Private local model (slower first run, zero cost)
openclaw launch --model qwen2.5:27b
Performance from the 0.18 release notes: Kimi-K2.5 runs 2x faster than 0.17. MiniMax-M2.5 got up to 10x faster time-to-first-token (under 1s now). Cloud models don't require ollama pull at all, so there's zero local storage.
The trade-off is cold start. Cloud models are always warm (instant first token). Local models need 5-30 seconds to load weights into VRAM on the first inference. After that, they're fast.
I ran into the same pattern designing edge vs origin caching for CDN deployments. Cold start kills user experience. Cloud models outsource that cost to Ollama's infrastructure. Local models make you eat it.
Use cloud for user-facing chatbots where latency matters on every request. Use local for background jobs, code review agents, or anything touching sensitive data.
The pitfalls that will bite you
The /v1 trap
Don't use http://localhost:11434/v1 with OpenClaw.
/v1 is OpenAI-compatible mode, designed for drop-in replacement in existing OpenAI code. Tool calling in this mode is approximate. It outputs JSON strings, not structured tool calls. OpenClaw needs the structured format.
Your agent will refuse to use tools and suggest you run shell commands manually. That's the tell.
Fix: let OpenClaw use the native Ollama provider. It calls /api/chat automatically.
Silent cloud fallback (the privacy bug)
This one is ugly. GitHub Issue #43945:
- Main session uses Ollama (local models, no cloud API calls)
- You spawn a subagent
- Subagent can't authenticate with Ollama
- Subagent silently falls back to
gpt-4.1-mini(cloud model in the fallback chain) - Your private data just hit OpenAI's API. No warning.
Why it happens: OpenClaw's auth system treats all API keys as secrets. Secrets go into auth-profiles.json, which subagents read. But Ollama's OLLAMA_API_KEY="ollama-local" isn't a real key. It's a feature flag. OpenClaw classifies it as a "marker" and skips writing it to auth profiles.
Main session reads from openclaw.json (works). Subagents read from auth-profiles.json (empty). They get a 401, hit the fallback chain, and land on a cloud provider.
Users who chose local Ollama for data sovereignty (medical records, proprietary code, PII) just had their trust violated. Silently.
Workaround: set OPENCLAW_NO_FALLBACK=1 in your environment. This blocks cloud fallback and makes auth failures loud instead of silent.
The real fix needs to happen at the auth layer. It's the confused deputy problem. OpenClaw's auth pipeline assumes credentials are secrets. Local providers use permission markers, not secrets. Those are different primitives, and the system doesn't distinguish them yet.
"Local = fast" (it isn't, at first)
Cold start on a 70B model: 10-30 seconds (loading weights from disk into VRAM). After that, inference runs at ~50 tokens/s on a 4090.
Cloud models: 50-200ms to first token. Always.
Local wins for background jobs (cold start doesn't matter), long conversations (amortize the startup cost), and batch processing (load once, process thousands of items).
Cloud wins for user-facing chatbots, bursty workloads (sporadic requests mean cold start every time), and small tasks (loading a 70B model to answer "what's 2+2" is wasteful).
I got burned by this assumption with serverless cold starts years ago. First request is always the killer. Local models have the exact same problem.
Going further
Cloud models with the :cloud tag
Ollama's cloud models skip the download entirely. Append :cloud to the model name:
ollama run kimi-k2.5:cloud
ollama run minimax-m2.5:cloud
ollama run glm-5:cloud
Your ollama CLI is a client to Ollama's cloud inference API. Models stay on their servers. You pay per token. Auth requires ollama signin or the OLLAMA_API_KEY env var.
Good for prototyping (no 10GB downloads), machines without GPUs, or when you need fast time-to-first-token without managing infrastructure.
Bad for data sovereignty (data leaves your network), cost-sensitive batch jobs (local is free after hardware), or high-throughput workloads (local + batching wins on per-token cost).
Mixing local and cloud in one workflow
This is where it gets interesting. You can split a workflow by sensitivity.
Example, a code review agent:
- Planner (cloud): fast model analyzes the PR and generates a review plan. Metadata only, no actual code.
- Reviewer (local): reads the source code, runs static analysis, generates comments. Sensitive code never leaves your network.
- Summarizer (cloud): aggregates review comments into a readable summary. Metadata again, no code.
I'd use this pattern for HIPAA-compliant chatbots or financial data analysis. Orchestration logic and metadata can hit cloud APIs (fast, cheap). Sensitive data stays local (compliance win).
You're trading complexity for control. Single-model workflows are simpler. Hybrid gives you cloud latency + local privacy, but now you're running two environments.
Lock down the permissions
Local models with tool calling means shell access.
If Ollama is your auth boundary, and Ollama has no auth by default (localhost:11434 is wide open), any process on your machine can spawn agents with arbitrary tool access.
Fine for your personal laptop. Not fine for shared dev environments or self-hosted OpenClaw deployments.
What you need:
- Firewall Ollama to
127.0.0.1only, or isolate it on a VPN subnet - Put auth in front of it (HTTP Basic Auth or SSO via a reverse proxy like Caddy or nginx, since Ollama doesn't ship with auth)
- Use OpenClaw's approval system for dangerous tools (require manual approval for
exec)
I've seen too many "misconfigured internal API → data exfiltration" incidents to skip this section.
Alibaba published a security audit of Ollama in February. Their conclusion: "A misconfigured Ollama instance exposed to internal networks can become an attack surface, a data exfiltration vector, or a compliance liability."
Treat Ollama like you'd treat a database. Sensitive APIs need auth, even on internal networks.
What to do with this
Ollama 0.18 closed the gap between "local model" and "production orchestration."
Before: local models were isolated. Good for single-turn generation, useless for multi-agent workflows with tool calling and session management.
After: local models are native OpenClaw providers. Same orchestration as Claude Code. Same tool calling. Same multi-agent spawns. No API cost. No data leaving your network.
The costs are real. Cold start (10-30s to load into VRAM). Lower quality than frontier models (GPT-OSS 20B isn't Opus 4.5, but it's 10x cheaper). Default Ollama has no auth, so production deployments need boundaries.
Where this fits:
- Code review agents where you're running hundreds of reviews a day and $1.50/review on Opus adds up
- Privacy-first customer support (HIPAA, GDPR) where no external API calls is a hard requirement
- Self-hosted research pipelines where you don't want API key management or rate limits
Where cloud still wins:
- User-facing chatbots where every request needs sub-200ms first token
- Low-volume prototyping where downloading 10GB of model weights isn't worth it
- Tasks where model quality is the bottleneck and Opus 4.5 genuinely outperforms
The point isn't that local replaced cloud. It's that you can now choose. Cloud API with great tooling, or local model with the same tooling. That wasn't true a week ago.
Sources:
Get new posts in your inbox
Architecture, performance, security. No spam.
Keep reading
The MCP vs CLI Debate Is Missing the Point
Everyone's arguing whether AI agents should use MCP or CLI tools. The answer depends on a question nobody's asking: does the model already know how to use the tool, or did your team build it last Tuesday?
What Claude Code Actually Chooses (And Why Tool Vendors Should Pay Attention)
Amplifying.ai ran 2,430 prompts against Claude Code and found it builds custom solutions in 12 of 20 categories. The tools it picks are becoming the default stack for a growing share of new projects.
AI Can't Audit Your Binaries Yet
The best AI model finds 49% of backdoors in compiled binaries. With a 22% false positive rate. Here's what that means for your supply chain security strategy.