Ollama Just Became an OpenClaw Provider

Five days ago, local AI stopped being a hobbyist thing. You probably missed it.

Ollama 0.18 shipped with native OpenClaw integration. Not a hack. Not a custom baseUrl workaround. A real auth provider that slots into OpenClaw's onboarding flow next to Anthropic and OpenAI.

What changed: local models now get the same orchestration that made Claude Code useful. Tool calling. Multi-agent session spawns. Permission boundaries. Context management. No API cost. No data leaving your network.

I've shipped both local model deployments and cloud orchestration at production scale. The gap between "runs on my laptop" and "runs my workflow" has always been the permission model. Ollama 0.18 closes it.

What you need

Ollama 0.18 or later (shipped Mar 14, 2026)
OpenClaw 2026.3.7+ (native Ollama provider support)
16GB+ RAM for local models (8GB works for smaller ones like glm-4.7-flash)
Optional: Ollama cloud account for hybrid local+cloud mode

bash

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Check the version
ollama --version  # Should show 0.18.0+

# Install OpenClaw if you haven't
npm install -g openclaw

# Confirm Ollama shows as a provider
openclaw providers list | grep ollama

Why native beats the old `/v1` hack

Before 0.18, connecting local models to OpenClaw meant manual config edits:

json

{
  "providers": [
    {
      "id": "ollama-custom",
      "baseUrl": "http://localhost:11434/v1",
      "apiKey": "not-needed"
    }
  ]
}

This sort of worked. Tool calling broke, though. OpenClaw expected OpenAI-compatible /v1 responses, but Ollama's /v1 mode is "best effort" compatibility. It outputs raw JSON instead of structured tool_calls objects.

What that looks like in practice: your agent sees "call the read_file tool" as a text string, not an executable action. No file operations. No git commands. No web search. Just text generation pretending to be agentic.

The native integration uses Ollama's actual API (/api/chat), which has real tool support. Your local models can now run the same skills Claude Code uses (file ops, shell commands, browser automation) without cloud dependencies.

I've debugged enough "works in dev, breaks in prod" issues to spot a protocol mismatch. OpenAI compat mode optimizes for "works with existing code." Native APIs optimize for "works correctly." Different goals.

Step by step

1. Onboard with Ollama

bash

openclaw onboard --auth-choice ollama

OpenClaw will:

Auto-discover local models via Ollama's /api/tags endpoint
Ask you to choose Cloud + Local or Local-only mode
Open a browser for OAuth if you pick cloud mode
Pull your selected model if it's not cached
Write config to ~/.openclaw/openclaw.json

Here's what that looks like:

typescript

✓ Discovered 3 local models:
  - llama3.3:70b (context: 128K)
  - qwen2.5:27b (context: 32K)
  - glm-4.7-flash:latest (context: 8K)

Choose mode:
  1. Cloud + Local (recommended)
  2. Local-only (no external API calls)

Selection: 1

Recommended cloud model: kimi-k2.5:cloud (fast, good for agentic tasks)
Recommended local model: glm-4.7-flash (7B params, fits 16GB RAM)

Pulling glm-4.7-flash... ████████████████ 100%
✓ OpenClaw configured with Ollama provider

2. Verify tool calling works

Create a test file:

bash

echo "Test content for AI agent" > /tmp/test-file.txt

Launch OpenClaw and ask it to read the file:

bash

openclaw launch

typescript

You: Read /tmp/test-file.txt and tell me what's in it.

If tool calling works (native API):

typescript

Agent: [reads file using read tool]
The file contains: "Test content for AI agent"

If tool calling is broken (/v1 mode):

typescript

Agent: I cannot read files directly, but you can use the `cat` command...

That's how you know. Native Ollama API returns tool_calls as a structured field. The /v1 compat mode buries it as text inside the completion.

3. Spawn a multi-agent workflow

This is the real unlock. Multi-agent orchestration on local models.

bash

openclaw launch --model qwen2.5:27b

typescript

You: Spawn a research agent to find the top 3 GitHub repos trending this
week in backend development. Then spawn a summarizer to write a
one-paragraph summary of each.

OpenClaw spawns a research subagent (uses web search or GitHub API tools), waits for it to finish, spawns a summarizer with the research results as context, and returns aggregated output.

Before 0.18, local models were isolated. You could run ollama run llama3.3 for a single conversation, but multi-agent orchestration required custom Python scripts or framework-specific glue code. Now Ollama models are native OpenClaw providers. Subagent spawns work. Context passing works. Tool calling works.

The same patterns that work with Claude Code work with local Qwen or GLM models.

4. Try hybrid mode

If you chose Cloud + Local during onboarding:

bash

# Fast cloud model (low latency, pay-per-token)
openclaw launch --model kimi-k2.5:cloud

# Private local model (slower first run, zero cost)
openclaw launch --model qwen2.5:27b

Performance from the 0.18 release notes: Kimi-K2.5 runs 2x faster than 0.17. MiniMax-M2.5 got up to 10x faster time-to-first-token (under 1s now). Cloud models don't require ollama pull at all, so there's zero local storage.

The trade-off is cold start. Cloud models are always warm (instant first token). Local models need 5-30 seconds to load weights into VRAM on the first inference. After that, they're fast.

I ran into the same pattern designing edge vs origin caching for CDN deployments. Cold start kills user experience. Cloud models outsource that cost to Ollama's infrastructure. Local models make you eat it.

Use cloud for user-facing chatbots where latency matters on every request. Use local for background jobs, code review agents, or anything touching sensitive data.

The pitfalls that will bite you

The `/v1` trap

Don't use http://localhost:11434/v1 with OpenClaw.

/v1 is OpenAI-compatible mode, designed for drop-in replacement in existing OpenAI code. Tool calling in this mode is approximate. It outputs JSON strings, not structured tool calls. OpenClaw needs the structured format.

Your agent will refuse to use tools and suggest you run shell commands manually. That's the tell.

Fix: let OpenClaw use the native Ollama provider. It calls /api/chat automatically.

Silent cloud fallback (the privacy bug)

This one is ugly. GitHub Issue #43945:

Main session uses Ollama (local models, no cloud API calls)
You spawn a subagent
Subagent can't authenticate with Ollama
Subagent silently falls back to gpt-4.1-mini (cloud model in the fallback chain)
Your private data just hit OpenAI's API. No warning.

Why it happens: OpenClaw's auth system treats all API keys as secrets. Secrets go into auth-profiles.json, which subagents read. But Ollama's OLLAMA_API_KEY="ollama-local" isn't a real key. It's a feature flag. OpenClaw classifies it as a "marker" and skips writing it to auth profiles.

Main session reads from openclaw.json (works). Subagents read from auth-profiles.json (empty). They get a 401, hit the fallback chain, and land on a cloud provider.

Users who chose local Ollama for data sovereignty (medical records, proprietary code, PII) just had their trust violated. Silently.

Workaround: set OPENCLAW_NO_FALLBACK=1 in your environment. This blocks cloud fallback and makes auth failures loud instead of silent.

The real fix needs to happen at the auth layer. It's the confused deputy problem. OpenClaw's auth pipeline assumes credentials are secrets. Local providers use permission markers, not secrets. Those are different primitives, and the system doesn't distinguish them yet.

"Local = fast" (it isn't, at first)

Cold start on a 70B model: 10-30 seconds (loading weights from disk into VRAM). After that, inference runs at ~50 tokens/s on a 4090.

Cloud models: 50-200ms to first token. Always.

Local wins for background jobs (cold start doesn't matter), long conversations (amortize the startup cost), and batch processing (load once, process thousands of items).

Cloud wins for user-facing chatbots, bursty workloads (sporadic requests mean cold start every time), and small tasks (loading a 70B model to answer "what's 2+2" is wasteful).

I got burned by this assumption with serverless cold starts years ago. First request is always the killer. Local models have the exact same problem.

Going further

Cloud models with the `:cloud` tag

Ollama's cloud models skip the download entirely. Append :cloud to the model name:

bash

ollama run kimi-k2.5:cloud
ollama run minimax-m2.5:cloud
ollama run glm-5:cloud

Your ollama CLI is a client to Ollama's cloud inference API. Models stay on their servers. You pay per token. Auth requires ollama signin or the OLLAMA_API_KEY env var.

Good for prototyping (no 10GB downloads), machines without GPUs, or when you need fast time-to-first-token without managing infrastructure.

Bad for data sovereignty (data leaves your network), cost-sensitive batch jobs (local is free after hardware), or high-throughput workloads (local + batching wins on per-token cost).

Mixing local and cloud in one workflow

This is where it gets interesting. You can split a workflow by sensitivity.

Example, a code review agent:

Planner (cloud): fast model analyzes the PR and generates a review plan. Metadata only, no actual code.
Reviewer (local): reads the source code, runs static analysis, generates comments. Sensitive code never leaves your network.
Summarizer (cloud): aggregates review comments into a readable summary. Metadata again, no code.

I'd use this pattern for HIPAA-compliant chatbots or financial data analysis. Orchestration logic and metadata can hit cloud APIs (fast, cheap). Sensitive data stays local (compliance win).

You're trading complexity for control. Single-model workflows are simpler. Hybrid gives you cloud latency + local privacy, but now you're running two environments.

Lock down the permissions

Local models with tool calling means shell access.

If Ollama is your auth boundary, and Ollama has no auth by default (localhost:11434 is wide open), any process on your machine can spawn agents with arbitrary tool access.

Fine for your personal laptop. Not fine for shared dev environments or self-hosted OpenClaw deployments.

What you need:

Firewall Ollama to 127.0.0.1 only, or isolate it on a VPN subnet
Put auth in front of it (HTTP Basic Auth or SSO via a reverse proxy like Caddy or nginx, since Ollama doesn't ship with auth)
Use OpenClaw's approval system for dangerous tools (require manual approval for exec)

I've seen too many "misconfigured internal API → data exfiltration" incidents to skip this section.

Alibaba published a security audit of Ollama in February. Their conclusion: "A misconfigured Ollama instance exposed to internal networks can become an attack surface, a data exfiltration vector, or a compliance liability."

Treat Ollama like you'd treat a database. Sensitive APIs need auth, even on internal networks.

What to do with this

Ollama 0.18 closed the gap between "local model" and "production orchestration."

Before: local models were isolated. Good for single-turn generation, useless for multi-agent workflows with tool calling and session management.

After: local models are native OpenClaw providers. Same orchestration as Claude Code. Same tool calling. Same multi-agent spawns. No API cost. No data leaving your network.

The costs are real. Cold start (10-30s to load into VRAM). Lower quality than frontier models (GPT-OSS 20B isn't Opus 4.5, but it's 10x cheaper). Default Ollama has no auth, so production deployments need boundaries.

Where this fits:

Code review agents where you're running hundreds of reviews a day and $1.50/review on Opus adds up
Privacy-first customer support (HIPAA, GDPR) where no external API calls is a hard requirement
Self-hosted research pipelines where you don't want API key management or rate limits

Where cloud still wins:

User-facing chatbots where every request needs sub-200ms first token
Low-volume prototyping where downloading 10GB of model weights isn't worth it
Tasks where model quality is the bottleneck and Opus 4.5 genuinely outperforms

The point isn't that local replaced cloud. It's that you can now choose. Cloud API with great tooling, or local model with the same tooling. That wasn't true a week ago.

Sources:

Ollama Just Became an OpenClaw Provider

What you need

Why native beats the old `/v1` hack

Step by step

1. Onboard with Ollama

2. Verify tool calling works

3. Spawn a multi-agent workflow

4. Try hybrid mode

The pitfalls that will bite you

The `/v1` trap

Silent cloud fallback (the privacy bug)

"Local = fast" (it isn't, at first)

Going further

Cloud models with the `:cloud` tag

Mixing local and cloud in one workflow

Lock down the permissions

What to do with this

Get new posts in your inbox

Keep reading

The MCP vs CLI Debate Is Missing the Point

What Claude Code Actually Chooses (And Why Tool Vendors Should Pay Attention)

AI Can't Audit Your Binaries Yet

What you need

Why native beats the old /v1 hack

Step by step

1. Onboard with Ollama

2. Verify tool calling works

3. Spawn a multi-agent workflow

4. Try hybrid mode

The pitfalls that will bite you

The /v1 trap

Silent cloud fallback (the privacy bug)

"Local = fast" (it isn't, at first)

Going further

Cloud models with the :cloud tag

Mixing local and cloud in one workflow

Lock down the permissions

What to do with this

Get new posts in your inbox

Keep reading

The MCP vs CLI Debate Is Missing the Point

What Claude Code Actually Chooses (And Why Tool Vendors Should Pay Attention)

AI Can't Audit Your Binaries Yet

Why native beats the old `/v1` hack

The `/v1` trap

Cloud models with the `:cloud` tag