MCP Server Benchmarks Are Asking the Wrong Question

A benchmark dropped two weeks ago that tested MCP server implementations across Java, Go, Node.js, and Python. 3.9 million requests. Three test rounds. Controlled Docker environments with 1 CPU core and 1 GB memory per server.

The results: Go and Java clocked sub-millisecond latencies (0.855ms and 0.835ms). Node.js hit 10.66ms. Python came in at 26.45ms. Go used 18 MB of memory versus Java's 220 MB. All four implementations achieved 0% error rates.

LinkedIn went predictable. "Go wins!" "Java is dead!" "Python shouldn't be in production!"

I've been running MCP servers in production for months. The benchmark data is solid. The conclusions most people are drawing from it? Wrong.

The benchmark is good. The framing is off.

Thiago Mendes at TM Dev Lab did serious work here. Three independent test rounds. Proper k6 load testing with 50 concurrent virtual users. Resource-constrained Docker containers that simulate real deployment limits. This isn't another hello-world throughput test.

But the benchmark optimizes for a scenario most MCP deployments will never face: sustained high-load, latency-critical traffic on a single tool server.

Here's what MCP servers actually do in production. They sit between an LLM and your data. The LLM calls a tool. The server fetches from a database, hits an API, or runs a computation. The LLM waits for the response, processes it, and maybe calls another tool.

The bottleneck is never the MCP server. It's the LLM inference time. A Claude or GPT response takes 500ms to 5 seconds. Your MCP server adding 10ms versus 0.8ms? That's noise. The user doesn't notice. The LLM doesn't care.

What actually matters in production MCP

After running MCP tool servers across multiple projects, the things that break production have nothing to do with sub-millisecond latency.

Cold start time kills interactive flows

When a user opens a chat and the agent needs a tool, the MCP server spins up. Java's JVM warmup means the first request pays a real tax. Go's static binary starts in milliseconds. Node.js is somewhere in between.

The benchmark didn't measure cold starts. In a serverless or scale-to-zero deployment (which is how most teams run MCP servers), cold start matters more than p99 latency on a warmed-up server.

Session management is the actual hard problem

The benchmark tested stateless request-response patterns. Real MCP deployments deal with session state. A user's conversation with an agent might involve 20-30 tool calls over minutes. Each call needs context from previous calls.

The Node.js benchmark used per-request server instantiation to mitigate CVE-2026-25536, which tanks its numbers. But that's the point: security requirements reshape your architecture. You can't just look at raw throughput and decide Node.js is 3x slower. It's 3x slower because it's doing something the other implementations aren't.

Memory matters, but not how you think

Go's 18 MB footprint versus Java's 220 MB is a real advantage. But not because of raw cost savings. It matters because MCP servers tend to be many and small.

A typical agent might use 5-15 different tool servers. A platform serving multiple agents might run dozens. At that scale, Go's memory efficiency means you can pack more tool servers onto fewer nodes. Java's 220 MB per server becomes a problem when you're running 30 of them.

The benchmark tested one server at a time. Production runs many.

Developer velocity trumps microseconds

Here's a number the benchmark can't measure: how long it takes your team to add a new tool.

Python with FastMCP:

python

@mcp.tool()
async def search_orders(customer_id: str, status: str = "all") -> dict:
    """Search orders by customer and status."""
    orders = await db.orders.find(customer_id=customer_id, status=status)
    return {"orders": [o.to_dict() for o in orders], "count": len(orders)}

Go with the official SDK:

server.AddTool("search_orders", mcp.Tool{
    Description: "Search orders by customer and status",
    InputSchema: map[string]interface{}{
        "type": "object",
        "properties": map[string]interface{}{
            "customer_id": map[string]interface{}{
                "type": "string",
                "description": "Customer identifier",
            },
            "status": map[string]interface{}{
                "type": "string",
                "description": "Order status filter",
                "enum": []string{"all", "pending", "shipped", "delivered"},
            },
        },
        "required": []string{"customer_id"},
    },
}, func(args map[string]interface{}) (*mcp.ToolResponse, error) {
    customerID := args["customer_id"].(string)
    status := args["status"].(string)
    orders, err := db.SearchOrders(customerID, status)
    if err != nil {
        return nil, err
    }
    result, _ := json.Marshal(map[string]interface{}{
        "orders": orders,
        "count":  len(orders),
    })
    return &mcp.ToolResponse{
        Content: []interface{}{
            map[string]interface{}{
                "type": "text",
                "text": string(result),
            },
        },
    }, nil
})

Same functionality. Python is 4 lines. Go is 30+. When your team needs to ship 20 tools in a sprint, that difference compounds. And in MCP world, you're constantly adding tools as your agent's capabilities grow.

The right question: what's your deployment topology?

Instead of "which language is fastest?", the useful question is "how are you deploying MCP servers?"

Single monolith server with many tools: Java or Go. You want one process handling everything efficiently. Java's JIT compilation pays off when the server stays warm. Go's low memory means you can give it more headroom.

Many small tool servers (microservice style): Go, period. 18 MB per server times 30 servers = 540 MB. Java at 220 MB times 30 = 6.6 GB just for JVM overhead. The math doesn't work.

Scale-to-zero / serverless: Go again. Fast cold starts plus minimal memory. Java with GraalVM native image is competitive here but adds build complexity.

Rapid prototyping and iteration: Python or Node.js. Ship tools fast, validate with users, rewrite the hot paths in Go later if you need to. Most MCP tools never get enough traffic to justify optimization.

Team has TypeScript everywhere: Node.js with shared instances (not per-request). The CVE-2026-25536 mitigation added 7ms of overhead per request. Shared instances with proper session isolation bring Node.js performance much closer to where it should be.

The numbers that actually matter

If I were building the next version of this benchmark, here's what I'd measure:

Metric	Why it matters
Cold start to first response	Real user-facing latency in serverless
Time to add a new tool	Developer productivity determines shipping speed
Memory at 10 concurrent tool servers	Real deployment density
Latency with actual I/O (database, HTTP)	MCP tools aren't computing Fibonacci
Recovery after crash	MCP sessions need reconnection handling
Streaming response latency	Progressive tool results improve UX

The Fibonacci and JSON processing benchmarks test runtime characteristics. But MCP tool servers spend 95% of their time waiting on I/O: database queries, API calls, file reads. The language runtime performance during that I/O wait is irrelevant.

My production stack

For what it's worth, here's what I actually run:

Go for high-traffic, stable tool servers that don't change often. Database query tools, search integrations, anything that handles real volume.

Node.js (TypeScript) for tool servers that change weekly. The type safety catches schema drift between the LLM's expectations and your tool's actual interface. When you're iterating fast, TypeScript's compiler is worth more than Go's raw speed.

Python only for tools that wrap ML models or data science libraries. The ecosystem advantage is real. Calling a pandas transformation or a scikit-learn model from Go requires CGo or a subprocess, and both are worse than just running Python.

Nothing in Java. Not because Java is bad (the benchmark proves it's excellent), but because my team doesn't have JVM expertise and the 220 MB baseline per server doesn't fit our deployment model.

The CVE-2026-25536 factor nobody's talking about

The Node.js benchmark numbers deserve a closer look because they expose a pattern that applies to every language.

CVE-2026-25536 is a session data leakage vulnerability in the Node.js MCP SDK. The fix: instantiate a new server per request instead of sharing one instance across connections. That's why Node.js shows 10ms latency instead of what would probably be 2-3ms with shared instances.

But here's the thing. Every MCP server implementation will eventually face security constraints that degrade performance. Java will need request-scoped dependency injection. Go will need mutex locks around shared state. Python already has the GIL limiting concurrency.

The benchmark tested one implementation with a security fix applied and three without equivalent constraints. That's not comparing languages. That's comparing security postures.

When you evaluate MCP server performance, ask: "What security constraints will my deployment require?" Then benchmark with those constraints applied. The raw numbers without security hardening are interesting but not actionable.

What the 0% error rate actually tells us

The most interesting finding in the benchmark got the least attention. All four implementations across 3.9 million requests had zero errors. Zero.

MCP as a protocol is solid. The Streamable HTTP transport works. The tool registration and invocation patterns are reliable regardless of implementation language. This is the real takeaway for anyone evaluating MCP for production: the protocol isn't the risk. Your architecture decisions are.

Pick the language your team knows. Deploy in the topology that fits your infrastructure. Optimize the tool that actually shows up in your latency traces (it won't be the MCP server).

The benchmark answered "which runtime is fastest?" with solid data. But for most teams shipping MCP-powered agents, that was never the question that needed answering.

Sources

TM Dev Lab MCP Server Benchmark by Thiago Mendes, Feb 11, 2026
Model Context Protocol Specification by Anthropic
CVE-2026-25536 - Node.js MCP SDK session isolation vulnerability
MCP Go SDK v1.2.0
Spring AI MCP Server - Spring Boot 4.0.0 / Spring AI 2.0.0-M2