Back to Blog
McKinsey's AI Got Hacked by an AI. The Vulnerability Was From 1998.

McKinsey's AI Got Hacked by an AI. The Vulnerability Was From 1998.

An autonomous AI agent breached McKinsey's internal AI platform in two hours. No credentials. No insider access. The entry point was SQL injection through JSON field names, a bug class older than most junior developers.

SecurityAIEnterprise
March 11, 2026
9 min read

An autonomous AI agent picked its own target, found its own way in, and compromised the production database of the world's most prestigious consulting firm. Two hours. No credentials. No human guidance after the initial "go."

The target was Lilli, McKinsey's internal AI platform. 43,000 consultants use it daily. It processes 500,000 prompts a month. It sits on top of decades of proprietary research, client strategy documents, M&A analysis, and financial models.

The entry point was SQL injection. Not some novel AI-specific attack. SQL injection. The same bug class that's been in the OWASP Top 10 since the list existed.

What Lilli actually is

McKinsey launched Lilli in 2023, named after the first woman the firm hired professionally, back in 1945. It's a purpose-built AI platform: chat, document analysis, RAG over 100,000+ internal documents, search across the firm's entire intellectual output. Over 70% of McKinsey's workforce adopted it.

That adoption rate matters. When your AI platform becomes the default way 43,000 people discuss client strategy, the data inside it becomes more valuable than most things in your infrastructure. Every M&A conversation, every competitive analysis, every staffing discussion, every financial model someone asked Lilli to help with. All of it, in one place.

The way in

CodeWall, a security startup building autonomous offensive agents, pointed their system at McKinsey's public-facing infrastructure. The agent found API documentation exposed publicly. Over 200 endpoints, fully documented. Most required authentication.

Twenty-two didn't.

One of those unprotected endpoints handled user search queries. The values were parameterized properly. But the JSON keys (the field names in the request body) were concatenated directly into SQL. That's a distinction most security scanners miss entirely. OWASP ZAP didn't flag it. The parameterized values looked clean. But the keys weren't sanitized at all.

The agent figured this out by watching error messages. JSON keys sent in the request body showed up verbatim in database errors. That's the tell. From there, it ran fifteen blind iterations, each error message leaking a bit more about the query structure, until actual production data started coming back.

I want to sit with that for a second. The parameterized values gave the code a false sense of security. Someone looked at this code and thought "we're handling injection properly" because the values were bound correctly. They missed the keys. That's not a failure of knowledge. It's a failure of threat modeling. Nobody asked: "What happens when the field names themselves are attacker-controlled?"

What was inside

46.5 million chat messages. Plaintext. No encryption at rest for the message content. From a workforce discussing strategy, M&A activity, client engagements, competitive intelligence, and financials.

728,000 files. 192,000 PDFs. 93,000 spreadsheets. 93,000 PowerPoints. 58,000 Word documents. The filenames alone were sensitive, and each had a direct download URL.

57,000 user accounts. Every single employee on the platform.

384,000 AI assistants and 94,000 workspaces. The full organizational structure of how the firm uses AI internally.

3.68 million RAG document chunks, with S3 storage paths and internal file metadata. Decades of proprietary McKinsey frameworks, methodologies, and research. The intellectual property that justifies $500/hour billing rates.

And 95 system prompt configurations across 12 model types.

The prompt layer is the real story

Reading the data was bad. But the SQL injection wasn't read-only.

Lilli's system prompts were stored in the same database the agent could write to. These prompts controlled everything: how Lilli answered questions, what guardrails it followed, how it cited sources, what it refused to discuss.

One UPDATE statement. One HTTP request. No code deployment. No config change. No CI/CD pipeline. An attacker could rewrite what Lilli tells 43,000 consultants, and nobody would see it happen.

Think about what that means in practice:

A consultant asks Lilli to review a merger valuation. The poisoned prompt subtly inflates the projected cost savings by 15%. The consultant trusts the output because it came from their own internal tool. The recommendation goes to the client's board.

Or the prompt gets rewritten to exfiltrate data. Every time someone asks about Project X, Lilli includes a summary of the most recent confidential findings in its response. The consultant copies that response into an email to the client. The data walks out the door in plain sight.

Unlike a compromised server, a modified prompt leaves no log trail. No file changes. No process anomalies. No alerts. The AI just starts behaving differently, and nobody notices until the damage surfaces weeks or months later in a bad deal, a leaked strategy, or a regulatory filing that doesn't add up.

I've been thinking about this pattern since writing about the Clinejection attack last week, where prompt injection via a GitHub issue title led to a supply chain compromise. The surface is different. One was CI/CD, the other is enterprise AI. But the failure is the same: prompts are treated as configuration, not as code. They live in databases without access controls, version history, or integrity monitoring.

Why scanners didn't catch it

OWASP ZAP missed this. McKinsey's own internal scanners missed it. That detail from the CodeWall disclosure is worth examining.

Most SQL injection scanners test parameter values. They'll throw a single quote into a form field and watch what happens. That covers 95% of injection vectors. But JSON key injection is a different shape. The scanner sees a well-formed JSON body with properly parameterized values and moves on.

The AI agent found it because it doesn't follow a checklist. It probed, observed error messages, noticed the reflection pattern in the key names, and iterated. Fifteen rounds of blind probing, adjusting based on what each error revealed. That's closer to how a skilled human pentester works than how a scanner works.

This is the uncomfortable part of the story. The same technology that created the attack surface (an AI platform with writable prompts and insufficient access controls) is also better at finding these bugs than the tools we've relied on for twenty years. Your scanner is testing for the vulnerabilities it knows about. An autonomous agent is testing for the vulnerabilities that exist.

The org failure underneath

The HN discussion surfaced something I think matters more than the technical vulnerability. Multiple people claiming to be current or former McKinsey employees described the same pattern: Lilli started as an internal-only tool behind VPN and SSO. Then a senior partner pushed to make it externally accessible. By that time, the original engineering team had "rolled off" to client projects, because McKinsey's incentive structure punishes internal work and rewards client-facing impact.

So the platform that became externally accessible wasn't being maintained by the people who built it. It was maintained by whoever was available, staffed there because they couldn't get placed on client work.

This is the part that should worry you if you're running an engineering org. The vulnerability wasn't just a missing authorization check. It was an incentive structure that treats internal platforms as second-class work, combined with a leadership decision to expand access without expanding the security posture. That's not a McKinsey problem. I've seen this exact pattern at three different companies in the last two years.

What your AI platform probably has wrong

After writing about Clinejection and now this, I keep finding the same gaps. Here's what I'd check if I inherited an internal AI platform tomorrow:

Audit every endpoint, not just the authenticated ones. Lilli had 200+ endpoints documented publicly. Twenty-two had no auth. How many of your internal AI platform's endpoints have you actually tested without credentials?

Treat JSON keys as untrusted input. If your ORM or query builder interpolates field names from user-supplied JSON, you have the same bug McKinsey had. Whitelist the allowed field names. Don't concatenate them.

Separate prompt storage from data storage. If your system prompts live in the same database your application queries, a single SQL injection gives an attacker control over your AI's behavior. Prompts should live in a separate store with its own access controls, versioning, and integrity checks.

Version and hash your prompts. Every prompt change should be tracked, diffable, and auditable. If a prompt changes without a corresponding commit or deployment, that's an alert.

Monitor prompt behavior, not just prompt content. Even if you're watching for direct prompt modifications, an attacker could achieve similar effects through the RAG layer. If your AI starts giving subtly different advice on financial models, you want to catch that before a client acts on it. Behavioral monitoring (output drift detection) is harder than integrity monitoring, but it's the only thing that catches the sophisticated attacks.

Internal tools need external-grade security reviews when they become externally accessible. This sounds obvious. But McKinsey, a firm that requires external pentesting before launching anything to a small group of coworkers, apparently didn't apply that same standard when Lilli went public. The policy existed. The enforcement didn't.

McKinsey's response, to their credit

One thing I will say: McKinsey's response was fast. CodeWall disclosed on March 1. McKinsey's CISO acknowledged the same day. By March 2, all unauthenticated endpoints were patched, the development environment was taken offline, and public API documentation was blocked.

Compare that to the Clinejection timeline: five weeks of silence, a partial credential rotation that missed the exposed token, and a supply chain compromise that happened because the patch didn't actually close the hole.

McKinsey's incident response was competent. Their architecture wasn't.

The uncomfortable question

The agent that found this vulnerability is essentially the same technology as the AI platform it compromised. Both are LLMs reasoning about structured data, making decisions, executing actions. One builds, one breaks. The difference is intent and configuration.

We're in a period where every enterprise is deploying AI platforms internally. Most are doing it the way McKinsey did: fast, with existing security tooling, using the same infrastructure patterns they'd use for any web application. And most of those platforms have the same gaps. Not because the teams are incompetent, but because the threat model for AI platforms is genuinely different from the threat model for traditional web apps.

The prompt layer is a new attack surface. RAG poisoning is a new attack surface. And autonomous agents that can probe, chain, and escalate without human guidance are a new threat actor class.

The bug was 25 years old. The attack surface is brand new. That gap is where the next wave of breaches lives.

Share

Get new posts in your inbox

Architecture, performance, security. No spam.

Keep reading