Back to Blog

GPT-5.2 Proved Physicists Wrong. Here's Why Engineers Should Care.

OpenAI's GPT-5.2 conjectured a new formula in theoretical physics that humans missed for decades. A concrete data point on where AI reasoning actually stands.

AIResearchMachine LearningEngineeringOpenAI
February 14, 2026
6 min read

Physicists believed for decades that a specific class of particle interactions couldn't happen. Not a hunch — a proven result. The kind of argument that gets written into graduate QFT textbooks and stops being questioned.

This week, GPT-5.2 read the math, found a gap in the reasoning, and conjectured the formula that shows they were wrong.

That's the actual story. OpenAI published a preprint on arXiv — "Single-minus gluon tree amplitudes are nonzero" — co-authored by researchers from Harvard, Cambridge, Vanderbilt, and the Institute for Advanced Study. The AI didn't just assist. It found the closing formula. A scaffolded version then spent roughly 12 hours generating the formal proof. The team verified it analytically. They submitted it for publication.

I'm not here to hype it. I want to explain what actually happened and why the framing matters.

What gluons have to do with anything

Gluons carry the strong nuclear force — the thing holding protons together. When physicists calculate how likely two particles are to interact, they compute a "scattering amplitude." For gluons, these calculations are notoriously brutal, but they often collapse into surprisingly clean closed-form expressions at tree level (that's the approximation where you ignore quantum loop corrections and keep only the simplest diagrams).

One specific case had been treated as closed for a long time: the "single-minus" configuration, where one gluon has negative helicity and all the others are positive. Standard textbook derivations show the amplitude is zero. Everyone moved on.

The paper shows that's only true under generic momentum conditions. There's a specific slice of momentum space (the "half-collinear regime," where the gluon momenta travel at a precise collinear angle) where those standard derivations break down. And in that regime, the amplitude is not zero. It has a clean formula for arbitrary n.

GPT-5.2 Pro found that formula.

The actual sequence of events

The human collaborators computed the amplitudes by hand for specific cases up to n=6. Those Feynman-expanded expressions were a mess. Complexity growing superexponentially with n. Unmanageable. The kind of thing that defeats you after a while.

GPT-5.2 took those expressions, simplified them significantly, then spotted a pattern across the n=3 through n=6 cases. From that, it conjectured a closed-form formula valid for all n.

An internal scaffolded version of GPT-5.2 then independently worked through the problem from scratch, came up with the same formula, and produced a formal proof. The team verified it against the Berends-Giele recursion relation and the soft theorem (both standard validation methods in this area) and it held.

The result has already been extended from gluons to gravitons. More papers are coming.

This is not "AI helped with the writeup"

There's a lazy version of this story that sounds like: the AI tidied up some algebra. Sure. Mathematica does algebra. Pattern-matching on sequences isn't new.

But Nima Arkani-Hamed, arguably the most prominent theorist in this area and based at the Institute for Advanced Study, has been thinking about exactly this class of amplitudes for fifteen years. He called the AI's contribution "especially well-suited to exploit the power of modern AI tools." That's not a courtesy quote. He's been staring at these problems since before GPT existed.

The expressions the AI simplified weren't tedious arithmetic. They grew superexponentially in n. That means the n=6 case is incomparably messier than n=3. The AI simplified n=6 in the same session where it handled n=3. Then it looked at all four cases together and found the general rule.

The humans couldn't see the pattern until the AI showed them the simplified forms. The simplification enabled the insight — not the other way around.

That's a materially different kind of contribution than co-author who "helped draft Section 2."

What this means if you build software

I've spent nine years building distributed systems. When I read this paper, I didn't think about gluons. I thought about state space.

Physicists hit a wall at n=6 because the expressions became unmanageable for human working memory. The AI didn't hit that wall. It processed the n=6 case, held n=3 through n=5 simultaneously, and found the invariant.

Engineers hit the same wall, just differently. A system with 14 microservices has more failure mode interactions than any team can fully enumerate in code review. We compensate with testing, observability, runbooks, and the implicit knowledge that lives in senior engineers who've seen enough incidents. The coverage is always partial.

What the physics result is pointing at: when you give an AI system a formally defined problem space — precise rules, verifiable outputs, hard combinatorics — it doesn't have to approximate the state space the way humans do. It can hold more of it at once.

The question isn't whether that capability exists anymore. This paper answers that. The question is which engineering domains have the right structure: formal enough to define the space, difficult enough that human bandwidth is the real bottleneck.

Compiler performance work looks like that. Query planning looks like that. Security formal verification definitely looks like that.

The methodology is the template

One thing worth being explicit about: the team didn't take GPT-5.2's output on faith. The formula was verified using the Berends-Giele recursion relation and checked against the soft theorem. Standard validation machinery, applied to an AI-generated result.

That's the pattern going forward. The AI generates candidates. The human domain experts verify using established methods. Peer review stays. The trust model doesn't change. It's still "show your work, check the edge cases, submit the paper." What changes is where the first candidate comes from.

The 12 hours GPT-5.2 spent proving the formula is also notable. That's not a query. That's a sustained reasoning process operating over a structured mathematical domain. The humans who worked on this are researchers from Harvard, Cambridge, and IAS. The AI kept pace.

Where this sits in the broader trend

A few months ago, DeepMind's AlphaProof was solving International Math Olympiad problems at gold-medal level. That was formal theorem proving over a curated problem set. This is different — it's original research on an open problem in physics, with results being submitted for journal publication.

The gap between "solving problems we've seen before" and "contributing to problems nobody has solved" just got smaller. Not closed — but smaller.

If you're an engineer thinking about where to focus your next 12 months, the honest answer is: the domains where AI currently struggles most are the ones requiring fuzzy judgment and social context. The domains where AI is accelerating fastest are the ones requiring combinatorial depth over formally defined structures.

Physics and math are at one end of that spectrum. But formal systems (type theory, protocol verification, query planning) are closer to the math end than most engineers realize.

Watch the trajectory, not just the current position.

Share

Get new posts in your inbox

Architecture, performance, security. No spam.

Keep reading