The transformer architecture that powers ChatGPT, Gemini, and Claude may be approaching its limits - and the people best positioned to know are betting billions on what replaces it. Photo: Unsplash
Two announcements landed this week that, read together, tell a story most AI coverage is missing. On March 10, Thinking Machines Lab - the startup founded by former OpenAI COO Mira Murati - announced a "long-term gigawatt-scale strategic partnership" with Nvidia to power its AI model training. The same day, Yann LeCun's Paris-based startup Advance Machine Intelligence confirmed it had raised $1 billion to build what LeCun calls "AI world models."
Neither story is about incremental progress. Both are bets - large, well-funded, publicly committed bets - that the dominant paradigm in artificial intelligence, the transformer architecture that underpins every major language model from ChatGPT to Claude to Gemini, is not the endpoint. It is a waystation.
The people making these bets are not outsiders or contrarians. Murati spent six years at OpenAI, including as chief technology officer and acting CEO during the board crisis of November 2023. LeCun is a Turing Award winner, the architect of convolutional neural networks that enable computer vision, and spent a decade as chief AI scientist at Meta. When they say the current approach has a ceiling, that judgment carries weight that a thousand hot takes cannot replicate.
What a Gigawatt Actually Means
The phrase "gigawatt-scale strategic partnership" in Thinking Machines Lab's announcement is easy to skim past. It shouldn't be. A gigawatt of computing power, in the context of AI training infrastructure, represents a number so large it has no historical precedent in the private sector.
To calibrate: the largest single AI training cluster Nvidia has shipped to date - the kind deployed by hyperscalers like Microsoft and Google - consumes roughly 100 to 200 megawatts. A gigawatt-scale deployment is five to ten times larger. At current GPU efficiency figures, one gigawatt of sustained AI compute corresponds to somewhere between 300,000 and 500,000 H100-equivalent GPUs running simultaneously. [The Verge, March 10, 2026]
No startup has ever operated at this scale. The implication is that TML is not planning to be a startup for long. Murati is building infrastructure that rivals or exceeds what OpenAI currently operates - from scratch, without a decade of Microsoft backing, and less than 18 months after founding the company.
The Nvidia partnership is the mechanism. Nvidia doesn't announce "long-term gigawatt-scale" commitments as marketing copy. Those words describe an actual capital allocation: GPU production reserved, data center capacity pre-committed, a multi-year supply chain locked in. The companies publishing AI benchmarks while renting spot compute on AWS are playing a different game entirely.
"The news comes after multiple founding members of the startup left for OpenAI at the same time earlier this year." - The Verge, March 10, 2026, on Thinking Machines Lab's challenges before this announcement
That context matters. TML had a rough patch when founding engineers returned to OpenAI - a visible sign of internal turbulence for a company that had barely gotten started. The Nvidia deal reads, in part, as a statement of stabilization. Murati is not running a talent experiment. She is building a machine.
LeCun's World Model Thesis
LeCun has spent years arguing that the path to human-level intelligence runs through "world models" - systems that understand physics, causality, and persistent reality, not just token prediction. Photo: Unsplash
Yann LeCun has been making the same argument for several years, in public, to anyone who will listen. Language models, he says, are impressive but fundamentally limited. They predict the next token in a sequence. They do not model the world. They have no persistent understanding of physics, causality, space, time, or consequences. They are autocomplete at civilizational scale - useful, but not intelligent in any meaningful sense.
His alternative: "world models." Systems that build and maintain an internal representation of reality, update it continuously as new information arrives, and use it to reason about what will happen if certain actions are taken. The difference between a language model and a world model is roughly the difference between a very good librarian and a person who actually understands the books.
Advance Machine Intelligence - AMI - is where LeCun is building his version of this. The Paris location is not incidental. France has invested heavily in AI research infrastructure, and LeCun has long argued that the concentration of AI development in a handful of American companies is both geopolitically dangerous and intellectually limiting. A billion-dollar raise in Paris is a statement about where frontier research can happen.
The $1 billion figure is substantial but not extraordinary by 2026 standards. What matters is what it signals about investor conviction. World model research is genuinely hard, genuinely speculative, and genuinely unpopular among the benchmarking community that judges AI progress on metrics that happen to favor transformer models. Raising a billion dollars for work that explicitly rejects the dominant paradigm requires backers who believe the dominant paradigm is wrong.
What Is a "World Model"?
- Language models (GPT-4, Claude, Gemini): predict the next token based on patterns in training data. No persistent memory. No physical reasoning. No understanding of consequences.
- World models (what LeCun is building): maintain an internal simulation of physical and social reality. Reason about causality: "if I do X, Y will happen." Update continuously from new observations.
- The analogy: A language model reads every physics textbook ever written. A world model can drop a ball and predict where it lands.
- Why it matters: Robotics, autonomous vehicles, scientific discovery, and strategic planning all require world models. They cannot be solved with autocomplete, no matter how sophisticated.
The Exodus From Big AI Labs
Murati and LeCun are not alone. The past 18 months have seen a significant exodus of senior AI talent from the major labs - OpenAI, Google DeepMind, Meta AI, Anthropic - toward independent research and new ventures. The pattern is consistent enough to be a trend, not a coincidence.
Some of this is standard entrepreneurship: people with domain expertise and name recognition using a hot market to raise capital and start companies. But the departures tell a more specific story when you look at who is leaving and what they say publicly about why.
Ilya Sutskever, OpenAI's former chief scientist and one of the architects of the transformer-scaling approach, left in 2024 to found Safe Superintelligence - a company explicitly organized around the question of what happens when AI systems exceed human-level capability. He didn't leave to build better chatbots. He left because he believes something qualitatively different is coming and that the existing labs are not prepared for it.
Murati's departure from OpenAI was more abrupt. She stepped down in September 2024, just months after the board crisis that nearly dissolved the company. She has been careful about what she says publicly, but Thinking Machines Lab's positioning - a focus on "multimodal" AI that integrates text, vision, and other modalities into coherent reasoning - points toward the same critique LeCun makes. Tokens are not enough.
Dario and Daniela Amodei left OpenAI years earlier to found Anthropic, though their departure was more about safety philosophy than architectural disagreement. Still, the pattern holds: the people with the deepest knowledge of these systems' capabilities and limitations are not staying put and scaling up. They are leaving and building differently.
Why Nvidia Is on Both Sides
Nvidia's H100 and B200 GPUs power both the current transformer era and whatever comes next. The company is uniquely positioned to profit regardless of which architecture wins. Photo: Unsplash
Here is the underreported story inside the Thinking Machines Lab announcement: Nvidia committed to a gigawatt-scale partnership with a company that is explicitly building something different from what powers Nvidia's current revenue. The H100s and B200s that generate most of Nvidia's profit are optimized for transformer training. If world models or some other post-transformer architecture becomes dominant, those optimization choices might not transfer.
Nvidia is betting anyway. This is not surprising - the company has a long history of backing multiple architectural bets simultaneously - but it is instructive. When Nvidia says "long-term gigawatt-scale," it is not just selling GPUs to TML. It is signaling to the market that TML's approach is credible enough to warrant committed infrastructure. Nvidia has better information about where AI compute is heading than almost anyone else. Its endorsement is not nothing.
The second-order effect is competitive pressure on OpenAI, Google, and Anthropic. All three are currently locked into transformer scaling as their primary bet. Their investors, their infrastructure, their research pipelines, and their product roadmaps are built around the assumption that the next frontier model is a bigger, better transformer. If Murati and LeCun are right - if world models or some other paradigm genuinely outperforms transformers on real-world reasoning tasks - then the three most-valued AI companies in the world are running the wrong race.
They know this. Google has its own world model research. Meta AI - despite LeCun's departure - continues working on embodied AI. OpenAI has robotics efforts that implicitly require better physical reasoning than current LLMs provide. The transformer is not standing still. But it is also not the only game in play, and the people with the most to gain from dethroning it are now well-funded and infrastructure-equipped.
The Timeline of Defection
OpenAI board crisis. Sam Altman briefly fired, reinstated within days. The chaos accelerates internal questioning about the company's direction and governance.
Ilya Sutskever departs OpenAI. Co-founder and chief scientist leaves to found Safe Superintelligence, signaling concern about what post-AGI AI looks like - not just what it can do today.
Mira Murati leaves OpenAI. Former CTO, acting CEO during the board crisis, exits without explanation. Announces Thinking Machines Lab weeks later.
Yann LeCun formally exits Meta AI chief role. Begins focusing full-time on Advance Machine Intelligence and publicly intensifies criticism of LLM-centric approaches to AI development.
TML founding members return to OpenAI. Visible sign of internal turbulence at Murati's startup. A narrative of instability begins forming around TML.
Dual announcements reshape the narrative. TML announces gigawatt-scale Nvidia partnership. AMI confirms $1 billion raise for world model research. The post-transformer AI industry has, in effect, declared itself open for business.
What "World Models" Would Actually Change
It is worth being specific about what changes if world models succeed, because the implications extend far beyond AI benchmarks.
Current language models are fundamentally limited in tasks that require understanding consequences over time. Ask ChatGPT to write a story - fine. Ask it to design a reliable logistics system that adapts to supply chain disruptions - the model can describe what such a system should look like, but it cannot simulate how it would actually behave under novel conditions it hasn't seen in training data. It cannot model the world. It can only describe worlds it has read about.
World models change this. A system that maintains an internal simulation of physical and social dynamics can reason about unseen scenarios by running mental simulations. It can say: "If we redirect this shipment, it will arrive 48 hours late, which will trigger this contractual penalty, which will affect the balance sheet this way." Not because it has read about that exact situation - because it actually understands the causal relationships involved.
The implications for robotics are immediate. Current AI-powered robots fail constantly in novel physical environments because they have no internal model of how objects behave when moved, stacked, or dropped. A world model-equipped robot can handle a box it has never seen before because it understands that boxes have weight, inertia, and fragility. It does not need a training example for every possible box.
Scientific discovery is the bigger opportunity. Drug discovery, materials science, and climate modeling all require reasoning about physical systems in ways that current LLMs cannot reliably perform. They can retrieve and repackage existing research with impressive fluency. They cannot design a novel protein with specific folding properties from first principles, because that requires simulating molecular physics - not predicting tokens. World models, in theory, can. [Nature, various; DeepMind AlphaFold work as prior art context]
The Risk of Being Early
Building a paradigm-shifting AI architecture before the dominant paradigm is exhausted is a high-risk bet - LeCun and Murati are making it anyway. Photo: Unsplash
LeCun has been making the world model argument in public since at least 2022. The transformer, in that time, has gotten dramatically more capable. GPT-4 gave way to o1 and o3-level reasoning models. Gemini 2.0 and Claude Sonnet 4 outperform earlier "expert-level" benchmarks routinely. Every time the transformer appeared to be hitting a ceiling, a new scaling regime or inference technique pushed it higher.
The risk for LeCun and Murati is that they are early. Not wrong, but early - which in technology investment terms often amounts to the same thing. Being early means the current paradigm keeps improving, keeps winning product benchmarks, keeps capturing enterprise contracts, while your paradigm-shifting work remains in research mode. Early can mean you run out of money before the world catches up to your vision.
The billion-dollar raise and the gigawatt partnership are, in part, answers to that risk. AMI and TML are not doing research in a garage. They are building infrastructure that can compete with the incumbents directly, rather than waiting for the incumbents to exhaust themselves. The bet is not just that transformers have a ceiling - it is that the ceiling will be visible within a few years, and the first team with credible alternative infrastructure will capture the replacement market.
There is historical precedent for this working. Google built its search infrastructure while Yahoo was still dominant. Amazon Web Services launched while enterprise IT was still deploying physical servers. The timing was not that these companies caught their competitors failing - it was that they were ready with alternatives when the limitation became undeniable.
Murati and LeCun are making the same calculated bet: that transformer AI will hit a wall that matters to enterprises and consumers, that this will happen within their capital runway, and that they will be standing there with something better already running at scale. The gigawatt agreement and the billion-dollar raise are both, at their core, bets on timing.
The State of the Race - March 2026
The week that TML and AMI made their announcements also saw Microsoft integrate Anthropic's Claude into Copilot via a "Cowork" feature for long-running multi-step tasks. [The Verge, March 9, 2026] Anthropic's Claude Code Review launched in research preview for enterprise customers, using parallel agents to catch bugs that human reviewers miss. [The Verge, March 9, 2026] Google expanded Gemini in Chrome to Canada, New Zealand, and India, with support for 50-plus languages. [The Verge, March 10, 2026]
None of these are world models. All of them are transformer models getting better at tasks they were already good at - natural language processing, code review, question answering. They are impressive. They are also exactly the kind of improvement that LeCun's critique predicted: transformers getting incrementally better at things tokens are good for, while remaining unable to do things that require genuine world understanding.
The companies racing to deploy transformer improvements are not wrong to do so. There is massive commercial value in better LLMs. But the race has a structure: Microsoft, Google, Amazon, and Anthropic are competing to optimize the current paradigm, while Murati and LeCun are spending gigawatts and billions to make that race obsolete.
One of those bets will prove more correct. We will know which one within roughly three to five years - the timescale at which gigawatt infrastructure and billion-dollar research programs either produce paradigm shifts or are quietly absorbed into the companies they were meant to disrupt.
The people making the post-transformer bets are not naive. They ran the labs that built the transformer era. They know better than anyone what it can and cannot do. When they spend this kind of money on an alternative, the only rational response is to take the bet seriously - and watch the race with considerably more attention than the weekly benchmark releases usually receive.
Key Players in the Post-Transformer Race
- Mira Murati / Thinking Machines Lab: Ex-OpenAI CTO. Gigawatt-scale Nvidia partnership confirmed March 10, 2026. Focus: multimodal AI integrating text, vision, and embodied reasoning.
- Yann LeCun / Advance Machine Intelligence: Turing Award winner, ex-Meta AI chief. $1 billion raised, Paris-based. Focus: AI world models as an alternative to next-token prediction.
- Ilya Sutskever / Safe Superintelligence: Ex-OpenAI chief scientist and co-founder. Focus: AI safety architecture for post-AGI systems.
- The incumbents: OpenAI, Google DeepMind, Anthropic, Meta AI - all pursuing transformer improvements while maintaining internal research on alternative architectures. None are abandoning transformers, but none are ignoring their limitations either.
Get BLACKWIRE reports first.
Breaking news, investigations, and analysis - straight to your phone.
Join @blackwirenews on Telegram