Prediction 8 Revisited: I Said the Transformer Era Would Hit Architectural Fatigue and New Architectures Would Rise. The Fatigue Came. The Transformer Didn't Die.
The convergence happened. The architectural revolt is real but still insurgent, not victorious.
This is my most genuinely mixed result, so I'll resist the urge to spin it.
Prediction eight: as global labs converge on similar large-language-model architectures, the competitive advantage of any one organization diminishes, and the field starts exploring new architectures — search-based intelligence, hybrid models, continuous learning, training-free solutions. My favorite spin-offs were 'In-Depth Ecosystem Simulation' and 'Swarm AI Agents with Self-Upgrading Architectures.'
The convergence call: correct
The first half is unambiguously right. By 2026 the frontier labs converged hard. GPT-5, Gemini 3, Claude, and a wall of capable open models all clustered in capability. DeepSeek proved you could match frontier performance at a fraction of the cost. The 'advantage of any one organization diminishes' thesis is now the bear case on every frontier lab's valuation — the commoditization of the model layer is the single biggest story in AI economics. I called the convergence. It arrived.
The new-architecture call: real, but still insurgent
Here's the honest part. I predicted new architectures would 'rise.' What actually happened is subtler: new architectures are rising, but the transformer refused to die, and the most important shift wasn't a new architecture at all — it was a new axis of scaling.
On the architecture front, I was directionally right. State-space models — Mamba, Mamba-2, Mamba-3 — went from research curiosity to serious contender, delivering linear-time inference and 5x throughput gains over transformers on long sequences. Hybrid models like AI21's Jamba (interleaving Mamba and attention, scaled to 398B parameters) showed the attention-free and hybrid approaches can scale. Reasoning models built on Mamba demonstrated you can get strong reasoning without pure attention. The 'architectural fatigue' I described — diminishing returns from just scaling transformers — is real and widely acknowledged.
But — and I have to say it — the transformer did not get dethroned. It got supplemented. The dominant 2025-2026 models are still attention-based or hybrid, not pure alternatives. My prediction implied a changing of the guard. What we got is a coalition government.
What I genuinely missed: test-time compute
The biggest architectural story of 2025-2026 wasn't a new network architecture. It was a new place to spend compute: inference time. Chain-of-thought reasoning, test-time compute scaling, agentic loops — the realization that you can substitute inference-time thinking for raw model size. I gestured at 'search-based intelligence' in January 2025, which was actually a pretty good intuition for where reasoning models went. But I filed it under 'new architectures' when it's really a new scaling paradigm layered on top of the old architecture. I had the right instinct shelved in the wrong category.
This matters as a forecasting lesson. I was pattern-matching to 'the architecture changes' because that's the legible story. The actual change was economic and operational: inference became the cost center and the battleground, which is why efficiency-first designs like Mamba matter — not because they're prettier, but because inference repeated billions of times is the business model itself.
The next 12 months: efficiency stops being optional
My forecast for May 2027: hybrid architectures (attention + state-space + MoE) become the default for new frontier models, because the economics of inference force it. Pure transformers persist where they're entrenched; new builds increasingly go hybrid for the throughput and long-context gains. The 'one architecture to rule them all' era ends — not because the transformer lost, but because the right answer is now provably task-dependent.
Watch inference cost become the headline metric. Training cost is paid once; inference is paid forever. As reasoning models spend ever more compute at inference, the company with the most efficient inference architecture wins on margin even with an equally smart model. That's the structural reason I keep harping on infrastructure: the model layer commoditizes, so the durable edge moves to whoever runs inference cheapest — which is an architecture-plus-hardware-plus-energy question, not a 'whose model is smartest' question.
My grade on my own January 2025 self: B-minus. Right about convergence and architectural fatigue. Right-ish about search-based reasoning. Wrong to imply the transformer would fall, and I shelved the most important shift — test-time compute — under the wrong heading. A real partial. I'm not going to pretend otherwise.
____
Sources: original forecast (Jan 6 2025); Mamba/Mamba-2/Mamba-3 research (2024-2026); AI21 Jamba hybrid architecture; M1 / PromptCoT-Mamba reasoning models (2025); DeepSeek efficiency reporting; test-time compute scaling literature 2025-26.


