From Training Burst to Inference Continuous

Why 2027 reshapes the compute map and how modular distributed infrastructure is structurally aligned with the workload mix the next decade actually consumes

Jun 01, 2026

Part 4 of a five-part series on the structural opportunity in modular AI infrastructure

There is a pattern in technology infrastructure cycles that recurs reliably enough to be useful as a planning heuristic. The infrastructure built during the first phase of any major compute paradigm — the burst phase, the experimental phase, the phase where the model architecture itself is still being figured out — is almost never the infrastructure that turns out to be optimal for the second phase, the production phase, the phase where the workload settles into a steady-state economic equilibrium.

This was true of the mainframe-to-minicomputer transition. It was true of the on-premise-to-cloud transition. It was true of the desktop-to-mobile transition. And it is becoming true, right now, of the training-to-inference transition that the AI infrastructure market crosses through over the next eighteen months.

The investor implication is straightforward but worth stating directly: the infrastructure being built today, on the assumption that frontier-model training is the dominant workload, will not be the infrastructure that monetizes the production AI economy from 2028 onward. The geography, the form factor, the latency profile, the customer mix, and the unit economics of inference-dominant AI compute are materially different from those of training-dominant AI compute.

DCXPS’s modular distributed architecture is not optimized for the training-burst world. It is optimized for the inference-continuous world that follows it. And the 6-year SPV horizon described in Article 2 is calibrated to the exact transition curve where the value reallocates.

Here is the analysis behind that positioning.

The flip

The market-sizing data points that frame this transition:

In 2025, training workloads accounted for approximately 62% of AI compute demand, with inference at ~38%.
By mid-2027, inference workloads cross training in aggregate compute consumption.
By 2030, the projected mix is inference at ~60–65%, training at ~35–40%.
On energy footprint specifically, multiple independent estimates (IEA, McKinsey, Schneider Electric) converge on inference reaching 60% of AI energy consumption by 2027, despite using less per-query compute than training.

The shift from training-dominant to inference-dominant is not a marginal rebalancing. It is the largest workload-mix transition the data center industry has experienced since the mobile-first transition reshaped consumer compute in 2010–2015.

The mechanism driving it is straightforward and not particularly controversial. Frontier model training is, by its nature, a periodic event — a single model is trained once (or a small number of times for fine-tuning), and that training run consumes enormous compute for weeks or months. Inference is the productive deployment of those trained models. Every query, every API call, every agent step, every chatbot interaction, every image generated, every line of code completed — each one consumes inference compute. As enterprise AI deployment moves from proof-of-concept to production scale, inference volume grows by orders of magnitude even as training cluster size grows linearly.

The clearest signal that we are already inside this transition. NVIDIA’s most recent earnings disclosures and capital markets commentary explicitly call out inference as the dominant near-term growth vector for Blackwell deployment. AMD’s MI300X/MI350 product positioning is explicitly inference-first. Hyperscaler custom silicon (Google’s TPU v6, AWS Trainium-3, Meta MTIA v3) is increasingly inference-optimized rather than training-optimized.

The capital market is pricing the transition. What most operators have not yet internalized is what it changes about where and how you build the infrastructure.

What changes when inference takes over

Six characteristics of AI compute infrastructure change materially as the workload mix shifts from training-dominant to inference-dominant. Each has direct implications for site selection, deployment architecture, and unit economics.

1. Single-node performance becomes more important than aggregate cluster performance

A frontier training run requires tens of thousands of GPUs operating in synchronized parallel, with extreme demands on inter-node bandwidth and latency. Training infrastructure is dominated by the question of “how do I keep 25,000 GPUs lockstep at 99.9% efficiency for six weeks.” This is why hyperscale training clusters live in single-building campuses with InfiniBand fabric and meticulous topology engineering.

Inference is overwhelmingly single-node or small-cluster. A typical LLM inference request runs on 1–8 GPUs. A computer vision inference runs on 1 GPU. A recommendation model inference runs on a fraction of one. The aggregate compute is enormous, but the unit of compute is small. This means:

Infrastructure can be physically smaller and geographically distributed without performance penalty.
Inter-node bandwidth requirements are dramatically lower (you do not need 800 Gb/s InfiniBand for a single-node inference query).
Hardware failures have local rather than systemic impact (one container down ≠ entire training run lost).

2. Geographic distribution becomes a feature, not a cost

Training infrastructure benefits from physical concentration. Inference infrastructure benefits from physical distribution. The reason is latency.

Modern agentic AI applications, real-time copilots, autonomous-systems control loops, robotics control planes, and consumer-facing AI products operate on sub-50-millisecond round-trip latency budgets. At the speed of light in fiber, that’s roughly 4,000 km of round-trip distance under ideal conditions — and real-world routing typically delivers 30–40% of theoretical, meaning a sub-50ms budget translates to 1,000–1,500 km of actual reach in practice.

The implication: a hyperscale data center campus in northern Virginia cannot serve a real-time inference workload in Prague, Madrid, or Helsinki at the required latency. Not even if the data center is fast, not even if the network is well-engineered. It is a physics problem, not an engineering problem.

This is why the inference geography looks structurally different from the training geography. Training can be done at the most power-efficient site available, anywhere on the planet. Inference must be done near where the inference is consumed — which means distributed, regional, and increasingly metro-edge.

3. Utilization curves change shape

Training workloads exhibit a “burst” utilization profile. A new model is announced; training begins; cluster utilization runs at 95%+ for weeks; training completes; utilization drops. The capital model has to monetize the high-utilization period sufficiently to amortize the deep-cycle equipment cost.

Inference workloads exhibit a “continuous” utilization profile. Customer queries arrive 24/7. The diurnal pattern is real (utilization is higher during business hours) but the dynamic range is much narrower — typically 60–85% utilization on a continuous basis, rather than 0–100%.

For modular operators, this is the friendlier curve. Continuous utilization is easier to forecast, easier to bill against contractual commitments, and produces more predictable EBITDA. The 96% margin profile described in Article 2 is structurally easier to defend against an inference-continuous mix than against a training-burst mix.

4. Hardware specifications diverge

Training-optimized hardware (H100, B200, B300 in their training configurations) is dominated by high-bandwidth memory (HBM3E, soon HBM4), extreme inter-node interconnect (NVLink, NVSwitch, InfiniBand), and maximum FLOPS per dollar.

Inference-optimized hardware emphasizes memory capacity (to hold larger models in single-GPU memory), memory bandwidth (for token generation throughput), and energy efficiency (because the workload runs continuously rather than in bursts).

The H200, in particular, is interesting in this context. Its higher HBM3E capacity (141 GB vs. H100’s 80 GB) makes it materially better than H100 for inference of larger models that did not fit in H100 memory. The B300 inherits this profile and extends it. Both of these chips are inference-relevant in a way that the original H100 generation was not. This is part of why our SPV unit composition is 14 B300 + 35 H200 — it produces a mix that is well-positioned for the workload transition over the SPV’s 6-year term.

5. Customer mix broadens

Frontier model training is concentrated among a small number of customers: OpenAI, Anthropic, xAI, Meta AI, Google DeepMind, a handful of Chinese labs, and the major European frontier labs. The customer count is in the low double digits.

Inference customers are every enterprise running AI in production. The customer count for the inference economy is in the millions. The pricing dynamic is different (smaller average ticket but more buyers), the procurement cycle is different (faster, transactional, often self-serve), and the loyalty profile is different (price-sensitive, performance-sensitive, multi-cloud by default).

For the Chapek platform — DCXPS’s bare-metal GPU cloud monetization layer — this transition is operationally favorable. We are built for the inference-dominant world: self-service provisioning, transparent hourly pricing, no orchestration tax, geographic distribution that puts compute near demand.

6. Energy economics change

The training workload favors absolute lowest energy cost regardless of location — which is why hyperscale training facilities locate near hydropower, nuclear, or stranded gas, and why Crusoe’s bitcoin-flare model worked.

The inference workload trades energy cost for latency. A site that is 20% more expensive on power but 200 km closer to demand may be the better economic site for inference because the value of the latency reduction exceeds the energy cost premium.

This is the dynamic that makes Central and Eastern European positioning particularly attractive for inference workloads serving the EU enterprise market. Czech power costs are competitive with German and French rates. The latency profile from Prague to Frankfurt, Vienna, Munich, Warsaw, Budapest, Berlin, and Amsterdam all fits within real-time inference budgets. The regulatory positioning (EU member state, EU AI Act jurisdiction, GDPR-compliant) is a structural advantage for the EU-sovereign demand pool described in Article 3.

What this means for capital deployed in 2026

The investment thesis question is not “is inference becoming bigger than training” — that is settled, the data are clear, the only debate is timing. The question is: what does an infrastructure portfolio look like that is positioned for the post-transition equilibrium, deployed inside the transition window?

Five propositions follow from the analysis above:

Proposition 1 — Distributed beats concentrated, in 6-year horizon terms

Capital deployed into single-campus hyperscale builds is exposed to the risk that the workload mix being optimized for is not the workload mix that exists at the operating end of the lifecycle. Capital deployed into geographically distributed modular capacity has the option to serve either training (with intra-site clusters of multiple containers) or inference (with single-container deployments serving regional demand) — and the architecture is flexible enough to shift between them as the mix evolves.

Proposition 2 — The training cluster premium is compressing

Hourly pricing for top-tier training clusters has been declining since mid-2024. The squeeze comes from two directions: hyperscaler internal capacity ramping up, and inference-economy demand absorbing the marginal GPU at lower per-hour rates than the peak training rate. The peak per-hour pricing for training-burst configurations may not return to 2024 levels. Operators whose capital model assumes those rates are exposed.

Proposition 3 — Hardware-fungibility matters

The B300 and H200 configurations in our SPV structure are not training-specific or inference-specific. They are hardware platforms that can be rebalanced toward either workload as customer demand evolves. The Chapek platform supports this rebalancing through its bare-metal provisioning model — capacity can be allocated to training tenants or inference tenants as the market mix dictates. Capital partners are not exposed to a hardware bet on one workload type.

Proposition 4 — EU latency geography is undervalued

The market is currently pricing AI infrastructure substantially on its training capability (power cost, cluster scale, interconnect bandwidth). It is not yet pricing AI infrastructure on its inference geography (latency reach, regional sovereignty, regulatory positioning). As the workload mix transitions, the EU-distributed-and-sovereign positioning will reprice. Operators positioned ahead of that reprice capture the value transfer.

Proposition 5 — The 6-year SPV horizon spans the transition perfectly

A unit deployed Q4 2026 begins commercial operation in 2027 — the year inference crosses training in aggregate compute. The unit operates through 2032. The first half of its operating life monetizes the late-training era; the second half monetizes the established-inference era. This is not a coincidence. The 6-year term was calibrated specifically to span the transition.

The technology arc — what 2028 looks like

For investors thinking 5+ years out, the technology arc that will shape the second half of the SPV term:

HBM4 deployment (2026–2027). NVIDIA Rubin, AMD MI400 series. ~15 TB/s memory bandwidth, vs. ~9 TB/s for current HBM3E. The performance uplift is meaningful for both training and inference, but the inference impact is more economically significant — it dramatically improves per-query throughput for large models.

Inference-specific silicon at scale (2027–2028). The economics of dedicated inference chips improve as the inference market matures and reaches a size that justifies dedicated silicon investment. Google TPU v7, AWS Inferentia-3, Meta MTIA v4, plus startup entrants (Cerebras, Groq, SambaNova) scaling production volume. This is a risk for general-purpose GPU revenue per hour — and a reason to think carefully about lifecycle planning at the unit level.

Mixture-of-experts and model efficiency (continuous). Architectural innovations are reducing per-query compute requirements by 30–50% for many workloads. This is offset by expansion of the addressable user base (Jevons paradox) but should be modeled as a moderating factor on aggregate accelerator demand growth.

Edge inference (2028+). 5G/6G-native AI workloads, autonomous vehicle compute, industrial IoT, smart-city infrastructure. This is a market growing from a small base but at significant CAGR. It is structurally aligned with the distributed modular thesis — these workloads cannot be served from hyperscale campuses.

The DCXPS roadmap accounts for each of these technology arcs:

Generation 1 (current): 1 MW MADC units. Brownfield grid. Human technicians. Revenue from month five.
Generation 2 (2027–2028): MADC+ at 2 MW. On-site CHP. First-generation robotics. 85%+ grid independence.
Generation 3 (2028–2030): DDCU at 4 MW. Dark, distributed, dynamic. Lights-out compute. Software-defined energy management.
Generation 4 (2030–2032): 10 MW DDCU. Humanoid robots in operational role. Integrated agricultural co-loads. EBITDA margin profile expanding toward ~93%.
Generation 5 (2032–2034+): 100–200 MW “Robotic DC City” campuses. 200–500 humanoid robots. 5–10 human orchestrators.

The Generation 1 SPV that a capital partner commits to in 2026 is not exposed to the Generation 5 build — but the operator’s roadmap, the unit-level technology refresh paths, and the residual-value pathways do tie into the broader trajectory. Capital partners hold the option to participate in the next generation, but their exposure is locked at the unit-level economics of the generation they entered.

The operator framework

For DCXPS as operator, the workload mix transition creates a specific set of execution priorities for the next 24 months. I am laying these out openly because capital partners ought to be able to verify, in diligence, that the operator has thought about the right questions:

Priority 1 — Customer mix construction. The Chapek platform must build a customer mix that balances training tenants (for peak utilization revenue) with inference tenants (for utilization-stability revenue). Our target mix at 18 months of operation is 35% training / 65% inference, weighted by revenue. This is a deliberate mix; pure-training mix exposes us to the burst-cycle compression risk, pure-inference mix gives up the upside of peak training rates.

Priority 2 — Geographic expansion ahead of inference reprice. The Site 01 Kladno deployment is the anchor. Subsequent sites prioritize EU-distributed latency geography — central Czechia, Hungary, Poland, additional Western European positions where co-located power exists. Each site is structured as its own SPV pool.

Priority 3 — Hardware refresh discipline. The SPV operating term is 6 years, but hardware generations turn over in 18–24 months. The operator’s job is to optimize the workload mix on each generation as it ages — moving an H200-heavy unit toward inference workloads as B300/Rubin take over the training tenant base, and so on. This is the lifecycle management that capital partners hire us to do.

Priority 4 — Customer development at the regulated-enterprise tier. The inference customer mix that monetizes the EU-sovereign positioning is BFSI, healthcare, government, automotive, frontier AI labs (per Article 3). Each of these is a long-cycle enterprise sales motion. We have started; we are not done.

Priority 5 — Platform investment in audit, observability, and compliance tooling. The AI Act enforcement window (August 2026 onward) will commercially differentiate operators with mature compliance tooling from operators offering raw compute. Chapek’s roadmap includes auditability, training-data-traceability, model-versioning, and human-oversight integration tooling as differentiation points for the regulated-enterprise customer tier.

Where this leads

For capital partners, the inference-transition thesis is a confirmation argument for the modular distributed structure, not a contradiction of it. The 6-year SPV horizon is structurally aligned with the workload mix transition. The hardware composition (B300 + H200) is structurally aligned with the workload profile. The geographic positioning (EU-distributed, latency-optimized) is structurally aligned with where the inference economy actually consumes compute.

The DCXPS Site 01 deployment at Kladno is the deployment vehicle. Nine SPV positions. $45M unit size, $15M minimum participation. 2.75× modeled multiple, ~29.2% simple yield, 85/15 EBITDA waterfall. First units online October 2026.

For data-room access: investors@dcxps.com.

The final article in this series steps back from the operating thesis and looks at the asset-class framing. Why hardware-backed exposure to AI infrastructure is structurally different from platform equity, from tokenized compute claims, from listed neocloud equity, and from traditional infrastructure asset classes. And why, for a specific class of capital partner with a specific risk-return mandate, the SPV-owned modular structure is the only one of these vehicles that fits.

Jiri Fiala is CEO and co-founder of DCXPS, building Tier 3 modular AI data centers and the Chapek bare-metal GPU cloud platform. Previous in this series: “Power Is the New Silicon,” “The 195-Day Data Center,” and “The CLOUD Act Conflict.” Final: “Own the Metal — A Framework for AI Infrastructure as an Asset Class.”

This article does not constitute an offer to sell or solicitation of an offer to buy any security. Any such offer will be made only by means of definitive transaction documents to qualified investors. See risk factors in the DCXPS Confidential Investor Memorandum.

AI Of The Coast - Jiri Fiala

Discussion about this post

Ready for more?