The Invisible Backbone: What Actually Limits Global GPU Infrastructure

In the current AI boom, the spotlight is almost exclusively on securing the latest GPUs. Hardware availability matters, but access to chips is only one variable in building large-scale infrastructure.

As someone who has overseen deployments across 29 data centers on three continents, I’ve learned that procurement is only the visible part of the iceberg. The real complexity lies in the gap between hardware design and infrastructure reality.

Modern GPU systems are increasingly designed for extremely high rack densities and advanced cooling environments, while much of the world’s existing data center infrastructure still operates within far stricter limits.

In practice, success comes from designing deployments around the hard limits of real data centers — power density, cooling capacity, and rack constraints — so that new hardware can actually run safely once it is installed.

Power Density Limits

The real infrastructure challenge today is balancing hardware performance, cooling constraints, and power availability while keeping systems stable under continuous load. In our case, all deployments run in air-cooled data centers, which immediately sets practical limits on how dense GPU infrastructure can become.

Many air-cooled facilities typically operate in the ~16–20 kW per rack range. As GPU servers grow more power-hungry, rack density quickly becomes a real constraint.

We began deploying GPU infrastructure well before the current AI boom. Between 2016 and 2019, our servers already drew up to ~2 kW per node. Today that figure can reach ~3 kW per server, which significantly changes how racks must be designed and distributed within a facility.

This pressure on power density is one of the factors shaping how modern neocloud infrastructure is being built.To support modern AI clusters, many operators are now forced to either build or finance data center capacity designed for higher-density compute environments.

The Connectivity Layer

Hardware alone does not determine performance. Connectivity between infrastructure and users is just as important.

Deploying powerful GPU clusters means little if network routing introduces unnecessary latency or instability. For real-time workloads like cloud gaming, even small changes in network paths can directly affect user experience.

Infrastructure strategy therefore goes beyond simply adding bandwidth. It often involves direct peering with regional networks, multiple transit providers, and continuous routing optimisation.

In practice, this means ensuring traffic follows the most stable and predictable paths, rather than simply the lowest-cost routes that default BGP policies might select.

Without that level of control, even the most powerful hardware may struggle to deliver consistent real-time performance.

Strategic Redundancy

Most large platforms rely on Tier 3 data centers, which provide strong reliability guarantees. However, even highly redundant facilities still require maintenance events such as power testing or infrastructure upgrades.

For many enterprise workloads, these maintenance windows have limited impact.

Latency-sensitive platforms behave differently.

Users do not experience “maintenance windows.” Instead, they experience latency spikes when workloads are shifted to other locations, sometimes hundreds or thousands of kilometers away.

Managing this requires geographic redundancy.

Our infrastructure footprint spans 29 data centers, allowing workloads to shift between regions while maintaining service continuity.

The goal is not to eliminate maintenance events — which is impossible — but to ensure the platform behaves predictably even when individual facilities undergo operational changes.

From Gaming to AI

Many of the infrastructure challenges now discussed around AI systems are not entirely new.

Cloud gaming faced similar constraints years earlier: high-density GPU servers, strict latency requirements, and the need to operate infrastructure consistently across multiple regions.

Those lessons translate directly to the current AI infrastructure wave.

Building global GPU platforms isn’t just about buying hardware. It’s about running it reliably across very different data centers and network conditions — every rack, every region, every connection counts.

New GPU architectures tend to dominate headlines.

But in practice, infrastructure — power, cooling, and connectivity — is often the factor that determines whether large-scale platforms can actually grow beyond a few locations.

In the end, scaling GPU infrastructure isn’t just about chasing the newest chips. It’s about solving the real-world limits of power, cooling, and connectivity — making sure every GPU we install can actually run at full speed, wherever it’s placed.