How AWS Used Random Graph Theory to Build More Efficient Data Centers¶

Source: About Amazon by Kirsteen Rodger

As AWS data centres ballooned to tens of thousands of servers, the traditional network topology — the hierarchical tree — began to break. Cables tangled into impossible knots. Routing became a bottleneck. The cost and power consumption of network equipment started eating into the economics of cloud computing.

The solution came from an unlikely place: random graph theory.

The Three Problems¶

As data centres scaled, three interlocking problems emerged:

1. Cable Tangles¶

Traditional data centre networks use a multi-tier tree topology (access → aggregation → core). At AWS's scale, this meant millions of cables running through crowded plenums. Cable management became a nightmare — tangled bundles that made maintenance a hazard and airflow a constant struggle.

Solution: ShuffleBox — a sealed, unpowered enclosure that deterministically shuffles connections between servers using just 8 specific numbers derived from Comandur's equation. Think of it as a mechanical permutation device: cables go in one side, emerge rearranged on the other, creating a near-random network topology with no active components.

2. Routing Complexity¶

In a traditional network, routing is a shortest-path problem. But as the network grows, routing tables explode and convergence times slow down. A single link failure can send the entire network into a recalculation spiral.

Solution: SprayPoint Protocol — instead of computing a single best path, SprayPoint sprays data packets to all neighbours simultaneously. Waypoint routers keep a lightweight pointer to the destination, and data finds its way through hundreds of simultaneous paths. This eliminates the need for global routing tables and provides inherent load balancing — traffic naturally spreads across all available links.

3. Proving It Works¶

Random networks are notoriously hard to verify. How do you prove a design works at a scale that doesn't exist yet? Building a test data centre for a million-server topology is impractical.

Solution: 530 Compute-Processing Years of Simulation — the team ran exhaustive simulations and derived mathematical formulas that could predict the behaviour of the random topology at any scale. The proof wasn't just empirical; it was theoretical. Comandur's equation provides a closed-form expression for the network's expected performance characteristics.

The Team¶

The project was a cross-disciplinary collaboration:

Giacomo Bernardi — AWS networking lead
Ratul Mahajan — AWS networking researcher
Seshadhri Comandur — UC Santa Cruz mathematician (the "Comandur" behind the equation)
Matt Rehder — AWS hardware engineer

The Results¶

The numbers are striking:

Metric	Improvement
Performance	1/3 faster data throughput
Power consumption	40% reduction in network power
Hardware costs	Billions saved in networking equipment
Deployment	Began 2025 in Spain and Germany; rolled out to most global AWS data centres by 2026

The power savings alone are significant at AWS's scale — 40% less energy spent on network switching means lower operational costs and a meaningful reduction in the company's carbon footprint.

Why Random Works¶

The counter-intuitive insight: perfectly planned, regular network topologies (trees, hypercubes, toruses) create single points of failure and congestion hotspots. Random topologies, by contrast, distribute traffic uniformly across all links. No link is special; no switch is critical. The network becomes robust by eliminating hierarchy.

AWS's ShuffleBox design proves that sometimes the best-laid plans are no match for a well-chosen dose of randomness.

What's Next¶

With the random graph approach proven at scale, the techniques are likely to influence data centre design across the industry. Google, Microsoft, and Meta are all wrestling with the same scaling problems. AWS's open publication of the research suggests the company wants to establish this as an industry standard — and given the billions saved, competitors will be paying close attention.