Why Fixed-Flow GPU Rack Cooling Wastes Energy

1. OCP Guidelines and the rationale behind them

Large-frame mechanical designers select pipe diameters, quick-disconnects, and pump heads for the worst-case rack TDP—e.g., 120 kW for an NVIDIA GB200 NVL72 rack or 160 kW for coming Rubin racks. The Open Compute Project’s liquid-cooling guidance and vendor reference designs all embed the same sizing constant:

Source

Sizing rule

Purpose

OCP OAI Liquid-Cooling Guidelines (v1.0) – §3.2

1.5 L min⁻¹ kW⁻¹ at ΔT ≤ 10 °C

Cold-plate loop design, any OAM-B or GB200 server. (Seamless heat transfer in liquid cooling solutions – Stulz)

OCP Reservoir & Pumping Unit Spec (Meta/CoolIT)

150 L min⁻¹ for a 100 kW rack
(≈ 1.5 L min⁻¹ kW⁻¹)

Defines the minimum pump curve for rack CDUs.

Vertiv 360AI Ref-Design #020 (GB200 rack)

1.35 L min⁻¹ kW⁻¹; dual VFD pumps

Row CDU supports 130 kW racks.

nVent / Stulz CDU datasheets

Variable-speed pumps sized at 1.2–1.4 L min⁻¹ kW⁻¹

Advertise energy savings at part load.

1.5 L min⁻¹ kW⁻¹ guarantees a ≤10 °C coolant rise when every Blackwell GPU in the rack is pinned at its 1 kW TDP and fans are bypassed. This keeps silicon Tjunction comfortably below 85 °C, satisfying NVIDIA spec. (GB200 NVL72 | NVIDIA)
Pipe friction and quick-disconnect impedance limit practical ΔP; most RPU specs top out at 40 psi. Staying near 1.5 L min⁻¹ kW⁻¹ balances flow and pressure head across dozens of cold plates. (CyberCool CMU | STULZ)

2. Real racks almost never run at design heat

Modern reference designs size the coolant flow for the worst-case rack TDP—for example, 120 kW for an NVIDIA GB200 NVL72 rack that OCP just accepted into its contribution library (NVIDIA Developer). The OAI liquid-cooling guideline therefore calls for 1.25 – 2.0 L min-¹ kW-¹, with 1.5 L min ¹ kW-¹ as the typical target to hold ΔT ≈ 10 °C (Open Compute Project). The companion RPU spec insists a rack CDU must be able to deliver 150 L min-¹ at ≤ 40 psi to a 100 kW load (i.e., the same 1.5 L min-¹ kW-¹ ratio).

Yet production telemetry shows that board power regularly falls far below those design watts, even while utilization.gpu sits at 100 %:

Real-world trigger

Typical board-power drop

Evidence

Memory-bound or NCCL all reduce stalls during LLM training (8–10 s each iteration)

↓ ≈ 30 % vs. TDP

Lab trace of V100/A100 power in AllReduce study shows board watts dipping one-third during comms phases (ar5iv)

MIG 1/7 slice serving inference on an H100/Hopper

Whole GPU ≈ 15 % of TDP while utilization.gpu = 100 %

NVIDIA documentation notes only one partition of SMs and memory controllers is active (NVIDA Hopper Architecture In Depth)

DVFS or admin power-cap events (thermal or facility limits)

Heat ≤ 80 % of design

User reports of enforced 275 W cap on 400 W A100 GPUs (NVIDIA DGX A100 Station-Power Capping)

Because pump power scales with the cube of flow (Affinity Law) (The Engineering ToolBox), running the fixed 1.5 L min-¹ kW-¹ design flow through these frequent low-heat valleys wastes well over 60 % of pump kWh. Variable-speed CDUs such as the Stulz CyberCool CMU already advertise that their VFD pumps “eliminate bypass under low load,” confirming that throttling flow saves energy instead of burning it across a bypass valve.

In short, the OCP rule is a sound mechanical safety net. Still, the workload dynamics (training stalls, MIG slices, DVFS caps) make a compelling case for adding workload-aware, variable-speed control so flow tracks actual heat rather than the theoretical peak. Doing so reclaims pump energy for more servers or faster time-to-train without touching the pipe sizing dictated by OCP.

3. Variable-Speed Control (pre-cool + slew-limited feed-forward)

The above diagram illustrates the architecture of Federator.ai Smart Liquid Cooling solution. A Federator.ai Edge Agent resides on a GPU server polls metrics from DCGM frequently to detect any potential critical conditions and trigger alerts to the centralized Fedeartor.ai Liquid Cooling module, which then forwards the alerts to the Rack Flow Manager of a CDU controller for any fail-safe adjustments in real time. The Federator.ai Smart Liquid Cooling module also polls GPU workload related metrics from Prometheus in the Kubernetes cluster and the CDU metrics such as current flow speed, coolant supply/return temperatures etc. Based on these metrics, the Federator.ai Smart Cooling module issue flow rate recommendations at appropriate time to the Rack Flow Manager.

3.1 Element-Level Updates

Element

Action / Update

Telemetry source

Locked to NVIDIA DCGM/NVML. Edge Agent polls GPU related metrics in short intervals.

Deployment

One Federator.ai Edge Agent per GPU host; rack-level Flow-Manager in the CDU controller.

Control stages

(1) feed-forward pre-cool on job allocation;

(2) adaptive flow driven by temperature-aware HeatIndex;

(3) host-side fail-safe alerts to Flow-Manager.

HeatIndex

Couples instantaneous load (kW) and thermal margin (°C) so flow reflects how much heat and how urgent removal is.

RPM set-point

Inverse-cubic law ( $RPM\alpha\sqrt[3]{HI}$ ) passed through a slew limiter and hysteresis to avoid thrashing.

3.2 Edge-Agent Architecture

Edge Agent polls DCGM , computes HeatIndex, and triggers alert notifications to the Flow-Manager. Local fail-safe issues instant alerts and power caps.

3.3 Temperature-Aware HeatIndex

Let $Q_{host}$ be board power (kW), $T_{gpu}$ the average core temperature, $T_{max}=85^{\circ}C$ the ceiling, $T_{idle}$ the temperature when GPU is idle.

$M=\frac{T_{max}-T_{gpu}}{T_{max}-T_{idle}}\;\;\;clipped\:to\:[0,1]$

$HI_{host}=\alpha\frac{Q_{host}}{Q_{TDP}}+(1-\alpha)(1-M),\;\;\alpha=0.7$

Where:

$\alpha\,\epsilon\,[0,1]$ : weight assigned to the power component, say 0.7 means that we gauge the current power consumption more and make it more aggressive in changing the flow rate if the power consumption increases.
$Q_{host}$ : current power draw
$Q_{TDP}$ : maximum rated power (TDP)
: fraction of temperature range used
: available thermal headroom

Rack HeatIndex is the (optionally power-weighted) mean of all hosts.

3.4 Feed-Forward Pre-Cooling

The scheduler webhook delivers predicted rack heat $\hat{Q}_{job}$ , the pump is primed for 30 s (or one iteration):The formula for calculating the precool RPM is:

$rpm_{precool}=RPM_{MAX}({\frac{\hat{Q}{_{job}}+Q_{baseline}}{Q_{TDP}}}{})^{1/3}$

Variables

$RPM_{MAX}$ : Maximum allowable RPM for the cooling system
$\hat{Q}_{job}$ : Heat generated by the specific job or process
$Q_{baseline}$ : Baseline heat load from ambient conditions or auxiliary systems
$Q_{TDP}$ : Thermal Design Power (maximum heat dissipation capacity)

Explanation

The cube root in the formula,

$\left(\frac{\hat{Q}_{job}+Q_{baseline}}{Q_{TDP}}\right)^{1/3}$

suggests a scaling relationship between heat dissipation and rotational speed, often derived from fluid dynamics principles. This is commonly used in cooling systems where airflow or coolant flow is related to the cube of the RPM.

Example Calculation

Given:

$RPM_{MAX}=3000$

$\hat{Q}_{job}+Q_{baseline}=800W$

$Q_{TDP}=1000W$

Then,

$rpm_{precool}=3000\times(\frac{800}{1000})^{1/3}\approx 3000\times 0.928\approx 2784\,RPM$

Progressive RPM Algorithm

$RPM_{target}=RPM_{MAX}\sqrt[3]{HI_{rack}},\;\;RPM_{next}=rateLimit(RPM_{prev},RPM_{target},0.03\,RPM_{MAX}).$

Changes ≤ 0.5% $RPM_{MAX}$ are ignored to suppress chatter.

3.5 Operational Impact

Thermal-coupled control reacts when GPUs near the ceiling, even if kW is flat.
Slew-limited RPM removes ±4 kW pump oscillations, extending motor life.
Edge reaction ＜1 s when GPU temp spike: allows immediate flow rate adjustment before GPU gets overheated
Energy ROI: 25–35 % pump-fan kWh savings vs. fixed-flow, even after pre-cooling overhead.
Mechanical envelope: 1.5 L min^-1kW^-1 manifold sizing remains the safe-harbor worst case

4. Energy-saving proof points—real rack, field unit, and whole-hall model

A CoreWeave laboratory A/B run on a liquid-cooled NVIDIA GB200 NVL72 rack (≈ 120 kW IT) compared fixed “safe-harbor” flow with a variable-speed loop while an NCCL all-reduce job pulsed the GPUs. During each 30 second burst the controller reduced coolant flow by 28 %, and the rack’s Grafana trace shows pump + fan demand dropping by ≈ 5.9 kW—exactly what the cubic pump-affinity law ( P∝Q³) predicts for that flow cut. The GPUs stayed below 85 °C, so no thermal derate occurred.

A field pilot with the STULZ CyberCool CMU row-level CDU confirms the same physics at scale. STULZ’s public datasheet highlights:

“Variable-speed pumps ensure enhanced energy efficiency, especially under low loads, eliminating the need for liquid bypass.”

Using the same affinity law, throttling flow to 70 % of nominal at 50 % load yields 30–40 % pump-kWh savings—the range STULZ quotes in its application note.

Finally, a whole-hall model in the ProphetStor + Supermicro white paper Beyond Static Cooling takes an 80 kW H100 rack that was measured at 16–18 % pump-fan savings under adaptive flow and scales the duty cycle. If the pumps are allowed to run at 1.0 L min⁻¹ kW⁻¹ during light-load windows (≈ 40 % of the day) instead of the OCP safe-harbor 1.5 L min⁻¹ kW⁻¹, the model shows a 25–30 % reduction in CDU-motor energy with no ΔT breach. On a 5 MW AI block that free kWh corresponds to roughly 0.6–1 MW of electrical headroom—enough for about 600 additional GB200 GPUs without new utility service.

Together these three lines of evidence—lab, production hardware, and calibrated model—demonstrate that smart, workload-aware flow control delivers 15–40 % cooling-motor energy savings with zero risk to thermal margins.

5. Answer to the mechanical team

Keep the 1.5 L min⁻¹ kW⁻¹ manifold sizing—it’s your safe harbor for worst-case TDP and balances line losses. Add federated, workload-aware variable-speed control, so pumps back off automatically when real heat is below design. The flow variation never exceeds what quick-disconnect CV allows, and you regain >5 % of facility power for more compute.

Reference

Open Compute Project, “OAI System Liquid-Cooling Guidelines, Rev 1.0,” Mar. 2023. https://www.opencompute.org/documents/oai-system-liquid-cooling-guidelines-in-ocp-template-mar-3-2023-update-pdf
OCP Cooling Environments Project, “Reservoir and Pumping Unit Specification v1.0,” Open Compute Project. https://www.opencompute.org/documents/ocp-reservoir-and-pumping-unit-specification-v1-0-pdf
Vertiv Group Corp., “Deploying Liquid Cooling in Data Centers: Installing and Managing CDUs,” Mar. 2024. https://www.vertiv.com/en-us/about/news-and-insights/articles/blog-posts/deploying-liquid-cooling-in-data-centers-installing-and-managing-coolant-distribution-units-cdus/
NVIDIA Developer Forums, “MIG Performance,” Nov. 2024. https://forums.developer.nvidia.com/t/mig-performance/314963
NVIDIA Developer Forums, “GPU Utilization vs Power Draw,” Apr. 2021. https://forums.developer.nvidia.com/t/some-questions-on-gpu-utilization/176318
Pumps & Systems, “Drives for Efficiency and Energy Savings,” Dec. 2011. https://www.pumpsandsystems.com/drives-efficiency-and-energy-savings
EngineeringToolBox, “Affinity Laws for Pumps,” 2023. https://www.engineeringtoolbox.com/affinity-laws-d_408.html
Stulz GmbH, “CyberCool CMU — Coolant Distribution Unit,” 2024. https://www.stulz.com/en-de/products/detail/cybercool-cmu/
NVIDIA Corp., “GB200 NVL72,” Apr. 2025. https://www.nvidia.com/en-us/data-center/gb200-nvl72/
CoreWeave, “Unleashing the Power of the NVIDIA GB200 NVL72,” Jan. 2025. [Online]. Available: https://www.coreweave.com/blog/unleashing-the-power-of-the-nvidia-gb200-nvl72
STULZ GmbH, “CyberCool CMU | Advanced Coolant Distribution Unit,” Datasheet. 2024. Available: https://www.stulz.com/fileadmin/user_upload/products/Brochures_Manuals/CyberCool_CMU/STULZ_CyberCool_CMU_Flyer_2412_EN.pdf?
ProphetStor & Supermicro, “Beyond Static Cooling – The Value of Smart Liquid Cooling in High-Utilization GPU Data Centers,” White paper, May 2025. PDF available from the authors on request.

These independent sources span OCP standards, vendor reference designs, pump-energy fundamentals, and live workload evidence—proving the baseline flow rule is sensible, **but only variable-speed, workload-aware control prevents chronic over-pumping and unlocks head-room for more GPU racks.

Products

Innovative Technologies

GPU Operations

IT/Cloud Operations

Infrastructure Optimization

GPU Operations

GPU Support

IT/Cloud Integrations

Applications

Metric Data Sources

Latest News

ProphetStor and TOMORROW NET Forge Alliance to Boost AI Development and Deployment in Japan and Korea

Highlight Article

Predictive Workload-Aware Liquid Cooling for High-Density AGI GPU Data Centers: Unlocking 30 Percent Energy Savings and 45 Percent Compute Acceleration

How-to Video

Federator.ai Stack optimizes the Time-to-Onlineof GPU servers

Our Offices

Proving Why 1.25-1.60 L min⁻¹ kW⁻¹ Is a Good Design Rule but Wasteful Without Variable-Speed Control

1. OCP Guidelines and the rationale behind them

2. Real racks almost never run at design heat

3. Variable-Speed Control (pre-cool + slew-limited feed-forward)

3.1 Element-Level Updates

3.2 Edge-Agent Architecture

3.3 Temperature-Aware HeatIndex

3.4 Feed-Forward Pre-Cooling

Variables

Explanation

Example Calculation

Progressive RPM Algorithm

3.5 Operational Impact

4. Energy-saving proof points—real rack, field unit, and whole-hall model

5. Answer to the mechanical team

Reference

Products

Support

Solutions

Integrations

Resources

Company

Federator.ai GPU Booster ®

Federator.ai ®

Federator.ai Stack optimizes the Time-to-Online
of GPU servers

Federator.ai GPU Booster ^®

Federator.ai ^®