Federator.ai SLC + Phaidra — Complementary Architecture for AI Factory Liquid Cooling

Complementary Architecture

Two complementary approaches to AI-driven liquid cooling for GPU data centers — Phaidra’s RL agent masters CDU setpoint optimization while Federator.ai extends the control boundary to GPU workloads, scheduling, and platform-wide thermal management. Together, they cover the full stack.

Executive Summary

Better together than apart

Both solutions address the same root cause: PID controllers are reactive, not predictive, leading to thermal overshoots during power transients and wasted energy from chronic sub-cooling. Rather than competing, they operate at different layers of the control stack — Phaidra masters CDU setpoint optimization via RL, while Federator.ai extends control upward into GPU workloads, scheduling, and platform-wide orchestration. Combined, they deliver what neither achieves alone: full-stack thermal intelligence from the pump to the job scheduler.

Phaidra

Single-variable RL agent for CDU setpoint. Supervisory layer on existing PID. Uses rack power as leading indicator (~10-60s). Self-learning via digital twin pre-training. Co-authored with NVIDIA; validated on DGX SuperPOD and CoreWeave NVL72.

Federator.ai SLC

Three-layer control hierarchy bridging the fundamental timing gap between GPU heating (milliseconds) and liquid cooling response (180+ seconds). By treating IT and OT as one integrated domain, SLC uses workload-aware predictive control and admission gating to prevent thermal throttling, save 25-30% cooling energy, and dynamically adjust flow rates to meet target exit temperatures — all without additional OT integration effort.

Phaidra excels at CDU setpoint optimization — learning nonlinear dynamics no physics model captures. Federator.ai extends control upward into workload admission, GPU execution, and platform orchestration. The combined architecture covers every layer from the coolant pump to the job scheduler.

Complementary value proposition

Control Approach

RL + MPC: different layers, one integrated stack

Dimension	Phaidra	Federator.ai SLC
Paradigm	Reinforcement Learning (model-free, feed-forward)	Model Predictive Control (physics-based) + PID + Scheduler
Manipulated variable	CDU secondary supply temp setpoint	Pump flow + GPU power limits + launch rate + job admission
Leading indicator	Rack power (electrical → thermal delay)	Scheduler queue + power prediction (3 confidence-weighted sources)
Horizon	Implicit in RL policy (~10-60s via transport delay)	Explicit: 6×5s = 30s MPC + 5-min workload pre-cooling
Explainability	Black-box — validated by results	White-box: J = Σ[w_T·(T−T*)² + w_E·P_pump + w_ΔU·Δu²]
Solver	Neural network (PPO/SAC)	scipy SLSQP; PID fallback on solver failure
Adaptability	Self-learning: digital twin → live post-training (hours)	Online parameter estimation: thermal mass, time constant, HTC
Timing gap	Responds to observed thermal lag	Bridges GPU heat (ms) vs coolant (180s+) — predictive + admission
IT / OT boundary	OT only (CDU setpoint)	IT = OT unified — workload awareness makes cooling effective
Flow control	Indirect via setpoint	Target exit temp → dynamic flow rate auto-adjustment

Phaidra captures nonlinear CDU dynamics via learned policy. Federator.ai SLC adds auditable multi-variable control above it. Together: RL precision + MPC breadth.

Architecture Depth

Each solution owns different layers — together they span all four

Phaidra excels at Layer 2 — a supervisory RL agent that learns optimal CDU setpoints faster than any physics model can be manually tuned. It works with the existing CDU PID (Layers 0-1). Federator.ai contributes Layers 0-1 and L3: direct pump flow control, PID with anti-windup and bumpless transfer, and critically, L3 workload-aware admission with pre-cooling. Combined, Phaidra’s RL handles CDU optimization while Federator.ai controls the heat source itself through workload scheduling — a capability no single solution provides alone.

Safety Architecture

Layered defense — CDU safety + platform-wide interlocks

Layer	Phaidra	Federator.ai
Guardrails	Hard-coded TCS envelope	83°C max, 90°C shutdown, ramp limits
Failover	Agent fail → local PID	MPC fail → PID + anti-windup + bumpless
Interlocks	Existing CDU retained	4: GPU≥90, supply≥55, return≥70, flow<50
Actuation	Temp setpoint only	Pump + GPU power + launch + admission
Blast radius	CDU thermal only (safe)	Wider — requires Proof of Trust
Regulatory	Easy to certify as advisory	Full ICS, 4-phase trust progression

Complementary Phaidra’s CDU-focused safety is simple to deploy and certify. Federator.ai adds platform-wide interlocks (GPU temp, flow rate, admission control). Together: defense in depth from CDU to workload layer.

Workload Integration

Phaidra reacts in seconds; Federator.ai plans minutes ahead — both needed

Power as proxy

Rack power as leading indicator. ~10-60s window bounded by physical transport delay. Does not integrate with Slurm/K8s. Cannot see queued jobs before they start.

Schedule-aware pre-cooling

Scheduler integration (conf 0.9), trend extrapolation (0.6), current baseline (0.3). 5-minute pre-cooling window. Can also shape the thermal load via admission control.

Phaidra reacts to power transients in 10–60 seconds with unmatched CDU precision. Federator.ai looks 5+ minutes ahead via scheduler integration and can shape the thermal load itself. Combined: fast CDU response for spikes AND proactive workload shaping for sustained transitions.

Combined advantage

Thermal Admission & GPU Execution Control

Federator.ai’s contribution above the CDU layer — what Phaidra was never designed to do

Unique to Federator.ai — no other solution controls GPU execution for thermal management

Level	Mechanism	Measured Impact
NONE	Baseline operation	36.76W avg, 96.67% util, 64.16°C
POWER_CAP	`nvidia-smi -pl {watts}`	Immediate, sub-second, no app changes
LAUNCH_THROTTLE	`LD_PRELOAD=libnvscope.so` token bucket	Moderate −31.6%, Heavy −63.4%, Extreme −85.0%
DEFER	K8s/Slurm queue hold	Zero GPU impact; job starts with full thermal budget
REJECT	Admission denied	Prevents thermal emergency entirely

Performance Claims

Different metrics, additive benefits

Phaidra — March 2026 Whitepaper

~75%

Overshoot reduction (60kW)

3-4°C → 0.5-1°C

~80%

Overshoot reduction (100kW)

5-6°C → ~1°C

Hours

Live training convergence

After digital twin pre-training

DGX GB200

Validation platform

SuperPOD + CoreWeave NVL72

Federator.ai SLC — Core Value Propositions

25-30%

Cooling energy savings

Dynamic flow → target exit temp

Zero

Thermal overshoot

Admission control eliminates the spike entirely

ms vs 180s

Timing gap bridged

GPU heat (ms) ↔ coolant (180s+)

IT = OT

Unified domain

No extra OT integration work

<100ms

L1 safety response

PID emergency override

The fundamental insight: cooling can only be effective and efficient when you understand the workload. SLC sets the target exit temperature and dynamically adjusts flow rate to meet design specifications — no over-cooling, no under-cooling, no performance capping from thermal events.

Additive Phaidra reduces overshoot by 75-80% (3-4°C down to 0.5-1°C). Federator.ai eliminates overshoot entirely via admission control — the spike never happens. Combined with 25-30% cooling energy savings.

Integration Scope

CDU agent + full-stack platform = complete coverage

Phaidra is a best-in-class CDU optimization agent, deep where it matters most. Federator.ai SLC is one module within a 12-domain AI data center operating system, providing the platform fabric that connects cooling to workload scheduling, GPU execution control, failure prediction, auto-remediation (Martin-SRE), observability, and billing. Phaidra plugs into Federator.ai’s L2 slot, contributing superior CDU setpoint intelligence while Federator.ai handles everything above and around it.

Deployment & Learning Model

Phaidra self-learns the CDU; Federator.ai manages the trust boundary above it

Phaidra: RL self-learning

Pre-train on digital twin (per CDU model)
Shadow mode (observe only)
Live post-training (converges in hours)
Active — adjusts setpoint in real-time

Advantage: adapts automatically, no manual parameter tuning.

Federator.ai: Physics model + Proof of Trust

Configure physics model parameters
SHADOW — read-only telemetry (30 days)
ADVISORY — dual-key approval (60 days)
BOUNDED AUTONOMY — auto within blast radius
FULL AUTONOMY — closed-loop control

Advantage: explainable at every step, formal audit trail for ICS certification.

Revenue & TCO Impact

Capacity unlock + operational savings = stacked ROI

Phaidra unlocks stranded cooling capacity: at 1GW scale, raising TCS by 10°C frees 67.4 MW for an additional $3.8B/year in IT revenue. Federator.ai delivers 25-30% cooling energy savings, eliminates GPU thermal throttling (protecting compute revenue), and extends GPU lifespan by keeping junction temperatures within design targets. These value streams are entirely additive — deploying both captures revenue that neither achieves alone.

Combined financial impact

Strategic Assessment

What each brings to the partnership

What Phaidra contributes

CDU mastery — Self-learning RL adapts to any CDU, hours to converge
Transient suppression — 75-80% overshoot reduction (3-4°C → 0.5-1°C residual)
NVIDIA ecosystem — Co-authored, DGX SuperPOD validated
Capacity unlock — $2.2-6.5B/year revenue at GW scale
Zero-config deployment — Digital twin pre-training, no parameter tuning

What Federator.ai contributes

25-30% cooling energy savings — Dynamic flow rate to target exit temperature
Zero overshoot — Admission control eliminates the thermal spike entirely, not just reduces it
IT = OT unified — Already managing IT workloads, no extra OT integration needed
Workload-aware cooling — Only when you understand workloads can cooling be effective
GPU execution control — 5 mechanisms: power cap, launch throttle, defer, reject
Platform fabric — Cortex ADDC connects 12+ domains

Combined Capability Map

Capability	Phaidra contributes	Federator.ai contributes
CDU optimization	Primary RL learns CDU dynamics	MPC supplements; PID safety fallback
Transient response	Primary 75-80% overshoot reduction	Primary Zero overshoot via admission control
Prediction horizon	~10-60s (transport delay)	Primary 5+ min scheduler integration
GPU execution control	—	Primary 5 mechanisms (power cap → reject)
Workload integration	—	Primary Slurm/K8s native
Deployment speed	Primary Self-learning, hours	Trust progression for actuation layers
Explainability	RL learns — results validate	Primary Auditable MPC cost function
Safety architecture	CDU guardrails + failover	Primary 3-layer interlocks + PoT
Platform integration	CDU-focused agent	Primary Full Cortex ADDC (12 domains)
NVIDIA ecosystem	Primary Co-authored, DGX validated	NVIDIA-native stack (DCGM, NIM)
Value impact	Primary $B capacity unlock	Additive 25-30% cooling savings + zero throttling

Phaidra makes your CDU the smartest it can be. Federator.ai bridges the timing gap between GPU heating and coolant response, saves 25-30% cooling energy, and prevents performance capping — because only when you understand workloads can cooling be truly effective. Together, they make the entire AI factory thermally intelligent.

Combined positioning

Phaidra’s roadmap (NVIDIA DSX Max-Q) envisions unifying IT, OT, and cooling into a single optimization layer. Federator.ai Cortex already treats IT and OT as one domain — no extra integration work needed because SLC already manages the workloads that generate the heat. The partnership is natural: Phaidra brings CDU intelligence, Federator.ai brings the workload awareness, admission control, and dynamic flow adjustment that makes the entire system effective and efficient.