Practical Design of the Power Delivery Network for AI Hyper-Scale Blade Servers: - Data Storage - Applications - Applications

Practical Design of the Power Delivery Network for AI Hyper-Scale Blade Servers: Balancing Power Density, Efficiency, and Reliability

AI Blade Server Power Delivery Network Topology Diagram

AI Hyper-Scale Blade Server Power Delivery Network Overall Topology

Download Format:

graph LR %% Primary Side High-Voltage Conversion subgraph "Primary-Side High-Voltage Conversion (AC/DC or DC/DC)" INPUT[AC Input or 48VDC Bus] --> EMI_FILTER["EMI Filter & Protection"] EMI_FILTER --> PFC_STAGE["PFC Stage"] PFC_STAGE --> LLC_RESONANT["LLC Resonant Converter"] subgraph "Primary-Side SiC MOSFET Array" Q_PFC1["VBP117MC06
1700V/6A SiC"] Q_PFC2["VBP117MC06
1700V/6A SiC"] Q_LLC1["VBP117MC06
1700V/6A SiC"] Q_LLC2["VBP117MC06
1700V/6A SiC"] end PFC_STAGE --> Q_PFC1 PFC_STAGE --> Q_PFC2 LLC_RESONANT --> Q_LLC1 LLC_RESONANT --> Q_LLC2 Q_PFC1 --> HV_BUS["High-Voltage DC Bus"] Q_PFC2 --> HV_BUS HV_BUS --> LLC_XFMR["LLC Transformer"] LLC_XFMR --> Q_LLC1 LLC_XFMR --> Q_LLC2 Q_LLC1 --> GND_PRI Q_LLC2 --> GND_PRI end %% Intermediate Bus & POL Conversion subgraph "Intermediate Bus & High-Current POL Converters" LLC_XFMR_SEC["Transformer Secondary"] --> INT_BUS["12V/48V Intermediate Bus"] INT_BUS --> POL_CONVERTERS["Multi-Phase POL Converters"] subgraph "High-Current POL MOSFET Array" Q_POL1["VBGQT1601
60V/340A TO-LL"] Q_POL2["VBGQT1601
60V/340A TO-LL"] Q_POL3["VBGQT1601
60V/340A TO-LL"] Q_POL4["VBGQT1601
60V/340A TO-LL"] end POL_CONVERTERS --> Q_POL1 POL_CONVERTERS --> Q_POL2 POL_CONVERTERS --> Q_POL3 POL_CONVERTERS --> Q_POL4 Q_POL1 --> GPU_CPU_RAIL["GPU/CPU Core Rail
1-2V, 1000A+"] Q_POL2 --> GPU_CPU_RAIL Q_POL3 --> GPU_CPU_RAIL Q_POL4 --> GPU_CPU_RAIL end %% Auxiliary Power & Load Management subgraph "Low-Voltage Rails & Auxiliary Power" AUX_CONVERTER["Auxiliary DC-DC Converter"] --> LV_RAILS["Low-Voltage Rails
(5V, 3.3V, 1.8V)"] subgraph "Load Switch & Power Management MOSFETs" Q_AUX1["VBE1606
60V/97A TO-252"] Q_AUX2["VBE1606
60V/97A TO-252"] Q_AUX3["VBE1606
60V/97A TO-252"] Q_AUX4["VBE1606
60V/97A TO-252"] end LV_RAILS --> Q_AUX1 LV_RAILS --> Q_AUX2 LV_RAILS --> Q_AUX3 LV_RAILS --> Q_AUX4 Q_AUX1 --> LOAD1["PCIe Slot Power"] Q_AUX2 --> LOAD2["Fan Wall Control"] Q_AUX3 --> LOAD3["Hot-Swap/OR-ing"] Q_AUX4 --> LOAD4["Memory/Storage"] end %% Thermal Management System subgraph "Three-Level Thermal Management Architecture" COOLING_LEVEL1["Level 1: Liquid Cooling
Cold Plate"] --> HIGH_FLUX["High Heat Flux Components
VRMs, SiC MOSFETs"] COOLING_LEVEL2["Level 2: Forced Air Cooling
High-Flow Fans"] --> MAGNETICS["Magnetics & Capacitors"] COOLING_LEVEL3["Level 3: Conduction Cooling
PCB to Chassis"] --> BOARD_COMP["Board-Level Components"] HIGH_FLUX --> Q_POL1 HIGH_FLUX --> Q_PFC1 MAGNETICS --> LLC_XFMR BOARD_COMP --> Q_AUX1 end %% Control & Monitoring System subgraph "Digital Control & Monitoring" MCU["Main Control MCU/DSP"] --> DIGITAL_CONTROLLER["Digital Multi-Phase Controller"] DIGITAL_CONTROLLER --> GATE_DRIVER_POL["POL Gate Drivers"] GATE_DRIVER_POL --> Q_POL1 GATE_DRIVER_POL --> Q_POL2 MCU --> TELEMETRY["System Telemetry"] TELEMETRY --> CURRENT_SENSE["Current Sensors"] TELEMETRY --> TEMP_SENSE["NTC Temperature Sensors"] TELEMETRY --> VOLT_MON["Voltage Monitors"] CURRENT_SENSE --> MCU TEMP_SENSE --> MCU VOLT_MON --> MCU MCU --> FAN_PWM["Fan PWM Control"] MCU --> PUMP_CTRL["Liquid Pump Control"] end %% Protection & Signal Integrity subgraph "Protection & SI/PI Design" ACTIVE_CLAMP["Active Clamping/Snubbers"] --> Q_PFC1 RC_SNUBBER["RC Snubbers"] --> Q_POL1 TVS_PROTECTION["TVS Protection Array"] --> GATE_DRIVERS SI_PI_DESIGN["SI/PI Design: X-Y Capacitor Layout"] --> Q_POL1 SYMMETRICAL_LAYOUT["Symmetrical Multi-Phase Layout"] --> POL_CONVERTERS end %% Style Definitions style Q_PFC1 fill:#e8f5e8,stroke:#4caf50,stroke-width:2px style Q_POL1 fill:#e3f2fd,stroke:#2196f3,stroke-width:2px style Q_AUX1 fill:#fff3e0,stroke:#ff9800,stroke-width:2px style MCU fill:#fce4ec,stroke:#e91e63,stroke-width:2px

As AI hyper-scale blade servers evolve towards higher computational density, greater energy efficiency, and unwavering reliability, their internal power delivery and management systems are no longer simple utility units. Instead, they are the core determinants of rack-level power performance, operational PUE (Power Usage Effectiveness), and total cost of ownership. A well-designed power chain is the physical foundation for these compute blades to achieve stable high-current delivery to ASICs/GPUs, high-efficiency multi-stage conversion, and long-lasting durability under 24/7 operational stress.
However, building such a chain presents multi-dimensional challenges: How to balance increased power density with thermal dissipation limits? How to ensure the long-term reliability of power devices in environments characterized by high ambient temperatures and relentless electrical stress? How to seamlessly integrate high-current delivery, fast transient response, and intelligent power management? The answers lie within every engineering detail, from the selection of key components to system-level integration.
I. Three Dimensions for Core Power Component Selection: Coordinated Consideration of Voltage, Current, and Topology
1. Primary-Side High-Voltage Converter (PFC/LLC) SiC MOSFET: The Enabler of High-Frequency, High-Efficiency Conversion
The key device is the VBP117MC06 (1700V/6A/TO-247, SiC MOSFET), whose selection is critical for system-level efficiency.
Voltage Stress Analysis: Considering modern server power supply units (PSUs) with 3-phase 400V AC input or 48VDC intermediate bus architectures, the 1700V rating provides substantial margin for voltage spikes and ringings in hard-switching or resonant topologies (e.g., Totem-Pole PFC, LLC). This allows operation at higher switching frequencies (e.g., 200-500kHz) while maintaining safe derating. The SiC technology's inherent robustness supports this high-voltage, high-frequency operation.
Dynamic Characteristics and Loss Optimization: The low RDS(on) (1500mΩ @18V) for a 1700V device, combined with SiC's negligible reverse recovery charge (Qrr), is transformative. In PFC stages, it minimizes switching losses at high frequency, enabling higher efficiency. In LLC resonant converters, it allows for higher operational frequencies, drastically reducing the size of magnetic components (transformers, inductors) and increasing power density.
Thermal Design Relevance: The TO-247 package, when paired with a properly designed heatsink, must manage the heat from high-frequency switching losses. The calculation of junction temperature is crucial: Tj = Tc + (P_cond + P_sw) × Rθjc. SiC's ability to operate at higher junction temperatures (theoretically >200°C) offers an advantage in thermal design margin.
2. Intermediate Bus / High-Current POL (Point-of-Load) Converter MOSFET: The Backbone of GPU/CPU Power Delivery
The key device selected is the VBGQT1601 (60V/340A/TOLL, SGT MOSFET), a cornerstone for high-current, low-voltage applications.
Efficiency and Power Density Enhancement: In a multi-phase buck converter powering a 1-2V, 1000A+ GPU/CPU rail, conduction loss is paramount. This device's ultra-low RDS(on) of 1mΩ (max) directly minimizes I²R losses. The TO-LL package offers an excellent balance of low package resistance, superior thermal performance (exposed top for heatsinking), and reduced parasitic inductance compared to traditional TO-247. This facilitates very high di/dt switching necessary for fast transient response to AI workload spikes, without excessive voltage overshoot.
Server Environment Adaptability: The robust mechanical structure of the TO-LL package is suitable for the vibration environment within a server chassis. Its optimized pin-out minimizes switching loop inductance, critical for maintaining clean switching waveforms and low EMI in densely packed multi-phase VRM (Voltage Regulator Module) designs.
Drive Circuit Design Points: Requires a high-current, low-impedance gate driver capable of fast switching. Careful layout with symmetric gate drive paths for parallel devices is essential. Active or passive balancing techniques are needed in multi-phase configurations.
3. Low-Voltage Rail & Auxiliary Power MOSFET: The Execution Unit for Precision Power Sequencing & Distribution
The key device is the VBE1606 (60V/97A/TO-252, Trench MOSFET), enabling efficient and compact load switching.
Typical Load Management Logic: Used in secondary-side synchronous rectification for lower-power DC-DC rails (e.g., 12V to 5V/3.3V), as a high-side or low-side switch for PCIe slot power, fan wall control, or for hot-swap and OR-ing functions. Its excellent RDS(on) rating at low gate drive voltages (4.5mΩ @10V, 12mΩ @4.5V) ensures minimal voltage drop and high efficiency even when driven directly from a logic-level controller.
PCB Layout and Reliability: The TO-252 (DPAK) package offers a good compromise between current handling, thermal dissipation (through PCB copper area), and footprint. For high-current auxiliary rails, multiple devices can be paralleled easily. Attention must be paid to using adequate copper pours and thermal vias on the PCB to act as an effective heatsink, keeping the case temperature within limits during continuous operation.
II. System Integration Engineering Implementation
1. Multi-Level Thermal Management Architecture
A hierarchical cooling strategy is non-negotiable.
Level 1: Liquid Cooling (Cold Plate): Targets the highest heat flux components—the multi-phase VRMs using VBGQT1601 MOSFETs and potentially the primary-side SiC MOSFETs. Direct-attach cold plates are used to keep junction temperatures low and ensure long-term reliability.
Level 2: Forced Air Cooling (High-Flow Fans): Targets the primary-side magnetics (PFC/LLC transformers), bulk capacitors, and secondary-side power stages for lower-current rails. Optimized blade-level ducting ensures targeted airflow.
Level 3: Conduction Cooling to Chassis: For devices like the VBE1606 and other board-level components, reliance is on thermal vias, internal PCB ground planes, and ultimately conduction to the server blade's metal chassis, which acts as a heat spreader.
2. Signal Integrity & Power Integrity (SI/PI) Design
High di/dt Loop Control: For the POL converters, implement an "X-Y split" capacitor layout and use low-ESL/ESR polymer and ceramic capacitors placed immediately adjacent to the VBGQT1601 MOSFETs to minimize switching loop inductance and supply rail noise.
Multi-Phase Controller Synchronization & Layout: Ensure symmetrical layout for all phases of a VRM to guarantee current sharing. Use daisy-chained or star-topology clock synchronization to avoid beat frequencies and reduce input current ripple.
Auxiliary Rail Decoupling: Even for auxiliary switches like the VBE1606, proper local bulk and high-frequency decoupling is necessary to prevent noise from coupling into sensitive analog or communication circuits on the same board.
3. Reliability Enhancement Design
Electrical Stress Protection: Implement active clamping or snubbers for the primary-side SiC MOSFET (VBP117MC06) to manage voltage spikes during turn-off. Use gate resistor optimization and RC snubbers where needed on POL switches.
Fault Diagnosis and Telemetry: Implement comprehensive telemetry: Current & Temperature Sensing: Use integrated current sense in controllers or dedicated shunts/sensors. Place NTCs or use MOSFET's own temperature sense feature (if available). Voltage Monitoring: Real-time monitoring of all key rails for droop/overshoot. Predictive analytics can track increasing RDS(on) of MOSFETs over time as a health indicator.
III. Performance Verification and Testing Protocol
1. Key Test Items and Standards
Efficiency Mapping: Test full-load efficiency from AC input (or 48V input) to all key DC outputs (e.g., 12V, GPU/CPU rail) across load ranges (10%, 25%, 50%, 75%, 100%). Focus on typical AI workload profiles (bursty, sustained).
Thermal & Airflow Testing: Measure component temperatures (case and estimated junction) under worst-case airflow and ambient temperature (e.g., 40°C inlet) scenarios.
Transient Response Test: Apply large step loads (e.g., 50-100A/μs) to the CPU/GPU rail and verify output voltage deviation and recovery time meet specifications.
Long-Term Reliability (LTOL) Test: Perform extended high-temperature operational life testing to validate design margins and component lifespan projections.
2. Design Verification Example
Test data from a prototype 3kW AI accelerator blade power supply (48V Intermediate Bus, Ambient temp: 35°C) shows:
Primary Side (SiC-based LLC): Peak efficiency of 98.2% at 500kHz switching frequency.
POL Stage (Multi-phase using VBGQT1601): Peak efficiency of 94.5% for the 1.8V/800A GPU rail.
Key Point Temperature Rise: Under sustained 80% load with specified airflow, estimated SiC MOSFET junction temperature was 112°C; POL MOSFET (VBGQT1601) case temperature was 68°C.
Transient Performance: The POL stage recovered from a 300A step load within 10μs with less than 30mV deviation.
IV. Solution Scalability
1. Adjustments for Different Compute Density Tiers
Entry / Inference Blades: May use a simpler 12V intermediate bus. Primary side could utilize high-voltage Superjunction MOSFETs (e.g., VBP165R76SFD). POL stages may use fewer phases with devices like VBE1606.
High-Performance Training Blades: Require the described high-efficiency SiC primary and ultra-low RDS(on) TO-LL POL solutions. May implement 54V or higher direct-to-chip architectures, pushing requirements for even higher-voltage, efficient conversion.
Liquid-Cooled Direct-to-Chip Systems: The power chain must integrate with cold plate designs. This places a premium on component placement for optimal thermal interface and may drive adoption of even more compact, low-profile packages.
2. Integration of Cutting-Edge Technologies
Gallium Nitride (GaN) Integration: For the next phase, GaN HEMTs can be evaluated for the primary-side PFC and LLC stages, targeting even higher frequencies (MHz+) and power density, further shrinking magnetics.
Digital Power & AI-Optimized Power Management: Future systems will leverage fully digital multi-phase controllers with AI-driven algorithms. These can predict workload patterns and dynamically adjust phase count, switching frequency, and voltage positioning for optimal efficiency across the entire operational map.
Fully Integrated Voltage Regulators (FIVR) & Domain-Specific Power: As compute architectures evolve, power delivery will become more decentralized. This will increase the demand for highly integrated, efficient, and compact POL solutions close to the processing cores.
Conclusion
The power delivery network design for AI hyper-scale blade servers is a multi-dimensional systems engineering task, requiring a balance among power density, conversion efficiency, thermal performance, signal integrity, and relentless reliability. The tiered optimization scheme proposed—leveraging SiC technology for high-frequency primary conversion, utilizing ultra-low-RDS(on) SGT MOSFETs in TO-LL packages for core compute rail delivery, and employing high-performance trench MOSFETs for auxiliary power management—provides a clear implementation path for next-generation server power designs.
As computational demands continue their exponential rise, future server power architecture will trend towards higher bus voltages, more granular power delivery, and deeper integration with cooling systems. It is recommended that engineers adhere to rigorous server-grade design standards and validation processes while using this framework, preparing for the impending transitions to GaN, advanced digital control, and liquid-cooled power stages.
Ultimately, an excellent server power design is largely invisible. It does not compute but is the critical enabler, creating value through maximum compute uptime, minimized energy costs, and the reliable, high-current foundation upon which AI breakthroughs are built. This is the true value of engineering precision in powering the intelligence revolution.

Detailed Topology Diagrams

Primary-Side High-Voltage Conversion Detail (SiC MOSFET)

Download Format:

graph LR subgraph "Three-Phase PFC Stage with SiC MOSFETs" AC_IN["Three-Phase 400VAC Input"] --> EMI["EMI Filter"] EMI --> RECTIFIER["Three-Phase Rectifier"] RECTIFIER --> PFC_INDUCTOR["PFC Inductor"] PFC_INDUCTOR --> PFC_SW_NODE["PFC Switching Node"] PFC_SW_NODE --> Q_PFC["VBP117MC06 SiC MOSFET
1700V/6A"] Q_PFC --> HV_BUS["700-800V DC Bus"] PFC_CONTROLLER["PFC Controller"] --> PFC_DRIVER["Gate Driver"] PFC_DRIVER --> Q_PFC end subgraph "LLC Resonant Converter Stage" HV_BUS --> LLC_RES_TANK["LLC Resonant Tank
(Lr, Cr, Lm)"] LLC_RES_TANK --> LLC_XFMR["High-Frequency Transformer"] LLC_XFMR --> LLC_SW_NODE["LLC Switching Node"] LLC_SW_NODE --> Q_LLC["VBP117MC06 SiC MOSFET
1700V/6A"] Q_LLC --> GND LLC_CONTROLLER["LLC Controller"] --> LLC_DRIVER["Gate Driver"] LLC_DRIVER --> Q_LLC end subgraph "Voltage Spike Protection" SNUBBER["Active Clamp/Snubber Circuit"] --> Q_PFC SNUBBER --> Q_LLC GATE_PROT["Gate Protection TVS"] --> PFC_DRIVER GATE_PROT --> LLC_DRIVER end style Q_PFC fill:#e8f5e8,stroke:#4caf50,stroke-width:2px style Q_LLC fill:#e8f5e8,stroke:#4caf50,stroke-width:2px

High-Current POL Converter & VRM Detail

Download Format:

graph LR subgraph "Multi-Phase Buck Converter for GPU/CPU Rail" INT_BUS_IN["12V/48V Intermediate Bus"] --> INPUT_CAPS["Input Capacitor Bank
Polymer + Ceramic"] INPUT_CAPS --> PHASE1["Phase 1"] INPUT_CAPS --> PHASE2["Phase 2"] INPUT_CAPS --> PHASE3["Phase 3"] INPUT_CAPS --> PHASE4["Phase 4"] subgraph "Phase 1 Power Stage" P1_HIGH["High-Side: VBGQT1601"] --> P1_INDUCTOR["Output Inductor"] P1_LOW["Low-Side: VBGQT1601"] --> P1_INDUCTOR end subgraph "Phase 2 Power Stage" P2_HIGH["High-Side: VBGQT1601"] --> P2_INDUCTOR["Output Inductor"] P2_LOW["Low-Side: VBGQT1601"] --> P2_INDUCTOR end subgraph "Phase 3 Power Stage" P3_HIGH["High-Side: VBGQT1601"] --> P3_INDUCTOR["Output Inductor"] P3_LOW["Low-Side: VBGQT1601"] --> P3_INDUCTOR end subgraph "Phase 4 Power Stage" P4_HIGH["High-Side: VBGQT1601"] --> P4_INDUCTOR["Output Inductor"] P4_LOW["Low-Side: VBGQT1601"] --> P4_INDUCTOR end P1_INDUCTOR --> OUTPUT_CAPS["Output Capacitor Array"] P2_INDUCTOR --> OUTPUT_CAPS P3_INDUCTOR --> OUTPUT_CAPS P4_INDUCTOR --> OUTPUT_CAPS OUTPUT_CAPS --> GPU_RAIL["GPU/CPU Core Rail
1-2V, 1000A+"] DIGITAL_CONTROLLER["Digital Multi-Phase Controller"] --> GATE_DRIVERS["Gate Driver Array"] GATE_DRIVERS --> P1_HIGH GATE_DRIVERS --> P1_LOW GATE_DRIVERS --> P2_HIGH GATE_DRIVERS --> P2_LOW GATE_DRIVERS --> P3_HIGH GATE_DRIVERS --> P3_LOW GATE_DRIVERS --> P4_HIGH GATE_DRIVERS --> P4_LOW subgraph "Current Sharing & Synchronization" CURRENT_SENSE["Current Sensing (Each Phase)"] --> DIGITAL_CONTROLLER SYNC_CLOCK["Synchronization Clock
Daisy-Chained/Star"] --> DIGITAL_CONTROLLER end end style P1_HIGH fill:#e3f2fd,stroke:#2196f3,stroke-width:2px style P1_LOW fill:#e3f2fd,stroke:#2196f3,stroke-width:2px

Thermal Management & System Integration Detail

Download Format:

graph LR subgraph "Three-Level Cooling Strategy" LEVEL1["Level 1: Liquid Cooling"] --> COLD_PLATE["Direct-Attach Cold Plate"] COLD_PLATE --> HIGH_POWER_DEVICES["High Power Devices
VRMs, SiC MOSFETs"] LEVEL2["Level 2: Forced Air Cooling"] --> DUCTING["Optimized Ducting"] DUCTING --> HIGH_FLOW_FANS["High-Flow Server Fans"] HIGH_FLOW_FANS --> MAGNETICS_CAPS["Magnetics & Bulk Capacitors"] LEVEL3["Level 3: Conduction Cooling"] --> PCB_THERMAL["PCB Thermal Design"] PCB_THERMAL --> THERMAL_VIAS["Thermal Vias Array"] THERMAL_VIAS --> GROUND_PLANE["Internal Ground Planes"] GROUND_PLANE --> CHASSIS["Metal Chassis Heat Spreader"] CHASSIS --> AUX_COMPONENTS["Auxiliary Components"] end subgraph "Thermal Monitoring & Control" TEMP_SENSORS["Temperature Sensors
(NTC, MOSFET Sense)"] --> MCU["System MCU"] MCU --> FAN_CONTROLLER["Fan PWM Controller"] MCU --> PUMP_CONTROLLER["Liquid Pump Controller"] FAN_CONTROLLER --> HIGH_FLOW_FANS PUMP_CONTROLLER --> LIQUID_PUMP["Liquid Cooling Pump"] LIQUID_PUMP --> COLD_PLATE subgraph "Predictive Thermal Management" HISTORY_DATA["Historical Temperature Data"] --> AI_ALGO["AI Algorithm"] AI_ALGO --> PREDICTIVE["Predictive Adjustment"] PREDICTIVE --> MCU end end subgraph "Reliability & Protection Circuits" OVERCURRENT["Overcurrent Protection"] --> COMPARATOR["Fast Comparator"] OVERVOLTAGE["Overvoltage Protection"] --> COMPARATOR OVERTEMP["Overtemperature Protection"] --> COMPARATOR COMPARATOR --> FAULT_LATCH["Fault Latch Circuit"] FAULT_LATCH --> SHUTDOWN["System Shutdown Signal"] subgraph "Electrical Stress Mitigation" SNUBBER_CIRCUITS["RC/RCD Snubbers"] --> SWITCHING_NODES GATE_PROTECTION["Gate Protection TVS"] --> DRIVER_ICS ACTIVE_CLAMPING["Active Voltage Clamping"] --> HV_MOSFETS end end style HIGH_POWER_DEVICES fill:#e3f2fd,stroke:#2196f3,stroke-width:2px style COLD_PLATE fill:#bbdefb,stroke:#1976d2,stroke-width:2px