Power AI processor cluster VICOR

Recently, the introduction of a new cluster supercomputer based on AI ASIC processor has raised the boundary of power transmission network to a level never imagined a few years ago. With the application of current level close to 100kA / ASIC cluster, it is necessary to innovate in power system architecture, topology, control system and packaging to supply such high current demand. Due to the continuous improvement of power level, it is very important to use 48V power bus for power transmission. In addition, the increasingly compact application of processor cluster limits the feasibility of placing the power scheme horizontally next to the processor, so a new power scheme is needed to solve the problem.
VICOR 48V direct to load (< 1V) split ratio architecture (FPA ™)  Unlike the common 48V intermediate bus architecture (IBA), IBA is traditionally composed of an intermediate bus converter and a polyphase POL regulator, while FPA uniquely solves every power transmission problem faced by the cluster processor system through innovative solutions. It also supports the way that the power source solutions are placed vertically on the corresponding surface of the processor, This vertical power transmission mode (VPD) is very important to provide high current to such cluster systems.

Challenges of cluster power transmission

The cluster ASIC system adopts tight packaging to achieve the required high-speed bandwidth, so as to realize the trillions of processing performance required by AI training workload (such as automatic driving). Each processor in the cluster itself may need 600 to 1000 amps of current. Therefore, even on the acceleration card of a single processor, if the placement position of the power scheme is not close to the power pin of the processor, it will also bring serious PCB or substrate impedance loss, which will bring the challenge of power transmission loss of the caller.
In addition, GPU and special AI processors have adopted 7Nm and 5nm processes, and 3nm silicon process nodes will be used soon, so as to realize the rapid development of artificial intelligence (AI). The nominal core operating voltage of these process nodes is currently between 0.75 and 0.85v. In order to speed up the performance of the system with 8 GPU based processors and cards in the cluster, and then put them on each rack of the system with high performance requirements. However, the recent introduction from cerebras and Tesla shows another method of clustering AI ASIC itself. This method can generate supercomputers with great computing power and high power density, but it also brings severe tests on power transmission and challenges on thermal management / cooling.
For power transmission, ASIC / GPU cluster has no horizontal power transmission space like single processor or dual processor AI card. The high-speed I / O signal used by ASIC / GPU cluster is very sensitive to high current switching noise (i.e. the noise generated when the hard switching polyphase buck regulator works). Therefore, moving the hard switching polyphase power scheme closer to the processor will bring more current switching noise. In this case, the power scheme design should not only meet the requirements of noise sensitive I / O signal, but also reduce the PDN value as much as possible. Under the typical design value of 40 – 60A / phase, the number of polyphase power supply schemes required to provide peak current for each AI ASIC or GPU (in many cases, the current demand of each AISC is greater than 1500A) can easily exceed 30 phases. In this application scenario, the traditional transverse power supply (polyphase Buck scheme) is almost difficult to achieve.

A new mode of unlocking current transmission for split ratio power supply

The basic principle of split ratio architecture (FPA) is to divide the power converter into two main functions, optimize each function respectively, and then realize these functions as a system. These two functions are voltage stabilization and current multiplication.

Voltage stabilization

The efficiency of the voltage regulator is inversely proportional to the work done – the more work, the lower the efficiency. The closer the input voltage and output voltage of the regulator are, the less work is performed and the higher the efficiency is. With the position optimization of the split ratio architecture in the system, the input-output voltage difference of the regulator can be minimized. PRM ™  The voltage regulator adopts zero voltage switching (ZVS) buck boost topology, which has high efficiency when the input and output voltage difference is small. ZVS greatly reduces the switching loss, realizes high-frequency operation, and greatly reduces the size of the converter. PRM usually adjusts the input voltage of 40 to 60V to the output voltage of 30 to 50V.
Soft switching and current multiplication
The PRM is followed by the second stage, which performs voltage step-down and current rise functions. This is using sinusoidal amplitude (SAC ™) VTM of topology ™  Current multiplier module. The characteristics of VTM can be regarded as an ideal transformer. Its input and output voltages are related by a fixed ratio, and it can maintain a very low impedance (hundreds of µ Ω) when exceeding 1MHz operating frequency.
Since there is no energy storage device in VTM, it can provide enough energy as long as it maintains sufficient cooling. This makes the power capacity of the VTM match the thermal capacity of the processor.
Sac topology uses zero voltage and zero current switching control system, which further reduces switching noise and power loss.

PRM ™  And VTM ™  It is an integral part of FPA. PRM is selected according to the system input voltage range and power requirements; VTM is selected according to the output voltage range and current requirements. PRM can be installed in any convenient place in the system; The VTM shall be installed as close to the processor core as possible.
PRM and VTM together constitute the functional modules of FPA: one is specially used for voltage stabilization, and the other is specially used for voltage conversion and current multiplication.
SM chip package reduces noise and improves heat dissipation
While the topology and architecture used to implement high-performance voltage regulators are important, packaging technology is equally important. Vicor SM-ChiP ™   The package integrates all passive devices, magnetic devices, MOSFETs and controllers into one module. In addition, the package design can effectively supply high current and facilitate module cooling with the lowest thermal impedance. Many SM chip devices have grounded metal shields in most parts of the outer surface. This not only contributes to cooling, but also shields high-frequency parasitic current noise from spreading outside the device.
Vertical power transmission mode can reduce PDN loss by 95%
For large-scale cluster processor arrays, it is almost impossible to adopt the traditional horizontal power transmission mode. The best solution for cluster processor power supply is vertical power transmission (VPD). In VPD, the current multiplier is directly located under the processor on the other side of the motherboard. By shortening the distance of current passing through the motherboard, the PDN loss is significantly reduced. VPD requires two key features to achieve this function.

Vertical power transmission scheme GTM ™  The current multiplier is placed under the processor to maximize the power transmission performance. The vertical power transmission (VPD) solution is also designed for solutions including higher I / O routing, on-board memory or tighter processor clusters, greatly reducing the number of peripheral device applications.
First, the vertical power supply scheme (VPD) should be in the area directly below the processor, which contains many high-frequency capacitors, which are necessary to decouple UHF current (> 10MHz) from the rest of the system. Secondly, in order to achieve maximum efficiency, the current output position and style of VPD solution must be consistent with the current input position and style mirror on the processor, so as to realize the real high current “vertical” power supply.
In order to realize these functions, VICOR VPD solution is an integration module composed of three layers: the lower layer is a gearbox and the middle layer is VTM ™  Current multiplier array, the upper layer is PRM ™  Voltage regulator, such three layers form a complete VPD solution, which we call DCM ™。 Gearbox performs two functions: one is to include high-frequency decoupling capacitor, and the other is to redistribute the current from VTM to form a mode consistent with the image of the processor above. The size of the VTM array depends on the input current requirements of the processor, and the size of the PRM depends on the total power demand. If the GPU or ASIC needs multiple power rails, the VTM layer and PRM layer can be realized by using independent PRM and VTM respectively, and their size can meet the current and voltage requirements of each specific rail.

Vicor DCM ™  It is a complete 48V to load VPD solution for ASIC clusters implemented in an advanced package. PRM ™、 VTM ™  And the gearbox layer of the module provide voltage stabilization, current multiplication, decoupling capacitance and pin to pin package matching.
Vicor FPA ™  Architecture, ZVS and ZCS control system, high frequency sac ™  Current multiplier topology and SM chip ™  Packaging technology provides all the elements to improve VPD. It solves the problem of low noise and cluster power transmission, and simplifies the mechanical design of cooling and thermal management with high efficiency and strong thermal adaptability. VPD solution allows processors to analyze high-speed and massive data through clusters, so as to improve the training model and improve machine learning to a significantly higher level, so as to become the real promoter of high-performance AI system.

Better ways to get high-performance computing power

AI and machine learning are in the primary stage of growth, and this train will only accelerate with the passage of years. This acceleration requires solutions that process more complex data faster. The new generation of supercomputers based on AI ASIC processors will require more power than traditional supercomputers. A new and innovative power transmission scheme is the only way for AI to realize its commitment. It requires the power system architecture, topology, control system and packaging to work together to meet the increasing high current demand. The vertical power supply scheme using current multiplier is the preferred solution. It is a proven and mature solution that can meet today’s demand for high-performance computing and can be easily expanded to keep up with future needs. It has compact structure and high efficiency, and can reduce the power loss of PDN by more than 50%.

Post time: Mar-03-2022