Jensen Huang declares the arrival of the reasoning era. What new variables will LPU bring?

On March 16 local time, NVIDIA CEO Jensen Huang unveiled a new computing platform built for agent AI at the GTC conference—NVIDIA Vera Rubin.

This platform is like a set of super “computing gear,” combining multiple key components, including the Vera CPU (central processing unit), the Rubin GPU (graphics processing unit), the NVLink 6 switches, the ConnectX-9 SuperNIC (super network interface card), the BlueField-4 DPU (data processing unit), as well as the Spectrum-6 (Ethernet switch), and the newly added Groq 3 LPU (language processing unit).

Simply put, this is a complete hardware lineup built specifically for AI, making computing faster and smarter.

Among them, NVIDIA also launched the Groq 3 LPX rack designed for large-scale deployments. This means it can combine hundreds of LPUs to work together like a “super brain,” achieving extremely fast inference speeds and massive text-processing capabilities. This rack is equipped with 256 LPUs and comes with 128GB of on-package high-speed storage, with transfer speeds up to 640 TB/s.

In the view of industry insiders, the highlight of this release is not only the chip upgrade, but also a leap forward in system integration density. Zhuang Changlei, Director of the AI/Intelligent Manufacturing Group at Cloud Slope Capital, said in an interview with a reporter from 21st Century Business Herald, “The biggest change is that NVIDIA has officially upgraded the LPU from a single chip or an accelerator card to a first-tier rack system that stands alongside the GPU.”

In particular, the number of LPU units in the LPX rack jumped from 64 in the first generation to 256. This density leap far exceeds industry expectations and also reflects the market’s urgent demand for ultra-low latency and long-text inference.

Zhuang Changlei believes this marks the shift in AI computing from “training-focused” to “training and inference side by side,” with inference becoming a new system-level foundational infrastructure.

Core is for inference

The LPU is a new chip architecture designed specifically for compute-intensive tasks with sequential processing. Its core objective is to optimize language-model inference efficiency through architectural innovation.

On the architecture side, each Groq 3 LPU integrates 500 MB of SRAM. One of the core elements of the LPU is the MEM block—a flat, SRAM-prioritized memory architecture—where the 500 MB of high-speed on-chip SRAM serves as the primary working storage for inference.

(Image source: NVIDIA official website)

The compiler and runtime place the active working set (including weights, activations, and KV states) into on-chip memory and explicitly move data, rather than relying on cache managed by hardware. This reduces unpredictable latency and helps provide low and stable latency by keeping latency-sensitive data close to where computation happens.

Zhuang Changlei told reporters that the core advantage of Groq LPU is not just that it is fast, but also “consistently fast every time”—deterministic latency. This timing-deterministic architectural design requires deep customization of the compute pipeline, memory access, and the compiler, with extremely high technical barriers.

For scenarios with stringent real-time requirements—such as industrial control and autonomous driving—this kind of “determinism” is a must-have. By contrast, general-purpose GPU architectures and ASICs designed by cloud providers based on simplified instruction sets are all difficult to achieve such extreme determinism while still maintaining flexibility.

Haitong Securities Research noted that compared with January’s CES, the positioning of Groq LPU within NVIDIA’s overall product lineup at this year’s GTC conference has become clearer. NVIDIA plans to leverage the low-latency characteristics of LPU to meet higher interactivity requirements of applications such as Agent AI.

Zhuang Changlei also pointed out that once the hardware latency bottleneck is broken, model designers will have more confidence to explore more real-time and more complex interactive AI. For example, today’s AI agents may still need a few seconds to think; in the future, they may truly achieve millisecond-level responses. Models will no longer be “word-slinging,” but will converse with you smoothly and in real time—like a real person.

The year of silicon photonics begins

Outside the NVIDIA Groq 3 LPX racks, another major highlight of the Rubin platform is the NVIDIA Spectrum-6 SPX Ethernet rack.

Using Spectrum-X silicon photonics technology with co-packaged optics (CPO), compared with traditional pluggable transceivers, optical power efficiency can improve by up to 5 times, and system reliability increases by 10 times.

“Scale-Out (interconnection between racks) is the clearest incremental step right now.” Zhuang Changlei said that the Rubin platform has begun adopting CPO switches to solve the problem of massive data-flux transmission between large numbers of racks inside data centers, and it is expected that 2027 will become an important time node for widespread deployment of CPO.

At GTC, NVIDIA also disclosed that after Vera Rubin, NVIDIA’s next major architecture will be Feynman, which will include a new CPU: NVIDIA Rosa.

Among them, Rosa is the core of the new platform. This platform combines NVIDIA’s next-generation LPU LP40 with NVIDIA BlueField-5 and CX10, enabling vertical expansion of copper cables and co-packaged optics through NVIDIA Kyber, as well as NVIDIA Spectrum-level optical horizontal expansion.

“Scale-Up (inside the rack / between chips) is a more forward-looking highlight.” Zhuang Changlei said that in the Feynman architecture, NVIDIA plans to introduce NVLink 8 CPO to achieve “light into the rack,” meaning optical interconnect replaces part of the traditional copper backplane connections, directly connecting the GPU and the LPU. This means optical interconnect is moving step by step from the most edge-of-network switches toward inside the core rack of computing.

In Zhuang Changlei’s view, optical modules, as the “blood vessels” of compute interconnect, are increasing in value as the size of agent clusters expands. As CPO moves from labs into large-scale commercial use, the year of silicon photonics has already begun, which will directly drive upgrades across the entire communications hardware industry chain.

High-end PCB demand may surge

As mentioned above, to meet the needs of agent systems for low latency and long context, NVIDIA also launched the Groq 3 LPX inference acceleration rack, containing 256 LPU processors. Combined with Vera Rubin, the inference throughput per megawatt can increase by 35 times.

And shipping LPUs/LPX in rack form will have disruptive implications for the PCB industry, potentially being the largest upside-beyond-expectations element in the industry chain.

A PCB—printed circuit board—is the carrier that electrically connects electronic components. It has penetrated nearly all electronic devices. China’s PCB industry, as a core engine of the global electronics manufacturing industry, is growing strongly.

Benefiting from advantages such as cost management, environmental standards, and supply-chain supporting capabilities, China’s mainland PCB industry value accounts for more than 50% of the global total, forming industrial clusters such as the Bohai Rim, the Pearl River Delta, and the Yangtze River Delta.

From upstream and downstream perspectives, as AI demand surges and cloud providers’ capital expenditures continue to be revised upward, procurement for AI servers, storage devices, and network equipment increases. CITIC Securities (601066) estimates that the PCB market size corresponding to GPU+ASIC servers exceeds 40 billion in 2025 and exceeds 90 billion in 2026; the growth rate has already doubled.

“At the moment, the global AI server PCB industry is facing a 20% supply-demand gap,” Zhuang Changlei said frankly.

In Zhuang Changlei’s view, as LPU/LPX rack shipments enter a production peak period in late 2026 to 2027, demand for high-end PCBs will show a surge pattern. “This will further exacerbate the shortage of high-end HDI and high-layer-count PCBs, pushing the entire PCB industry chain into a new round of capacity expansion and upgrade cycle.”

For example, because the inside of LPU/LPX racks needs to handle massive data throughput and ultra-low-latency communications, requirements for PCB layer count, materials, and processes are extremely high. Taking NVIDIA’s LPU racks as an example, the PCB value of a single main board can be as high as $6,000, while the PCB total value of a complete rack can reach $96,000 (roughly RMB 700,000). This is more than a 10x increase compared with traditional AI server PCB value.

In addition, to match high-speed signal transmission of 224Gbps and above, and to support high-speed interconnect among 256 LPUs, PCBs must use more advanced base materials and designs. In terms of materials, ordinary substrates can no longer meet the requirements; they must be upgraded to M9-grade copper-clad laminates. Enhanced materials also shift from ordinary electronic fiberglass cloth to Q-glass cloth, whose value is 10 times higher. Even next-generation products have already begun testing M10 materials.

Zhuang Changlei said that in the Rubin Ultra architecture, an orthogonal backplane solution is even introduced, enabling direct interconnection between the GPU and NVSwitch through 78-layer PCBs. This significantly reduces the use of copper cables. This indicates that PCB is replacing part of the role of traditional cables, becoming the “skeleton” for interconnects inside the rack.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin