Inference Computing Power Demand Surges; Industry Chain Enterprises Accelerate Layout

robot
Abstract generation in progress

Securities Daily Reporter Wang Jingru

As generative artificial intelligence technology gradually moves from “model training” to large-scale commercial deployment, the focus of computational power consumption is shifting from training to continuous inference-driven demand. On March 17, NVIDIA CEO Jensen Huang stated at the GTC conference that the inflection point for the AI inference market has arrived, with AI fully entering the inference and execution phase from training, leading to exponential growth in inference computing power demand.

“With the expansion of generative AI applications, the growth rate of inference computing power demand may far exceed that of training. On one hand, application demand is exploding, with generative AI and intelligent agent applications accelerating deployment, and high-frequency user interactions generating exponential inference requests; on the other hand, breakthroughs in specialized inference chips, liquid cooling, and optical interconnect technologies are significantly improving computing efficiency and concurrency, laying the foundation for large-scale deployment,” said Zhang Pengyuan, researcher at Qianhai PaiPaiNet Fund Sales Co., Ltd., to Securities Daily.

Industry forecasts indicate that the importance of inference computing power continues to rise. International Data Corporation (IDC) predicts that by 2027, inference computing power in China will account for over 70% of total computing power. Huang Chao, founder and CEO of China IDC Circle, stated that by 2026, the industry’s intelligent agents will enter a blooming development stage, with computing power applications shifting from “training-led” to “inference-driven,” and the explosive cycle of inference computing power demand is about to fully arrive.

In response to the rapid growth of inference computing power demand, domestic upstream and downstream companies are accelerating technological R&D and product deployment. At the chip level, many manufacturers are launching chips optimized for inference scenarios. Compared to traditional training chips, inference chips emphasize power consumption control, cost efficiency, and deployment flexibility, making them suitable for both cloud and edge applications.

Take Shenzhen Yuntian Lifei Technology Co., Ltd. (hereinafter “Yuntian Lifei”) as an example. The company focuses on NPU as the core, establishing the GPNPU technology route for large-scale cloud inference chips, with deep optimization in matrix, vector units, storage hierarchy, and effective bandwidth utilization. The goal is to exponentially reduce token costs and accelerate the large-scale, inclusive deployment of large models.

By 2025, Yuntian Lifei is expected to achieve revenue of 1.308 billion yuan, a year-on-year increase of 42.57%. A relevant executive told Securities Daily, “For enterprises, as industry competition shifts from training scale to inference efficiency, delivery costs, and system profitability, those who can integrate hardware, storage, and software earlier will have a better chance to take the lead in the inference era.”

At the server and system level, leading manufacturers are also continuously launching inference-optimized computing platforms. For example, Inspur Electronic Information Industry Co., Ltd. launched the YuanNao R1 inference server, capable of supporting 16 standard PCIe double-width cards in a single machine, which can deploy the DeepSeek-671B model; they also launched YuanNao CPU inference servers for rapid deployment and efficient operation of next-generation inference models like DeepSeek-R132B and QwQ-32B.

Meanwhile, infrastructure construction for computing power is accelerating. In the past, many domestic intelligent computing centers adopted integrated training and inference models. On March 12, Yuntian Lifei won the bid for the Guangdong Zhanjiang AI penetration support new productivity infrastructure project, which is focused on inference tasks with an AI inference cluster mainly serving various industry applications, providing deployment examples for AI transformation of traditional domestic industries.

He Li, General Manager of Beijing Zhi Yu Zhi Shan Investment Management Co., Ltd., believes that in this transformation, high-performance inference chips, HBM, and full-stack software will be the first to benefit from the computing power dividend. Inference scenarios demand extremely low latency, high throughput, and energy efficiency; dedicated architectures like LPU and ASIC will accelerate replacing general-purpose computing units, while storage technologies like HBM4 will be key to breaking bandwidth bottlenecks. Additionally, as computing power shifts from data centers to the edge, the demand for high-density inference racks and advanced cooling technologies increases. Coupled with model quantization, parameter compression, and other compiler optimizations, this will drive the industry from hardware stacking toward hardware-software integration.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin