"Token Inflation" Underway! Zhipu New Model Price Increased by 20%

LightningPacketLoss · 2026-03-17T04:20:14+00:00

The large language model industry is facing a wave of price increases, with Zhipu's newly launched GLM-5-Turbo model API experiencing a 20% price hike. Market attention on token consumption has increased, with recognition of it becoming a new means of production. In the future, as agent tasks become more complex, token consumption will grow significantly, with the number of agents expected to exceed 350 million by 2031, with a compound annual growth rate exceeding 135%.

LightningPacketLoss

2026-03-17 04:20:14

Abstract generation in progress

Source: Shanghai Securities Journal Author: Sun Xiaocheng

The price hike wave in the large model industry continues. On March 16, “the world’s leading large model stock” Zhipu announced the launch of the base model GLM-5-Turbo, designed for intelligent agent tasks like OpenClaw (“Lobster”), and simultaneously increased the API prices for GLM-5-Turbo by 20%. Prior to this, Tencent Cloud and OpenAI also raised prices for some of their models.

Industry insiders believe that as large models evolve from simple “question-and-answer” functions to capable “doing real work,” the tokens (the core metric for charging, representing words or units of text) consumed per call are significantly increasing. This directly raises the costs for model providers and makes price increases for users a natural outcome. As intelligent agents handle more complex tasks in the future, token consumption is expected to grow exponentially.

Focus on “Doing Real Work” — Zhipu’s New Model Price Increase

Within two months, Zhipu has raised prices twice. On February 12, it launched the new flagship model GLM-5, and the next day, increased the price of the GLM Coding Plan package by at least 30%. This news boosted the performance of the AI industry chain and drew more market attention to “token inflation” narratives.

Compared to previous models, Zhipu’s latest price hike remains substantial. Relative to GLM-4.7, GLM-5 has increased in price by an average of 50%; on top of that, GLM-5-Turbo is priced 20% higher than GLM-5, representing an 83% increase over GLM-4.7 on average.

The new model GLM-5-Turbo, which Zhipu is pricing again, is designed specifically for “doing real work” and executing intelligent agent tasks, including the recently popular OpenClaw (“Lobster”).

Zhipu’s technical director explained that although Lobster is currently popular, user feedback indicates it doesn’t run smoothly. When deployed in real, complex agent scenarios, general large models often struggle to respond effectively.

The reason is that agent tasks are not simple question-and-answer exchanges. They typically involve multi-turn understanding, task decomposition, tool invocation, state management, time-triggered actions, and long, continuous workflows. Therefore, even if a general model performs well in dialogue, it can easily deviate from instructions, have unstable tool calls, or stall during long tasks in real Lobster scenarios.

The director believes that to fundamentally solve these issues, deep optimization at the base model level is necessary. Based on this, Zhipu has systematically constructed various task scenarios around real agent workflows, enabling the model to handle complex, dynamic, long-chain tasks with true executability. They have focused on enhancing GLM-5-Turbo’s capabilities in tool invocation, instruction following, timed and continuous tasks, and long-chain execution.

From “Question-and-Answer” to “Doing Real Work” — Tokens Consumption Multiplies

In the era of large models, tokens have become “measurable production resources,” no longer “free traffic.” Guoyuan Securities believes that large models turn “dialogue/code writing/content generation,” which seem like services provided by software vendors, into online inference services heavily dependent on computing power.

For model providers, each response consumes GPU, memory, bandwidth, and electricity; for users, every time they ask the model to think longer, write longer code, or handle more complex tasks, more tokens are consumed. As a result, tokens have naturally become a new unit of measurement.

Hence, the concept of “token inflation” has emerged in the market. This does not simply mean tokens are becoming more expensive; rather, it refers to a structural increase in tokens consumed per unit time and per user.

Some large model companies have disclosed data confirming this trend. For example, in the first two months of 2026, MiniMax’s model calls and new user numbers both surged significantly. Its M2 series text models saw an average daily token consumption increase of over six times from December 2025 to February 2026.

Looking ahead, as large models improve their “doing real work” capabilities, token consumption will grow exponentially. IDC forecasts that by 2031, the number of active intelligent agents in Chinese enterprises will exceed 350 million, with a compound annual growth rate of over 135%, leading the global markets. Additionally, due to increased task execution density and complexity, the annual token consumption of intelligent agents is expected to rise by more than 30 times, showing an exponential growth trend.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.