Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
"Token Inflation" Underway! Zhipu New Model Price Increased by 20%
Source: Shanghai Securities Journal Author: Sun Xiaocheng
The price hike wave in the large model industry continues. On March 16, “the world’s leading large model stock” Zhipu announced the launch of the base model GLM-5-Turbo, designed for intelligent agent tasks like OpenClaw (“Lobster”), and simultaneously increased the API prices for GLM-5-Turbo by 20%. Prior to this, Tencent Cloud and OpenAI also raised prices for some of their models.
Industry insiders believe that as large models evolve from simple “question-and-answer” functions to capable “doing real work,” the tokens (the core metric for charging, representing words or units of text) consumed per call are significantly increasing. This directly raises the costs for model providers and makes price increases for users a natural outcome. As intelligent agents handle more complex tasks in the future, token consumption is expected to grow exponentially.
Focus on “Doing Real Work” — Zhipu’s New Model Price Increase
Within two months, Zhipu has raised prices twice. On February 12, it launched the new flagship model GLM-5, and the next day, increased the price of the GLM Coding Plan package by at least 30%. This news boosted the performance of the AI industry chain and drew more market attention to “token inflation” narratives.
Compared to previous models, Zhipu’s latest price hike remains substantial. Relative to GLM-4.7, GLM-5 has increased in price by an average of 50%; on top of that, GLM-5-Turbo is priced 20% higher than GLM-5, representing an 83% increase over GLM-4.7 on average.
The new model GLM-5-Turbo, which Zhipu is pricing again, is designed specifically for “doing real work” and executing intelligent agent tasks, including the recently popular OpenClaw (“Lobster”).
Zhipu’s technical director explained that although Lobster is currently popular, user feedback indicates it doesn’t run smoothly. When deployed in real, complex agent scenarios, general large models often struggle to respond effectively.
The reason is that agent tasks are not simple question-and-answer exchanges. They typically involve multi-turn understanding, task decomposition, tool invocation, state management, time-triggered actions, and long, continuous workflows. Therefore, even if a general model performs well in dialogue, it can easily deviate from instructions, have unstable tool calls, or stall during long tasks in real Lobster scenarios.
The director believes that to fundamentally solve these issues, deep optimization at the base model level is necessary. Based on this, Zhipu has systematically constructed various task scenarios around real agent workflows, enabling the model to handle complex, dynamic, long-chain tasks with true executability. They have focused on enhancing GLM-5-Turbo’s capabilities in tool invocation, instruction following, timed and continuous tasks, and long-chain execution.
From “Question-and-Answer” to “Doing Real Work” — Tokens Consumption Multiplies
In the era of large models, tokens have become “measurable production resources,” no longer “free traffic.” Guoyuan Securities believes that large models turn “dialogue/code writing/content generation,” which seem like services provided by software vendors, into online inference services heavily dependent on computing power.
For model providers, each response consumes GPU, memory, bandwidth, and electricity; for users, every time they ask the model to think longer, write longer code, or handle more complex tasks, more tokens are consumed. As a result, tokens have naturally become a new unit of measurement.
Hence, the concept of “token inflation” has emerged in the market. This does not simply mean tokens are becoming more expensive; rather, it refers to a structural increase in tokens consumed per unit time and per user.
Some large model companies have disclosed data confirming this trend. For example, in the first two months of 2026, MiniMax’s model calls and new user numbers both surged significantly. Its M2 series text models saw an average daily token consumption increase of over six times from December 2025 to February 2026.
Looking ahead, as large models improve their “doing real work” capabilities, token consumption will grow exponentially. IDC forecasts that by 2031, the number of active intelligent agents in Chinese enterprises will exceed 350 million, with a compound annual growth rate of over 135%, leading the global markets. Additionally, due to increased task execution density and complexity, the annual token consumption of intelligent agents is expected to rise by more than 30 times, showing an exponential growth trend.