Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
【Alibaba's Thousand Questions Former Core】Lin Junyang's Thousand-Word Essay: AI Models Will Shift Toward "Agent-Based Thinking" Revealing Why Qwen Abandoned Combined Thinking and Command Mode
Alibaba (09988) The soul of the Qwen large model, Lin Junyang, suddenly resigned in early March, sparking speculation about disagreements with the management team. As the storm subsided, Lin Junyang recently published an article titled “From ‘Reasoning’ Thinking to ‘Agentic’ Thinking” on social platform X. Although the article mainly discusses the direction of AI technology, it contains reflections on Alibaba’s Qwen technology roadmap.
He pointed out that the purely computational “reasoning thinking” has reached its peak, the second half of AI will belong to “agentic thinking” (Agentic Thinking) that can interact with the real environment and think while acting.
The focus of AI is shifting: What will happen next?
Lin Junyang indicated that the focus of the AI industry in the first half of 2025 will mainly be on “reasoning thinking”—that is, how to make large models spend more time and computation thinking, how to utilize stronger feedback mechanisms to train models, and how to control these additional reasoning processes.
However, the pressing question the industry must face is: What will happen next?
He believes that the answer is undoubtedly “agentic thinking” (Agentic Thinking). Future AI should not merely think in isolation to provide answers but should “think in order to take action.” It needs to simulate while interacting with the environment and continuously update and correct plans based on feedback from the real world.
Internal Blueprint of Qwen and the Failure of the “Merger Route”
Lin Junyang disclosed for the first time the internal technical blueprint of the Qwen team in early 2025. At that time, many members hoped to create an ideal system that unifies “thinking” and “instruction” modes. The vision for this system is quite grand:
Smart Adjustment: It can automatically determine how much reasoning computation is needed based on prompts and context (similar to low/medium/high tiers).
Autonomous Decision-Making: It allows the model to decide when to respond instantly and when to think deeply or invest significant computation when encountering difficult problems.
Lin Junyang stated that Qwen3 is the clearest public attempt in this direction, introducing a “hybrid thinking model” that emphasizes controllable thinking budgets. However, Lin Junyang candidly admitted: “Merging sounds easy, but executing it is extremely difficult.”
Lin Junyang believes that forced merging will lead to a “mediocre” model, as the data distributions and behavioral objectives behind “thinking modes” and “instruction modes” are entirely different; forcing them to merge will lead to “thinking behavior” becoming verbose, bloated, and lacking decisiveness; while “instruction behavior” loses its clarity and becomes unreliable, even significantly increasing the cost for commercial users.
In commercial reality, he believes that a large number of enterprise customers truly need high throughput, low cost, and high controllability pure instruction operations (like batch processing).
For this reason, the Qwen team ultimately chose to release independent instruction (Instruct) and thinking (Thinking) versions in the subsequent 2507 series. Lin Junyang believes that separating the two will allow the team to focus more purely on solving their respective data and training issues, avoiding the emergence of “two awkwardly fused personalities.”
Competitor Strategy: Anthropic’s “Restraint” and Goal Orientation
In contrast to Qwen’s separation route, other labs like Anthropic and GLM-4.5 have chosen a completely opposite “integration route.”
Lin Junyang specifically mentioned Anthropic’s (Claude series) approach, believing that its development trajectory demonstrates rigor and restraint; Claude 3.7 / Claude 4 alternates between reasoning and “tool usage.”
Goal-Oriented Thinking: Anthropic believes that producing extremely long reasoning paths does not equate to a smarter model. If a model elaborates on every trivial matter, it actually indicates improper resource allocation.
Pragmatism First: If the goal is to write code, AI’s thinking should be used for planning, breaking down tasks, fixing bugs, and invoking tools; if it is for agent workflows, thinking should be used to enhance the execution quality of long-term tasks, rather than simply producing seemingly impressive “reasoning essays.”
Core Differences Between Reasoning Thinking and Agentic Thinking
Lin Junyang predicts that “agentic thinking” will ultimately replace the “static monologue” reasoning that lacks interaction and is overly verbose. A truly advanced system should have the authority to search, simulate, execute, check, and correct in a robust and efficient manner to solve problems.
Changing Evaluation Criteria: From “Can the model solve mathematical problems?” to “Can the model advance progress when interacting with the environment?”
Real-World Challenges to Address:
Three Major Technical Challenges to Achieve “Agentic Thinking”
Beyond application-level differences, Lin Junyang delves deeper into the enormous challenges of developing agentic thinking at the foundational level:
Bottlenecks in Training Infrastructure (GPU Efficiency Collapse): Agentic reinforcement learning (RL) is much more difficult than pure reasoning RL. AI agents need to frequently interact with external tools (such as browsers, execution sandboxes), and waiting for real-world feedback can lead to training stagnation, significantly lowering GPU utilization. In the future, “training” must be cleanly decoupled from “reasoning.”
“Reward Hacking” and Cheating Risks: When the model has the authority to use tools, it easily learns to “cheat” to deceive the system for rewards (for example, exploiting system vulnerabilities to peek at future information) rather than truly solving problems. Tools amplify the risk of false optimization, and future anti-cheating protocols will be critical for major companies.
Multi-Agent Orchestration: Future systems engineering will no longer rely on a single model but will be comprised of multiple agents working together. The system will include “orchestrators” responsible for planning, “expert agents” specializing in specific domains, and “sub-agents” handling narrow tasks, thereby controlling context and preventing the thinking process from being contaminated.
Summary: The Next Competitive Focus in the AI Industry
At the end of the article, Lin Junyang pointed out the next competitive focus in the AI industry: the core training targets of the future will no longer merely be the “model” itself, but rather a comprehensive system of “model + environment” (agents and their surrounding harness).
Past Reasoning Era: Advantages stemmed from better reinforcement learning (RL) algorithms, stronger feedback signals, and scalable training pipelines.
Future Agentic Era: Advantages will depend on better environment design, closer train-serve integration, stronger systems engineering, and the ability for models to learn to take responsibility for their decisions and form “closed loops.”
Original X Content