【Alibaba's Thousand Questions Former Core】Lin Junyang's Thousand-Word Essay: AI Models Will Shift Toward "Agent-Based Thinking" Revealing Why Qwen Abandoned Combined Thinking and Command Mode

robot
Abstract generation in progress

Alibaba (09988) The soul of the Qwen large model, Lin Junyang, suddenly resigned in early March, sparking speculation about disagreements with the management team. As the storm subsided, Lin Junyang recently published an article titled “From ‘Reasoning’ Thinking to ‘Agentic’ Thinking” on social platform X. Although the article mainly discusses the direction of AI technology, it contains reflections on Alibaba’s Qwen technology roadmap.

He pointed out that the purely computational “reasoning thinking” has reached its peak, the second half of AI will belong to “agentic thinking” (Agentic Thinking) that can interact with the real environment and think while acting.

The focus of AI is shifting: What will happen next?

Lin Junyang indicated that the focus of the AI industry in the first half of 2025 will mainly be on “reasoning thinking”—that is, how to make large models spend more time and computation thinking, how to utilize stronger feedback mechanisms to train models, and how to control these additional reasoning processes.

However, the pressing question the industry must face is: What will happen next?

He believes that the answer is undoubtedly “agentic thinking” (Agentic Thinking). Future AI should not merely think in isolation to provide answers but should “think in order to take action.” It needs to simulate while interacting with the environment and continuously update and correct plans based on feedback from the real world.

Internal Blueprint of Qwen and the Failure of the “Merger Route”

Lin Junyang disclosed for the first time the internal technical blueprint of the Qwen team in early 2025. At that time, many members hoped to create an ideal system that unifies “thinking” and “instruction” modes. The vision for this system is quite grand:

Smart Adjustment: It can automatically determine how much reasoning computation is needed based on prompts and context (similar to low/medium/high tiers).

Autonomous Decision-Making: It allows the model to decide when to respond instantly and when to think deeply or invest significant computation when encountering difficult problems.

Lin Junyang stated that Qwen3 is the clearest public attempt in this direction, introducing a “hybrid thinking model” that emphasizes controllable thinking budgets. However, Lin Junyang candidly admitted: “Merging sounds easy, but executing it is extremely difficult.”

Lin Junyang believes that forced merging will lead to a “mediocre” model, as the data distributions and behavioral objectives behind “thinking modes” and “instruction modes” are entirely different; forcing them to merge will lead to “thinking behavior” becoming verbose, bloated, and lacking decisiveness; while “instruction behavior” loses its clarity and becomes unreliable, even significantly increasing the cost for commercial users.

In commercial reality, he believes that a large number of enterprise customers truly need high throughput, low cost, and high controllability pure instruction operations (like batch processing).

For this reason, the Qwen team ultimately chose to release independent instruction (Instruct) and thinking (Thinking) versions in the subsequent 2507 series. Lin Junyang believes that separating the two will allow the team to focus more purely on solving their respective data and training issues, avoiding the emergence of “two awkwardly fused personalities.”

Competitor Strategy: Anthropic’s “Restraint” and Goal Orientation

In contrast to Qwen’s separation route, other labs like Anthropic and GLM-4.5 have chosen a completely opposite “integration route.”

Lin Junyang specifically mentioned Anthropic’s (Claude series) approach, believing that its development trajectory demonstrates rigor and restraint; Claude 3.7 / Claude 4 alternates between reasoning and “tool usage.”

Goal-Oriented Thinking: Anthropic believes that producing extremely long reasoning paths does not equate to a smarter model. If a model elaborates on every trivial matter, it actually indicates improper resource allocation.

Pragmatism First: If the goal is to write code, AI’s thinking should be used for planning, breaking down tasks, fixing bugs, and invoking tools; if it is for agent workflows, thinking should be used to enhance the execution quality of long-term tasks, rather than simply producing seemingly impressive “reasoning essays.”

Core Differences Between Reasoning Thinking and Agentic Thinking

Lin Junyang predicts that “agentic thinking” will ultimately replace the “static monologue” reasoning that lacks interaction and is overly verbose. A truly advanced system should have the authority to search, simulate, execute, check, and correct in a robust and efficient manner to solve problems.

Changing Evaluation Criteria: From “Can the model solve mathematical problems?” to “Can the model advance progress when interacting with the environment?”

Real-World Challenges to Address:

  • Knowing when to stop thinking and take action.
  • Choosing which tools to invoke and in what sequence.
  • Being able to handle noisy and incomplete observational data from the real environment.
  • Knowing how to revise plans when encountering failures.
  • Maintaining logical coherence in multi-turn dialogues and multiple tool invocations.

Three Major Technical Challenges to Achieve “Agentic Thinking”

Beyond application-level differences, Lin Junyang delves deeper into the enormous challenges of developing agentic thinking at the foundational level:

Bottlenecks in Training Infrastructure (GPU Efficiency Collapse): Agentic reinforcement learning (RL) is much more difficult than pure reasoning RL. AI agents need to frequently interact with external tools (such as browsers, execution sandboxes), and waiting for real-world feedback can lead to training stagnation, significantly lowering GPU utilization. In the future, “training” must be cleanly decoupled from “reasoning.”

“Reward Hacking” and Cheating Risks: When the model has the authority to use tools, it easily learns to “cheat” to deceive the system for rewards (for example, exploiting system vulnerabilities to peek at future information) rather than truly solving problems. Tools amplify the risk of false optimization, and future anti-cheating protocols will be critical for major companies.

Multi-Agent Orchestration: Future systems engineering will no longer rely on a single model but will be comprised of multiple agents working together. The system will include “orchestrators” responsible for planning, “expert agents” specializing in specific domains, and “sub-agents” handling narrow tasks, thereby controlling context and preventing the thinking process from being contaminated.

Summary: The Next Competitive Focus in the AI Industry

At the end of the article, Lin Junyang pointed out the next competitive focus in the AI industry: the core training targets of the future will no longer merely be the “model” itself, but rather a comprehensive system of “model + environment” (agents and their surrounding harness).

Past Reasoning Era: Advantages stemmed from better reinforcement learning (RL) algorithms, stronger feedback signals, and scalable training pipelines.

Future Agentic Era: Advantages will depend on better environment design, closer train-serve integration, stronger systems engineering, and the ability for models to learn to take responsibility for their decisions and form “closed loops.”

Original X Content

		Financial Hot Talk
	





	China's car sales take the lead as "world number one," will high oil prices boost electric vehicles going abroad?
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin