In early 2026, Alibaba officially launched Qwen3-Max-Thinking, its most advanced large language model to date. The model has set new records across multiple internationally recognized benchmarks, demonstrating performance on par with leading global models such as OpenAI’s GPT-5.2, Google’s Gemini 3 Pro, and Anthropic’s Claude Opus 4.5—marking the first time a Chinese-developed large model has fully entered the global top tier.
Massive Scale Drives Top Performance
According to the latest news, Qwen3-Max-Thinking features over 1 trillion parameters and was trained on a staggering 36 trillion tokens of data, further refined through extensive reinforcement learning. Its preview version already made headlines by achieving the first-ever perfect scores in China on the AIME 25 and HMMT 25 math competitions. The final release builds on this success, setting new state-of-the-art results in scientific knowledge (GPQA Diamond), mathematical reasoning (IMO-AnswerBench), and code generation (LiveCodeBench).
Smarter Reasoning with Test-Time Scaling
Unlike conventional approaches that simply increase parallel reasoning paths—often leading to redundant computations—Qwen3-Max-Thinking introduces a novel Test-Time Scaling mechanism. This technique extracts “experience” from prior reasoning steps and iteratively refines its conclusions through multiple self-improvement cycles within the same context, significantly boosting both reasoning depth and efficiency.
This breakthrough is especially evident in the Human-Level Evaluation (HLE) benchmark, where Qwen3-Max-Thinking scored 58.3, far surpassing GPT-5.2-Thinking (45.5) and Gemini 3 Pro (45.8), claiming the highest score among all models to date.

Enhanced Agent Capabilities for Smarter Tool Use
Designed for the emerging era of AI agents, Qwen3-Max-Thinking features dramatically improved native tool-calling abilities. Through hybrid reinforcement learning combining rule-based and model-based rewards, it can autonomously select and coordinate core tools—including web search, personalized memory, and a code interpreter—to solve complex tasks at a professional level.
Users can already experience this intelligence on QwenChat: the model not only understands instructions precisely but also proactively plans tool usage, delivering responses that are more accurate, fluent, and aligned with human preferences—while significantly reducing hallucinations and enabling reliable real-world problem solving.
Freely Available to All Users and Developers
Qwen3-Max-Thinking is now freely accessible to the public. End users can try it via the Qwen desktop and web platforms; developers can integrate it through Alibaba Cloud’s Bailian API service; and the Qwen mobile app will soon be upgraded to include the new model—ensuring seamless, zero-barrier access for everyone.
With the launch of Qwen3-Max-Thinking, Alibaba’s Tongyi Lab has not only achieved a major technological leap but also propelled China’s large models from “catching up” to “running alongside” and even “leading” the global AI race—bringing a powerful new force from the East to the world stage.