A research paper on the DeepSeek-R1 inference model, co-authored by the DeepSeek team and with Liang Wenfeng as corresponding author, graced the cover of issue 645 of the prestigious international journal Nature. Compared to the initial DeepSeek-R1 paper published in January of this year, this paper provides more details on the model’s training.
DeepSeek-R1 is also reportedly the world’s first mainstream large-scale language model to undergo peer review. Nature commented: “Almost all mainstream large-scale models have not yet undergone independent peer review, a gap that has finally been broken by DeepSeek.”
The paper’s abstract states that general reasoning has been a long-standing and daunting challenge in artificial intelligence (AI). Recent breakthroughs, such as large language models (LLMs) and chain-of-thought (CoT) prompts, have achieved remarkable success in basic reasoning tasks. However, this success relies heavily on large amounts of manually annotated demonstration data, and the models’ capabilities remain limited when handling more complex problems.
Research demonstrates that the reasoning capabilities of large language models can be stimulated through pure reinforcement learning (RL), without relying on manually annotated reasoning traces. The proposed RL framework promotes the autonomous formation of advanced reasoning patterns, such as self-reflection, verification, and dynamic policy adjustment.
As a result, the trained models demonstrate superior performance on verifiable tasks in mathematics, programming competitions, and STEM (science, technology, engineering, and mathematics) fields, outperforming comparable models trained using traditional supervised learning (based on human demonstration data). Furthermore, the autonomous reasoning patterns exhibited by these large-scale models can be systematically used to guide and improve the reasoning capabilities of smaller models.