Tsinghua, NVIDIA, and NFT: A New Era in AI Learning

Jun 23, 20254 Mins read147

I. Core Technology: How Implicit Negative Strategies Break the “Error Blind Spot” in Supervised Learning

Traditional supervised learning resembles a model student who only studies the “right answers,” largely ignoring the educational value of mistakes. NFT (Negative-aware FineTuning) introduces a breakthrough by establishing a “translation system for errors”—transforming incorrect model outputs into optimization signals through implicit negative strategies. This core mechanism unfolds across three layers of dynamic modeling:

Data Filtering Layer: In mathematical problem generation scenarios, NFT uses a 0/1 reward function to categorize answers as “correct” or “incorrect” and builds a graded dataset based on problem difficulty (measured by correctness rate). For instance, problems with a correctness rate below 30% are flagged, and their incorrect answers are prioritized for optimization.
Strategy Mapping Layer: By analyzing parameter differences between the original and target models, NFT constructs an implicit negative strategy function. Think of this as equipping the model with an “error microscope.” If the model repeatedly misinterprets symbols in algebra, this strategy dynamically boosts the weight of such error samples, forcing the model to address its weak points.
Gradient Fusion Layer: Theoretically, NFT’s loss function gradient is mathematically equivalent to the GRPO algorithm used in reinforcement learning. This equivalence enables supervised learning to naturally realize Group Relative Normalization—a key principle in RL—without the need for hand-crafted reward functions.

By shifting from passively discarding errors to actively leveraging them, NFT redefines the logic of supervised training. In evaluations on the Qwen-7B model, NFT utilized incorrect data 400% more effectively than traditional fine-tuning. Additionally, entropy—an indicator of exploration—rose 15%–20% during training, signaling a balance between precision and exploratory behavior.

II. Experimental Validation: From Mathematical Reasoning to Scaling Laws

NFT’s impact is particularly pronounced at the large-model scale. In a joint evaluation by Stanford University and NVIDIA, NFT not only outperformed all traditional supervised baselines on mathematical reasoning tasks but also matched the performance of leading RL-based methods for the first time:

Performance: On the GSM8K dataset, NFT-7B achieved an average score of 82.3, an 18.7% improvement over Rejection FineTuning (RFT), and surpassed RLHF methods like GRPO and DAPO by 5–8 points. For 32B models, NFT matched DAPO in performance while improving training efficiency by 30%, supporting the insight that larger models derive greater value from negative feedback—with NFT’s advantage growing by 40% from 7B to 32B models.
Generalization: By incorporating implicit negative strategies, NFT showed stronger transferability on unseen, complex logic problems. For instance, in mixed geometry-algebra problems, NFT reduced the error rate by 27% compared to RFT. This gain stems from the model learning error pattern graphs, such as identifying frequent mistakes like unit conversion issues or formula misapplications.
Efficiency: Unlike traditional RL methods that rely on human-labeled reward signals, NFT trains solely on model-generated data. In simulated medical diagnosis scenarios, NFT optimized performance using past diagnostic errors, cutting labeling costs by 60% compared to RL approaches—making it invaluable in sensitive fields like finance and healthcare.

Notably, NFT exhibits a unique “entropy-increasing” property: as training progresses, the diversity of generated answers increases, countering the “standard-answer fixation” common in supervised learning. This trait gives NFT a clear edge in creative reasoning tasks. For example, in programming problem-solving, 35% of NFT-generated solutions were non-standard approaches, 20% higher than baseline models.

III. Paradigm Shift: Dual Shockwave from Theoretical Unification to Ecosystem Restructuring

NFT is more than a technical improvement—it marks the first mathematically grounded unification of supervised and reinforcement learning, triggering widespread implications from theory to industry:

Theoretical Foundation: Philosophical Implications of Gradient Equivalence
NFT’s gradient formulation is mathematically identical to the GRPO algorithm, revealing a deep connection between learning paradigms: supervised learning becomes “static reinforcement,” while reinforcement learning is “dynamic supervision.” This overturns the long-held view that RL uniquely depends on real-time feedback. Instead, NFT shows that cleverly reusing error data allows supervised learning to simulate dynamic RL behavior. This theoretical bridge enables hybrid learning paradigms, e.g., in robotics—where a model can be pre-trained using NFT on historical errors, then fine-tuned in real time via RL for adaptive control.
Application Breakthrough: From Error Economics to Diverse Deployments
In education, NFT-powered tutoring systems now analyze students’ incorrect problem-solving steps to generate personalized remediation strategies. A pilot program showed 40% faster mastery of complex math topics. In industry, NVIDIA has applied NFT to autonomous driving simulations, optimizing decision models using over 100,000 error-driving scenarios, boosting response accuracy in edge cases by 22%. In scientific research, NFT improved protein structure prediction accuracy by 15% by learning from AlphaFold2’s past errors—demonstrating potential in foundational research domains.
Ecosystem Challenge: Dynamic Environments and Open Source Innovation
Despite its promise, NFT’s generalization in dynamic environments (e.g., real-time finance) needs further validation. Its strategy modeling currently relies on static data distributions, making it vulnerable to rapidly shifting conditions. Additionally, training on 32B models consumes 20% more compute than conventional fine-tuning. To address this, the team developed a lightweight version—NFT-Lite—which maintains 85% of full performance while reducing compute cost by 35%. Encouragingly, Tsinghua has open-sourced NFT’s core strategy modules, and community developers are already extending it to multimodal domains, such as using mislabeled image data to improve vision models—injecting fresh innovation into the AI ecosystem.

From rejecting errors to embracing them, NFT signals a profound shift in AI learning paradigms. It not only provides a more efficient path for training large models, but also reveals a fundamental principle: progress in intelligence often stems from a deep understanding of mistakes. As NFT continues to expand into more complex domains, this “error-driven” learning mode may become the core competitiveness of next-gen AI systems, enabling a transition from standard-answer executors to autonomous, self-correcting explorers.