DnD: Revolutionizing LLMs with Drag-and-Drop Technology

Jun 27, 20254 Mins read13

The application of large models continues to expand across various fields, with their zero-shot generalization ability being particularly impressive. However, in real-world scenarios, adapting large models to specific tasks often requires significant time for fine-tuning. Even with efficient parameter-based methods like LoRA, each task still demands considerable fine-tuning costs. Recently, researchers from institutions such as the National University of Singapore and the University of Texas at Austin have brought an innovative breakthrough: a new technology called “Drag-and-Drop Large Language Models” (DnD), which paves a new path for large model applications.

Disrupting Tradition: The Core Technology and Principles of DnD Innovation

DnD is essentially a parameter generator based on prompts that enables adaptive fine-tuning of LLMs without requiring training. Its core technology is highly innovative and breaks away from traditional model optimization paradigms. Researchers found that LoRA adapters are essentially a function of their training data, and gradient descent works by “dragging” the base weights to the optimal state for a specific task. Based on this, if a direct mapping from prompts to weights can be established, the time-consuming gradient descent process can be completely bypassed.

To achieve this, DnD utilizes two key steps to acquire its “drag-and-drop” capability. The first step is preparing training data. The team trained and saved the corresponding LoRA adapters for various datasets, and then randomly paired the prompts from these datasets with the collected LoRA weights, forming the “prompt-parameter” pairs for DnD’s training data. In this process, model parameters (weights) are explicitly paired with the conditions (prompts) of specific datasets, laying the foundation for subsequent parameter generation.

The second step is training the parameter generator. The generator consists of a decoder made of cascaded convolutional blocks and a lightweight text encoder. During training, the existing text encoder is used to extract the embedding vectors of prompts, which are then input into the generator. The generator, with its unique structure—each hyper-convolutional block contains three hyper-convolutional modules—extracts and fuses feature information across different dimensions, ultimately predicting the model weights. The team optimized the generator by minimizing the mean squared error (MSE) loss between the generated weights and the true LoRA weights, continuously training it to generate parameters tailored for various tasks. During inference, the DnD model only requires a forward pass to quickly generate task-specific parameters by inputting new prompts from unseen datasets.

Performance Leap: A Dual Breakthrough in Efficiency and Effectiveness

DnD has achieved a stunning leap in performance, reaching unprecedented levels of both efficiency and effectiveness.

Efficiency:
In terms of efficiency, DnD reduces computational costs by up to 12,000 times compared to traditional full-scale fine-tuning. For example, generating parameters for tasks such as mathematics, code, or multimodal tasks, which might have taken hours with traditional methods, can now be completed in seconds using DnD. The inference speed is increased by 2500 to 12,000 times. This extreme efficiency boost makes DnD ideal for scenarios that require rapid response times, such as real-time intelligent customer service or online code assistance, where DnD can generate task-specific model parameters in seconds, significantly improving user experience and work efficiency.
Effectiveness:
In terms of effectiveness, DnD also performs exceptionally well. In zero-shot benchmarks like common-sense reasoning, mathematics, coding, and multimodal tasks, DnD outperforms the most powerful LoRA models by 30%. For instance, in common-sense reasoning on unseen test sets, DnD-generated models were 21% more accurate than the base LoRA model used for training. In the HumanEval coding benchmark, the pass@1 rate for DnD was 32.7%, which is 15.1% higher than the base LoRA model. In solving math problems (GSM8K), DnD achieved 66.3% accuracy, surpassing the base LoRA model by 23.4%. Even in multimodal tasks like the MathVista dataset (combining images and math problems), DnD showed performance improvements. These results demonstrate that DnD not only enables fast parameter generation but also enhances the model’s ability to perform complex tasks, improving adaptability and accuracy.

Additionally, DnD demonstrates strong generalization capabilities. It requires only unlabelled prompts to switch between different domains, achieving excellent performance across diverse tasks. For example, when using a DnD model trained on common-sense reasoning to generate weights for a scientific question-answering task, it performed 30% better than a model specifically trained for scientific tasks. This impressive generalization capability allows DnD to rapidly adapt to varied, complex real-world applications without the need for extensive task-specific training.

Widespread Application: DnD Reshapes the Large Model Landscape Across Industries

The advent of DnD opens up new opportunities for numerous fields, potentially reshaping the landscape of large model applications.

In Rapid Model Specialization:
DnD has an unparalleled advantage in scenarios that require rapid model specialization. For example, in the intelligent customer service industry, businesses face diverse and specific needs. Traditionally, adapting a general model to answer a company’s unique product or service questions requires considerable time and resources for fine-tuning. With DnD, businesses can input task-specific prompts like “answer common after-sales questions for a particular product,” and DnD can generate custom model parameters in seconds, quickly transforming a general model into a specialized customer service model. This significantly reduces development costs and time, enabling businesses to respond more swiftly to market changes.
In Resource-Limited Environments:
DnD’s lightweight design shines in resource-constrained environments. In edge computing devices or mobile platforms where computational resources and memory are limited, traditional large models are often too bulky to deploy or run. DnD, however, only requires a lightweight text encoder and decoder, and can run efficiently on a single A100 GPU with less than 21GB of memory, making it ideal for resource-limited devices. For instance, in smart home devices, DnD can quickly generate adapted model parameters based on simple voice prompts from users, enabling intelligent control and interaction features, offering a convenient user experience.
For Academic Research:
DnD accelerates the pace of research, particularly in academic fields. Researchers often need to test various model configurations and parameter settings when exploring new tasks or algorithms. Previously, each adjustment required time-consuming training processes, which drained computational resources. With DnD, researchers can input task prompts and quickly obtain adapted model parameters, enabling immediate experiments and validation of new ideas. This dramatically improves research efficiency and accelerates AI innovation.

Conclusion

The “Drag-and-Drop Large Language Model” (DnD), with its innovative technical principles, remarkable performance, and wide-ranging application potential, breathes new life into the development and deployment of large models. As the technology continues to mature and be widely adopted, it is poised to bring transformative changes across industries, driving AI technology to new heights and offering more convenience and surprises to people’s lives and work.