A groundbreaking scientific advancement has been made by Qian Long’s research team at the Center for Quantitative Biology of Peking University. The team has successfully developed and launched SYMPLEX, the world’s first large language model (LLM) specifically designed for functional gene mining tasks. This innovation opens a new frontier in the field of biological sciences, enabling automatic, efficient identification of key genes with targeted functions from vast volumes of biological literature. It further supports precise screening and functional validation, laying a solid scientific foundation for protein function design, biopharmaceutical development, and applications in biomanufacturing. The relevant research was published in the internationally renowned journal Science Advances, drawing widespread attention from the global scientific community.
A Vast Natural Gene Reservoir Awaiting Unlocking
“Nature is like a colossal treasure trove filled with countless secrets,” remarked Qian Long in an interview. “Organisms carry an astonishing number of useful genes. Through billions of years of natural selection, these genes have evolved diverse sequences and combinations, resulting in highly refined functions that help life thrive in complex environments. With the rapid advancement of sequencing technologies, we now possess billions of biological sequences. These natural genes form a critical reservoir of genetic elements indispensable to synthetic biology and biomanufacturing.”
However, the current reality is far from ideal. Despite their immense potential, only a small portion of popular genes have been well-annotated and modeled in terms of sequence or structure. Qian Long further elaborated, “Traditional gene mining and protein design approaches—based on sequence analysis, structural modeling, or deep learning—face significant technical limitations. These methods often fail to handle complex genes and cannot effectively expand the research scope. This technical bottleneck severely hinders the development and utilization of many high-value genetic resources.”
SYMPLEX: A Powerful Engine Forged Through Innovative Integration
In response to these challenges, Qian Long’s team pursued a novel strategy by deeply integrating a large language model with structured biological knowledge bases, leading to the successful creation of the SYMPLEX Intelligent Gene Mining Platform. Dubbed a “super search engine” for biology, SYMPLEX is equipped with unprecedented capabilities. It can automatically read and comprehend tens of millions of biological research papers, extracting and analyzing content across multiple dimensions—genes, functions, and knowledge.
During this process, SYMPLEX aligns concepts precisely with expert-curated databases, engages in effective semantic interactions, and generates statistical models to ultimately produce a high-quality candidate gene set.
Notably, SYMPLEX achieves key technical breakthroughs. It effectively avoids the common “hallucination” issue in large language models, where generated information appears plausible but is factually incorrect. More importantly, it can autonomously generate fine-grained knowledge trees closely linked to gene functions. These knowledge trees act like detailed maps, helping researchers explore intricate biological mechanisms and molecular processes with greater clarity, thereby significantly improving research efficiency and precision. Extensive benchmarking shows that compared to conventional gene mining methods, SYMPLEX offers substantial advantages. It uncovers a wider diversity of genes, even surpassing the boundaries of current protein function prediction models, thus opening up entirely new possibilities for gene mining research.
Application in mRNA Capping Enzyme Discovery Demonstrates SYMPLEX’s Strength
To validate SYMPLEX’s real-world utility, the research team collaborated with Professor Lou Chunbo from the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, to apply the model to the discovery of mRNA capping enzymes—a key step in mRNA vaccine production. This step has long been plagued by inefficiencies and high costs, becoming a critical bottleneck for large-scale vaccine production and deployment.
Leveraging SYMPLEX’s robust computational and analytical capabilities, the team achieved remarkable results. They discovered nearly 20,000 novel capping enzyme genes and conducted rigorous experimental validation on over a dozen of them. The data was compelling—some newly identified enzymes exhibited more than double the activity of currently used commercial capping enzymes in mRNA vaccine production. Independent third-party validations further confirmed these enzymes’ superior catalytic efficiency, significantly outperforming commercial enzymes from leading global companies such as New England Biolabs (NEB). These breakthroughs dramatically enhanced mRNA vaccine production efficiency while cutting costs. The enzyme database uncovered by SYMPLEX now serves as a critical technological foundation for mRNA vaccine development and mRNA-based gene therapies, offering the potential for transformative growth in these fields.

Ushering in a New Era of AI-Driven Biomanufacturing
Reflecting on the significance of this work, Qian Long stated confidently, “This research establishes a brand-new paradigm for functional gene mining and provides a core enzyme resource for large-scale mRNA vaccine production—and this is only the beginning.” The research team is actively using SYMPLEX to mine additional critical enzymes applicable to synthetic biology. In the future, they plan to extend the platform to areas such as synthetic pathway design and beyond.
SYMPLEX is poised to become a powerful engine driving biological manufacturing into a new phase of AI-powered scientific research. It will fuel continuous innovation and help address major challenges affecting human health and global development. The SYMPLEX interactive platform is now officially live and available for free to researchers worldwide. Featuring an advanced modular design, it offers three core functional modules, providing scientists with a user-friendly and powerful tool for gene discovery and accelerating the advancement of biological sciences to new heights.