Thursday , 16 January 2025
Home AI: Technology, News & Trends Shengshu Teams Up with Tsinghua to Launch China’s First Self-developed Video Model, Benchmarking Sora

Shengshu Teams Up with Tsinghua to Launch China’s First Self-developed Video Model, Benchmarking Sora

308
Vidu

On April 27, at the Future Artificial Intelligence Pioneer Forum of Zhongguancun Forum in China, Shengshu Technology and Tsinghua University officially released Vidu, China’s first long-duration, highly consistent, and dynamically stable video model.

It is reported by Latest that the model adopts the U-ViT architecture, which integrates Diffusion (Diffusion Probability Model) and Transformer, supporting one-click generation of high-definition video content with a resolution of up to 1080P and a duration of up to 16 seconds. According to Shengshu Technology, consistent with Sora, Vidu can directly generate high-quality videos of up to 16 seconds based on the provided textual descriptions.

Shengshu Technology introduced that its core technology U-ViT architecture was proposed by the team in September 2022, earlier than the DiT architecture adopted by Sora. It is the world’s first architecture to integrate Diffusion (Diffusion Probability Model) and Transformer, entirely independently developed by the team.

The short films generated by Vidu adopt a “one-step” generation approach. Like Sora, the transformation from text to video is direct and continuous. The underlying algorithm implementation is based on a single model that generates end-to-end, without involving intermediate keyframe generation and interpolation processing.

Keyframe interpolation involves adding one or more frames between every two frames of a video to increase its length or smoothness. This method requires frame-by-frame processing of the video and inserting additional frames to improve its length and quality, which is a multi-step process. However, Vidu and Sora generate high-quality videos directly in a single step, without going through multiple steps of keyframe generation and interpolation processing.

In March 2023, the Shengshu Technology team open-sourced the world’s first multimodal diffusion large model UniDiffuser based on the U-ViT architecture, and was the first to complete the large-scale scalability verification of the fusion architecture globally. UniDiffuser is a model with nearly 1 billion parameters trained on the large-scale multimodal dataset LAION-5B, supporting arbitrary generation and transformation between image and text modalities. Architecturally, UniDiffuser leads Stable Diffusion 3, which also uses the DiT architecture, by one year.

Shengshu Technology stated that the breakthrough of large models is a multidimensional and cross-disciplinary comprehensive process, requiring deep integration of technology and industrial applications. Therefore, at the time of its release, Shengshu Technology officially launched the “Vidu Large Model Partner Program,” inviting upstream and downstream companies in the industry chain, as well as research institutions, to join hands in building a cooperative ecosystem.

Established in March 2023, Shengshu Technology’s founding team comes from the Artificial Intelligence Research Institute of Tsinghua University and is one of the earliest teams worldwide engaged in diffusion probability model research. As of now, Shengshu Technology has completed hundreds of millions of yuan in financing, with investors including Qiming Venture Partners, Ant Group, Baidu Ventures, DT Capital, Jin Qiu Fund, and Zhuo Yuan Asia, among other well-known institutions.

Related Articles

A hexamer of macroscopic mechanical oscillators for studying quantum collective phenomena.

Scientists Achieve Collective Quantum Behavior in Macroscopic Oscillators

Quantum technologies are radically transforming our understanding of the universe. One emerging...

Artist’s impression of a Dyson Sphere, a proposed alien megastructure that is the target of SETI surveys. Finding one of these qualifies in a “first contact” scenario. Credit: Breakthrough Listen / Danielle Futselaar

Scientists Propose New Method to Detect Alien Civilizations Via Black Holes

Of all the unanswered questions in modern science, perhaps the most talked...

AI Chip

Tech Industry Groups Urge Biden to Reconsider AI Chip Access Restrictions

The Information Technology Industry Council, a technology industry group representing Amazon, Microsoft,...

Researchers built a 3D image of nearly every neuron

New 3D Map Charted with Google AI Eeveals ‘Mysterious but Beautiful’ Slice of Human Brain

Researchers have mapped a tiny sliver of the human brain on an...