Saturday , 25 May 2024
Home AI: Technology, News & Trends Shengshu Teams Up with Tsinghua to Launch China’s First Self-developed Video Model, Benchmarking Sora

Shengshu Teams Up with Tsinghua to Launch China’s First Self-developed Video Model, Benchmarking Sora


On April 27, at the Future Artificial Intelligence Pioneer Forum of Zhongguancun Forum in China, Shengshu Technology and Tsinghua University officially released Vidu, China’s first long-duration, highly consistent, and dynamically stable video model.

It is reported by Latest that the model adopts the U-ViT architecture, which integrates Diffusion (Diffusion Probability Model) and Transformer, supporting one-click generation of high-definition video content with a resolution of up to 1080P and a duration of up to 16 seconds. According to Shengshu Technology, consistent with Sora, Vidu can directly generate high-quality videos of up to 16 seconds based on the provided textual descriptions.

Shengshu Technology introduced that its core technology U-ViT architecture was proposed by the team in September 2022, earlier than the DiT architecture adopted by Sora. It is the world’s first architecture to integrate Diffusion (Diffusion Probability Model) and Transformer, entirely independently developed by the team.

The short films generated by Vidu adopt a “one-step” generation approach. Like Sora, the transformation from text to video is direct and continuous. The underlying algorithm implementation is based on a single model that generates end-to-end, without involving intermediate keyframe generation and interpolation processing.

Keyframe interpolation involves adding one or more frames between every two frames of a video to increase its length or smoothness. This method requires frame-by-frame processing of the video and inserting additional frames to improve its length and quality, which is a multi-step process. However, Vidu and Sora generate high-quality videos directly in a single step, without going through multiple steps of keyframe generation and interpolation processing.

In March 2023, the Shengshu Technology team open-sourced the world’s first multimodal diffusion large model UniDiffuser based on the U-ViT architecture, and was the first to complete the large-scale scalability verification of the fusion architecture globally. UniDiffuser is a model with nearly 1 billion parameters trained on the large-scale multimodal dataset LAION-5B, supporting arbitrary generation and transformation between image and text modalities. Architecturally, UniDiffuser leads Stable Diffusion 3, which also uses the DiT architecture, by one year.

Shengshu Technology stated that the breakthrough of large models is a multidimensional and cross-disciplinary comprehensive process, requiring deep integration of technology and industrial applications. Therefore, at the time of its release, Shengshu Technology officially launched the “Vidu Large Model Partner Program,” inviting upstream and downstream companies in the industry chain, as well as research institutions, to join hands in building a cooperative ecosystem.

Established in March 2023, Shengshu Technology’s founding team comes from the Artificial Intelligence Research Institute of Tsinghua University and is one of the earliest teams worldwide engaged in diffusion probability model research. As of now, Shengshu Technology has completed hundreds of millions of yuan in financing, with investors including Qiming Venture Partners, Ant Group, Baidu Ventures, DT Capital, Jin Qiu Fund, and Zhuo Yuan Asia, among other well-known institutions.

Related Articles

AI deception

Beware of the Deceptive Evolution of Artificial Intelligence

An article in the field of artificial intelligence (AI) has caused a...


GPT-4o: OpenAI’s Super Gateway, Challenging Google?

Based on ChatGPT or GPT-4o, the way humans obtain information may likely...

What is openai's new product

Speculation: Multimodal AI Assistant, Google vs. OpenAI: What’s Behind the Mystery Product?

Google’s and OpenAI’s mystery new product, slated to be revealed just a...

DeepMind Alphafold 3

Milestone Breakthrough: Google’s DeepMind Unveils New Drug Development AI Model AlphaFold 3

On Wednesday, Google’s DeepMind unveiled the AlphaFold 3, a new model for...