Tsinghua University and Shengshu Technology have officially released China’s first long-duration, highly consistent, and dynamically rich video model—Vidu, according to reports from several domestic media outlets.
The model adopts the U-ViT architecture, a fusion of the team’s original Diffusion and Transformer, supporting the one-click generation of high-definition video content up to 16 seconds long and with a resolution of up to 1080P, closely resembling the visual effects of Sora.
Experts state that in today’s digital era, artificial intelligence technology, with its increasingly prominent role and potential, is leading the wave of technological innovation.
The arrival of Vidu signifies a new stage in AI technology. It not only can simulate the real physical world but also possesses rich imagination. With features like multi-camera generation and high spatiotemporal consistency, Vidu is the first globally to make significant breakthroughs in video models since the release of Sora. Its performance is on par with top international standards and continues to improve through accelerated iteration.
In the future, Vidu will support the generation of more diverse and longer-duration video content. Its flexible architecture will also be compatible with a wider range of modalities, further expanding the boundaries of multimodal general-purpose capabilities.