ShengShu Technology and Tsinghua University Unveil TurboDiffusion, Ushering in the Era of Real-Time AI Video Generation

23.12.25 14:00 Uhr

Delivering faster generation, lower costs, improved user experience, and scalable enterprise use, without compromising visual quality

SINGAPORE, Dec. 23, 2025 /PRNewswire/ -- ShengShu Technology and Tsinghua University's TSAIL Lab have jointly announced the open-sourcing of TurboDiffusion (https://github.com/thu-ml/TurboDiffusion), a new acceleration framework that delivers 100 to 200 timesfaster AI video generation with little to no loss in visual quality. The release marks a significant milestone for the industry, signalling the arrival of a real-time generation era for AI video creation— a "DeepSeek Moment" for video foundation models.

As generative AI advances rapidly in content creation, video generation is reaching a critical inflection point. The focus is no longer simply on whether video can be generated, but on whether high-quality output can be produced faster, at lower cost, and at scale for real-world and enterprise use. To address the long-standing trade-offs between quality, speed, and computing cost in high-resolution, long-form video generation, ShengShu Technology and Tsinghua University's TSAIL Lab conducted foundational research into inference efficiency, leading to the development of TurboDiffusion, which is designed to improve the practicality and scalability of AI video creation.

Following its release, TurboDiffusion has sparked widespread discussion across international AI research and developer communities, drawing attention from researchers at Meta and OpenAI, along with teams behind leading open-source inference acceleration projects such as vLLM.

Breaking the Speed Barrier in High-Quality Video Generation

Prior to the release of TurboDiffusion, ShengShu Technology had already established a strong position in AI video generation. In September 2024, Vidu became the first worldwide to introduce subject consistency functionality, ushering in a new era of reference-based video generation and gaining broad adoption among creators.

The recent launch of Vidu Q2 further delivered multiple industry-leading capabilities:

  • A full image generation stack, covering text-to-image, enhanced reference-to-image and comprehensive image editing;
  • Upgraded reference-based video generation, with improved semantic understanding, camera control, and multi-subject consistency;
  • High-efficiency image generation, producing 1080p images in five seconds without compromising visual quality.

These results demonstrated that Vidu's advantage was not achieved by sacrificing visual quality, but through mature model architecture and strong engineering capabilities.

As video generation moves toward higher resolutions, longer durations, and more complex application scenarios, the industry as a whole continues to face challenges related to latency and cost. TurboDiffusion was developed specifically to overcome these constraints.

Researchers and industry observers note that TurboDiffusion's core technical advantages strike at a critical inflection point for video generation. While diffusion-based video models have demonstrated strong creative potential, they have long been constrained by high computational complexity and limited efficiency. By dramatically reducing generation latency while preserving visual quality, TurboDiffusion brings high-quality video generation into a practical range approaching real-time interaction.

As a result, TurboDiffusion is widely viewed as a "DeepSeek Moment" for video foundation models, accelerating the transition from experimental research to scalable, real-world and commercial deployment, and marking a shift toward real-time, interactive AI video creation.

TurboDiffusion is not based on a single optimisation. Instead, it combines multiple advanced acceleration techniques systematically to improve efficiency:

  • Low-bit attention acceleration
  •  TurboDiffusion uses SageAttention to run attention computation on low-bit Tensor Cores, achieving lossless, multi-fold speedups.
  • Sparse-Linear Attention acceleration
  •  TurboDiffusion adopts trainable sparse attention, Sparse-Linear Attention (SLA), to sparsify attention computation, providing an additional 17–20× sparse attention speedup on top of SageAttention.
  • Sampling-step distillation acceleration
  •  With the state-of-the-art distillation method rCM, the model can generate high-quality videos in only 3–4 steps.
  • Linear layer acceleration
  •  TurboDiffusion quantizes both weights and activations in linear layers to 8-bit (W8A8), which speeds up linear computations and significantly reduces VRAM usage.

Together, these techniques enable near-lossless acceleration, allowing TurboDiffusion to deliver dramatic speed improvements while maintaining visual stability and consistency.

These four core technologies were independently developed by Tsinghua University TSAIL team and ShengShu Technology. They carry milestone significance and far-reaching impact for both breakthroughs in AI multimodal foundation models and their industrial-scale deployment. In particular, SageAttention is the first method to enable low-bit attention acceleration and has already been deployed at scale across the industry.

For example, SageAttention has been successfully integrated into NVIDIA's inference engine TensorRT, and has also been deployed and productionized on major GPU platforms such as Huawei Ascend and Moore Threads S6000. In addition, leading global and domestic technology companies and teams, including Tencent Hunyuan, ByteDance Doubao, Alibaba Tora, Shengshu Vidu, Zhipu Qingying, Baidu PaddlePaddle, Kunlun Wanwei, Google Veo3, SenseTime, and vLLM, have adopted this technology in their core products, generating substantial economic value.

Turning Minutes into Seconds

The impact of TurboDiffusion is substantial. On open-source video generation models 1.3B/14B-T2V, TurboDiffusion achieves 100 times to a peak of 200 times end-to-end speedup on a single RTX 5090 GPU, significantly reducing generation time while maintaining visual quality. The code and models are open-sourced and can be directly deployed.

When applied to ShengShu Technology's proprietary Vidu video model, similar gains are observed. For example, generating a 1080p, 8-second high-quality video, which previously required around 900 seconds, can now be completed in approximately 8 seconds. What once took minutes is now achieved in seconds.

This shift brings AI video generation closer to real-time interaction and significantly improves usability for both creators and enterprises.

Looking ahead, ShengShu Technology will continue to invest in foundational innovation to improve efficiency, enhance user experience, and reduce the cost of creation and deployment. Through ongoing advances at the system and model level, the company aims to accelerate the real-world adoption of generative AI and usher the creative ecosystem into a more efficient new era.

For more information:

TurboDiffusion: https://github.com/thu-ml/TurboDiffusion 

SageAttention: https://github.com/thu-ml/SageAttention 

Sparse-Linear Attention: https://github.com/thu-ml/SLA 

rCM: https://github.com/NVlabs/rcm 

To learn more about Vidu, visit www.vidu.com

Vidu API is available at platform.vidu.com. 

About ShengShu Technology

Founded in March 2023, ShengShu Technology is a world-leading artificial intelligence company, specializing in the development of Multimodal Large Language Models. Driven by innovation, the company delivers cutting-edge MaaS and SaaS products that revolutionize creative production by enabling smarter, faster, and scalable content creation. With its flagship video generation platform Vidu, ShengShu Technology's solutions have reached more than 200 countries and regions around the world, spanning fields including interactive entertainment, advertising, film, animation, cultural tourism, and more.

Cision View original content to download multimedia:https://www.prnewswire.com/news-releases/shengshu-technology-and-tsinghua-university-unveil-turbodiffusion-ushering-in-the-era-of-real-time-ai-video-generation-302648640.html

SOURCE ShengShu Technology