Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The number of AI video generation models continues to grow with a new one, Pyramid Flow, launching this week and offering high quality video clips up to 10 seconds in length — quickly, and all open source. Developed by a collaboration of researchers from Peking University, Beijing University of Posts and Telecommunications, and Kuaishou Technology — the latter the creator of the well-reviewed proprietary Kling AI video generator — Pyramid Flow leverages a new technique wherein a single AI model generates video in stages, most of them low resolution, saving only a full-res version for the end of its generation process. It’s available as raw code for download on Hugging Face and Github, and can be run in an inference shell here but requires the user to download and run the model code on their own machine. NEW: Open Source Text/ Image to video model is out – MIT licensed – Rivals Gen-3, Pika & Kling 🔥 > Pyramid Flow: Training-efficient Autoregressive Video Generation method> Utilizes Flow Matching> Trains on open-source datasets> Generates high-quality 10-second videos>… pic.twitter.com/ZU0X6YMxWr — Vaibhav (VB) Srivastav (@reach_vb) October 10, 2024 At inference, the model can generate a 5-second, 384p video in just 56 seconds—on par with or faster than many full-sequence diffusion counterparts — though Runway’s Gen 3-Alpha Turbo still takes cake in terms of speed of AI video generation, coming in at under one minute and often times 10-20 seconds in our tests. We haven’t had a chance to test Pyramid Flow yet, but the videos posted by the model creators appear to be incredibly lifelike, high enough resolution, and compelling — analogous to those of proprietary offerings. You can see various examples here on its Github project page. Indeed, Pyramid Flow is available designed now to download and use — even for commercial/enterprise purposes — and is designed to compete directly with paid proprietary offerings such as Runway’s Gen-3 Alpha, Luma’s Dream Machine, Kling, and Haulio, which can cost hundreds of even thousands of dollars a year for users on unlimited generation subscriptions. As the race between various AI video providers to gain users continues, Pyramid Flow aims to bring more efficiency and flexibility to developers, artists, and creators seeking advanced video generation capabilities. A new technique for high-quality AI videos: ‘pyramidal flow matching’ AI video generation is a computationally intensive task that typically involves modeling large spatiotemporal spaces. Traditional methods often require separate models for different stages of the process, which limits flexibility and increases the complexity of training. Pyramid Flow is built on the concept of pyramidal flow matching, a method that drastically cuts down the computational cost of video generation while maintaining high visual quality, completing the video generation process as a series of “pyramid” stages, with only the final stage operating at full resolution. It’s described in a pre-reviewed paper, “Pyramidal Flow Matching for Efficient Video Generative Modeling,” submitted to open access science journal arXiv on October 8, 2024. The authors include Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong Mu, and Zhouchen Lin. Most of these researchers are affiliated with Peking University, while others are from Kuaishou Technology. As they write, the ability to compress and optimize video generation at different stages leads to faster convergence during training, allowing Pyramid Flow to generate more samples per training batch. For example, the proposed pyramidal flow reduces the token count by a factor of four compared to traditional diffusion models, which results in more efficient training. The model can produce 5- to 10-second videos at 768p resolution and 24 frames per second, all while being trained on open-source datasets. Specifically, the paper states that Pyramid Flow was trained on trained on: LAION-5B, a large dataset for multimodal AI research. CC-12M, a dataset of web-crawled image-text pairs. SA-1B, which features high-quality, non-blurred images. WebVid-10M and OpenVid-1M, which are video datasets widely used for text-to-video generation. In total, the authors curated approximately 10 million single-shot videos. However, many of these “public” or “open source” datasets have in recent years come under fire from critics for including copyrighted material without permission or informed consent of the copyright holders, and LAION-5B in particular accused of hosting child sexual abuse material. Separately, Runway is among the companies being sued by artists in a class action lawsuit for training on materials without permission, compensation, or consent — allegedly in violation of U.S. copyright. The case remains being argued in court, for now. Permissively licensed, open source for commercial usage Pyramid Flow is released under the MIT License, allowing for a wide range of uses, including commercial applications, modifications, and redistribution, provided the copyright notice is preserved. This makes Pyramid Flow an attractive option for developers and companies looking to integrate the model into proprietary systems, and could challenge Luma AI and Runway as both look to offer paid application programming interfaces for developers seeking to integrate their proprietary AI video generation technology into customer or employee-facing apps. Yet those proprietary models already exist as inferences suitable for developers, while Pyramid Flow has a demo inference on Hugging Face, it is not suitable for building full applications atop it and users would need to host their own version of an inference, which could also be costly, despite the model itself being “free.” In addition, Pyramid Flow may prove to be enticing to film studios looking to leverage AI to gain efficiencies, cut costs, and explore new creative tools. One major film studio, Lionsgate — owner of the John Wick and Twilight films franchises, among many other tiles — recently inked a deal for an unspecified sum with Runway to train a custom AI video generation model. Furthermore, Titanic and Terminator director James Cameron joined the board of AI video and image model provider Stability (the latter also subject to the same class-action lawsuit from artists as Runway). Using Pyramid Flow,