According to Cointelegraph, Meta has introduced a new suite of artificial intelligence models named 'Movie Gen' on October 4, capable of generating photorealistic movies up to 16 seconds long, complete with sound effects and backing music tracks. While not the first multimodal AI model to generate video and audio from text prompts, Movie Gen appears to demonstrate state-of-the-art capabilities. Researchers claim it outperformed rival systems in human testing.
Meta's blog post reveals that Movie Gen can output movies at a frame rate of 16 frames per second (FPS). For context, traditional Hollywood films were shot at 24 FPS to achieve the 'film look.' Although higher FPS rates are preferred in gaming and other graphical applications, Meta's 16 FPS is close to professional-quality movie imagery. The models can generate entirely new movies based on simple text prompts or modify existing images or videos to replace or alter objects and backgrounds.
One of the most advanced features of Movie Gen is its ability to generate up to 45 seconds of audio, including sound effects and background music, which is integrated and synced with the motion in the generated videos. Despite these advancements, Meta is keeping the foundation models behind Movie Gen under wraps for now. The company has not provided a timeframe for the product's launch, stating that further safety testing is required before deployment.
A research paper from Meta's AI team indicates that the Movie Gen models were developed for research purposes and need multiple improvements before being deployed. The company plans to incorporate safety models to reject input prompts or generations that violate their policies to prevent misuse.