Meta showcases its Movie Gen AI model, combining video with sound
Meta has unveiled Movie Gen, a media-focused generative AI model designed to assist and inspire filmmakers regardless of the scale of the project.
A text prompt is transformed into a video with sound but any prospective users who are intrigued will need to wait for some time as there is no public release on the horizon.
Despite this, Meta claimed on Friday (Oct, 4) that it is “sharing this research because we believe in the power of this technology to help people express themselves in new ways and to provide opportunities to people who might not otherwise have them.”
The company said it hopes one day people will be able to “bring their artistic visions to life” with universal access to the program.
Meta’s Movie Gen does not consist of just one advanced model as it combines a “cast” of foundational models with the most powerful being the text-to-video element. In the production of AI videos, sound is generated to match the setting or theme of the video.
This could be the sound of a train leaving a station platform, or heavy rain during a thunderstorm. Music will also be added if deemed appropriate.
Meta Movie Gen is on the scene! Our breakthrough generative AI research for media enables:
-turning text into video
-creation of personalized video
-precision video editing
-audio creationAnd while it’s just research today, we can’t wait to see all the ways people enhance… pic.twitter.com/I4Bq9if3eK
— Meta (@Meta) October 4, 2024
How was Meta’s Movie Gen trained?
Movie Gen and its four ‘capabilities’ (video generation, personalized video generation, precise video editing, and audio generation) are said to have been trained using “a combination of licensed and publicly available datasets”, with video content likely to have been obtained from Meta’s platforms such as Facebook and Instagram.
A previous stumbling block for video generators has been the ability to edit. That has been addressed, to an extent, by Mark Zuckerberg’s company.
Movie Gen introduces a text-based editing method that you can use for a basic edit. This could be “change the background to a cityscape night sky”, but it will make the specified change only.
The AI generator can produce up to 16 seconds of video content, with a staggering 16 frames per second, or alternatively, you can have 10 seconds of video at 24 frames per second.
The video output is 768 pixels in width, harking back to the times of 1024×768 but this is more than enough to combine with other HD formats.
Some may have been expecting voice generation but there are valid reasons why Meta has not taken that step. It is very difficult to master speech generation and it is also controversial, with the rise of deepfake content.
The risk factor is enhanced in the run-up to the presidential election later this year and the current political climate.
Meta has set out its intentions with the publication of its research on AI video generation and what it is currently capable of.
Image credit: Meta
The post Meta showcases its Movie Gen AI model, combining video with sound appeared first on ReadWrite.