OpenAI introduced a new neural network, Sora, for video generation. The company says Sora “can create realistic and fantasy scenes using text instructions.” The text-to-video conversion model allows users to create photorealistic videos up to a minute long with Full HD resolution (1920 × 1080 pixels) based on text descriptions.
Sora is capable of creating “complex scenes with multiple characters, specific types of movement, and precise object and background detail,” OpenAI said in a blog post. The company also notes that the neural network can understand how objects “exist in the physical world,” as well as “accurately interpret props and generate compelling characters that express powerful emotions.”
The model can generate video from a still image, fill in missing frames in an existing video, or extend it. Demos created with Sora and featured on the OpenAI blog include a California Gold Rush scene, a video from what appears to be the inside of a Tokyo train, and more. Many of them have some artifacts that indicate the work of artificial intelligence. For example, the suspiciously moving floor in the video about the museum. OpenAI itself says that the model “may have difficulty accurately simulating the physics of a complex scene,” but overall the results are quite impressive.
A couple of years ago, it was text-to-image generators like Midjourney that best demonstrated AI’s ability to turn words into images. But generative video has been improving at a remarkable pace lately, with companies like Runway and Pika showing off impressive text-to-video conversion models, and Google’s Lumiere looks set to become one of OpenAI’s main competitors in this space. Like Sora, Lumiere provides users with tools to convert text to video and also allows users to create videos from a still image.
Currently, Sora is only available to individual testers who evaluate the model for potential harms and risks. OpenAI also offers on-demand access to individual artists, designers and filmmakers to receive feedback. The company notes that the existing model may not accurately simulate the physics of a complex scene and may misinterpret some instances of cause-and-effect relationships.
Earlier this month, OpenAI announced it was adding markings to its text-to-image tool DALL-E 3, but noted that they can be easily removed. As with other AI products, OpenAI will have to contend with the consequences of fake photorealistic AI-generated videos being passed off as real.
More videos generated by Sora can be found here.
If you notice an error, select it with the mouse and press CTRL+ENTER.