Sora, the new AI from the creators of ChatGPT, generates videos with surprising realism

The artificial intelligence It is already multimodal. OpenAI, the developer of the popular ChatGPT, revealed last Thursday a new tool to generate videos from text descriptions.

This is Sora, a word that in Japanese means sky. Shortly after its presentation, the news spread across the internet and social networks. What made this AI attract so much attention? The great realism of its images.

AI can produce images following user instructions both in theme and style, just as we are already accustomed to other generative tools for text and images. The duration of their videos is one minute.

The company has said on its blog that Sora can create a video from scratch using text instructions or using a still image as a reference and then expand it with new material.

“We are teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require interaction in the real world”explains OpenAI when presenting the new text-to-video conversion tool. “Sora can generate videos up to one minute long while maintaining visual quality and fidelity to the user’s instructions”Add.

Multimodal AI

OpenAI presented its ChatGPT chatbot in 2022, although a year earlier it had already launched Dall-E, its tool for generating images from text. The fame of ChatGPT, which is an AI specialized in dialogue, did not take long to arrive, in a few months it had already accumulated 100 million users.

Although artificial intelligence is a technology that has been experimented with for decades, the popularity of the company’s products made these tools break into the technological world. Since then, other world-class companies such as Google, Meta or even Apple have been working on their own projects.

Although other models already exist for generating videos, they are still several steps behind their text and still image counterparts. Sora changes this scenario a bit, since can output videos of any resolution and aspect ratio, even up to 1080p.

Months ago, in a conversation with El Comercio, César Beltran, coordinator of the Artificial Intelligence Research Group at the Pontificia Universidad Católica del Perú (PUCP), the specialist pointed out that the path that artificial intelligence would follow would be to become multimodal, that is, combine text, image, video and sound. We have already arrived. What will be the next step?

Sora Availability

OpenAI CEO Sam Altman announced the launch of Sora on social network X. At the moment, Sora is not open access, it is only available to some researchers and video creators. However, so much has been the fascination that has been brewing on Elon Musk’s social network that Altman has presented several more examples.

The model has a deep knowledge of the language, allowing it to accurately interpret cues and generate compelling characters that express vibrant emotions, the company explains.

Here is an example of the ‘prompt’ that OpenAI has used in one of its videos: “An elegant woman walks down a Tokyo street full of bright neon lights and animated signs of the city. She is wearing a black leather jacket, a long red dress, black boots and a black bag. She is wearing sunglasses and red lipstick. She walks with confidence and carefree. The street is wet and reflective, creating a mirror effect of colored lights. “A lot of pedestrians walk there.”

Here is one more example: “A white and orange tabby cat is seen running happily through a dense garden, as if chasing something. His eyes are wide and happy as he runs forward, scanning the branches, flowers and leaves as he walks. The path is narrow as it makes its way through all the plants. The scene is captured from a ground-level angle, closely following the cat, providing a low and intimate perspective. The image is cinematic with warm tones and a grainy texture. The daylight scattered between the leaves and plants above creates a warm contrast that accentuates the cat’s orange fur. The shot is clear and sharp, with a shallow depth of field.”

The team with access to the tool will be tasked with testing its capabilities and finding bugs, as well as determining its susceptibility to circumventing OpenAI’s terms of service, which They prohibit “extreme violence, sexual content, hate images, the image of celebrities or the intellectual property of third parties.

Technology in the spotlight

OpenAI has been sued on more than one occasion for alleged copyright infringement in the training of its generative artificial intelligence tools, which digest massive amounts of material scraped from the Internet and imitate the images or text contained in those data sets. International news outlets such as the New York Times have sued the company.

The emergence of tools like Sora, which has great realism, opens the possibility to the entertainment industry and audiovisual production, but there are also risks such as the massification of ‘deepfakes’, manipulated videos that are passed off as real. The technology industry faces great challenges in the coming years.

