Google presented the next generation of its image and video artificial intelligence (AI) models on Tuesday at the I/O 2025 event. These multimodal AI models, dubbed Imagen 4 and Veo 3, provide new features and updates to their predecessors. While Imagen 4 has quicker generating speeds and enhanced text rendering, Veo 3 now has native audio creation capabilities and can include background sound and conversation in created films. Along with the new models, the tech giant launched Flow, a new AI-powered filmmaking program.
What's New with Imagen 4 and Veo 3?
In a blog post, the Mountain View-based tech behemoth described its new picture and video generating AI models. Imagen 4 arrives over a year after its predecessor was launched. Google also published Veo 2 and added additional features to Imagen 3 in December 2024.
Imagen 4 focuses on model creation speed and accuracy. Like the previous generation, the current Imagen model accepts text and pictures as input. The produced photographs show an improvement in terms of small details such as delicate textiles, water drops, and animal hair. It can also produce photos considerably more quickly than its predecessor.
Google claims that Imagen 4 can produce superior photographs in photorealism and abstract genres. It produces output in a variety of aspect ratios and resolutions, including up to 2K. In addition, the business improved text display by focusing on word spelling and typography. The model is now more mindful of context when it comes to text positioning, font size selection, and innovative font style choices.
Imagen 4 is now accessible in the Gemini app, Whisk, Vertex AI (for companies), and Workspace applications including Docs, Slides, Vids, and more. It is unclear whether Google intends to spread the approach to all Gemini users or only premium members. Later this year, the business hopes to release a version of the AI model that can create photos 10 times quicker than Imagen 3.
Google's latest video creation model, Veo 3, now includes native audio generation, allowing it to add ambient noises, background noise, and conversations into films. In a demo demonstrated at the I/O 2025 event, two animated characters could communicate with each other in a crisp and natural-sounding voice.
Flow is an AI-powered filmmaking tool that uses the Gemini, Imagen, and Veo models. Users may describe a video clip using natural language prompts, and the software will create an eight-second video. The software is reported to have high prompt adherence and can provide consistent frames of cast, places, objects, and styles. It is accessible for Google AI Pro and Ultra plan members in the United States.