Yesterday at Google I/O 2026, Demis Hassabis officially unveiled Gemini Omni — the model that leaked three days ago inside the Gemini app. Multiple outlets are already framing it as Nano Banana for video: conversational AI video editing, with the model deciding what to change and what to keep.
Here is what is actually confirmed, where Omni sits in the existing video-to-video landscape, and what it means for AI video editing workflows today.
What Gemini Omni Actually Ships
The first public variant is Gemini Omni Flash, rolling out same-day to Gemini app subscribers and Flow, with free access in YouTube Shorts and YouTube Create. The confirmed surface:
- 10-second clips — the cap for Flash. Google describes this as a compute deployment choice, not a model limit, and says longer durations are in the pipeline.
- Multimodal input — text, image, audio, and video can be combined in a single prompt. The model produces video output.
- Conversational editing — change characters, backgrounds, objects, or motion via plain-text instructions, with identity preserved across edits.
- Implicit masking — when you ask to edit a part of a clip, the model segments and modifies only the relevant region. There is no user-facing rotoscoping UI; the masking happens inside the pipeline.
- Speed — early reports point to frontier-quality generations finishing in under a minute for a 10-second clip. That is a meaningful gap versus older video-to-video models that took several minutes per pass.
The “Nano Banana for video” framing is fair. The shared paradigm is the same: tell the model what to change, let it figure out where and how, keep what was not asked to change.
Where It Sits in the Video-to-Video Landscape
Omni is not the first video-to-video editing model. It is the first one with Google’s distribution and a credible claim to frontier quality at this latency.
- Runway Aleph was the notable first. It demonstrated in-context video editing — adding objects, changing camera angles, relighting — and produced strong demos. In practice, it was hard to use reliably outside a narrow set of shots.
- Kling and LTX Video both added video-to-video editing during 2025. Kling shipped strong motion quality on short clips; LTX stays useful as an open-source option that runs on consumer hardware.
- Wan 2.1 VACE from Alibaba was the first open-source unified generation and editing model — masks, repainting, spatio-temporal extension, all released under permissive weights. For teams that need on-prem or want to fine-tune, it remains the most flexible option.
- mago.studio carved out a niche on professional VFX-style video-to-video — style transfer, frame-to-frame consistency, motion design.
Omni’s contribution is not the category. It is doing this fast enough, and well enough, that the conversational editing pattern becomes practical for short content at scale.
Early Caveats
We have tested Omni inside the Gemini app since the launch. Two things stand out.
The safety filter has a lot of false positives on video input. Clips that pass any reasonable bar for general audiences are getting blocked. Google has acknowledged similar over-cautious behavior on Nano Banana at launch and tuned it down later, so this will likely improve — but right now it limits what you can actually test.
There is no API yet. Google has said it will land on Vertex AI in the coming weeks. Until then, Omni is a product, not a building block. Pipelines that depend on it for production work are still on hold.
What This Means on Tellers
Tellers is multi-model by design. The agent already orchestrates AI video creation across Veo 3.1, Seedance 2, Runway Gen 4.5, Kling, LTX, Hailuo, HappyHorse, and others, picking the right tool per shot rather than locking the platform to one provider.
When the Omni API ships, we will integrate it and put it through the same evaluation we run on every new model — latency, identity preservation, audio sync, conditioning surface, output rights. If it earns its place, the Tellers agent will reach for it on the shots where it is the right tool. If a different model is better for the shot in front of you, that one will get the call instead.
What is Gemini Omni?
Gemini Omni is Google's new multimodal video model, announced at Google I/O 2026 on May 19, 2026. It accepts text, image, audio, and video as input and produces edited or generated video output through conversational prompts. The first variant shipping is Gemini Omni Flash.
How long are Gemini Omni clips?
Gemini Omni Flash generates and edits clips up to 10 seconds long. Google has described this as a deployment choice tied to compute, with longer durations in the pipeline rather than a hard model limit.
Is Gemini Omni available via API?
Not yet. Google has said the API will arrive in the coming weeks via Vertex AI. Today, Omni is available inside the Gemini app and Flow for paying subscribers, with free access through YouTube Shorts and YouTube Create.
Is Gemini Omni available on Tellers?
Not yet. Tellers will integrate Gemini Omni as soon as the API ships. The Tellers agent is multi-model, so Omni will join existing video models like Veo 3.1, Seedance 2, Runway Gen 4.5, Kling, and LTX rather than replace them.
How does Gemini Omni compare to Runway Aleph?
Runway Aleph was the first widely discussed video-to-video editing model — strong demos but limited production usability. Gemini Omni targets the same conversational editing pattern with faster turnaround and stronger consistency, but ships without timeline-level masking or user-controlled rotoscoping.
Next Steps
Omni is the most credible video-to-video launch of 2026 so far, and the API window is short. If you want a workflow that already combines top AI video editing models with full timeline control — and will pick up Omni the moment it is callable — open Tellers and start one project. The model selection updates as the landscape moves.