Tellers Agents Can Now Edit Images, Use Masks, and Search Pexels Faster

Tellers Team · May 3, 2026 · 10 min read

Tellers is not just a collection of AI video tools. It is an agentic video creation system.

That means the important question is not only “what features are available?” but also “what can the agent now do for you?”

With the latest update, the Tellers agent can do more of the visual work required to create a video. It can search Pexels faster, use both stock videos and stock images, generate multiple image outputs, and protect parts of an image with masks before editing the rest.

These improvements are especially useful when creating quick, low-cost videos. Tellers can automatically find relevant B-roll, assemble visuals around a script, audio recording, voice-over, or song, and produce a first version quickly.

When using Fast mode, the same agentic workflow runs with a smaller, cheaper, faster model configuration. Fast mode does not unlock different features; it simply prioritizes speed and cost over deeper reasoning.

The Tellers Agent Now Has Better Visual Tools

Most video creation workflows involve many small visual decisions.

You need footage that matches a sentence.
You need an image that supports a voice-over.
You need a scene that fits the tone of the music.
You need to keep one part of an image while changing another.
You need several visual options before choosing the right one.

In Tellers, these are not separate manual steps. They are tasks the agent can perform as part of the video creation workflow.

The latest update improves three parts of that workflow:

Faster and broader Pexels search
Mask selection for AI image editing
Multi-image output for visual generation

Together, these make the agent better at finding, editing, and generating the right visuals for a video.

Pexels Search Is Now Faster and Supports Images

Tellers has supported Pexels since the start of the project. The new update does not introduce Pexels for the first time. Instead, it improves how the Tellers agent uses it.

The agent can now run multiple Pexels searches at once. This makes the search process faster and helps reduce token consumption, because the agent can explore several visual directions in fewer steps.

Tellers also now supports Pexels images on top of Pexels videos.

This matters because not every scene needs generated visuals. Sometimes the fastest and most cost-effective option is to use high-quality stock footage or a real photograph. Stock assets are especially useful when you want to create a video quickly and cheaply.

For example, if your script mentions:

“A founder preparing for a product launch”

The agent can search for relevant B-roll such as startup teams, laptops, workspaces, planning sessions, product shots, or abstract business footage. It can then select the best visual assets and place them into the video automatically.

This is what makes the Tellers agent powerful: it does the search, selection, and assembly work for you.

What B-Roll Means in Tellers

B-roll is supporting footage or imagery used to illustrate the main message of a video.

If the main content is a voice-over, narration, podcast clip, song, or script, the B-roll is what appears visually while the audio plays. It helps the viewer understand and feel the story.

In a traditional workflow, finding B-roll means searching manually through stock libraries, downloading clips, importing them into an editor, cutting them to match the timing, and repeating that process for every section of the video.

In Tellers, the agent does that automatically.

You can provide a script, an audio file, a voice recording, or music. The agent analyzes the content, understands what each part needs visually, and finds or creates relevant B-roll to illustrate it.

This can include:

Stock videos from Pexels
Stock images from Pexels
AI-generated images
Edited images
Visuals based on reference frames
Assets selected from your own footage or media library

The goal is simple: give the agent the content, and let it build the visual layer around it.

Mask Selection Lets the Agent Protect Parts of an Image

Mask selection gives the Tellers agent more control when editing images.

Instead of regenerating an entire image, the agent can protect specific areas and modify only the parts that should change.

For example, you might want to:

Keep a product unchanged while replacing the background
Protect a person’s face while changing the scene around them
Keep a logo or brand element intact
Remove an object without altering the rest of the frame
Regenerate only one part of an image that does not fit the video

This is important because video generation often depends on continuity. If an image is almost right, you do not always want to regenerate it from scratch. You want to keep what works and only fix what does not.

Mask selection lets the agent do that.

It can preserve the important parts of an image, then use AI editing to modify or regenerate the remaining areas. This gives creators more control while keeping the workflow agent-driven.

Multi-Image Output Gives the Agent More Visual Options

Tellers now supports multi-image output from GPT Image 2.

This allows the agent to generate several image outputs in one step. Instead of requesting one image, checking it, then prompting again, the agent can produce multiple options and choose the most useful one for the video workflow.

GPT Image 2 brings coherence preservation across a few frames. Tellers builds on top of that with its own agentic approach.

The Tellers agent is already designed to preserve coherence across many generations by automatically choosing relevant reference frames and prompts. This means the agent can maintain visual consistency over longer workflows, not just across a small set of generated images.

For video creation, this is especially useful.

A single video may need many related visuals: scene backgrounds, cutaways, title cards, product shots, transitions, and variations of the same concept. The agent can use multi-image output to explore options faster, then continue the workflow with the images that best match the story.

Why This Matters for Video Creation

The biggest benefit of these updates is not that users have more buttons to click.

The benefit is that the Tellers agent can now make better visual decisions automatically.

For a simple video, you can provide a script, audio recording, voice-over, or song. The agent can then:

Understand the content
Break it into visual moments
Search for relevant stock footage or images
Generate visuals when stock assets are not enough
Edit images while preserving important areas with masks
Select the best outputs
Assemble the video

When speed and cost matter, stock footage and stock images are often the best starting point. That is why the improved Pexels search is important. The agent can find relevant B-roll quickly, use real footage where appropriate, and avoid unnecessary generation.

For simple videos, the agent can produce a first version in less than 30 seconds, especially when it can rely on stock footage instead of slower generation-heavy workflows.

Example: Turning a Script Into a Video

Imagine you provide this script:

“AI agents are changing how teams create content. Instead of manually searching for visuals, editing images, and assembling timelines, teams can now describe what they want and let the agent build the video.”

The Tellers agent can automatically search for relevant B-roll: teams working, people editing content, abstract AI visuals, laptops, timelines, creative workflows, and product scenes.

If it finds strong stock footage, it can use it directly.

If it finds a useful Pexels image that is close but not perfect, it can use mask selection to protect the parts that matter and regenerate the rest.

If the video needs a more specific visual, it can generate multiple image options, compare them, and continue with the best one.

The result is not just faster image generation. It is a more complete agentic workflow for creating videos.

Available Now

These updates are live in Tellers v0.0.237.

The Tellers agent can now use improved multi-search for Pexels, add Pexels images in addition to videos, edit images with mask selection, and generate multiple image outputs with GPT Image 2.

If you are creating videos from scripts, audio recordings, voice-overs, or music, the agent can automatically find or generate relevant B-roll and assemble the video for you.

Try it on Tellers

What is B-roll?

B-roll is supporting footage used to illustrate or enrich the main story of a video. For example, if a voice-over talks about a startup team building a product, the B-roll could show people working on laptops, product shots, office scenes, or abstract visuals that support the message. In Tellers, the agent automatically finds or generates relevant B-roll to illustrate a script, audio recording, song, or voice-over.

What is mask support in the Tellers image generation tool?

Mask support lets the Tellers agent protect specific parts of an image before modifying or regenerating the rest. This is useful when you want to keep a subject, product, face, logo, or important visual area unchanged while asking the agent to edit another part of the image.

How does multi-image output work in Tellers?

The Tellers agent can now request multiple image outputs in one generation step. This uses GPT Image 2's ability to preserve coherence across a few frames, while Tellers adds its own agentic layer to choose relevant reference frames and prompts across larger video generation workflows.

What changed with Pexels in Tellers?

Pexels was already available in Tellers from the start of the project. This update improves how the agent searches Pexels by allowing multiple searches at once, which helps reduce token usage and speed up the process. Tellers also now supports Pexels images in addition to Pexels videos.

When should I use stock footage in Tellers?

Stock footage is ideal when you want to create videos quickly and cheaply. The Tellers agent can automatically find relevant stock B-roll and use it to assemble a first version of a video much faster than generation-heavy workflows.

What is Fast mode in Tellers?

Fast mode is a model and reasoning preset. It uses a smaller, faster, cheaper configuration so the agent can respond more quickly and cost less to run. It does not have special features: the agentic workflow is the same, but the reasoning budget is lower than in more advanced modes.

Can Tellers create videos from scripts, audio, or music?

Yes. Tellers is built around an agentic workflow: the agent analyzes your script, audio, voice recording, or music, then finds or generates relevant visuals to illustrate it. This can include stock footage, Pexels images, AI-generated images, and edited images.

Does this work through the Tellers API?

Yes. These capabilities are available through the Tellers API, so developers can build automated video creation workflows where the agent searches, selects, edits, and generates visuals programmatically.