Creating Videos

This guide covers the complete process for generating videos in ZeroTwo Studio, including step-by-step instructions and prompt writing techniques for better results.

Video generation requires a Pro+ plan. If video generation is not available to you, upgrade in Settings → Account.

Step-by-step: generate a video

Navigate to the video workspace

Click Video in the topbar or go to /studio/video. You’ll see the prompt field, model selector, and settings controls.

Describe your video

Write a description of the video in the prompt field. Effective video prompts describe: the subject, what they’re doing, the setting, camera movement, visual style, and mood.Example prompts:

A person walking slowly through a misty forest at dawn, trees towering overhead, cinematic, slow pan
Time-lapse of a city skyline transitioning from day to night, lights flickering on across the buildings
Close-up of ocean waves crashing against rocks, slow motion, dramatic, overcast sky

Select a model

Choose a video generation model from the model dropdown. Different models have different strengths — see video models for guidance.

Set output settings

Configure the output before generating:

Duration: How long the clip should be (typically 2–10 seconds depending on model)
Aspect ratio: Landscape (16:9), Portrait (9:16), or Square (1:1)
Format: MP4, WebM, or MOV — see supported formats

Click Generate

Click the Generate button. A progress indicator shows while the video is being created. Do not close the tab — generation must complete in-browser.

Wait for generation to complete

Video generation takes 30 seconds to several minutes. Longer clips and higher-quality models take more time. This is normal — AI video requires significantly more compute than images.

Preview and download

When generation completes, the video appears in the gallery. Click to preview it in-browser. Use the download icon to save in your preferred format.

Prompt tips for better videos

Video prompts benefit from a few specific elements that don’t apply to image prompts:

Specify camera movement

AI video models respond well to explicit camera direction:

"slow pan from left to right"
"zoom out gradually revealing the full scene"
"close-up on subject, then pull back"
"static camera, no movement"
"handheld, slightly shaky for a documentary feel"
"aerial view descending toward the subject"

Describe the action clearly

Be specific about what is happening in the scene:

Instead of: a city
Try: a busy city street at night, pedestrians walking under neon lights, cars passing by

Clear, specific actions produce more coherent motion. Abstract or vague descriptions tend to produce inconsistent or confusing movement.

Set the visual style and mood

AI video models respond to style direction:

"cinematic, 24fps film look, anamorphic lens flare"
"documentary style, naturalistic lighting"
"dreamy and soft, slightly out of focus, pastel tones"
"high energy, fast cuts implied in scene energy"
"calm and meditative, slow movement"

Keep it focused

AI video currently works best with:

One or two subjects (not crowds)
Clear, simple action (not complex choreography)
Single continuous shots (not implied cuts)
Natural environments and settings

The more focused and specific the scene, the more consistent the motion and subject rendering will be.

Mention duration cues

If you want the clip to feel like a short moment vs. a longer unfolding:

"a brief moment, 3 seconds"
"slow-motion capture of a single action"
"a continuous 5-second shot"

Generated video actions

Once a video is generated and appears in the gallery:

Action	Description
Preview	Play the video in-browser using the built-in player
Download	Save as MP4, WebM, or MOV
Share	Generate a shareable link
Add to project	Attach the video to a ZeroTwo project (if applicable)

Chain multiple short clips together in a video editing tool (like CapCut, DaVinci Resolve, or Premiere) to build longer sequences. AI video currently produces the most consistent results in short clips — generating 3–5 second segments and editing them together is often more effective than generating a single long clip.

Getting Started

Overview

Core Chat

Tools

Studio

Models & Providers

Projects

Custom Agents

Skills

Connectors & Integrations

Personalization & Memory

Sharing

Workspaces & Business

Account & Billing

Privacy

Prompts

Troubleshooting

FAQ

Changelog

Reference

Step-by-step: generate a video

Prompt tips for better videos

Specify camera movement

Describe the action clearly

Set the visual style and mood

Keep it focused

Mention duration cues

Generated video actions

Getting Started

Overview

Core Chat

Tools

Studio

Models & Providers

Projects

Custom Agents

Skills

Connectors & Integrations

Personalization & Memory

Sharing

Workspaces & Business

Account & Billing

Privacy

Prompts

Troubleshooting

FAQ

Changelog

Reference

Documentation Index

​Step-by-step: generate a video

​Prompt tips for better videos

​Specify camera movement

​Describe the action clearly

​Set the visual style and mood

​Keep it focused

​Mention duration cues

​Generated video actions

Step-by-step: generate a video

Prompt tips for better videos

Specify camera movement

Describe the action clearly

Set the visual style and mood

Keep it focused

Mention duration cues

Generated video actions