Skip to main content
This guide covers the complete process for generating videos in ZeroTwo Studio, including step-by-step instructions and prompt writing techniques for better results.
Video generation requires a Pro+ plan. If video generation is not available to you, upgrade in Settings → Account.

Step-by-step: generate a video

1

Navigate to the video workspace

Click Video in the topbar or go to /studio/video. You’ll see the prompt field, model selector, and settings controls.
2

Describe your video

Write a description of the video in the prompt field. Effective video prompts describe: the subject, what they’re doing, the setting, camera movement, visual style, and mood.Example prompts:
  • A person walking slowly through a misty forest at dawn, trees towering overhead, cinematic, slow pan
  • Time-lapse of a city skyline transitioning from day to night, lights flickering on across the buildings
  • Close-up of ocean waves crashing against rocks, slow motion, dramatic, overcast sky
3

Select a model

Choose a video generation model from the model dropdown. Different models have different strengths — see video models for guidance.
4

Set output settings

Configure the output before generating:
  • Duration: How long the clip should be (typically 2–10 seconds depending on model)
  • Aspect ratio: Landscape (16:9), Portrait (9:16), or Square (1:1)
  • Format: MP4, WebM, or MOV — see supported formats
5

Click Generate

Click the Generate button. A progress indicator shows while the video is being created. Do not close the tab — generation must complete in-browser.
6

Wait for generation to complete

Video generation takes 30 seconds to several minutes. Longer clips and higher-quality models take more time. This is normal — AI video requires significantly more compute than images.
7

Preview and download

When generation completes, the video appears in the gallery. Click to preview it in-browser. Use the download icon to save in your preferred format.

Prompt tips for better videos

Video prompts benefit from a few specific elements that don’t apply to image prompts:

Specify camera movement

AI video models respond well to explicit camera direction:
  • "slow pan from left to right"
  • "zoom out gradually revealing the full scene"
  • "close-up on subject, then pull back"
  • "static camera, no movement"
  • "handheld, slightly shaky for a documentary feel"
  • "aerial view descending toward the subject"

Describe the action clearly

Be specific about what is happening in the scene:
  • Instead of: a city
  • Try: a busy city street at night, pedestrians walking under neon lights, cars passing by
Clear, specific actions produce more coherent motion. Abstract or vague descriptions tend to produce inconsistent or confusing movement.

Set the visual style and mood

AI video models respond to style direction:
  • "cinematic, 24fps film look, anamorphic lens flare"
  • "documentary style, naturalistic lighting"
  • "dreamy and soft, slightly out of focus, pastel tones"
  • "high energy, fast cuts implied in scene energy"
  • "calm and meditative, slow movement"

Keep it focused

AI video currently works best with:
  • One or two subjects (not crowds)
  • Clear, simple action (not complex choreography)
  • Single continuous shots (not implied cuts)
  • Natural environments and settings
The more focused and specific the scene, the more consistent the motion and subject rendering will be.

Mention duration cues

If you want the clip to feel like a short moment vs. a longer unfolding:
  • "a brief moment, 3 seconds"
  • "slow-motion capture of a single action"
  • "a continuous 5-second shot"

Generated video actions

Once a video is generated and appears in the gallery:
ActionDescription
PreviewPlay the video in-browser using the built-in player
DownloadSave as MP4, WebM, or MOV
ShareGenerate a shareable link
Add to projectAttach the video to a ZeroTwo project (if applicable)
Chain multiple short clips together in a video editing tool (like CapCut, DaVinci Resolve, or Premiere) to build longer sequences. AI video currently produces the most consistent results in short clips — generating 3–5 second segments and editing them together is often more effective than generating a single long clip.