Skip to main content
ZeroTwo offers multiple AI video generation models, each with different strengths in motion quality, stylistic output, clip length, and prompt adherence. This page gives an overview of the available models and guidance on choosing between them.
AI video generation technology is evolving rapidly. New models are added to ZeroTwo regularly. Check the model dropdown in the video Studio for the current full list of available models.

About video generation models

Video generation models are fundamentally different from text or image models. They’re specialized for:
  • Temporal consistency: ensuring subjects, lighting, and scene elements remain coherent across frames
  • Realistic motion: generating natural-looking movement for people, objects, and environments
  • Prompt adherence: translating text descriptions into specific visual action and camera behavior
This is a significantly harder problem than generating a single image, which is why AI video quality is still evolving and generation times are longer.

Model capabilities overview

CapabilityNotes
Output lengthTypically 2–10 seconds per clip (model-dependent)
Aspect ratiosLandscape (16:9), Portrait (9:16), Square (1:1)
Output formatsMP4, WebM, MOV
Plan requirementPro+ for all video models
Generation time30 seconds to 5+ minutes depending on model and length

Choosing a model

Different models have different strengths. General guidance:
If you need…Look for…
Realistic human motionModels marketed for realistic/cinematic output
Animated or stylized videoModels with style or animation emphasis
Fastest generationLighter/faster model variants
Longest clip lengthCheck model-specific max duration in the model dropdown
Best prompt adherenceTry multiple models — adherence varies significantly
Because video generation takes minutes and uses more compute than images, it’s worth spending time on a strong prompt before generating. Review the prompt tips in the create videos guide before your first generation.

Working with short clips

AI video models produce the most consistent results with short clips — typically 2–5 seconds. Short clips:
  • Complete faster
  • Have better subject and motion consistency
  • Are easier to iterate on
For longer videos, generate multiple short clips and combine them in a video editing tool. This gives you more control over pacing and lets you replace any segment that didn’t generate well.

Prompt strategies for different model types

Different model strengths call for different prompting approaches: For realistic/cinematic models: Focus on naturalistic descriptions — real-world settings, human subjects, natural lighting, and grounded action. These models respond well to photographic terminology: “shallow depth of field”, “natural lighting”, “handheld documentary style”. Example: A woman in her 30s walking through a busy farmers market, warm morning light, handheld camera, documentary style, slow motion For animated or stylized models: Lean into stylistic descriptions — art styles, color palettes, and animation characteristics. Reference genres or aesthetic movements: “Studio Ghibli style”, “cel-shaded animation”, “retro anime aesthetic”. Example: An animated fox running through an autumn forest, Studio Ghibli style, falling leaves, warm amber and gold color palette, smooth flowing motion For all models:
  • Keep scenes focused on one or two subjects
  • Describe camera movement explicitly
  • Specify time of day and lighting conditions
  • Mention mood and visual tone

Generation workflow tips

Before you generate

  1. Write your prompt in full before opening the model dropdown
  2. Re-read it and ask: is the subject clear? Is the action specific? Is the camera behavior described?
  3. Select the model you want to test first
  4. Generate a short clip (2–4 seconds) to validate the direction

After generation

  1. If motion is wrong but subject is right: refine the action description, keep the model
  2. If subject is wrong: revise the subject description and try again
  3. If both are off: try a different model with the same prompt to see if model selection is the issue
  4. If quality is generally low: try a different model known for higher-quality output

Building longer sequences

For content longer than 10 seconds:
  1. Break the story into distinct scenes (each 2–5 seconds)
  2. Generate each scene as a separate clip
  3. Combine in a video editor (CapCut, DaVinci Resolve, Final Cut Pro, Premiere)
  4. This approach also lets you replace any individual clip that didn’t generate well

Model updates

ZeroTwo’s video model library is updated as new and improved models become available from AI providers. The model dropdown in the video Studio always reflects the current available selection. Check the ZeroTwo changelog for announcements about new video model additions.

Creating videos

Full guide to video prompts and the generation workflow.

Supported formats

MP4, WebM, and MOV — which to choose for your use case.

Video troubleshooting

Common issues and fixes for video generation.

Image models

Compare image generation models for still-image creative work.