AI audio generation technology is evolving rapidly. New models are added to ZeroTwo regularly. Check the model dropdown in the audio Studio for the current full list.
Model categories
Audio generation models in ZeroTwo fall into three main categories:Music generation models
Specialized for generating original music from text descriptions. These models understand genre, mood, instrumentation, tempo, and musical structure. Best for:- Background music for videos, presentations, and apps
- Ambient soundscapes and atmospheric audio
- Jingles, intros, and branded audio pieces
- Specific genre requests (jazz, classical, electronic, etc.)
"Upbeat electronic background music, 120 BPM, synthesizer melody, suitable for a tech product demo, 60 seconds"
Voice synthesis / text-to-speech models
Generate spoken audio from text input. These models produce natural-sounding narration in various voices and styles. Best for:- AI narration for videos and presentations
- Podcast-style spoken content
- Accessibility audio (screen reader-style narration)
- Character voices for creative projects
"Narrate the following in a warm, professional female voice at a moderate pace: [text]"
Sound effects models
Generate specific, discrete audio events — clicks, chimes, environment sounds, and other effects. Best for:- UI sounds and notification tones
- Environmental and ambient effects
- Production sound design
- Game audio assets
"A single wooden door knock, two knocks, natural reverb, interior setting"
Choosing the right model
| Use case | Model type to choose |
|---|---|
| Background music for a video | Music generation |
| Voiceover narration | Voice synthesis / TTS |
| App notification sound | Sound effects |
| Ambient environment audio | Music generation or sound effects |
| Podcast intro | Music generation |
| AI-read article | Voice synthesis / TTS |
Output formats
| Format | Best for |
|---|---|
| MP3 | Web sharing, social media, general use |
| WAV | Professional production, lossless quality, video editing |
Prompting by model type
Each audio model type responds to different prompt elements:Prompting music generation models
The most important elements for music prompts are genre, mood, and instrumentation:| Prompt element | Examples |
|---|---|
| Genre | jazz, classical, electronic, ambient, folk, hip-hop, cinematic |
| Mood | uplifting, tense, melancholic, energetic, peaceful, mysterious |
| Instruments | piano, acoustic guitar, orchestral strings, synthesizer, drums, bass |
| Tempo | 120 BPM, slow and deliberate, fast-paced, moderate tempo |
| Duration | 30 seconds, 60 seconds, 2 minutes |
| Purpose | background music for a product video, podcast intro, game menu music |
Cinematic orchestral piece with rising strings and dramatic percussion, building tension over 30 seconds, suitable for a movie trailer
Prompting voice synthesis models
Voice synthesis prompts focus on the text to be spoken and the voice characteristics:| Prompt element | Examples |
|---|---|
| Voice characteristics | warm and friendly, authoritative and professional, energetic, calm and soothing |
| Gender / age | male voice, female voice, neutral, mature, young |
| Pace | slow and deliberate, conversational pace, brisk and confident |
| Accent / style | American English, British accent, news anchor style |
Read the following in a warm, professional female voice at a conversational pace, with natural pauses: [your text here]
Prompting sound effects models
Sound effect prompts should be as specific as possible about the exact sound event:| Prompt element | Examples |
|---|---|
| Sound event | door knock, coin drop, camera click, notification chime |
| Material / character | wooden, metallic, glass, soft, sharp |
| Environment | interior, outdoor, reverberant space, dry studio |
| Duration | brief 1-second burst, 3-second sustained |
A single metallic coin dropped onto a hardwood floor, brief ring and roll, indoor environment with slight room reverb
Model updates
ZeroTwo’s audio model library is updated as new models become available. Check the ZeroTwo changelog for announcements about newly added audio models.Related
Creating audio
Step-by-step guide and prompt examples for all audio types.
Audio troubleshooting
Fix common issues with audio generation.
Studio overview
Overview of all three Studio sections — images, video, and audio.
Video generation
Generate AI video clips from text descriptions.

