Accessing audio generation
Navigate to/studio/audio from the topbar at the top of the ZeroTwo app. Click Audio in the topbar to open the audio workspace.
Audio Studio vs. Voice
ZeroTwo has two separate audio features. It’s important to understand the difference:| Feature | Audio Studio | Voice |
|---|---|---|
| What it does | Generates audio files (music, sound effects, narration) from text prompts | Real-time voice conversation with AI |
| Output | Downloadable audio file (MP3, WAV, etc.) | Live spoken response during a chat session |
| Use case | Background music, sound effects, produced audio assets | Hands-free chat, accessibility, voice interaction |
| Location | /studio/audio | In-chat voice mode |
What you can create
- Background music: ambient tracks, genre-specific music, mood-driven compositions for videos, presentations, or apps
- Sound effects: UI sounds, environmental effects, specific sound descriptions
- AI narration: spoken audio of written text in various tones and styles
- Jingles and short musical pieces: branded audio, intros, outros
- Creative audio: experimental sound design, unique soundscapes
How it works
Describe your audio
Write a description of what you want: genre, mood, tempo, instruments, duration, and intended use. The more specific, the better.
Select a model
Choose an audio generation model. Different models are optimized for different audio types — music, voice, effects.
Set duration (if available)
Specify how long the audio should be, if the selected model supports duration control.
Plan requirements
| Plan | Audio generation |
|---|---|
| Free | Very limited or unavailable |
| Pro | Available |
| Pro 2x | Available |
| Plus Ultra | Unlimited |
Audio generation is primarily a Pro+ feature. Free plan users may have very limited or no access. Upgrade in Settings → Account.
Prompt essentials
Audio prompts follow different best practices depending on what you’re generating: For background music: Include genre, mood, instruments, tempo, and intended duration. Also specify the context (what it’s for) — this helps the model calibrate energy and style:Calm, professional background music for a corporate explainer video, piano and light strings, 60 seconds, no percussion, subtle and understated
For narration / text-to-speech:
Provide the text you want read, plus a description of the voice — tone, pace, and style:
Read the following text in a warm, friendly female voice at a conversational pace: "Welcome to our platform. We're glad you're here."
For sound effects:
Be specific about the exact sound event, material, and environment:
A single wooden door knock, 2 knocks, interior space with light reverb, natural sound
For ambient soundscapes:
Describe the environment and the mood you want to evoke:
Quiet office ambience: distant keyboard typing, subtle air conditioning hum, occasional muffled conversation, focused and productive atmosphere
Frequently asked questions
Is Audio Studio different from Voice mode?
Is Audio Studio different from Voice mode?
Yes. Audio Studio generates audio files (music, effects, narration) that you download. Voice mode is real-time spoken conversation with the AI during a chat session. They are separate features.
What file formats are available?
What file formats are available?
MP3 and WAV are the primary formats, depending on the selected model. MP3 for general sharing; WAV for professional production. Format availability depends on the model.
Can I use generated audio commercially?
Can I use generated audio commercially?
Generated audio from ZeroTwo Studio is generally usable for commercial purposes, subject to ZeroTwo’s Terms of Service and the content policies of the underlying model providers. Review the terms for your specific use case.
Can I generate audio on the Free plan?
Can I generate audio on the Free plan?
Free plan access to audio generation is very limited or unavailable. Audio is primarily a Pro+ feature. Upgrade in Settings → Account.
Explore further
Creating audio
Step-by-step guide with prompt examples for music, effects, and narration.
Audio models
Available audio generation models and their best use cases.
Troubleshooting
Fix common audio generation issues.

