How Voice works
ZeroTwo uses WebRTC for low-latency, bidirectional audio streaming directly in your browser. There is no file to upload and no processing delay between turns — the connection stays open for the entire session. Server-side Voice Activity Detection (VAD) automatically detects when you start and stop speaking. You never need to press and hold a button. When you finish a sentence, ZeroTwo begins processing immediately.10 available voices
ZeroTwo offers 10 distinct AI voices powered by the OpenAI Realtime API:| Voice | Character |
|---|---|
| Alloy | Balanced, neutral, versatile — the default |
| Ash | Warm, conversational |
| Ballad | Expressive, nuanced |
| Cedar | Clear, professional, crisp |
| Coral | Friendly, approachable, upbeat |
| Echo | Precise, sharp, technical |
| Marin | Calm, smooth, measured |
| Sage | Thoughtful, careful, wise |
| Shimmer | Bright, energetic, lively |
| Verse | Natural, flowing, conversational |
Transcripts
Every voice session is fully transcribed. Both what you said and what ZeroTwo responded are saved as text in the chat history entry for that conversation. Transcripts are available immediately after each exchange.Transcripts are saved automatically. You do not need to enable anything — just make sure you are logged in and not using a private/incognito browser session.
Use cases
Hands-Free Work
Great for when your hands are occupied — cooking, commuting, exercising, or taking notes during a meeting.
Brainstorming
Thinking out loud is often faster than typing. Use Voice to explore ideas and let the transcript capture everything.
Accessibility
Voice input removes barriers for users who find typing difficult or slow.
Language Practice
Have spoken conversations in a language you are learning — ZeroTwo can respond in kind.
Long Dictation
Dictate emails, documents, or notes at speaking pace — the transcript captures everything.
Quick Questions
Ask something fast without stopping to type — ideal for quick lookups while you’re in the middle of something else.
Audio technical specs
| Property | Value |
|---|---|
| Audio format | PCM16 |
| Sample rate | 24 kHz |
| Channels | Mono |
| Transport | WebRTC (browser-native) |
| Transcription engine | Whisper-1 |
Plan availability
Voice is available on all ZeroTwo plans.| Plan | Voice Access |
|---|---|
| Free | Included |
| Pro | Included |
| Pro 2x | Included |
| Plus Ultra | Included |
| Business | Included |
Requirements
- A modern browser (Chrome, Edge, Firefox, or Safari)
- Microphone access granted to zerotwo.ai
- An HTTPS connection (all zerotwo.ai pages use HTTPS by default)
- A stable internet connection (WiFi recommended for best quality)
Quick Links
Start a Voice Chat
Step-by-step guide to your first voice conversation
Voice Options
10 voices, audio specs, and how to change your voice
Troubleshooting
Fix microphone, audio, and connection issues

