Skip to main content
ZeroTwo supports real-time voice conversations powered by WebRTC and the OpenAI Realtime API. Click the microphone icon in the prompt bar, start speaking, and ZeroTwo responds with a natural AI voice. The entire conversation is transcribed and saved to your chat history automatically — no extra steps required.

How Voice works

ZeroTwo uses WebRTC for low-latency, bidirectional audio streaming directly in your browser. There is no file to upload and no processing delay between turns — the connection stays open for the entire session. Server-side Voice Activity Detection (VAD) automatically detects when you start and stop speaking. You never need to press and hold a button. When you finish a sentence, ZeroTwo begins processing immediately.
You speak → VAD detects end of turn → ZeroTwo processes → ZeroTwo responds with voice
You can interrupt ZeroTwo at any time by simply speaking while it is responding. The current response stops and ZeroTwo addresses what you just said, making conversations feel natural rather than rigidly turn-based.

10 available voices

ZeroTwo offers 10 distinct AI voices powered by the OpenAI Realtime API:
VoiceCharacter
AlloyBalanced, neutral, versatile — the default
AshWarm, conversational
BalladExpressive, nuanced
CedarClear, professional, crisp
CoralFriendly, approachable, upbeat
EchoPrecise, sharp, technical
MarinCalm, smooth, measured
SageThoughtful, careful, wise
ShimmerBright, energetic, lively
VerseNatural, flowing, conversational
Change your voice in Settings → Preferences → Voice. See Voice Options for full descriptions and recommendations.

Transcripts

Every voice session is fully transcribed. Both what you said and what ZeroTwo responded are saved as text in the chat history entry for that conversation. Transcripts are available immediately after each exchange.
Transcripts are saved automatically. You do not need to enable anything — just make sure you are logged in and not using a private/incognito browser session.

Use cases

Hands-Free Work

Great for when your hands are occupied — cooking, commuting, exercising, or taking notes during a meeting.

Brainstorming

Thinking out loud is often faster than typing. Use Voice to explore ideas and let the transcript capture everything.

Accessibility

Voice input removes barriers for users who find typing difficult or slow.

Language Practice

Have spoken conversations in a language you are learning — ZeroTwo can respond in kind.

Long Dictation

Dictate emails, documents, or notes at speaking pace — the transcript captures everything.

Quick Questions

Ask something fast without stopping to type — ideal for quick lookups while you’re in the middle of something else.

Audio technical specs

PropertyValue
Audio formatPCM16
Sample rate24 kHz
ChannelsMono
TransportWebRTC (browser-native)
Transcription engineWhisper-1
The PCM16 / 24 kHz format is optimized for real-time streaming — it prioritizes low latency while maintaining clear, intelligible speech. The mono channel reduces bandwidth requirements without meaningfully affecting voice quality for conversation.

Plan availability

Voice is available on all ZeroTwo plans.
PlanVoice Access
FreeIncluded
ProIncluded
Pro 2xIncluded
Plus UltraIncluded
BusinessIncluded

Requirements

  • A modern browser (Chrome, Edge, Firefox, or Safari)
  • Microphone access granted to zerotwo.ai
  • An HTTPS connection (all zerotwo.ai pages use HTTPS by default)
  • A stable internet connection (WiFi recommended for best quality)
For the best experience, use headphones. This prevents your microphone from picking up ZeroTwo’s audio output, eliminating echo, and gives you cleaner audio quality overall.

Start a Voice Chat

Step-by-step guide to your first voice conversation

Voice Options

10 voices, audio specs, and how to change your voice

Troubleshooting

Fix microphone, audio, and connection issues