OpenAI · Open Source · OpenAI
How OpenAI delivers low-latency voice AI at scale
Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.
★ Tier-1 Source
By Yi Zhang and William McDonald, Members of Technical Staff.
Key facts
- A VIP is a virtual IP address fronting the relay fleet; combined with the port, it gives the client a single stable destination, such as `203.0.113.10:3478`, even though many relay instances sit
- Most sessions are 1:1—one user talking to one model, or one application talking to one real-time agent—with latency sensitivity on every turn
- The team evaluated several ways to get there, including TURN (Traversal Using Relays around NAT), where an edge relay terminates client allocations and forwards traffic on their behalf. 2
- It keeps audio codecs, RTCP messages, data channels, recording, and per-stream policy in one place. 1
Summary
Voice AI only feels natural if conversation moves at the speed of speech. At OpenAI’s scale, that translates into three concrete requirements:. Fast connection setup so a user can start speaking as soon as a session begins. Low and stable media round-trip time, with low jitter and packet loss, so turn-taking feels crisp.