You're offline - Playing from downloaded podcasts
Back to All Episodes
Podcast Episode

OpenAI Launches Three Real-Time Voice Models with GPT-5-Class Reasoning

May 8, 2026

0:00
2:29
Podcast Thumbnail

OpenAI has released three new audio models through its Realtime API: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. The trio brings smarter reasoning, live translation across 70+ languages, and streaming transcription to voice applications. Early adopters report dramatic improvements, including a 26-point jump in call success rates.

A New Era for Voice AI

OpenAI has unveiled three new real-time audio models through its Realtime API, marking a significant push to make voice-powered applications smarter, more multilingual, and easier for developers to build. The trio - GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper - collectively addresses reasoning, translation, and transcription in live voice interactions.

GPT-5-Class Reasoning Comes to Voice

GPT-Realtime-2 is the headline release, described by OpenAI as their most intelligent voice model yet and the company's first voice model with GPT-5-class reasoning capabilities. The model features a 128,000-token context window, quadrupling the 32,000-token limit of its predecessor GPT-Realtime-1.5, and supports variable reasoning levels from minimal to high. On audio benchmarks, OpenAI says GPT-Realtime-2 scored roughly 15 percent higher on Big Bench than GPT-Realtime-1.5, which launched in February.

OpenAI framed the model as a shift from scripted voice bots to real-time collaborators that can listen, reason, and solve complex problems as conversations unfold.

Translation and Transcription at Scale

GPT-Realtime-Translate handles live speech translation from more than 70 input languages into 13 output languages, keeping pace with the speaker in real time. GPT-Realtime-Whisper provides streaming speech-to-text transcription with controllable latency - lower delay settings produce earlier partial text, whilst higher settings improve accuracy.

Pricing for GPT-Realtime-2 starts at $32 per million audio input tokens. GPT-Realtime-Translate is priced at $0.034 per minute, whilst GPT-Realtime-Whisper costs $0.017 per minute.

Early Adopters Report Strong Results

Several companies participated in early testing with impressive outcomes. Zillow reported a 26-point improvement in call success rates using GPT-Realtime-2, reaching 95 per cent compared to 69 per cent with earlier models. BolnaAI noted a 12.5 per cent reduction in word error rates when evaluating GPT-Realtime-Translate for Hindi, Tamil, and Telugu.

Safety and Availability

OpenAI said the API includes safety protocols such as real-time classifiers to terminate conversations that violate content standards, and that the service complies with EU data residency regulations. The models are available immediately through OpenAI's Realtime API, opening the door for developers to build a new generation of voice-driven applications.

Published May 8, 2026 at 5:36am

More Recent Episodes