OpenAI Realtime API: Build Voice Agents with WebRTC (Beginner's Guide)
Learn how to use OpenAI's Realtime API with WebRTC to build voice agents directly in the browser. A complete beginner-friendly guide with code examples.
OpenAI Realtime API: Build Voice Agents with WebRTC
I am sure you must have used OpenAI's text-to-text API, but OpenAI has a lot of other APIs to offer. Here we're going to talk about the OpenAI Realtime API that lets you make voice agents directly from the browser.
What is OpenAI Realtime API?
Think of it like this: instead of text-to-text responses, it's like voice-to-voice talk (your next million-dollar voice agent!).
No BS - Straight to the Point
To consume OpenAI Realtime API, there are 2 ways: using WebSockets and WebRTC. Here we're going to discuss the WebRTC way as of now.
Don't worry, it's going to be very beginner-friendly and you're going to implement it even if you don't know what WebRTC is.
First Things First: What Are We Doing?
We're building a system where your frontend (browser) talks directly to OpenAI's backend using something called WebRTC. Think of it like two people (called "peers") having a direct conversation — one is your browser, and the other is OpenAI.
To make this conversation happen, we need to understand two things:
1. What the Heck Is WebRTC?
WebRTC stands for Web Real-Time Communication. It's a built-in feature in browsers that lets two devices talk to each other directly — no middleman server needed.
It's used for:
- Video calls
- Voice chats
- Real-time messaging
- And now... talking to OpenAI in real-time!
2. What Do We Need to Make WebRTC Work?
To connect two peers (your browser and OpenAI), WebRTC uses:
A. SDP (Session Description Protocol)
Think of SDP like a handshake. It's a message that says:
"Hey, here's how I want to talk — I support audio, video, and data."
You create this message and send it to OpenAI. OpenAI replies with its own message saying:
"Cool, I can talk like that too."
This is how both sides agree on how to communicate.
B. DataChannel
This is the actual lane where messages go back and forth.
You use it to:
- Send audio chunks (like your voice)
- Send text (like questions)
- Receive responses from OpenAI
- Update session info (like timestamps or speaker ID)
It's like a chat tunnel between your browser and OpenAI.
WebRTC in the Browser (No Fancy Packages Needed)
You use WebRTC with pure browser APIs. Here's what you do:
Create a WebRTC connection:
const pc = new RTCPeerConnection({
iceServers: [{ urls: "stun:stun.l.google.com:19302" }]
});
This sets up the connection and uses a STUN server to help your browser figure out its public IP (so OpenAI can reach you).
All the Important Functions & Events
[Add your image here showing WebRTC functions/events diagram]
DataChannel
[Add your DataChannel diagram/image here]
ICE Candidates — What Are Those?
Sometimes your browser is behind a router or firewall. So WebRTC uses ICE (Interactive Connectivity Establishment) to figure out how to connect.
Your browser finds different ways to connect (called ICE candidates) and sends them to OpenAI. OpenAI does the same. This helps both sides find the best path to talk.
Hope You Got the Basic Idea!
So now you know:
- WebRTC is used to connect two peers directly
- SDP is the handshake
- DataChannel is the tunnel for messages
- ICE helps find the best connection path
Now Let's Dive Into Connecting to OpenAI
Here's the full flow of how we use WebRTC to talk to OpenAI:
Step 1: Get an Ephemeral Token
Before you can talk to OpenAI, you need permission.
Your backend sends a request to OpenAI's API and gets a temporary token (called an ephemeral token). This token is like a one-time pass that says:
"Hey OpenAI, I want to start a real-time session."
Step 2: Send Token to Frontend
Your frontend (browser) calls your backend API and gets the token. Now it can use this token to start a WebRTC session with OpenAI.
Step 3: Create WebRTC Connection
You create a RTCPeerConnection with ICE servers to prepare the connection.
Step 4: Create SDP Offer
You generate an SDP offer — this is like saying:
"Here's how I want to talk — I support audio and data."
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
Step 5: Use the Token to Start the Session
Now you send the SDP offer to OpenAI using the ephemeral token. This is the actual moment you ask OpenAI to start the session:
const response = await fetch(
'https://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03',
{
method: 'POST',
body: offer.sdp, // your handshake message
headers: {
Authorization: `Bearer ${interviewData.client_secret.value}`, // your token
'Content-Type': 'application/sdp',
},
}
);
What's Happening Here?
- You're sending your SDP offer to OpenAI
- You're using the ephemeral token to prove you're allowed to start the session
- You're telling OpenAI: "Here's how I want to talk — let's start!"
OpenAI replies with an SDP answer, and you use that to complete the connection.
Step 6: Create DataChannel
You open a DataChannel to send and receive messages.
const dataChannel = pc.createDataChannel("openai");
Step 7: Send ICE Candidates
Your browser finds connection paths and sends them to OpenAI.
pc.onicecandidate = (event) => {
if (event.candidate) {
// Send to OpenAI
}
};
Step 8: Send Audio/Text
You send audio chunks (binary) and text (JSON) through the DataChannel.
dataChannel.send(audioChunk); // binary
dataChannel.send(JSON.stringify({ role: "user", content: "Hello!" })); // text
Step 9: Receive Responses
OpenAI replies with audio and text — all through the same DataChannel.
dataChannel.onmessage = (event) => {
console.log("OpenAI says:", event.data);
};
That's It!
You've now learned:
- What WebRTC is
- How it works in the browser
- What OpenAI needs to connect
- How the ephemeral token starts the session
- The full flow from token to talking
Now go build your voice agent and make it awesome!
Related Topics:
More Articles
Understanding Time Value and Date Comparison in JavaScript
Learn what time value means in JavaScript and how to compare dates correctly using time values.
AI Interview App: Complete Guide to Ace Your Tech Interviews
Learn how AI interview apps can help you prepare for technical interviews with real-time feedback and practice sessions.