OpenAI Realtime API: Build Voice Agents with WebRTC

I am sure you must have used OpenAI's text-to-text API, but OpenAI has a lot of other APIs to offer. Here we're going to talk about the OpenAI Realtime API that lets you make voice agents directly from the browser.

What is OpenAI Realtime API?

Think of it like this: instead of text-to-text responses, it's like voice-to-voice talk (your next million-dollar voice agent!).

No BS - Straight to the Point

To consume OpenAI Realtime API, there are 2 ways: using WebSockets and WebRTC. Here we're going to discuss the WebRTC way as of now.

Don't worry, it's going to be very beginner-friendly and you're going to implement it even if you don't know what WebRTC is.

First Things First: What Are We Doing?

We're building a system where your frontend (browser) talks directly to OpenAI's backend using something called WebRTC. Think of it like two people (called "peers") having a direct conversation — one is your browser, and the other is OpenAI.

To make this conversation happen, we need to understand two things:

1. What the Heck Is WebRTC?

WebRTC stands for Web Real-Time Communication. It's a built-in feature in browsers that lets two devices talk to each other directly — no middleman server needed.

It's used for:

Video calls
Voice chats
Real-time messaging
And now... talking to OpenAI in real-time!

2. What Do We Need to Make WebRTC Work?

To connect two peers (your browser and OpenAI), WebRTC uses:

A. SDP (Session Description Protocol)

Think of SDP like a handshake. It's a message that says:

"Hey, here's how I want to talk — I support audio, video, and data."

You create this message and send it to OpenAI. OpenAI replies with its own message saying:

"Cool, I can talk like that too."

This is how both sides agree on how to communicate.

B. DataChannel

This is the actual lane where messages go back and forth.

You use it to:

Send audio chunks (like your voice)
Send text (like questions)
Receive responses from OpenAI
Update session info (like timestamps or speaker ID)

It's like a chat tunnel between your browser and OpenAI.

WebRTC in the Browser (No Fancy Packages Needed)

You use WebRTC with pure browser APIs. Here's what you do:

Create a WebRTC connection:

const pc = new RTCPeerConnection({
  iceServers: [{ urls: "stun:stun.l.google.com:19302" }]
});

This sets up the connection and uses a STUN server to help your browser figure out its public IP (so OpenAI can reach you).

All the Important Functions & Events

[Add your image here showing WebRTC functions/events diagram]

DataChannel

[Add your DataChannel diagram/image here]

ICE Candidates — What Are Those?

Sometimes your browser is behind a router or firewall. So WebRTC uses ICE (Interactive Connectivity Establishment) to figure out how to connect.

Your browser finds different ways to connect (called ICE candidates) and sends them to OpenAI. OpenAI does the same. This helps both sides find the best path to talk.

Hope You Got the Basic Idea!

So now you know:

WebRTC is used to connect two peers directly
SDP is the handshake
DataChannel is the tunnel for messages
ICE helps find the best connection path

Now Let's Dive Into Connecting to OpenAI

Here's the full flow of how we use WebRTC to talk to OpenAI:

Step 1: Get an Ephemeral Token

Before you can talk to OpenAI, you need permission.

Your backend sends a request to OpenAI's API and gets a temporary token (called an ephemeral token). This token is like a one-time pass that says:

"Hey OpenAI, I want to start a real-time session."

Step 2: Send Token to Frontend

Your frontend (browser) calls your backend API and gets the token. Now it can use this token to start a WebRTC session with OpenAI.

Step 3: Create WebRTC Connection

You create a RTCPeerConnection with ICE servers to prepare the connection.

Step 4: Create SDP Offer

You generate an SDP offer — this is like saying:

"Here's how I want to talk — I support audio and data."

const offer = await pc.createOffer();
await pc.setLocalDescription(offer);

Step 5: Use the Token to Start the Session

Now you send the SDP offer to OpenAI using the ephemeral token. This is the actual moment you ask OpenAI to start the session:

const response = await fetch(
  'https://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03',
  {
    method: 'POST',
    body: offer.sdp, // your handshake message
    headers: {
      Authorization: `Bearer ${interviewData.client_secret.value}`, // your token
      'Content-Type': 'application/sdp',
    },
  }
);

What's Happening Here?

You're sending your SDP offer to OpenAI
You're using the ephemeral token to prove you're allowed to start the session
You're telling OpenAI: "Here's how I want to talk — let's start!"

OpenAI replies with an SDP answer, and you use that to complete the connection.

Step 6: Create DataChannel

You open a DataChannel to send and receive messages.

const dataChannel = pc.createDataChannel("openai");

Step 7: Send ICE Candidates

Your browser finds connection paths and sends them to OpenAI.

pc.onicecandidate = (event) => {
  if (event.candidate) {
    // Send to OpenAI
  }
};

Step 8: Send Audio/Text

You send audio chunks (binary) and text (JSON) through the DataChannel.

dataChannel.send(audioChunk); // binary
dataChannel.send(JSON.stringify({ role: "user", content: "Hello!" })); // text

Step 9: Receive Responses

OpenAI replies with audio and text — all through the same DataChannel.

dataChannel.onmessage = (event) => {
  console.log("OpenAI says:", event.data);
};

That's It!

You've now learned:

What WebRTC is
How it works in the browser
What OpenAI needs to connect
How the ephemeral token starts the session
The full flow from token to talking

Now go build your voice agent and make it awesome!

OpenAI Realtime API: Build Voice Agents with WebRTC (Beginner's Guide)