← Back to Blog

OpenAI Realtime API: Build Voice Agents with WebRTC (Beginner's Guide)

Learn how to use OpenAI's Realtime API with WebRTC to build voice agents directly in the browser. A complete beginner-friendly guide with code examples.

By Abhishek Bahukhandi12 min read
How to use openai realtime api using raw wbrtc in 40 second

A) First Things First: What Are We Doing?

We’re building a system where your frontend (browser) talks directly to OpenAI’s backend using something called WebRTC. Think of it like two people (called “peers”) having a direct conversation — one is your browser, and the other is OpenAI.

To make this conversation happen, we need to understand two things:

1. What the Heck Is WebRTC?

WebRTC stands for Web Real-Time Communication. It’s a built-in feature in browsers that lets two devices talk to each other directly — no middleman server needed.

It’s used for:

  • Video calls
  • Voice chats
  • Real-time messaging
  • And now... talking to OpenAI in real-time!

2. What Do We Need to Make WebRTC Work?

A. SDP (Session Description Protocol)

Think of SDP like a handshake. It’s a message that says:

“Hey, here’s how I want to talk — I support audio, video, and data.”

You create this message and send it to OpenAI. OpenAI replies with its own message saying:

“Cool, I can talk like that too.”

This is how both sides agree on how to communicate.

B. DataChannel

This is the actual lane where messages go back and forth.

You use it to:

  • Send text (like questions)
  • Receive responses from OpenAI
  • Update session info (like timestamps or speaker ID)

It’s like a chat tunnel between your browser and OpenAI.

C. Media Tracks

This is how audio is sent and received.

You attach your microphone audio using pc.addTrack().

OpenAI sends back audio the same way.

B) WebRTC in the Browser (No Fancy Packages Needed)

You use WebRTC with pure browser APIs. Here’s what you do:

Create a WebRTC connection:

const pc = new RTCPeerConnection({
  iceServers: [{ urls: "stun:stun.l.google.com:19302" }]
});
This sets up the connection and uses a STUN server to help your browser figure out its public IP (so OpenAI can reach you).

C) All the Important Functions & Events

RTCPeerConnection (pc)

Function/Event What It Does
createOffer() Makes the SDP offer (the handshake message)
setLocalDescription() Sends your SDP to OpenAI
setRemoteDescription() Accepts OpenAI’s reply
createDataChannel() Opens the chat tunnel
addIceCandidate() Adds connection options from OpenAI
onicecandidate Finds your connection options (IP/port)

DataChannel (dc)

Function/Event What It Does
send(data) Sends audio or text to OpenAI
onmessage Gets replies from OpenAI
onopen Tells you when the channel is ready
onerror Tells you if something breaks
onclose Tells you when the channel closes

D) All the Important Pieces

Audio Setup

// Capture mic and send to OpenAI
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
stream.getTracks().forEach(track => pc.addTrack(track, stream));

// Play OpenAI's audio response
pc.ontrack = (event) => {
  const audioEl = document.querySelector('#assistantAudio');
  audioEl.srcObject = event.streams[0];
  audioEl.play();
};

DataChannel for Events

const dataChannel = pc.createDataChannel("oai-events");
dataChannel.onmessage = (event) => {
  console.log("OpenAI says:", event.data);
};

// Send a text message
dataChannel.onopen = () => {
  dataChannel.send(JSON.stringify({ type: "response.create", response: { instructions: "Hello!" } }));
};

E) ICE Candidates — What Are Those?

Sometimes your browser is behind a router or firewall. So WebRTC uses ICE (Interactive Connectivity Establishment) to figure out how to connect.

Your browser finds different ways to connect (called ICE candidates) and sends them to OpenAI. OpenAI does the same.

This helps both sides find the best path to talk.

Hope You Got the Basic Idea!

  • WebRTC is used to connect two peers directly
  • SDP is the handshake
  • DataChannel is the tunnel for messages
  • ICE helps find the best connection path

Now Let’s Dive Into Connecting to OpenAI

Here’s the full flow of how we use WebRTC to talk to OpenAI:

Step 1: Get an Ephemeral Token

Before you can talk to OpenAI, you need permission.

Your backend sends a request to OpenAI’s API and gets a temporary token (called an ephemeral token).

This token is like a one-time pass that says:

“Hey OpenAI, I want to start a real-time session.”

Step 2: Send Token to Frontend

Your frontend (browser) calls your backend API and gets the token.

Now it can use this token to start a WebRTC session with OpenAI.

Step 3: Create WebRTC Connection

You create a RTCPeerConnection with ICE servers to prepare the connection.

Step 4: Attach Audio

const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
stream.getTracks().forEach(track => pc.addTrack(track, stream));

Step 5: Create SDP Offer

You generate an SDP offer — this is like saying:

“Here’s how I want to talk — I support audio and data.”

const offer = await pc.createOffer();
await pc.setLocalDescription(offer);

Step 5: Use the Token to Start the Session

Now you send the SDP offer to OpenAI using the ephemeral token.

This is the actual moment you ask OpenAI to start the session:

const response = await fetch(
  https://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03,
  {
    method: 'POST',
    body: offer.sdp, // your handshake message
    headers: {
      Authorization: Bearer interviewData.client_secret.value, // your token
      'Content-Type': 'application/sdp',
    },
  }
);
What’s Happening Here?
  • You’re sending your SDP offer to OpenAI.
  • You’re using the ephemeral token to prove you’re allowed to start the session.
  • You’re telling OpenAI:

“Here’s how I want to talk — let’s start!”

OpenAI replies with an SDP answer, and you use that to complete the connection.

Step 6: Use DataChannel for Text

You open a DataChannel to send and receive messages.

const dc = pc.createDataChannel('oai-events');
dc.onmessage = (e) => console.log('OpenAI event:', e.data);
dc.onopen = () => dc.send(JSON.stringify({ type: 'response.create', response: { instructions: "Hello!" } }));

Step 7: Receive Responses

OpenAI replies with audio and text — all through the same DataChannel.

dataChannel.onmessage = (event) => {
  console.log("OpenAI says:", event.data);
};

That’s It!

You’ve now learned:

  • What WebRTC is
  • How it works in the browser
  • What OpenAI needs to connect
  • How the ephemeral token starts the session
  • The full flow from token to talking

Related Topics:

openai realtime apiwebrtc tutorialvoice agentopenai voice apirealtime communication

More Articles