How to Use OpenAI WebSocket Mode API ?

TL;DR

OpenAI offers two WebSocket API modes for different use cases: the Responses API WebSocket mode for agentic workflows with heavy tool calling (up to 40% faster for 20+ tool calls), and the Realtime API for low-latency voice and audio applications. Both use persistent WebSocket connections instead of stateless HTTP requests, reducing latency by eliminating repeated connection overhead and enabling event-driven, stateful interactions.

Introduction

OpenAI's API has evolved beyond simple request-response patterns. For applications requiring rapid-fire tool calls or real-time audio streaming, the traditional HTTP model creates unnecessary overhead. Every new request requires connection setup, authentication, and state transmission—even when you're continuing the same conversation.

OpenAI's WebSocket API solves this by maintaining a persistent, bidirectional connection. For agentic workflows with 20+ sequential tool calls, this translates to roughly 40% faster end-to-end execution. For voice applications, it enables natural, interrupt-capable conversations with latencies under 500ms.

💡

Testing WebSocket connections traditionally required complex debugging tools. Apidog's WebSocket testing interface lets you establish connections, send events, and monitor responses in real-time—essential for validating OpenAI WebSocket integrations before production deployment.

button

This guide covers both of OpenAI's WebSocket modes: the Responses API for tool-heavy agent workflows, and the Realtime API for audio streaming. You'll learn when to use each, how to implement them, and how to test them effectively.

What is OpenAI WebSocket API?

The OpenAI WebSocket API provides an alternative transport mechanism to HTTP for interacting with OpenAI's language models. Instead of creating a new connection for each API call, WebSocket establishes a single, long-lived connection that remains open for the duration of your session.

Key Characteristics

Persistent Connection: Once established, the WebSocket connection stays open until explicitly closed, eliminating per-request connection overhead.

Bidirectional Communication: Both client and server can send messages at any time, enabling true event-driven architectures.

Stateful Sessions: The server maintains context for the current connection, allowing you to reference previous responses without re-sending full conversation history.

Event-Driven Model: Communication happens through discrete events (JSON messages) rather than request-response pairs.

WebSocket Protocol Basics

WebSocket connections start with an HTTP upgrade request, then switch to the WebSocket protocol. For OpenAI, you'll connect to endpoints like:

Responses API: wss://api.openai.com/v1/responses
Realtime API: wss://api.openai.com/v1/realtime?model=gpt-realtime

The wss:// scheme indicates a secure WebSocket connection (analogous to HTTPS for HTTP).

Two WebSocket Modes Explained

OpenAI provides two distinct WebSocket modes, each optimized for different use cases.

Responses API WebSocket Mode

The Responses API supports WebSocket connections for agentic workflows where you need to make many sequential tool calls. This mode is designed for coding assistants, orchestration systems, and autonomous agents that repeatedly call tools to accomplish complex tasks.

How It Works:

On an active WebSocket connection, the service maintains one previous-response state in a connection-local in-memory cache (the most recent response). When you continue a turn, you send only:

previous_response_id (reference to the last response)
New input items (user messages, tool results, etc.)

The server reuses the cached state instead of re-processing the entire conversation history.

Performance Benefits:

For workflows with 20+ tool calls, OpenAI reports up to 40% faster end-to-end execution compared to HTTP. This comes from:

No per-request connection setup
No repeated authentication overhead
Cached state reduces processing time
Lower network latency for small continuation messages

Compatibility:

WebSocket mode works with both Zero Data Retention (ZDR) and store=false options, making it suitable for privacy-sensitive applications.

Realtime API WebSocket Mode

The Realtime API provides low-latency, streaming audio capabilities for voice-driven applications. It enables speech-to-speech interactions where the model can respond to audio input with audio output, handling interruptions naturally.

How It Works:

The Realtime API uses WebSocket to create a stateful, event-driven session. You stream audio chunks to the API, and it streams back both transcriptions and generated audio responses. The connection supports:

Audio input streaming (send audio chunks as they're captured)
Audio output streaming (receive generated audio in real-time)
Text input/output (for hybrid text+voice interactions)
Automatic interruption handling (stop generation when user speaks)

Key Features:

Voice Activity Detection (VAD): The API includes semantic VAD that understands when a user has finished speaking versus just pausing. This creates more natural conversation flow.

Multimodal Capabilities: Direct access to GPT-4o's native multimodal abilities, processing both audio and text in a unified model.

Low Latency: Designed for latencies under 500ms for voice interactions, suitable for real-time conversations.

WebSocket vs HTTP: Performance Comparison

Choosing between WebSocket and HTTP depends on your application's characteristics. Here's when each protocol excels.

When WebSocket Outperforms HTTP

High Tool Call Volume:
If your workflow makes 10+ sequential tool calls, WebSocket's persistent connection eliminates repeated setup overhead. Each HTTP request requires:

DNS lookup (if not cached)
TCP handshake (3-way)
TLS handshake (2 round trips for TLS 1.3)
HTTP request/response headers

WebSocket does this once, then reuses the connection.

Latency-Sensitive Applications:
For real-time voice or chat applications where every millisecond counts, WebSocket's persistent connection and streaming capabilities significantly reduce perceived latency.

Server-Initiated Updates:
WebSocket allows the server to push data to clients without polling. For long-running agent tasks, the server can send progress updates as events occur.

When HTTP is Sufficient

Simple Request-Response:
For one-off API calls or workflows with 1-2 tool calls, HTTP is simpler to implement and debug. Most developers are familiar with HTTP clients, and infrastructure (load balancers, proxies) handles HTTP well.

Stateless Operations:
If you don't need to maintain session state between requests, HTTP's stateless nature is actually an advantage—no connection management required.

Infrastructure Constraints:
Some deployment environments (serverless functions, certain proxies) don't support long-lived WebSocket connections. HTTP works universally.

Performance Metrics

Based on OpenAI's documentation and community testing:

Metric	HTTP	WebSocket (Responses API)	WebSocket (Realtime API)
Connection Setup	Every request (~100-300ms)	Once (~100-300ms)	Once (~100-300ms)
20+ Tool Call Workflow	Baseline	~40% faster	N/A
Voice Round-Trip Latency	N/A (not designed for this)	N/A	<500ms
Memory Overhead	Low (stateless)	Medium (cached state)	Medium-High (session state)
Implementation Complexity	Low	Medium	Medium-High

How to Use Responses API WebSocket Mode

Let's implement a WebSocket connection to the Responses API for an agentic workflow.

Prerequisites

OpenAI API key with access to the Responses API
WebSocket client library (ws for Node.js or websocket-client for Python)
Understanding of tool calling in OpenAI API

Connection Setup

Node.js Example:

const WebSocket = require('ws');

// Connect to Responses API WebSocket endpoint
const ws = new WebSocket('wss://api.openai.com/v1/responses', {
  headers: {
    'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
    'OpenAI-Beta': 'responses-api=v1'
  }
});

ws.on('open', () => {
  console.log('Connected to OpenAI Responses API');

  // Send initial request
  const initialMessage = {
    model: 'gpt-4o',
    messages: [
      { role: 'user', content: 'Help me analyze this codebase and suggest improvements.' }
    ],
    tools: [
      {
        type: 'function',
        function: {
          name: 'read_file',
          description: 'Read contents of a file',
          parameters: {
            type: 'object',
            properties: {
              path: { type: 'string', description: 'File path to read' }
            },
            required: ['path']
          }
        }
      },
      {
        type: 'function',
        function: {
          name: 'search_code',
          description: 'Search for code patterns',
          parameters: {
            type: 'object',
            properties: {
              pattern: { type: 'string', description: 'Regex pattern to search' }
            },
            required: ['pattern']
          }
        }
      }
    ]
  };

  ws.send(JSON.stringify(initialMessage));
});

ws.on('message', (data) => {
  const response = JSON.parse(data);
  console.log('Received:', response);

  // Check if model wants to call tools
  if (response.choices[0].finish_reason === 'tool_calls') {
    const toolCalls = response.choices[0].message.tool_calls;

    // Execute tools (simplified)
    const toolResults = toolCalls.map(call => ({
      tool_call_id: call.id,
      output: executeToolLocally(call.function.name, call.function.arguments)
    }));

    // Continue the conversation with tool results
    const continuation = {
      previous_response_id: response.id, // Reference previous response
      input: toolResults
    };

    ws.send(JSON.stringify(continuation));
  }
});

ws.on('error', (error) => {
  console.error('WebSocket error:', error);
});

ws.on('close', () => {
  console.log('Connection closed');
});

function executeToolLocally(name, args) {
  // Your tool execution logic
  if (name === 'read_file') {
    const { path } = JSON.parse(args);
    return fs.readFileSync(path, 'utf-8');
  }
  // ... other tools
}

Python Example:

import websocket
import json
import os

def on_message(ws, message):
    response = json.loads(message)
    print(f"Received: {response}")

    # Handle tool calls
    if response['choices'][0]['finish_reason'] == 'tool_calls':
        tool_calls = response['choices'][0]['message']['tool_calls']

        # Execute tools
        tool_results = []
        for call in tool_calls:
            result = execute_tool(call['function']['name'],
                                 json.loads(call['function']['arguments']))
            tool_results.append({
                'tool_call_id': call['id'],
                'output': result
            })

        # Send continuation with only new input + previous_response_id
        continuation = {
            'previous_response_id': response['id'],
            'input': tool_results
        }
        ws.send(json.dumps(continuation))

def on_error(ws, error):
    print(f"Error: {error}")

def on_close(ws, close_status_code, close_msg):
    print("Connection closed")

def on_open(ws):
    print("Connected to OpenAI Responses API")

    # Send initial request
    initial_message = {
        'model': 'gpt-4o',
        'messages': [
            {'role': 'user', 'content': 'Analyze this codebase and suggest improvements.'}
        ],
        'tools': [
            {
                'type': 'function',
                'function': {
                    'name': 'read_file',
                    'description': 'Read file contents',
                    'parameters': {
                        'type': 'object',
                        'properties': {
                            'path': {'type': 'string'}
                        },
                        'required': ['path']
                    }
                }
            }
        ]
    }
    ws.send(json.dumps(initial_message))

def execute_tool(name, args):
    if name == 'read_file':
        with open(args['path'], 'r') as f:
            return f.read()
    # ... other tools

# Create WebSocket connection
ws = websocket.WebSocketApp(
    "wss://api.openai.com/v1/responses",
    header={
        "Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}",
        "OpenAI-Beta": "responses-api=v1"
    },
    on_open=on_open,
    on_message=on_message,
    on_error=on_error,
    on_close=on_close
)

ws.run_forever()

Key Implementation Details

State Management:
The critical difference from HTTP is using previous_response_id in continuations. This tells the API to reuse cached state from the last response.

Input-Only Continuations:
When continuing a turn, send only:

previous_response_id: References the cached response
input: New data (tool results, user messages, etc.)

Don't resend the full messages array—the server already has that context.

Zero Data Retention:
To use ZDR with WebSocket mode, include store: false in your initial request.

How to Use Realtime API WebSocket Mode

The Realtime API enables low-latency voice interactions. Here's how to implement it.

Prerequisites

OpenAI API key with Realtime API access
Audio capture/playback capabilities
WebSocket client library
Audio encoding (24kHz, 16-bit, mono PCM for best results)

Connection Setup

JavaScript (Browser) Example:

// Connect to Realtime API
const ws = new WebSocket(
  `wss://api.openai.com/v1/realtime?model=gpt-realtime`,
  ['realtime', 'openai-insecure-api-key.' + process.env.OPENAI_API_KEY]
);

ws.addEventListener('open', () => {
  console.log('Connected to Realtime API');

  // Configure session
  ws.send(JSON.stringify({
    type: 'session.update',
    session: {
      modalities: ['text', 'audio'],
      voice: 'alloy',
      input_audio_format: 'pcm16',
      output_audio_format: 'pcm16',
      turn_detection: {
        type: 'server_vad', // or 'semantic_vad' for smarter detection
        threshold: 0.5,
        prefix_padding_ms: 300,
        silence_duration_ms: 500
      }
    }
  }));
});

ws.addEventListener('message', (event) => {
  const message = JSON.parse(event.data);

  switch (message.type) {
    case 'session.created':
      console.log('Session created:', message.session);
      break;

    case 'conversation.item.created':
      console.log('New item:', message.item);
      break;

    case 'response.audio.delta':
      // Received audio chunk - play it
      const audioChunk = base64ToArrayBuffer(message.delta);
      playAudioChunk(audioChunk);
      break;

    case 'response.audio_transcript.delta':
      // Received transcript chunk
      console.log('Transcript:', message.delta);
      break;

    case 'input_audio_buffer.speech_started':
      console.log('User started speaking');
      break;

    case 'input_audio_buffer.speech_stopped':
      console.log('User stopped speaking');
      break;

    case 'error':
      console.error('API error:', message.error);
      break;
  }
});

// Send audio from microphone
async function streamMicrophoneToAPI() {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  const audioContext = new AudioContext({ sampleRate: 24000 });
  const source = audioContext.createMediaStreamSource(stream);
  const processor = audioContext.createScriptProcessor(4096, 1, 1);

  processor.onaudioprocess = (e) => {
    const inputData = e.inputBuffer.getChannelData(0);

    // Convert Float32 to Int16 PCM
    const pcmData = new Int16Array(inputData.length);
    for (let i = 0; i < inputData.length; i++) {
      pcmData[i] = Math.max(-32768, Math.min(32767, inputData[i] * 32768));
    }

    // Send to API
    ws.send(JSON.stringify({
      type: 'input_audio_buffer.append',
      audio: arrayBufferToBase64(pcmData.buffer)
    }));
  };

  source.connect(processor);
  processor.connect(audioContext.destination);
}

// Send text input
function sendTextMessage(text) {
  ws.send(JSON.stringify({
    type: 'conversation.item.create',
    item: {
      type: 'message',
      role: 'user',
      content: [
        { type: 'input_text', text: text }
      ]
    }
  }));

  // Request response generation
  ws.send(JSON.stringify({
    type: 'response.create'
  }));
}

function playAudioChunk(arrayBuffer) {
  const audioContext = new AudioContext({ sampleRate: 24000 });
  audioContext.decodeAudioData(arrayBuffer, (buffer) => {
    const source = audioContext.createBufferSource();
    source.buffer = buffer;
    source.connect(audioContext.destination);
    source.start();
  });
}

Python Example:

import websocket
import json
import base64
import pyaudio

# Audio configuration
RATE = 24000
CHUNK = 4096
FORMAT = pyaudio.paInt16
CHANNELS = 1

audio = pyaudio.PyAudio()

def on_open(ws):
    print("Connected to Realtime API")

    # Configure session
    ws.send(json.dumps({
        'type': 'session.update',
        'session': {
            'modalities': ['text', 'audio'],
            'voice': 'alloy',
            'input_audio_format': 'pcm16',
            'output_audio_format': 'pcm16',
            'turn_detection': {
                'type': 'server_vad',
                'threshold': 0.5,
                'silence_duration_ms': 500
            }
        }
    }))

    # Start streaming microphone
    stream_microphone(ws)

def on_message(ws, message):
    data = json.loads(message)

    if data['type'] == 'response.audio.delta':
        # Decode and play audio
        audio_chunk = base64.b64decode(data['delta'])
        play_audio(audio_chunk)

    elif data['type'] == 'response.audio_transcript.delta':
        print(f"Transcript: {data['delta']}", end='', flush=True)

    elif data['type'] == 'input_audio_buffer.speech_started':
        print("\n[User speaking...]")

    elif data['type'] == 'error':
        print(f"Error: {data['error']}")

def stream_microphone(ws):
    stream = audio.open(
        format=FORMAT,
        channels=CHANNELS,
        rate=RATE,
        input=True,
        frames_per_buffer=CHUNK
    )

    def audio_thread():
        while True:
            audio_data = stream.read(CHUNK)
            ws.send(json.dumps({
                'type': 'input_audio_buffer.append',
                'audio': base64.b64encode(audio_data).decode('utf-8')
            }))

    import threading
    threading.Thread(target=audio_thread, daemon=True).start()

def play_audio(audio_chunk):
    stream = audio.open(
        format=FORMAT,
        channels=CHANNELS,
        rate=RATE,
        output=True
    )
    stream.write(audio_chunk)
    stream.stop_stream()
    stream.close()

# Create WebSocket connection
ws = websocket.WebSocketApp(
    f"wss://api.openai.com/v1/realtime?model=gpt-realtime",
    header={
        "Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}"
    },
    on_open=on_open,
    on_message=on_message
)

ws.run_forever()

Key Implementation Details

Event Types:

The Realtime API uses event-driven communication. Common events:

Client → Server:

session.update - Configure session parameters
input_audio_buffer.append - Send audio chunks
conversation.item.create - Add text messages
response.create - Request AI response

Server → Client:

session.created - Confirms session setup
response.audio.delta - Audio chunk from AI
response.audio_transcript.delta - Transcription of AI audio
input_audio_buffer.speech_started/stopped - VAD events
error - Error notifications

Voice Activity Detection:

Choose between two VAD modes:

server_vad: Basic voice activity detection based on audio volume and silence duration.

semantic_vad: Smarter detection that understands natural pauses vs. turn completion. Use this for more natural conversations where users might pause mid-thought.

Testing WebSocket Connections with Apidog

Testing WebSocket APIs differs from HTTP testing—you need to maintain a connection, send events, and monitor bidirectional message flow. Apidog provides specialized WebSocket testing capabilities.

Setting Up WebSocket Tests in Apidog

Step 1: Create WebSocket Request

In Apidog, create a new request and select "WebSocket" as the protocol. Enter your connection URL:

wss://api.openai.com/v1/responses

Step 2: Configure Headers

Add authentication headers:

Authorization: Bearer YOUR_OPENAI_API_KEY
OpenAI-Beta: responses-api=v1

For the Realtime API, you can also use URL-based authentication:

wss://api.openai.com/v1/realtime?model=gpt-realtime

With the API key in the subprotocol header.

Step 3: Establish Connection

Click "Connect" to establish the WebSocket connection. Apidog shows:

Connection status (connected/disconnected)
Latency metrics
Connection duration

Step 4: Send Events

Use Apidog's message composer to send JSON events. For the Responses API:

{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "What's the weather in San Francisco?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
          "type": "object",
          "properties": {
            "location": { "type": "string" }
          }
        }
      }
    }
  ]
}

Step 5: Monitor Responses

Apidog displays:

All received messages in chronological order
Message timestamps and sizes
JSON formatting and syntax highlighting
Copy/export capabilities for debugging

Testing Continuations

To test the continuation pattern with previous_response_id:

Send initial message, note the response.id in the response
Send continuation with only new input:

{
  "previous_response_id": "resp_abc123",
  "input": [
    {
      "tool_call_id": "call_xyz789",
      "output": "{\"temperature\": 72, \"conditions\": \"sunny\"}"
    }
  ]
}

Observe the reduced latency compared to resending full context

Testing Realtime API

For the Realtime API, Apidog lets you:

Send base64-encoded audio chunks
Monitor session.update events
Track VAD events (speech started/stopped)
View transcript deltas in real-time

This is particularly useful for debugging why your voice assistant might be cutting off users or not detecting speech properly.

Environment Variables

Store API keys securely using Apidog's environment variables:

{{OPENAI_API_KEY}}

This lets you switch between development and production keys without editing requests.

Real-World Use Cases

Let's explore practical scenarios where OpenAI's WebSocket modes excel.

Use Case 1: Autonomous Coding Agent

Scenario: A coding assistant that analyzes codebases, identifies issues, and makes improvements autonomously.

Why Responses API WebSocket:

Typical workflow: Read file → Analyze → Search for similar patterns → Read more files → Suggest changes
This creates 15-30 tool calls per task
WebSocket mode reduces total execution time by ~40%
Persistent connection maintains context across all tool calls

Implementation Pattern:

// Initial task
ws.send({ messages: [{ role: 'user', content: 'Audit security vulnerabilities' }], tools: [...] })

// First response: model calls read_file
ws.on('message', (resp1) => {
  ws.send({ previous_response_id: resp1.id, input: [tool_result_1] })
})

// Second response: model calls search_code
ws.on('message', (resp2) => {
  ws.send({ previous_response_id: resp2.id, input: [tool_result_2] })
})

// Continue for 20+ iterations...

Use Case 2: Voice Customer Service Bot

Scenario: Phone support bot that handles customer inquiries with natural conversation flow.

Why Realtime API WebSocket:

Low latency critical (<500ms for natural conversation)
Needs to handle interruptions (customer talks over bot)
Processes voice input directly without separate transcription
Streams responses in real-time (doesn't wait for complete sentence)

Implementation Pattern:

// Stream phone audio to API
phoneSystem.on('audio', (chunk) => {
  ws.send({
    type: 'input_audio_buffer.append',
    audio: base64Encode(chunk)
  })
})

// Play AI responses immediately
ws.on('message', (event) => {
  if (event.type === 'response.audio.delta') {
    phoneSystem.playAudio(base64Decode(event.delta))
  }
})

Troubleshooting Common Issues

Connection Fails to Establish

Symptoms: WebSocket connection never opens, immediate close event.

Common Causes:

Invalid API key - Double-check your Authorization header
Missing beta header - Responses API requires OpenAI-Beta: responses-api=v1
Network restrictions - Some corporate networks block WebSocket
Incorrect URL - Verify wss:// (not ws://) and endpoint path

Solution:
Use Apidog to test the connection with detailed error messages. The request inspector shows exactly which headers are sent, making it easy to spot missing or incorrect API keys.

Debugging Code:

ws.on('error', (error) => {
  console.error('Connection error:', error);
});

ws.on('close', (code, reason) => {
  console.log(`Closed with code ${code}: ${reason}`);
  // Common codes:
  // 1006: Abnormal closure (often auth issues)
  // 1008: Policy violation (invalid headers)
});

High Latency Despite WebSocket

Symptoms: WebSocket connection works but isn't faster than HTTP.

Common Causes:

Not using previous_response_id - You're resending full context
Cold start - First request on new connection is slower
Network latency - Geographic distance to API servers
Large payloads - Sending unnecessary data in continuations

Solution:
Verify you're sending only new input in continuations:

// WRONG - sends full context every time
ws.send({
  messages: [...allPreviousMessages, newMessage],
  tools: [...]
})

// RIGHT - references cached state
ws.send({
  previous_response_id: lastResponse.id,
  input: [newMessage]
})

Memory Leaks in Long-Running Connections

Symptoms: Application memory grows over time with persistent connection.

Common Causes:

Event listeners not removed - Accumulating listeners on reconnection
Audio buffers not released - Keeping references to played audio
Message history growing - Storing all received messages

Solution:

// Clean up event listeners on reconnection
function cleanupAndReconnect(ws) {
  ws.removeAllListeners();
  ws.close();

  const newWs = createConnection();
  return newWs;
}

// Release audio buffers after playing
function playAndRelease(audioBuffer) {
  const source = audioContext.createBufferSource();
  source.buffer = audioBuffer;
  source.connect(audioContext.destination);
  source.start();

  source.onended = () => {
    source.disconnect();
    // Buffer will be garbage collected
  };
}

// Limit message history
const messageHistory = [];
const MAX_HISTORY = 100;

ws.on('message', (data) => {
  messageHistory.push(data);
  if (messageHistory.length > MAX_HISTORY) {
    messageHistory.shift(); // Remove oldest
  }
});

Conclusion

OpenAI's WebSocket API modes unlock new possibilities for AI applications. The Responses API WebSocket mode delivers up to 40% faster execution for agentic workflows with heavy tool calling, making it ideal for autonomous coding assistants and orchestration systems. The Realtime API provides sub-500ms latency for voice applications, enabling natural, interrupt-capable conversations.

Choosing the right mode depends on your use case:

Responses API WebSocket: Tool-heavy agents, coding assistants, research tools (10+ tool calls)
Realtime API WebSocket: Voice assistants, phone bots, language tutors (audio streaming)
HTTP: Simple requests, serverless environments, 1-5 API calls

The persistent, event-driven nature of WebSocket connections requires different testing approaches than HTTP. Test OpenAI's WebSocket APIs with Apidog's real-time WebSocket client—import your API key, establish connections, send events, and monitor responses with detailed logging. Try it free to validate your integrations before production deployment.

button