OpenAI Responses API

Status: Production Ready Last Updated: 2026-01-21 API Launch: March 2025 Dependencies: openai@6.16.0 (Node.js) or fetch API (Cloudflare Workers)

What Is the Responses API?

OpenAI's unified interface for agentic applications, launched March 2025. Provides stateful conversations with preserved reasoning state across turns.

Key Innovation: Unlike Chat Completions (reasoning discarded between turns), Responses preserves the model's reasoning notebook, improving performance by 5% on TAUBench and enabling better multi-turn interactions.

vs Chat Completions:

Feature	Chat Completions	Responses API
State	Manual history tracking	Automatic (conversation IDs)
Reasoning	Dropped between turns	Preserved across turns (+5% TAUBench)
Tools	Client-side round trips	Server-side hosted
Output	Single message	Polymorphic (8 types)
Cache	Baseline	40-80% better utilization
MCP	Manual	Built-in

Quick Start

bash

npm install openai@6.16.0

typescript

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'What are the 5 Ds of dodgeball?',
});

console.log(response.output_text);

Key differences from Chat Completions:

Endpoint:
```
/v1/responses
```
(not
```
/v1/chat/completions
```
)
Parameter:
```
input
```
(not
```
messages
```
)
Role:
```
developer
```
(not
```
system
```
)

Output:

response.output_text

(not

choices[0].message.content

)

When to Use Responses vs Chat Completions

Use Responses:

Agentic applications (reasoning + actions)
Multi-turn conversations (preserved reasoning = +5% TAUBench)
Built-in tools (Code Interpreter, File Search, Web Search, MCP)
Background processing (60s standard, 10min extended timeout)

Use Chat Completions:

Simple one-off generation
Fully stateless interactions
Legacy integrations

Stateful Conversations

Automatic State Management using conversation IDs:

typescript

// Create conversation
const conv = await openai.conversations.create({
  metadata: { user_id: 'user_123' },
});

// First turn
const response1 = await openai.responses.create({
  model: 'gpt-5',
  conversation: conv.id,
  input: 'What are the 5 Ds of dodgeball?',
});

// Second turn - model remembers context + reasoning
const response2 = await openai.responses.create({
  model: 'gpt-5',
  conversation: conv.id,
  input: 'Tell me more about the first one',
});

Benefits: No manual history tracking, reasoning preserved, 40-80% better cache utilization

Conversation Limits: 90-day expiration

Built-in Tools (Server-Side)

Server-side hosted tools eliminate backend round trips:

Tool	Purpose	Notes
`code_interpreter`	Execute Python code	Sandboxed, 30s timeout (use `background: true` for longer)
`file_search`	RAG without vector stores	Max 512MB per file, supports PDF/Word/Markdown/HTML/code
`web_search`	Real-time web information	Automatic source citations
`image_generation`	DALL-E integration	DALL-E 3 default
`mcp`	Connect external tools	OAuth supported, tokens NOT stored

Usage:

typescript

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Calculate mean of: 10, 20, 30, 40, 50',
  tools: [{ type: 'code_interpreter' }],
});

Web Search TypeScript Note

TypeScript Limitation: The

web_search

tool's

external_web_access

option is missing from SDK types (as of v6.16.0).

Workaround:

typescript

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Search for recent news',
  tools: [{
    type: 'web_search',
    external_web_access: true,
  } as any],  // ✅ Type assertion to suppress error
});

Source: GitHub Issue #1716

MCP Server Integration

Built-in support for Model Context Protocol (MCP) servers to connect external tools (Stripe, databases, custom APIs).

User Approval Requirement

By default, explicit user approval is required before any data is shared with a remote MCP server (security feature).

Handling Approval:

typescript

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Get my Stripe balance',
  tools: [{
    type: 'mcp',
    server_label: 'stripe',
    server_url: 'https://mcp.stripe.com',
    authorization: process.env.STRIPE_TOKEN,
  }],
});

if (response.status === 'requires_approval') {
  // Show user: "This action requires sharing data with Stripe. Approve?"
  // After user approves, retry with approval token
}

Alternative: Pre-approve MCP servers in OpenAI dashboard (users configure trusted servers via settings)

Source: Official MCP Guide

Basic MCP Usage

typescript

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Roll 2d6 dice',
  tools: [{
    type: 'mcp',
    server_label: 'dice',
    server_url: 'https://example.com/mcp',
    authorization: process.env.TOKEN, // ⚠️ NOT stored, required each request
  }],
});

MCP Output Types:

```
mcp_list_tools
```
- Tools discovered on server
```
mcp_call
```
- Tool invocation + result
```
message
```
- Final response

Reasoning Preservation

Key Innovation: Model's internal reasoning state survives across turns (unlike Chat Completions which discards it).

Visual Analogy:

Chat Completions: Model tears out scratchpad page before responding
Responses API: Scratchpad stays open for next turn

Performance: +5% on TAUBench (GPT-5) purely from preserved reasoning

Reasoning Summaries (free):

typescript

response.output.forEach(item => {
  if (item.type === 'reasoning') console.log(item.summary[0].text);
  if (item.type === 'message') console.log(item.content[0].text);
});

Important: Reasoning Traces Privacy

What You Get: Reasoning summaries (not full internal traces) What OpenAI Keeps: Full chain-of-thought reasoning (proprietary, for security/privacy)

For GPT-5-Thinking models:

OpenAI preserves reasoning internally in their backend
This preserved reasoning improves multi-turn performance (+5% TAUBench)
But developers only receive summaries, not the actual chain-of-thought
Full reasoning traces are not exposed (OpenAI's IP protection)

Source: Sean Goedecke Analysis

Background Mode

For long-running tasks, use

background: true

typescript

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Analyze 500-page document',
  background: true,
  tools: [{ type: 'file_search', file_ids: [fileId] }],
});

// Poll for completion (check every 5s)
const result = await openai.responses.retrieve(response.id);
if (result.status === 'completed') console.log(result.output_text);

Timeout Limits:

Standard: 60 seconds
Background: 10 minutes

Performance Considerations

Time-to-First-Token (TTFT) Latency: Background mode currently has higher TTFT compared to synchronous responses. OpenAI is working to reduce this gap.

Recommendation:

For user-facing real-time responses, use sync mode (lower latency)
For long-running async tasks, use background mode (latency acceptable)

Source: OpenAI Background Mode Docs

Data Retention and Privacy

Default Retention: 30 days when

store: true

(default) Zero Data Retention (ZDR): Organizations with ZDR automatically enforce

store: false

Background Mode: NOT ZDR compatible (stores data ~10 minutes for polling)

Timeline:

September 26, 2025: OpenAI court-ordered retention ended
Current: 30-day default retention with
```
store: true
```

Control Storage:

typescript

// Disable storage (no retention)
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Hello!',
  store: false,  // ✅ No retention
});

// ZDR organizations: store always treated as false
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Hello!',
  store: true,  // ⚠️ Ignored by OpenAI for ZDR orgs, treated as false
});

ZDR Compliance:

Avoid background mode (requires temporary storage)
Explicitly set
```
store: false
```
for clarity
Note: 60s timeout applies in sync mode

Source: OpenAI Data Controls

Polymorphic Outputs

Returns 8 output types instead of single message:

Type	Example
`message`	Final answer, explanation
`reasoning`	Step-by-step thought process (free!)
`code_interpreter_call`	Python code + results
`mcp_call`	Tool name, args, output
`mcp_list_tools`	Tool definitions from MCP server
`file_search_call`	Matched chunks, citations
`web_search_call`	URLs, snippets
`image_generation_call`	Image URL

Processing:

typescript

response.output.forEach(item => {
  if (item.type === 'reasoning') console.log(item.summary[0].text);
  if (item.type === 'web_search_call') console.log(item.results);
  if (item.type === 'message') console.log(item.content[0].text);
});

// Or use helper for text-only
console.log(response.output_text);

Migration from Chat Completions

Breaking Changes:

Feature	Chat Completions	Responses API
Endpoint	`/v1/chat/completions`	`/v1/responses`
Parameter	`messages`	`input`
Role	`system`	`developer`
Output	`choices[0].message.content`	`output_text`
State	Manual array	Automatic (conversation ID)
Streaming	`data: {"choices":[...]}`	SSE with 8 item types

Example:

typescript

// Before
const response = await openai.chat.completions.create({
  model: 'gpt-5',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' },
  ],
});
console.log(response.choices[0].message.content);

// After
const response = await openai.responses.create({
  model: 'gpt-5',
  input: [
    { role: 'developer', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' },
  ],
});
console.log(response.output_text);

Migration from Assistants API

CRITICAL: Assistants API Sunset Timeline

August 26, 2025: Assistants API officially deprecated
2025-2026: OpenAI providing migration utilities
August 26, 2026: Assistants API sunset (stops working)

Migrate before August 26, 2026 to avoid breaking changes.

Source: Assistants API Sunset Announcement

Key Breaking Changes:

Assistants API	Responses API
Assistants (created via API)	Prompts (created in dashboard)
Threads	Conversations (store items, not just messages)
Runs (server-side lifecycle)	Responses (stateless calls)
Run-Steps	Items (polymorphic outputs)

Migration Example:

typescript

// Before (Assistants API - deprecated)
const assistant = await openai.beta.assistants.create({
  model: 'gpt-4',
  instructions: 'You are helpful.',
});

const thread = await openai.beta.threads.create();

const run = await openai.beta.threads.runs.create(thread.id, {
  assistant_id: assistant.id,
});

// After (Responses API - current)
const conversation = await openai.conversations.create({
  metadata: { purpose: 'customer_support' },
});

const response = await openai.responses.create({
  model: 'gpt-5',
  conversation: conversation.id,
  input: [
    { role: 'developer', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' },
  ],
});

Migration Guide: Official Assistants Migration Docs

Known Issues Prevention

This skill prevents 11 documented errors:

1. Session State Not Persisting

Cause: Not using conversation IDs or using different IDs per turn

Fix: Create conversation once (

const conv = await openai.conversations.create()

), reuse

conv.id

for all turns

2. MCP Server Connection Failed (

mcp_connection_error

)

Causes: Invalid URL, missing/expired auth token, server down
Fix: Verify URL is correct, test manually with
```
fetch()
```
, check token expiration

3. Code Interpreter Timeout (

code_interpreter_timeout

)

Cause: Code runs longer than 30 seconds
Fix: Use
```
background: true
```
for extended timeout (up to 10 min)

4. Image Generation Rate Limit (

rate_limit_error

)

Cause: Too many DALL-E requests
Fix: Implement exponential backoff retry (1s, 2s, 3s delays)

5. File Search Relevance Issues

Cause: Vague queries return irrelevant results
Fix: Use specific queries ("pricing in Q4 2024" not "find pricing"), filter by
```
chunk.score > 0.7
```

6. Cost Tracking Confusion

Cause: Responses bills for input + output + tools + stored conversations (vs Chat Completions: input + output only)
Fix: Set
```
store: false
```
if not needed, monitor
```
response.usage.tool_tokens
```

7. Conversation Not Found (

invalid_request_error

)

Causes: ID typo, conversation deleted, or expired (90-day limit)
Fix: Verify exists with
```
openai.conversations.list()
```
before using

8. Tool Output Parsing Failed

Cause: Accessing wrong output structure

Fix: Use

response.output_text

helper or iterate

response.output.forEach(item => ...)

checking

item.type

9. Zod v4 Incompatibility with Structured Outputs

Error:

Invalid schema for response_format 'name': schema must be a JSON Schema of 'type: "object"', got 'type: "string"'.

Source: GitHub Issue #1597
Why It Happens: SDK's vendored
```
zod-to-json-schema
```
library doesn't support Zod v4 (missing
```
ZodFirstPartyTypeKind
```
export)

Prevention: Pin to Zod v3 (

"zod": "^3.23.8"

) or use custom

zodTextFormat

with

z.toJSONSchema({ target: "draft-7" })

typescript

// Workaround: Pin to Zod v3 (recommended)
{
  "dependencies": {
    "openai": "^6.16.0",
    "zod": "^3.23.8"  // DO NOT upgrade to v4 yet
  }
}

10. Background Mode Web Search Missing Sources

Error:
```
web_search_call
```
output items contain query but no sources/results
Source: GitHub Issue #1676
Why It Happens: When using
```
background: true
```
+
```
web_search
```
tool, OpenAI doesn't return sources in the response
Prevention: Use synchronous mode (
```
background: false
```
) when web search sources are needed

typescript

// ✅ Sources available in sync mode
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Latest AI news?',
  background: false,  // Required for sources
  tools: [{ type: 'web_search' }],
});

11. Streaming Mode Missing output_text Helper

Error:
```
finalResponse().output_text
```
is
```
undefined
```
in streaming mode
Source: GitHub Issue #1662
Why It Happens:
```
stream.finalResponse()
```
doesn't include
```
output_text
```
convenience field (only available in non-streaming responses)
Prevention: Listen for
```
output_text.done
```
event or manually extract from
```
output
```
items

typescript

// Workaround: Listen for event
const stream = openai.responses.stream({ model: 'gpt-5', input: 'Hello!' });
let outputText = '';
for await (const event of stream) {
  if (event.type === 'output_text.done') {
    outputText = event.output_text;  // ✅ Available in event
  }
}

Critical Patterns

✅ Always:

Use conversation IDs for multi-turn (40-80% better cache)
Handle all 8 output types in polymorphic responses
Use
```
background: true
```
for tasks >30s
Provide MCP
```
authorization
```
tokens (NOT stored, required each request)
Monitor
```
response.usage.total_tokens
```
for cost control

❌ Never:

Expose API keys in client-side code
Assume single message output (use
```
response.output_text
```
helper)
Reuse conversation IDs across users (security risk)
Ignore error types (handle
```
rate_limit_error
```
,
```
mcp_connection_error
```
specifically)
Poll faster than 1s for background tasks (use 5s intervals)

References

Official Docs:

Responses API Guide: https://platform.openai.com/docs/guides/responses
API Reference: https://platform.openai.com/docs/api-reference/responses
MCP Integration: https://platform.openai.com/docs/guides/tools-connectors-mcp
Blog Post: https://developers.openai.com/blog/responses-api/
Starter App: https://github.com/openai/openai-responses-starter-app

Skill Resources:

templates/

references/responses-vs-chat-completions.md

references/mcp-integration-guide.md

references/built-in-tools-guide.md

references/migration-guide.md

references/top-errors.md

Last verified: 2026-01-21 | Skill version: 2.1.0 | Changes: Added 3 TIER 1 issues (Zod v4, background web search, streaming output_text), 2 TIER 2 findings (MCP approval, reasoning privacy), Data Retention & ZDR section, Assistants API sunset timeline, background mode TTFT note, web search TypeScript limitation. Updated SDK version to 6.16.0.

openai-responses

NPX Install

Tags

SKILL.md Content

OpenAI Responses API

What Is the Responses API?

Quick Start

When to Use Responses vs Chat Completions

Stateful Conversations

Built-in Tools (Server-Side)

Web Search TypeScript Note

MCP Server Integration

User Approval Requirement

Basic MCP Usage

Reasoning Preservation

Important: Reasoning Traces Privacy

Background Mode

Performance Considerations

Data Retention and Privacy

Polymorphic Outputs

Migration from Chat Completions

Migration from Assistants API

Known Issues Prevention

Critical Patterns

References