What Is GPT-5.4? Complete Guide to OpenAI's Most Capable Model

TL;DR / Quick Answer

GPT-5.4 is OpenAI's most advanced frontier model for professional work, released March 5, 2026. It combines industry-leading coding capabilities from GPT-5.3-Codex with enhanced reasoning, computer use, and tool integration. The model achieves 83% win rate on knowledge work tasks, 75% on computer use benchmarks, and uses significantly fewer tokens than GPT-5.2. Available via API at $2.50/M input tokens and $15/M output tokens, with Pro version ($30/$180) for complex tasks.

Introduction

OpenAI just raised the bar for AI-powered professional work. On March 5, 2026, they released GPT-5.4, a model that delivers 83% win rates against industry professionals on real-world knowledge work tasks while using significantly fewer tokens than its predecessor.

If you've worked with AI models that hallucinate facts, struggle with complex workflows, or burn through tokens on simple tasks, GPT-5.4 addresses these pain points directly. It's 33% less likely to make factual errors and completes computer-use tasks 3x faster than earlier models.

💡

For developers building AI-powered applications, testing and validating API integrations becomes critical. Tools like Apidog help you design, debug, and test API endpoints whether you're integrating GPT-5.4 or building your own services. Apidog's unified platform combines API design, debugging, testing, and mocking in a single interface, streamlining the development workflow for teams integrating AI models into their applications.

button

This guide breaks down what GPT-5.4 actually does, how it compares to previous versions, and whether the performance gains justify the higher token costs. You'll get specific benchmark data, real performance comparisons, and clear guidance on which GPT-5.4 variant fits your use case.

What you'll learn:

Exact performance improvements over GPT-5.2 and GPT-5.3-Codex
Benchmark scores across coding, computer use, and knowledge work
New computer use and vision capabilities with real examples
Pricing breakdown and when to use Pro vs standard
Integration considerations for API developers

What Is GPT-5.4?

GPT-5.4 represents OpenAI's first general-purpose model with native computer use capabilities. It merges the coding excellence of GPT-5.3-Codex with enhanced reasoning, visual perception, and tool integration into a single frontier model.

The model targets three core professional scenarios:

Knowledge work - Creating spreadsheets, presentations, documents, and analysis across 44 occupations. GPT-5.4 matches or exceeds industry professionals in 83% of comparisons on GDPval, up from 70.9% for GPT-5.2.

Computer use and agents - Operating computers through mouse/keyboard commands, browser automation, and multi-step workflows across applications. Achieves 75% success rate on OSWorld-Verified, surpassing human performance at 72.4%.

Coding and development - Writing, debugging, and iterating on code with state-of-the-art performance on SWE-Bench Pro (57.7%) while supporting up to 1M token context windows for complex codebases.

GPT-5.4 comes in two variants:

GPT-5.4 - Standard model for most professional tasks
GPT-5.4 Pro - Maximum performance on complex reasoning tasks ($30/M input, $180/M output)

Key Improvements Over GPT-5.2

GPT-5.4 isn't an incremental update. OpenAI made substantial gains across four critical areas.

1. Factual Accuracy and Hallucination Reduction

False claims dropped 33% at the individual claim level. Full responses contain 18% fewer errors overall. This matters when you're generating legal documents, financial models, or technical documentation where a single hallucinated fact can derail an entire project.

2. Token Efficiency

GPT-5.4 uses significantly fewer tokens to solve problems compared to GPT-5.2. In tool-heavy workflows with MCP Atlas benchmarks, token usage dropped 47% while maintaining accuracy. For high-volume API users, this efficiency gain offsets the higher per-token pricing.

3. Computer Use Capabilities

Previous models required separate specialized models for computer use. GPT-5.4 handles this natively:

Issues mouse and keyboard commands from screenshots
Automates browsers via Playwright
Navigates desktop environments through coordinate-based interactions
Supports custom safety policies and confirmation requirements

4. Tool Search and Integration

Tool search eliminates the need to load thousands of tool definitions into every request. The model looks up tool definitions on-demand, reducing upfront token costs and enabling work with ecosystems containing tens of thousands of tools.

On Toolathlon benchmark, GPT-5.4 achieves 54.6% accuracy compared to 45.7% for GPT-5.2, with fewer tool yields (latency proxy) required.

GPT-5.4 Performance Benchmarks

Benchmark data shows where GPT-5.4 excels and where earlier models remain competitive.

Knowledge Work (GDPval)

Model	Win Rate vs Professionals
GPT-5.4	83.0%
GPT-5.4 Pro	82.0%
GPT-5.2 Pro	74.1%
GPT-5.2	70.9%

GDPval tests well-specified knowledge work across 44 occupations from the top 9 industries contributing to US GDP. Tasks include sales presentations, accounting spreadsheets, urgent care schedules, manufacturing diagrams, and short videos.

Spreadsheet and Document Creation

On internal investment banking modeling tasks:

GPT-5.4: 87.3% mean score
GPT-5.2: 68.4% mean score

For presentation evaluation, human raters preferred GPT-5.4 outputs 68% of the time due to stronger aesthetics, greater visual variety, and more effective image generation use.

Coding Performance (SWE-Bench Pro)

Model	Accuracy	Estimated Latency
GPT-5.4	57.7%	~1000s
GPT-5.3-Codex	56.8%	~1200s
GPT-5.2	55.6%	~1500s

GPT-5.4 matches or exceeds GPT-5.3-Codex on SWE-Bench Pro while delivering lower latency across reasoning efforts. The /fast mode in Codex delivers up to 1.5x faster token velocity with GPT-5.4.

Computer Use (OSWorld-Verified)

OSWorld-Verified measures success at navigating desktop environments through screenshots and keyboard/mouse actions:

GPT-5.4: 75.0%
GPT-5.3-Codex: 74.0% (with API parameter preserving original image resolution)
GPT-5.2: 47.3%
Human performance: 72.4%

This benchmark tests real desktop workflows: email and calendar management, bulk data entry, file operations, and cross-application tasks.

Web Browsing (BrowseComp)

BrowseComp tests persistent web research to find hard-to-locate information:

GPT-5.4 Pro: 89.3%
GPT-5.4: 82.7%
GPT-5.2 Pro: 77.9%
GPT-5.2: 65.8%

The 17% absolute improvement over GPT-5.2 reflects better synthesis of multi-source information and more persistent search strategies.

Visual Understanding

MMMU Pro (no tools) - Tests visual understanding and reasoning:

GPT-5.4: 81.2%
GPT-5.2: 79.5%

OmniDocBench - Document parsing accuracy (lower error = better):

GPT-5.4: 0.109 normalized edit distance
GPT-5.2: 0.140 normalized edit distance

Computer Use and Vision Capabilities

GPT-5.4's computer use capabilities warrant detailed examination. This is the first general-purpose OpenAI model that can operate computers natively.

How Computer Use Works

The model interprets screenshots of browser or desktop interfaces and responds with:

Coordinate-based clicking on UI elements
Keyboard input for text entry
Playwright commands for browser automation
Mouse movements and drag operations

Developers configure behavior through system messages, adjusting safety policies and confirmation requirements based on risk tolerance.

Real-World Computer Use Example

Mainstay tested GPT-5.4 across approximately 30,000 HOA and property tax portals:

GPT-5.4: 95% first-attempt success, 100% within three attempts
Previous CUA models: 73-79% success rate
Session completion: 3x faster with GPT-5.4
Token usage: 70% fewer tokens per session

The model navigates portal interfaces, extracts data from varied UI layouts, handles authentication flows, and manages edge cases like captchas or multi-step forms.

Enhanced Visual Perception

GPT-5.4 introduced original image input detail level supporting:

Up to 10.24M total pixels
6000-pixel maximum dimension
Full-fidelity perception for dense, high-resolution images

The high detail level supports up to 2.56M total pixels or 2048-pixel maximum dimension. Early API user testing showed strong gains in localization ability, image understanding, and click accuracy with original or high detail settings.

Document Parsing Improvements

Better visual perception translates to document handling. GPT-5.4 parses:

Multi-page PDFs with tables and figures
Scanned documents with varied layouts
Screenshots containing text and UI elements
Technical diagrams and charts

The 22% improvement on OmniDocBench (0.140 to 0.109 error rate) reflects this capability.

Coding and Development Features

GPT-5.4 inherits GPT-5.3-Codex's coding excellence while adding computer use for integrated development workflows.

Frontend Development

Internal evaluations found GPT-5.4 excels at complex frontend tasks with noticeably more aesthetic and functional results than previous models. The experimental Playwright Interactive skill in Codex demonstrates this:

Example: Theme Park Simulation A single prompt generated an isometric theme park simulation with:

Tile-based path placement
Ride and scenery construction
Guest pathfinding and queueing
Park metrics (money, guests, happiness, cleanliness)
Browser playtesting via Playwright automation
Image generation for isometric assets

The model built the game, then used Playwright to automate playtests, verifying placement, navigation, guest reactions, and UI stability across multiple rounds.

Fast Mode for Developers

GPT-5.4 in Codex supports /fast mode delivering up to 1.5x faster token velocity. API developers access equivalent speeds through priority processing. This maintains the same intelligence while reducing iteration time during debugging and development.

Context Window Support

GPT-5.4 Codex includes experimental 1M token context window support. Configure via:

model_context_window parameter
model_auto_compact_token_limit parameter

Requests exceeding the standard 272K context count against usage limits at 2x the normal rate. This enables analysis of entire codebases, large documentation sets, or multi-file projects in a single request.

Apidog for API Documentation: When working with large codebases and API integrations, keep your API documentation synchronized with implementation. Apidog can import OpenAPI/Swagger specs, generate interactive documentation, and sync with your codebase to ensure API docs stay current as you integrate GPT-5.4 features.

Tool Integration and Search

Tool search represents a fundamental shift in how models interact with external tools and MCP servers.

How Tool Search Works

Previous approach: All tool definitions loaded into every request upfront. For systems with many tools, this added thousands to tens of thousands of tokens, increasing costs and slowing responses.

Tool search approach: Model receives a lightweight list of available tools. When needed, it looks up specific tool definitions and appends them to the conversation at that moment.

Token Savings Example

Scale's MCP Atlas benchmark tested 250 tasks with all 36 MCP servers enabled:

Token breakdown without tool search:

65,320 upfront input tokens (tool definitions)
Additional tokens from tool outputs
Output tokens

Tool search eliminates the upfront cost while preserving cache efficiency.

MCP Atlas Performance

On MCP Atlas benchmark (250 tasks, 36 MCP servers):

GPT-5.4: 67.2% accuracy
GPT-5.2: 60.6% accuracy

The model works with larger tool ecosystems without sacrificing accuracy or overwhelming context windows.

Agentic Tool Calling

Toolathlon benchmark tests multi-step tool workflows (reading emails, extracting attachments, uploading files, grading, recording results):

Tool yields (waiting for tool responses) better reflect latency than tool call counts because they capture parallelization benefits. GPT-5.4 completes tasks in fewer rounds.

GPT-5.4 vs GPT-5.3-Codex vs GPT-5.2

Choosing between models depends on your specific requirements.

When to Use GPT-5.4

Computer use required - Native computer operation, browser automation
Knowledge work - Spreadsheets, presentations, documents
Tool-heavy workflows - MCP servers, external APIs, multi-step automation
Cost-sensitive at scale - Token efficiency reduces total costs despite higher per-token pricing
Long-context needs - Up to 1M tokens for complex codebases

When GPT-5.3-Codex Remains Competitive

Pure coding tasks - Similar SWE-Bench Pro performance (56.8% vs 57.7%)
Established Codex workflows - Existing integrations may not need computer use
Cost optimization - If GPT-5.3-Codex pricing remains lower

When GPT-5.2 Suffices

Simple queries - Basic Q&A, summarization, straightforward generation
Budget constraints - Lower per-token costs ($1.75/$14 vs $2.50/$15)
Non-agentic workflows - Single-turn requests without tool use

Pricing Comparison

Model	Input Price	Cached Input	Output Price
GPT-5.2	$1.75/M	$0.175/M	$14/M
GPT-5.4	$2.50/M	$0.25/M	$15/M
GPT-5.2 Pro	$21/M	-	$168/M
GPT-5.4 Pro	$30/M	-	$180/M

Batch and Flex pricing available at 50% of standard rates. Priority processing at 200% of standard rates.

Availability and Access Options

GPT-5.4 rolled out gradually starting March 5, 2026 across ChatGPT, Codex, and API.

ChatGPT Access

GPT-5.4 Thinking available to:

ChatGPT Plus subscribers
ChatGPT Team subscribers
ChatGPT Pro subscribers

GPT-5.4 Pro available to:

ChatGPT Pro subscribers
ChatGPT Enterprise subscribers

Legacy access: GPT-5.2 Thinking remains available for three months under Legacy Models section, retiring June 5, 2026.

Enterprise and Education: Early access available via admin settings.

Codex Access

GPT-5.4 is the default model in Codex with:

Experimental 1M context window support
Playwright Interactive skill for browser playtesting
/fast mode for 1.5x token velocity

API Access

Model names:

gpt-5.4 - Standard model
gpt-5.4-pro - Pro model for complex tasks

Context windows:

Standard: 272K tokens
Extended: Up to 1M tokens (experimental, 2x usage rate)

Pricing:

Standard: $2.50/M input, $0.25/M cached input, $15/M output
Pro: $30/M input, $180/M output
Batch/Flex: 50% discount
Priority: 2x standard rate

Deprecation Timeline

GPT-5.2 Thinking retires June 5, 2026. Migrate workflows before this date to avoid disruption.

Conclusion

GPT-5.4 delivers measurable improvements across knowledge work, computer use, and coding tasks. The 83% GDPval win rate, 75% OSWorld-Verified score, and 57.7% SWE-Bench Pro accuracy establish it as the new state of the art for professional AI workflows.

For developers integrating GPT-5.4 into applications, having robust API testing and debugging tools becomes essential. Apidog streamlines the integration process with unified API design, debugging, testing, and documentation capabilities. Whether you're building AI agents, automating workflows, or creating customer-facing features powered by GPT-5.4, Apidog helps ensure your API integrations work correctly from day one.

button

Key takeaways:

33% reduction in false claims and 18% fewer response errors
47% token reduction in tool-heavy workflows
75% computer use success rate, surpassing human baseline
Native computer operation via mouse/keyboard commands
Tool search enables work with tens of thousands of tools
1M token context window for complex codebases
Available at $2.50/$15 per million tokens (standard variant)

When to adopt:

You need computer use or browser automation
Token efficiency matters for high-volume workflows
Factual accuracy is critical (legal, financial, technical)
You work with large tool ecosystems or MCP servers
Long-context analysis of codebases or documents

When to wait:

Simple Q&A workflows don't benefit from new capabilities
Budget constraints prioritize lowest per-token costs
Existing GPT-5.2 or GPT-5.3-Codex workflows perform adequately

GPT-5.4 represents OpenAI's most efficient reasoning model to date. The combination of reduced hallucinations, improved token efficiency, and native computer use capabilities justifies the higher per-token pricing for professional applications.

FAQ

What is the difference between GPT-5.4 and GPT-5.2?

GPT-5.4 achieves 83% win rate on knowledge work vs 70.9% for GPT-5.2, uses significantly fewer tokens, has native computer use capabilities, and reduces factual errors by 33%. Pricing is higher ($2.50/$15 vs $1.75/$14) but total costs may be lower due to efficiency gains.

How much does GPT-5.4 API cost?

GPT-5.4 costs $2.50 per million input tokens, $0.25 per million cached input tokens, and $15 per million output tokens. GPT-5.4 Pro costs $30/M input and $180/M output. Batch and Flex pricing offer 50% discounts.

Does GPT-5.4 have a context window limit?

Standard context window is 272K tokens. Experimental 1M token context window support is available in Codex by configuring model_context_window and model_auto_compact_token_limit parameters. Requests exceeding 272K count at 2x usage rate.

What is GPT-5.4 Pro used for?

GPT-5.4 Pro targets maximum performance on complex reasoning tasks. It scores higher on benchmarks like BrowseComp (89.3% vs 82.7%) and GDPval (82.0% vs 83.0% standard) but costs 12x more ($30/$180 vs $2.50/$15).

When did GPT-5.4 release?

GPT-5.4 released March 5, 2026, rolling out gradually across ChatGPT, Codex, and API. GPT-5.2 Thinking remains available until June 5, 2026 for migration.

Can GPT-5.4 use computers and browsers?

Yes. GPT-5.4 is OpenAI's first general-purpose model with native computer use capabilities. It issues mouse/keyboard commands, automates browsers via Playwright, and navigates desktop environments through screenshot interpretation.

What is tool search in GPT-5.4?

Tool search allows the model to look up tool definitions on-demand instead of loading all definitions upfront. This reduces token usage by 47% in tool-heavy workflows and enables work with ecosystems containing tens of thousands of tools.

How does GPT-5.4 compare to GPT-5.3-Codex for coding?

GPT-5.4 matches or exceeds GPT-5.3-Codex on SWE-Bench Pro (57.7% vs 56.8%) while offering lower latency and adding computer use capabilities. It's the recommended choice for new development workflows.

Is GPT-5.4 available in ChatGPT?

Yes. GPT-5.4 Thinking is available to Plus, Team, and Pro subscribers. GPT-5.4 Pro is available to Pro and Enterprise plans. GPT-5.2 Thinking remains available under Legacy Models until June 5, 2026.

What are the safety considerations for GPT-5.4?

GPT-5.4 is treated as High cyber capability under OpenAI's Preparedness Framework. Protections include expanded cyber safety stack, monitoring systems, trusted access controls, and asynchronous blocking for higher-risk requests on Zero Data Retention surfaces. Some false positives may occur as classifiers improve.