xAI rolled out Grok 4.3 in stages: beta on April 17, 2026, API access on April 30, and full general availability on May 6. The pitch is direct: a 1,000,000-token context window, native video input for the first time on the Grok line, always-on reasoning, and a price cut of roughly 40% against Grok 4.20. Eight legacy Grok models retire on May 15, so anyone running on grok-3 or grok-4 series should plan a migration this week.
This guide covers how to call Grok 4.3 from your code: endpoint shape, authentication, the OpenAI-compatible base URL, the reasoning effort parameter, video input, function calling, and a working test setup in Apidog.
For the voice side of the same release, see How to use Grok Voice for free. For the head-to-head against OpenAI’s flagship voice model, see Grok Voice vs GPT-Realtime.
TL;DR
- Grok 4.3 went GA on May 6, 2026. Eight legacy models retire on May 15, 2026.
- Pricing: $1.25 per 1M input tokens, $2.50 per 1M output tokens, cached input $0.20 per 1M. Roughly a 40% cut vs Grok 4.20.
- 1M-token context window. Native video input. Always-on reasoning.
- Reasoning effort:
low/medium/high. Default ismedium. - Endpoint:
https://api.x.ai/v1/chat/completions(OpenAI-compatible base URL). - Throughput: ~159 tokens/second on standard tiers.
- Intelligence Index 53 (Artificial Analysis), ranked 10th of 146 models globally.
- Use Apidog to script the request, hold reasoning configurations as variables, and replay across both Grok and OpenAI compatibility modes.
What changed in Grok 4.3
The headline upgrades, in order of impact for most teams:
- 40% price drop. Input is down 37.5% vs Grok 4.20; output is down 58.3%. The cached-input rate is now $0.20/1M, an aggressive cut that makes long stable system prompts much cheaper.
- 1M-token context. Up from 256k on Grok 4.20. Enough to fit a midsize codebase, a full earnings call, or a complete legal contract in one prompt.
- Native video input. First time on the Grok line. Pass a video URL and the model reasons over frames natively.
- Always-on reasoning. Grok 4.3 ships with a baseline reasoning step on every request. The
reasoning_effortparameter scales the depth, but the model never reasons less thanlow. - Major agentic gain. +300 Elo points on GDPval-AA against Grok 4.20. Tool dispatch and multi-step workflows behave noticeably better.
The Intelligence Index of 53 (Artificial Analysis) puts Grok 4.3 above the average of 35 for its price tier, and tenth out of 146 models tracked.
Prerequisites
Before the first request, line up four things:
- An xAI Console account at
console.x.ai. Same login flow as Grok Voice. - A billable tier with an API key. Project-scoped keys are recommended for production.
- The OpenAI SDK (Grok 4.3 is OpenAI-compatible) or the xAI SDK. Either works.
- An API client that can replay requests without spamming your terminal.

Export the key once:
export XAI_API_KEY="xai-..."
Endpoint and authentication
Grok 4.3 ships on the OpenAI-compatible Chat Completions surface, with xAI’s base URL.
POST https://api.x.ai/v1/chat/completions
Auth is a bearer token. Headers are standard:
Authorization: Bearer $XAI_API_KEY
Content-Type: application/json
The OpenAI compatibility means you can drop the OpenAI Python or Node SDK in and change the base_url. That is the path of least resistance for most teams migrating from gpt-4 or gpt-5.
from openai import OpenAI
client = OpenAI(
api_key=os.environ["XAI_API_KEY"],
base_url="https://api.x.ai/v1",
)
response = client.chat.completions.create(
model="grok-4.3",
messages=[
{"role": "user", "content": "Summarize the trade-offs of GraphQL vs REST in three bullets."}
],
reasoning_effort="medium",
)
print(response.choices[0].message.content)
If you prefer the xAI SDK, the call shape is the same; the only change is the import.
Request parameters
The full parameter map for Grok 4.3:
| Parameter | Type | Values | Notes |
|---|---|---|---|
model |
string | grok-4.3 |
Required. |
messages |
array | OpenAI message shape | Required. Supports role: system / user / assistant. |
reasoning_effort |
string | low, medium, high |
Optional. Default: medium. Higher levels increase latency and output tokens. |
max_tokens |
int | 1–32768 | Caps output. |
temperature |
float | 0.0–2.0 | Default 1.0. |
top_p |
float | 0.0–1.0 | Nucleus sampling. |
stream |
bool | true / false | Server-sent events when true. |
tools |
array | OpenAI tool shape | Function calling. |
tool_choice |
string / object | auto, none, or specific tool |
Standard OpenAI semantics. |
response_format |
object | { type: "json_object" } |
Structured output. |
seed |
int | any | For reproducibility on temperature: 0. |
A working curl request:
curl https://api.x.ai/v1/chat/completions \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "grok-4.3",
"messages": [
{"role": "system", "content": "You are a senior backend engineer."},
{"role": "user", "content": "Review this query plan and flag the bottleneck."}
],
"reasoning_effort": "high"
}'
The response carries the standard OpenAI shape: choices[].message.content, plus a usage object with prompt_tokens, completion_tokens, reasoning_tokens, and total_tokens broken out.
Reasoning effort
Three levels, with concrete guidance:
low. Use for classification, summarization, rule extraction, simple Q&A. Latency is short, output is direct.medium. Default. Use for customer service, function calling, data analysis, single-step tool use. Reasoning depth is enough for most production traffic.high. Use for multi-step agents, long-chain code review, complex math, and tasks where the model needs to plan before answering.
Always-on reasoning means even low does some thinking; that is what drives the factual-accuracy gain over Grok 4.20. Don’t expect to save money by avoiding reasoning altogether; it is baked in.
Function calling
Standard OpenAI shape works directly. Declare tools, the model emits a tool_calls array on the assistant message, you execute, you reply with a tool role message:
tools = [{
"type": "function",
"function": {
"name": "lookup_user",
"description": "Look up a user by ID.",
"parameters": {
"type": "object",
"properties": {"user_id": {"type": "string"}},
"required": ["user_id"],
},
},
}]
response = client.chat.completions.create(
model="grok-4.3",
messages=[{"role": "user", "content": "Find user u_42 and tell me their last login."}],
tools=tools,
reasoning_effort="medium",
)
tool_calls = response.choices[0].message.tool_calls
The 300 Elo gain on GDPval-AA shows up here in practice; Grok 4.3 picks better tools, fewer redundant calls, and recovers from a tool error without spinning. If you are testing tool flows, MCP server testing in Apidog covers the replay setup we use internally.
Video input
Grok 4.3 is the first Grok model with native video input. Pass a video URL inside a content block:
response = client.chat.completions.create(
model="grok-4.3",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe what happens in this clip and flag any anomalies."},
{"type": "video_url", "video_url": {"url": "https://example.com/clip.mp4"}},
],
}],
)
Video tokens count against the input meter. Long clips burn context fast; downsample or trim before you send if cost matters. The model reasons over frames natively, so you don’t need to extract keyframes manually.
1M-token context
The 1M context window is a real production tool, not a benchmark trophy. Common patterns:
- Whole-codebase code review. Concatenate the diff, all touched files, and the lint output. Ask Grok to review.
- Long-form document QA. Drop a 200-page contract in and ask targeted questions.
- Conversation memory. Keep an entire month of agent conversations in context for personalization.
Cached input at $0.20/1M makes this affordable. A 400k-token system prompt that you keep stable burns $0.08 per cached call instead of $0.50 fresh.
Migration from legacy Grok models
Eight legacy Grok models retire on May 15, 2026, 12:00 PM PT. If you are running on any of them, swap the model string to grok-4.3 before the cutoff. Most calls work without further change because the request shape is unchanged.
Two things to watch:
- Reasoning effort. Some legacy models did not accept
reasoning_effort. Grok 4.3 always reasons; if your prior code relied on a fast non-reasoning path, accept the latency increase or stay onlow. - Output formatting. Grok 4.3 is more structured than Grok 4.20 by default. If you used heavy regex post-processing, retest before swapping.
For the full price comparison across the OpenAI line, see GPT-5.5 pricing; for the head-to-head reasoning models, see How to use the GPT-5.5 API.
Testing in Apidog
The fastest way to validate Grok 4.3 against your own use case:
- Create an Apidog environment with
XAI_API_KEYandBASE_URL = https://api.x.ai/v1. - Save a request collection with three variants:
low,medium,highreasoning. Same prompt, different effort. - Run all three. Compare the response, the latency, and the
usage.reasoning_tokenscount side by side. - Add a fourth variant pointing at OpenAI’s base URL to compare Grok 4.3 against GPT-5.5 on identical input. Same SDK, different model and base URL.
Download Apidog to run the comparison. The collection ports cleanly when you swap providers, which is the point. For broader API testing strategy, see API testing tool for QA engineers.

Rate limits
Tier limits on the xAI Console run from a baseline of a few thousand requests per minute on Tier 1 to multi-hundred-thousand on enterprise tiers. Concrete numbers shift; check the console dashboard. The 159 tokens/second throughput xAI advertises is per-stream output speed, not aggregate; concurrent requests scale linearly within tier caps.
If you hit rate limits, the API returns a 429 with a retry-after header. Standard exponential backoff handles it.
FAQ
Is Grok 4.3 OpenAI-compatible end to end?For Chat Completions, yes. Drop in the OpenAI SDK, change the base_url, change the model. Function calling, structured output, and streaming all work identically.
Does it support the Responses API?The xAI surface is Chat Completions today. The Responses API is OpenAI-only.
What is the actual context limit in practice?1,000,000 tokens. Long inputs cost real money even at $1.25/1M; cache aggressively if your prompt is stable.
How does always-on reasoning affect latency?First-token latency is slightly higher than non-reasoning models, but Grok 4.3 streams output at ~159 tokens/second, so end-to-end response time is competitive. The trade-off is worth it on accuracy-sensitive workloads.
Can I use Grok 4.3 with Grok Voice?Yes. The voice agent (grok-voice-think-fast-1.0) calls Grok 4.3 under the hood when it reasons. You can also call Grok 4.3 directly from a voice loop you build on top of TTS and STT primitives.
What happens to my old Grok 3 / Grok 4 calls after May 15?They will fail with a 410 (model retired). Migrate before the cutoff.
Does Grok 4.3 support image input?Yes, alongside the new video input. Pass an image URL in a content block, same shape as OpenAI.
Wrapping up
Grok 4.3 is the most aggressive price-performance move xAI has shipped. The 40% cut, the 1M context, the always-on reasoning, and the native video together make it a serious daily driver for most agent workloads. The OpenAI compatibility means migration is a base-URL change, not a rewrite.
The fastest validation path: script three reasoning variants in Apidog, drop in your real prompts, measure latency and reasoning tokens. Migrate before May 15.



