What Is DeepSeek V4?

DeepSeek V4 launched April 23, 2026 as an MIT-licensed MoE family: V4-Pro (1.6T / 49B active) and V4-Flash (284B / 13B active) with 1M context. Full breakdown of architecture, benchmarks, and access.

Ashley Innocent

Ashley Innocent

24 April 2026

What Is DeepSeek V4?

Apidog for Enterprise

On-Premises Deploy

SSO & RBAC

SOC 2 Compliant

Explore Apidog Enterprise

DeepSeek dropped V4 on April 23, 2026, and this one is not a minor point release. The Hangzhou lab released four checkpoints at once, topped by DeepSeek-V4-Pro at 1.6 trillion total parameters, an MIT license, and a 1-million-token context window. The smaller sibling, DeepSeek-V4-Flash, lands at 284 billion parameters with the same context and the same open weights. Benchmarks put the Pro variant ahead of Claude Opus 4.6 on LiveCodeBench and Codeforces, and inside arm’s reach of GPT-5.4 xHigh on MMLU-Pro.

If you are deciding whether to swap Claude, GPT-5.5, or Qwen for DeepSeek V4, this guide covers what the model is, what changed from V3.2, the architecture choices that drive the benchmark story, and where to run it today.

For the matching developer walkthroughs, we have a DeepSeek V4 API guide, a free-access guide, and a full DeepSeek V4 usage walkthrough. The request shape maps cleanly onto OpenAI’s format, so you can pre-build the collection in Apidog before a key lands in your inbox.

button

TL;DR

What DeepSeek V4 actually is

DeepSeek V4 is the successor to the V3 and V3.2 lines that turned the lab into a household name last year. The architecture is still Mixture-of-Experts, but the shape of the model has changed. V4-Pro activates only 49 billion of its 1.6 trillion parameters per token, so the per-token compute bill looks closer to a 50B dense model than a trillion-parameter frontier system. Read the full technical report on the DeepSeek V4 model card.

Four checkpoints ship at launch:

All four drop under MIT, which is the quiet story. GPT-5.5 is closed and costs $5 per million input tokens; Claude Opus 4.6 is closed and prices closer to $15. DeepSeek V4-Pro has open weights you can download, mirror, fine-tune, and deploy on your own hardware with no license fee.

What changed from V3.2

V3 was already competitive on reasoning and code. V4 rewrites the attention stack and the training pipeline to push long context and efficiency at the same time.

Capability V3.2 V4-Pro
Total parameters 685B 1.6T
Active parameters 37B 49B
Context window 128K 1M
Inference FLOPs (1M context) baseline 27% of V3.2
KV cache (1M context) baseline 10% of V3.2
Precision FP8 FP4 + FP8 mixed
License DeepSeek License MIT
Reasoning modes single three

Three things drive the jump. First, a new hybrid attention stack that pairs Compressed Sparse Attention with Heavily Compressed Attention; that is where the 10% KV-cache number comes from. Second, Manifold-Constrained Hyper-Connections that stabilize gradients at the depth V4 needs. Third, a switch to the Muon optimizer for faster convergence. The training corpus also grew past 32 trillion tokens, and post-training uses a two-stage pipeline that cultivates domain-specific experts first, then consolidates them with on-policy distillation.

Benchmarks that matter

DeepSeek’s reported numbers put V4-Pro on the frontier board for coding and knowledge, with gaps on long-context retrieval.

For V4-Flash, the smaller variant, DeepSeek reports MMLU-Pro 86.2, GPQA Diamond 88.1, LiveCodeBench 91.6, Codeforces 3052, and SWE Verified 79.0. That is frontier territory for a 13B-active model, and it is the reason Flash is the interesting checkpoint for anyone deploying on their own hardware. See the DeepSeek V4-Flash card for the full table.

The honest read: V4-Pro wins on code, wins on open-ended factual recall, trails Gemini 3.1 Pro on general knowledge, and trails Claude Opus on the 1M-token retrieval benchmarks. If your workload is agentic coding or reasoning-heavy analysis, V4-Pro is in the conversation. If it is needle-in-a-haystack retrieval across a million tokens, Claude still has the edge.

Three reasoning modes

Every V4 checkpoint exposes three reasoning efforts, and picking the right one is the biggest cost lever.

Switch between them with a single thinking_mode parameter in the API or a flag in the local inference script. DeepSeek’s sampling recommendation is temperature=1.0, top_p=1.0 across all three.

Architecture in plain English

The V4 architecture paper is dense, but three choices explain the efficiency story.

  1. Hybrid attention. Most transformer layers use Compressed Sparse Attention, which keeps a small pool of high-value tokens fully attended and compresses the rest. A handful of layers use Heavily Compressed Attention, which lives closer to linear cost in sequence length. The mix is what delivers the 27% FLOPs and 10% KV-cache numbers at 1M tokens.
  2. Manifold-Constrained Hyper-Connections. Instead of plain residual connections, V4 wraps each layer’s residuals in a constraint that keeps activations on a stable manifold. The practical effect is that you can stack more layers without gradient chaos.
  3. Muon optimizer. Replaces AdamW for most of training. Muon converges faster and handles the huge gradient norms MoE models produce better than AdamW does.

None of these ideas are brand-new on their own. The V4 contribution is getting all three working together at trillion-parameter scale without blowing up training.

Availability today

DeepSeek launched all four checkpoints and the API on the same day. Here is the snapshot as of April 24, 2026.

Surface Access
chat.deepseek.com Free web chat, V4-Pro default, login required
DeepSeek API Live at api.deepseek.com; model IDs deepseek-v4-pro, deepseek-v4-flash
Hugging Face weights V4-Pro, V4-Flash, both MIT
ModelScope Mirrored weights for users in China
OpenRouter and aggregators Expected within days; typical DeepSeek launch pattern
deepseek-chat / deepseek-reasoner Deprecated July 24, 2026

The deprecation notice is worth circling. If you are still calling deepseek-chat in production, you have three months to migrate to deepseek-v4-pro or deepseek-v4-flash.

How it compares to GPT-5.5 and Claude

The three-way comparison most teams actually care about:

What to build with it

Four workloads line up cleanly with V4’s strengths:

  1. Agentic coding loops. The SWE Verified 79.0 and Codeforces 3206 numbers point directly at multi-file debugging, repo-aware refactors, and autonomous test fixes. Pair it with a good API client like Apidog to inspect every request and response while you tune prompts.
  2. Reasoning over long documents. 1M tokens is enough for most monorepos, most contracts, and most research corpora. Think High is the right mode for this.
  3. Self-hosted AI products. If your compliance story needs on-prem inference, V4-Flash is the first open-weights model that competes with closed frontier APIs on quality.
  4. Research and fine-tuning. The Base checkpoints are there specifically for custom training. Pair them with a domain dataset and you land on production-grade specialist models.

Where it is not a fit: high-volume classification, embedding retrieval, or short-prompt chat. V4-Flash is still overkill for those, and older DeepSeek checkpoints cost less.

Pricing in one line

DeepSeek had not published the final API rate card at the time of writing. V3.2 ran at roughly $0.28 per million input tokens and $0.42 per million output tokens, and the lab has a track record of keeping V-series pricing close to that floor. Expect V4-Flash in the same range and V4-Pro at a modest premium. Closed competitors price at $5 to $15 per million input tokens, so even a 3x jump from V3.2 leaves DeepSeek well below the frontier-API median. Track the live numbers on the DeepSeek pricing page.

How to test V4 today

Three paths, ranked by time-to-first-token.

  1. Web chat. Open chat.deepseek.com and sign in. V4-Pro is the default; toggle to Think High in the UI. Free, no card, works now.
  2. API. Grab a key, point your client at https://api.deepseek.com, set "model": "deepseek-v4-pro", and go. The request shape is OpenAI-compatible, so any existing OpenAI client works with a base-URL swap. Full walkthrough in the DeepSeek V4 API guide.
  3. Local weights. Pull from Hugging Face or ModelScope. V4-Flash runs on 2 to 4 H100s; V4-Pro needs a serious cluster. The inference code lives in the /inference folder of the model repo.

For the full walkthrough including Apidog-based prompt iteration, see how to use DeepSeek V4. To keep spend at zero, see how to use DeepSeek V4 for free. Download Apidog and pre-build your collection; the OpenAI-compatible format means one request works across DeepSeek, OpenAI, and every other frontier API.

FAQ

Is DeepSeek V4 really open source?Yes. All four checkpoints carry an MIT license, which permits commercial use, modification, and redistribution without a separate usage agreement.

Do I need a GPU cluster to run V4-Flash?You need two to four H100s or H200s for V4-Flash at full precision, less if you quantize. V4-Pro needs a genuine cluster. If you want to try V4 without hardware, use the API or chat.deepseek.com.

When does V4 hit the DeepSeek API?It is already live as of April 23, 2026. The model IDs are deepseek-v4-pro and deepseek-v4-flash. The older deepseek-chat and deepseek-reasoner IDs are deprecated July 24, 2026.

How does V4 compare to Kimi and Qwen?V4-Pro posts higher LiveCodeBench and Codeforces numbers than Kimi K2 and Qwen 3 Max on DeepSeek’s reported tables. All three are open-weights MoE systems with similar deployment profiles. Pick based on the benchmark closest to your workload.

Can I fine-tune V4 on my own data?Yes. The Base checkpoints exist for that; pair them with your domain data and a standard SFT pipeline. The MIT license covers commercial redistribution of the resulting model.

Will V4 work with my existing OpenAI-compatible tooling?Yes. The API accepts both OpenAI and Anthropic message formats at https://api.deepseek.com and https://api.deepseek.com/anthropic respectively. Most existing OpenAI clients work with a single base-URL change. See the matching GPT-5.5 API walkthrough for the parallel pattern.

Practice API Design-first in Apidog

Discover an easier way to build and use APIs

What Is DeepSeek V4?