DeepSeek V4 shipped on April 23, 2026, and unlike most frontier launches, the free paths are real. The official web chat runs V4-Pro with no credit card. The weights are MIT-licensed and downloadable today. Aggregators like OpenRouter and Chutes typically expose free tiers within days of a DeepSeek release. Add it up, and you can run serious V4 workloads at zero dollars before you ever decide whether to top up an account.
This guide walks through every no-cost path we can verify, which one fits which use case, and how to stand up a production-ready collection in Apidog so the jump to paid billing stays smooth when usage grows.
For the product-level overview, see what is DeepSeek V4. For the full API walkthrough, see how to use the DeepSeek V4 API.
TL;DR
- chat.deepseek.com — free web chat on V4-Pro with Think High and Think Max toggles. No card. Works today.
- Hugging Face weights + your own GPU — MIT license, V4-Flash runs on 2 to 4 H100s, V4-Pro needs a cluster.
- OpenRouter and Chutes free tiers — third-party gateways that usually open free quota on DeepSeek models within a week of launch.
- Hugging Face Inference Providers — a shared, rate-limited endpoint that exposes V4 for early experimentation.
- Kaggle, Colab, and RunPod trial credits — free compute for one-off runs when you want to test self-hosting.
- Every free path caps usage. For production workloads, move to paid billing before the cap bites.

Path 1: chat.deepseek.com (the default free path)
The fastest, most reliable free path is the official chat interface. V4-Pro is the default model; the toggle at the top of the composer switches between Non-Think, Think High, and Think Max reasoning modes.

Setup
- Open chat.deepseek.com.
- Sign in with email, Google, or WeChat.
- Confirm the active model reads V4-Pro.
- Start typing.
What you get
- The full 1M-token context window.
- File upload for PDFs, images, and code bundles.
- Web search on demand.
- All three reasoning modes, including Think Max.
- Conversation history and folders.
What the caps look like
DeepSeek does not publish a hard per-day message count; the free tier is soft-throttled under load. Heavy use can slow responses or queue requests but rarely hard-blocks. If you start seeing persistent rate limits, that is the signal to either slow the cadence or move to the API.
Good tasks for the web UI: testing whether V4 beats Claude on your hardest prompt, pasting a repo tarball for an architectural review, running Think Max against a contract you would otherwise pay a lawyer to read. Bad tasks: anything that needs automation or reproducibility.
Path 2: Self-host V4-Flash on your own GPU
V4-Flash is the MIT-licensed variant most people can realistically self-host. At 284B total and 13B active, a multi-H100 box runs it in FP8 at serious throughput, and an INT4 quantization drops it onto a single 80GB card.
The cost here is hardware, not licensing. If you already have GPU capacity, this is the most durable free path; it cannot be rate-limited, deprecated, or pulled.
Pull the weights
pip install -U "huggingface_hub[cli]"
huggingface-cli login
huggingface-cli download deepseek-ai/DeepSeek-V4-Flash \
--local-dir ./models/deepseek-v4-flash
Expect roughly 500GB at FP8. Reserve disk.
Serve with vLLM
pip install "vllm>=0.9.0"
vllm serve deepseek-ai/DeepSeek-V4-Flash \
--tensor-parallel-size 4 \
--max-model-len 1048576 \
--dtype auto \
--port 8000
Once it is up, point any OpenAI-compatible client at http://localhost:8000/v1. The endpoint accepts the same request shape as the paid DeepSeek API; Apidog sees it as another base URL and all your saved collections work untouched.
Hardware reality check
| Variant | Minimum cards (FP8) | Minimum cards (INT4) | Realistic throughput |
|---|---|---|---|
| V4-Flash | 2 × H100 80GB | 1 × H100 80GB | 50 to 150 tok/s |
| V4-Pro | 16 × H100 80GB | 8 × H100 80GB | cluster-dependent |
If you do not have cards sitting idle, the math usually favors the API over renting GPUs by the hour. The self-hosted path is mostly for teams with existing capacity or hard compliance requirements.
Path 3: OpenRouter free tier
OpenRouter is a request-level gateway that aggregates open-weights and closed models behind one API. The platform routinely opens free tiers on new DeepSeek releases, and the pattern has held for V3, V3.1, and V3.2.

Setup
- Sign up at openrouter.ai.
- Create an API key.
- Check the model catalog for
deepseek/deepseek-v4-proordeepseek/deepseek-v4-flash; the free variants are usually suffixed:free. - Call it with the OpenAI-compatible SDK.
from openai import OpenAI
client = OpenAI(
api_key=OPENROUTER_KEY,
base_url="https://openrouter.ai/api/v1",
)
response = client.chat.completions.create(
model="deepseek/deepseek-v4-flash:free",
messages=[{"role": "user", "content": "Write a Python CLI for semver bumping."}],
)
print(response.choices[0].message.content)
Caps
Free tiers on OpenRouter typically cap at a few hundred requests per day per key and reduce priority under load. Perfect for prototyping, unreliable for production.
Path 4: Hugging Face Inference Providers
Hugging Face runs a hosted inference surface that exposes V4 checkpoints shortly after release. Rate limits are tight and latency varies, but it is free to call.
from huggingface_hub import InferenceClient
client = InferenceClient(model="deepseek-ai/DeepSeek-V4-Flash")
response = client.chat_completion(
messages=[{"role": "user", "content": "Summarize the V4 technical report in 5 bullets."}],
max_tokens=512,
)
print(response.choices[0].message.content)
The HF token is free. For heavier use, upgrade to a Pro account; the rate limits loosen but the cost is still an order of magnitude below the official API for comparable workloads.
Path 5: Trial credits on Colab, Kaggle, RunPod, and Lambda
Every major GPU-rental provider ships trial credits. Used well, they cover one-off V4-Flash experiments without ever spending real money.
- Google Colab. Free T4 tier is too small for V4. Colab Pro+ gives 500 compute units per month, enough for a handful of V4-Flash experiments on an A100.
- Kaggle. Free weekly GPU hours on T4 and P100. Too small for V4-Pro, sometimes enough for quantized V4-Flash experiments.
- RunPod. $10 trial credit covers a few hours on an H100. Enough to spin up vLLM, run a benchmark suite, and tear it down.
- Lambda. Occasional free-hour promos on H100 and H200; watch the signup page for active offers.
None of these are long-term free paths. They work well for a bounded experiment and nothing else.
Build a provider-agnostic Apidog collection
The practical payoff of this many free paths is that you can test the same prompt across all of them without duplicating work. The workflow:
- Download Apidog.
- Create one collection with four environments:
chat(placeholder),deepseek(https://api.deepseek.com/v1),openrouter(https://openrouter.ai/api/v1),self-hosted(http://localhost:8000/v1). - Save a single POST request to
{{BASE_URL}}/chat/completions. - Store each provider’s key as a secret variable so the request body is identical across environments.
- Flip environments to A/B the same prompt across every backend.
This is the same pattern used for the GPT-5.5 free-tier collection; one tool, every provider, no duplicated work.
Which free path should you pick?
Four heuristics cover most decisions.
- I want to form an opinion in five minutes. Use chat.deepseek.com.
- I want to prototype a product. Use OpenRouter’s free tier until you hit the cap, then top up on DeepSeek.
- I have GPUs and a compliance story. Self-host V4-Flash on vLLM.
- I need long-term free usage. No such thing. Every hosted free tier caps somewhere. Pair chat.deepseek.com for interactive work with a modest paid top-up for automation.
When to move off free
Three signals say you have outgrown the free tier.
- You are rate-limited more than once a day. That means the workload is big enough to deserve a budget.
- You need SLAs. Free tiers do not carry them. The official API does.
- You need to log, audit, or pass compliance. The paid API returns clear billing records; most aggregator free tiers do not.
When any of those hit, move to the official API. The minimum top-up is $2 and the per-token pricing is the lowest in the frontier tier.
FAQ
Is chat.deepseek.com really free?Yes. No credit card, no trial clock. The service is soft-throttled but not paywalled.
Do I need a Hugging Face account to download the weights?Technically no, the repo is public. Practically yes; a logged-in account gives you better rate limits on the download.
Which free path runs the real V4-Pro?chat.deepseek.com runs the full V4-Pro. OpenRouter free tiers more often carry V4-Flash. If you need V4-Pro output and do not want to pay, the web chat is the reliable path.
Can I put a free tier behind a product?Not responsibly. Free tiers rate-limit, change terms, and sometimes disappear. If you are shipping V4 to customers, use the paid API or self-host.
Is self-hosting actually free?The license is free. The hardware is not. If you already own GPU capacity, the marginal cost is electricity. If you rent, the math usually loses to the paid API.
Will there be an Apidog free tier for testing?Apidog itself is free to use for API design and testing; it only costs credits when you hit paid APIs through it. So yes, you can combine a free Apidog workspace with chat.deepseek.com or OpenRouter for a fully free workflow.
