How to Use the Hy3 Preview API for Free ?

Tencent open-sourced Hy3 Preview on April 22, 2026, and within a day OpenRouter listed it as a fully free endpoint. No credit card, no token metering, no trial window. You can call the same 295B-parameter Mixture-of-Experts model that powers Tencent’s Yuanbao app and CodeBuddy assistant from your own code, today, for zero dollars.

This guide shows how to use the Hy3 Preview API for free through OpenRouter, the Hugging Face Space, and the raw Hy3 repo. It also covers the reasoning modes that make Hy3 different from most 2026 open models, and how to test the API inside Apidog without writing throwaway scripts.

button

If you want the fastest route to your first response, jump to “Step-by-step: call Hy3 Preview free on OpenRouter.”

TL;DR

Hy3 Preview is free on OpenRouter under the model ID tencent/hy3-preview:free with $0 input and $0 output pricing.
It’s a Mixture-of-Experts model: 295B total parameters, 21B active, 192 experts with top-8 routing, and a 256K-token context window.
Three reasoning modes ship built in: no_think for fast answers, low, and high for deep chain-of-thought on agent and coding tasks.
Benchmarks are strong for an open-weights model: SWE-bench Verified 74.4, Terminal-Bench 2.0 54.4, GPQA Diamond 87.2, MMLU 87.42.
You can run it three free ways: the OpenRouter free tier, the Hugging Face Hy3-preview Space, or local inference with vLLM and the open weights.
Apidog pairs well with the OpenRouter endpoint because Hy3 uses the OpenAI Chat Completions schema; point a request at OpenRouter and go.

What is Hy3 Preview?

Hy3 Preview is the first flagship release from Tencent’s restructured Hunyuan foundation-model team, now led by Yao Shunyu, a former OpenAI researcher the company hired to push its reasoning stack. Let's frame it as Tencent’s most capable model yet and a direct answer to the top Chinese open-weights releases from DeepSeek, Alibaba, and Zhipu.

The technical profile from the official model card is agent-first:

Architecture: Mixture-of-Experts, 80 layers plus one MTP layer, 64 attention heads with grouped-query attention.
Parameters: 295B total, 21B active per forward pass.
Experts: 192 specialists with top-8 routing per token.
Context: 256K tokens (262,144 on OpenRouter’s listing).
Tokenizer: 120,832-entry vocabulary with BF16 precision.
License: Tencent Hy Community License, commercial use allowed within the license terms.

What sets it apart from a generic 200B-range MoE is the agentic training. Tencent rebuilt its RL infrastructure for multi-turn tool use, and the published scores on SWE-bench Verified, Terminal-Bench 2.0, and the internal WildClawBench suite land it close to the top closed models on code and shell tasks.

Three free ways to use Hy3 Preview

You have three paths depending on whether you want a chat UI, an API, or local weights.

Path	What it is	Free?	Good for
OpenRouter `tencent/hy3-preview:free`	Hosted OpenAI-compatible API	Yes, $0 in/out	Building agents, scripts, and backend features
Hugging Face Space	Browser chat demo	Yes	Quick prompts, kicking the tires, smoke tests
Self-hosted weights (vLLM / SGLang)	Run the open weights on your own GPUs	Free software, hardware cost applies	Privacy-sensitive workloads, high volume

Most developers will want the OpenRouter route. It is the shortest path from signup to a working API call, and the rate limits on the free tier are generous enough for prototyping.

Step-by-step: call Hy3 Preview free on OpenRouter

Here is the minimal path from zero to a working tencent/hy3-preview:free response.

Create an OpenRouter account. Sign up at openrouter.ai. Email is enough; no payment method required for free-tier models.
Generate an API key. In the OpenRouter dashboard, open “Keys” and create a new key. Copy it into an environment variable, for example export OPENROUTER_API_KEY=sk-or-....
Open the model page. Go to the Hy3 Preview free listing and confirm the status banner reads “Free.” You will also see usage stats there; at launch the endpoint was handling 6.81B prompt tokens per day across all users.

Send your first request. OpenRouter exposes the OpenAI Chat Completions schema, so any OpenAI SDK works:

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tencent/hy3-preview:free",
    "messages": [
      {"role": "user", "content": "Explain the MoE routing decision inside a top-8 of 192 setup in 3 sentences."}
    ],
    "temperature": 0.9,
    "top_p": 1.0
  }'

Turn on reasoning when you need it. Hy3 accepts a reasoning parameter with effort set to low or high. OpenRouter returns the thinking trace in a separate reasoning_details array, billed as its own token bucket:

{
  "model": "tencent/hy3-preview:free",
  "messages": [
    {"role": "user", "content": "Plan, then write a Bash script that rotates daily log files older than 30 days into a dated archive folder."}
  ],
  "reasoning": {"effort": "high"}
}

Iterate. Keep the session in the same thread if you want the model to build on earlier context; Hy3’s 256K window handles most full codebases end to end.

That is the whole flow. The model you are calling is the same one published on Hugging Face; quality on the OpenRouter free tier is identical to the paid routes on other providers.

Free, Plus, and self-host: where they differ

Free is not the only path, and it helps to see the real diff before you commit to one.

Capability	OpenRouter Free	OpenRouter Paid (non-free endpoints)	Self-hosted (vLLM / SGLang)
Per-token cost	$0	Per provider	Electricity plus GPU amortization
Reasoning modes	`no_think`, `low`, `high`	Same	Same
Context length	256K	256K	256K (memory permitting)
Throughput under load	Shared pool, deprioritized under demand	Dedicated	Whatever your cluster serves
Rate limits	OpenRouter free-tier cap (flexes)	Provider-specific	None
Data retention	OpenRouter logging policy	Provider-specific	Stays on your hardware
Reasoning token visibility	Yes, via `reasoning_details`	Yes	Yes

Free is the right choice for prototypes, side projects, evaluation benchmarks, and low-traffic agents. Paid or self-hosted makes sense the moment latency matters or you exceed the rate cap.

Prompt and parameter tips that get more out of Hy3

Hy3 rewards explicit setup more than smaller models. A few habits help.

Match temperature to the mode. The model card recommends temperature=0.9 and top_p=1.0 as the default. Drop to 0.3 for structured output, stay at 0.9 for creative work.
Use no_think for everyday chat. The default reasoning mode is off for a reason; you only need low or high for planning, multi-step code, or math. Running high on a one-line question wastes reasoning tokens.
Name the tools in the system prompt. Hy3 was trained for tool use with a specific parser (hy_v3). Even on OpenRouter you get better calls when the system prompt describes each tool’s job instead of relying on schema alone.
Quote code, do not summarize it. The 256K window lets you paste whole files. Paste the file, then ask the question; do not ask the model to imagine the code.
Batch multi-file edits. Hy3’s SWE-bench Verified score of 74.4 comes from editing several files coherently. Give it the full set in one message rather than dripping them in one at a time.
Ask for a plan first. For agentic tasks, a two-step pattern (“draft a plan, wait for my confirmation, then execute”) consistently produces cleaner results than one-shot prompts.

Limits worth knowing before you ship

A few gotchas will trip you up if you skip them.

Rate limits flex with load. OpenRouter’s free tier shares capacity across all free users. At launch, daily prompt volume was already 6.81B tokens; peak-hour calls can see 429s. Build retries with exponential backoff.
Reasoning tokens count as output. reasoning_details are free on the OpenRouter free tier, but on paid routes they bill as output. Do not ship effort: "high" defaults to a revenue-sensitive product without measuring.
The license is not Apache 2.0. The Tencent Hy Community License allows commercial use but carries usage-policy and attribution clauses; read the full license on the GitHub repo before you embed Hy3 in a product.
Tool calling requires the right parser. If you self-host, run vLLM or SGLang with --tool-call-parser hy_v3 (or hunyuan for SGLang). Without it, tool calls come back as plain text.
English and Chinese are first class; other languages are second. The C-Eval 89.80 and CMMLU 89.61 scores show strong Chinese. Other languages are supported via MMMLU but drop off in quality.
It lags the top US flagships on some reasoning benchmarks. HLE sits at 30, and the SCMP coverage notes Hy3 is on par with top Chinese models but still behind OpenAI and Google DeepMind’s current flagships on the hardest reasoning suites.

The developer fast path: Hy3 Preview plus Apidog

Command-line curl is fine for a demo. For real iteration, a visual API client saves hours.

Open Apidog and create a new project. Import the OpenAI Chat Completions OpenAPI spec; OpenRouter uses the same schema.
Set the base URL to https://openrouter.ai/api/v1 and add an environment variable for OPENROUTER_API_KEY.
Create a request that hits /chat/completions with the model set to tencent/hy3-preview:free.
Fork the request to compare reasoning modes. Apidog lets you duplicate a request and tweak one parameter, so you can run the same prompt with no_think, low, and high side by side and inspect the latency and output diff.
Save prompt templates. Agentic prompts get long. Apidog’s environment and variable system keeps system prompts, tool schemas, and user turns separated so you can reuse them across tests.

If you are coming off Postman, the shift is quick; our API testing without Postman in 2026 guide covers the migration. Teams that live in their editor can run the same workflow inside VS Code with Apidog inside VS Code, which keeps prompt tuning next to the code that consumes the output.

Free alternatives when you hit the cap

If the OpenRouter free pool throttles you during peak hours, two paths worth trying first.

Hugging Face Space. The Hy3-preview Space hosts a browser chat demo. It is not scriptable, but it is free and useful for quick comparisons.
Other free Chinese open-weights models. Alibaba’s Qwen 3.5 Omni ships a free tier with strong multimodal output; see our Qwen 3.5 Omni announcement and how-to companion for setup. Zhipu GLM 5V Turbo is another option with a generous free tier; the GLM 5V Turbo API guide has the full walk-through.

None of these match Hy3’s SWE-bench and Terminal-Bench numbers for agentic coding, but they cover chat, multilingual, and multimodal use cases the free Hy3 tier does not prioritize. For a production build, Download Apidog and set up one collection per model; side-by-side benchmarks on your actual prompts beat reading any leaderboard.

button

Self-hosting Hy3 Preview with vLLM

If you have the hardware, local inference is the fourth free path. The model card recommends vLLM with tensor parallelism of 8 and multi-token prediction enabled for speculative decoding:

vllm serve tencent/Hy3-preview \
  --tensor-parallel-size 8 \
  --speculative-config.method mtp \
  --speculative-config.num_speculative_tokens 1 \
  --tool-call-parser hy_v3 \
  --reasoning-parser hy_v3 \
  --enable-auto-tool-choice \
  --served-model-name hy3-preview

The equivalent SGLang command uses --tool-call-parser hunyuan and --reasoning-parser hunyuan. Once the server is up at http://localhost:8000/v1, any OpenAI SDK points at it the same way it would point at OpenRouter; only the base URL and key change.

Expect eight H100-class GPUs at BF16 for the full model. Quantized community builds will appear, but at launch the official path is full precision.

FAQ

Is Hy3 Preview free?Yes. OpenRouter lists tencent/hy3-preview:free with $0 per million input tokens and $0 per million output tokens. Reasoning tokens on the free tier are also free, though they count against rate limits. Confirm the current status on the OpenRouter model page before depending on it for production.

How does Hy3 Preview compare to DeepSeek V3 and Qwen 3?Hy3 Preview’s SWE-bench Verified score of 74.4 and Terminal-Bench 2.0 of 54.4 put it in the same tier as the top Chinese open models, with a clear agent and tool-use tilt. For pure chat, Qwen 3 and DeepSeek V3 are competitive; for agent and coding workflows, Hy3’s RL-trained tool use is the differentiator.

What are Hy3’s reasoning modes?Three: no_think (default, direct answer), low, and high. Switch them through the reasoning parameter on OpenRouter or via chat_template_kwargs={"reasoning_effort": "high"} when calling the model directly. Use high for planning, multi-step code, and math; leave it off for chat.

Can I use Hy3 Preview commercially?Yes, under the Tencent Hy Community License. The license permits commercial use with attribution and usage-policy compliance. Read the full terms on the Hy3 GitHub repo before deploying it in a revenue-generating product.

What context length does the free tier support?256K tokens end to end. OpenRouter’s listing shows 262,144 tokens, matching the model card. You can paste an entire mid-size codebase and still have room for tool schemas and conversation history.

How do I test Hy3 Preview without writing code?Use the Hugging Face Space for a browser chat demo, or point Apidog at the OpenRouter endpoint. Apidog imports the OpenAI OpenAPI spec, so configuring the request is three fields: base URL, API key, and model name.