What is Kimi K2.7 Code?

Apidog for Enterprise

On-Premises Deploy

SSO & RBAC

SOC 2 Compliant

Moonshot AI just shipped Kimi K2.7 Code, an open-weight model built specifically for writing software and running coding agents. It keeps the trillion-parameter scale of the Kimi K2 line, adds vision, and trims the thinking-token bill that made earlier agent runs expensive. If you’ve used Kimi K2.6 or its API, this is the coding-tuned successor; it also ships with a terminal agent called Kimi Code that competes head-on with Claude Code and Codex.

Here’s what the model actually is, what changed, how it scores, and where you can run it today.

TL;DR

Kimi K2.7 Code is a Mixture-of-Experts model: 1 trillion total parameters, 32 billion active per token.
It adds a 256K token context window, vision (image and video input via a MoonViT encoder), and roughly 30% fewer thinking tokens than K2.6 on the same work.
Weights are public under a modified MIT license; you can self-host with vLLM, SGLang, or KTransformers.
On Moonshot’s reported benchmarks it sits just behind GPT-5.5 and Claude Opus on coding and agentic tasks; the pitch is open weights plus cost, not topping the chart.
It ships with Kimi Code, a terminal-and-IDE coding agent, and an OpenAI-compatible API you can test in minutes with Apidog.

Kimi K2.7 Code in one paragraph

Kimi K2.7 Code is the coding-specialized release of Moonshot AI’s K2 family. It uses the same sparse Mixture-of-Experts design as recent Kimi models, so only a fraction of its weights fire on any given token. The “Code” suffix is the point: Moonshot tuned this checkpoint for software development, multi-step tool calls, and long agent sessions rather than general chat. The headline upgrades over K2.6 are native multimodal input, a leaner reasoning budget, and tighter integration with Moonshot’s own agent framework. You can use it through the Kimi web app, the Kimi Code CLI, a hosted API, or by downloading the weights from Hugging Face.

What changed from Kimi K2.6

If you already read our Kimi K2.6 explainer, three differences matter most.

It’s tuned for code and agents first. K2.6 was a strong generalist. K2.7 Code narrows the focus to coding workflows: refactoring, debugging, codebase exploration, and chaining tool calls without losing the plot halfway through a task.

Thinking is cheaper. Moonshot reports about a 30% reduction in thinking-token usage versus K2.6 for comparable results. Reasoning tokens are billed tokens, so a 30% cut lands directly on your agent-run cost and latency. Over a long coding session with hundreds of tool calls, that adds up fast.

It sees. K2.7 Code ships with a 400M-parameter MoonViT vision encoder, so it reads screenshots, diagrams, and video frames. That matters for agents that need to look at a failing UI, a stack-trace screenshot, or a design mock before acting.

Inside the architecture

The shape of the model explains both its capability and its low serving cost.

Spec	Kimi K2.7 Code
Total parameters	1 trillion
Active parameters per token	32 billion
Experts	384 total, 8 selected per token
Layers	61 (1 dense)
Attention	Multi-head Latent Attention (MLA)
Context window	256K tokens
Vision encoder	MoonViT, 400M parameters
License	Modified MIT

The Mixture-of-Experts setup is why a “1 trillion parameter” model is practical to run. A router picks 8 of 384 experts for each token, so you pay compute for 32 billion active parameters, not the full trillion. You get the knowledge capacity of a huge model with the per-token cost closer to a mid-size one.

Multi-head Latent Attention keeps the key-value cache small, which is what makes a 256K context window affordable to serve. Long context is the part developers feel: you can drop a whole service, its tests, and its config into one prompt and ask for a change that respects all of it.

The benchmarks, read honestly

Moonshot published scores against GPT-5.5 and Claude Opus across coding and agentic suites. The pattern is consistent: K2.7 Code is competitive and close, but it doesn’t top the closed frontier on most tasks.

Coding

Benchmark	Kimi K2.7 Code	GPT-5.5	Claude Opus
Kimi Code Bench v2	62.0	69.0	67.4
Program Bench	53.6	69.1	63.8
MLS Bench Lite	35.1	35.5	42.8

Agentic and tool use

Benchmark	Kimi K2.7 Code	GPT-5.5	Claude Opus
Kimi Claw 24/7	46.9	52.8	50.4
MCP Atlas	76.0	79.4	81.3
MCP Mark Verified	81.1	92.9	76.4

Two caveats keep this fair. First, several of these suites are Moonshot’s own, so treat them as the vendor’s framing, not a neutral leaderboard. Second, the story isn’t “Kimi wins.” It’s “an open-weight model you can download and self-host lands within a few points of models you can only rent.” On MCP Mark Verified it even edges out Claude Opus. For a lot of real work, a model that’s 90% as good but open and cheaper is the better trade. If raw coding ceiling is your only metric, our DeepSeek V4 vs Claude Opus comparison covers the closed-vs-open gap in more depth.

Why the efficiency gain matters

Agentic coding burns tokens in a loop: read files, reason, call a tool, read the result, reason again. Most of that spend is reasoning, not output. Cutting thinking tokens by ~30% does two things at once. It lowers the per-task bill, and it shortens the wall-clock time for each step because the model writes less before acting. If you’ve watched a coding agent stall while it “thinks,” you know why that’s worth more than a benchmark point. For more ways to cut that bill, see our guide on reducing agent token costs from the CLI.

Kimi Code: the agent that ships with the model

K2.7 Code isn’t just a checkpoint. Moonshot built Kimi Code, a terminal-native coding agent designed around the model’s strengths: preserved thinking, interleaved reasoning, and multi-step tool calls. It writes and edits files, runs shell commands, searches your codebase, fetches web content, and spawns sub-agents for parallel work. You install it with one command:

curl -fsSL https://code.kimi.com/kimi-code/install.sh | bash

Then run kimi in any project directory. There’s a VS Code extension too, plus JetBrains and Zed support through the ACP protocol. We cover the full setup, slash commands, and first-run workflow in a dedicated walkthrough; if you’ve used the older Kimi CLI, the new agent is a ground-up rebuild, not a reskin.

Where Kimi K2.7 Code lives

You have four ways to reach the model.

Kimi web app and Kimi App. Chat access for quick questions and prototyping, no setup.

Kimi Code CLI. The terminal agent above, for hands-on coding inside your repo.

API. An OpenAI-compatible endpoint on the Moonshot platform. Use the model id kimi-k2.7-code and point your existing OpenAI client at https://api.moonshot.ai/v1. Because it’s OpenAI-compatible, it drops into tools like Claude Code, Cursor, and Cline with a base-URL swap. (The flat-rate Kimi Code subscription uses a separate id, kimi-for-coding.)

Open weights. Download from Hugging Face and self-host. Moonshot recommends vLLM, SGLang, or KTransformers for serving. This is the route if you need data to stay on your own hardware.

How to test the Kimi K2.7 Code API in Apidog

Before you wire the model into an agent, it helps to see raw requests and responses. Apidog gives you a visual workspace to do that without writing a client.

Open Apidog and create a new HTTP request.
Set the method to POST and the URL to https://api.moonshot.ai/v1/chat/completions.
Add an Authorization: Bearer <your-key> header. Grab a key from the Kimi platform console.
In the body, send an OpenAI-style payload with "model": "kimi-k2.7-code" and a messages array.
Send the request and read the response. Apidog formats the JSON, shows token usage, and lets you save the call as a reusable test.

From there you can build a small test scenario: assert the response status, check that usage.completion_tokens stays under a budget, and run it on every model update to catch regressions. Because the endpoint is OpenAI-compatible, the same setup works for any model on the Kimi platform. If you’re testing the model’s tool-calling through MCP, our MCP server testing playbook walks through the assertions that matter. Download Apidog to follow along.

Who should pick Kimi K2.7 Code

Pick it if you’re building:

Coding agents where token cost and latency decide whether the product is viable.
Tools that need long context: whole-repo edits, large refactors, multi-file reasoning.
Anything that must run on your own infrastructure for privacy or compliance, since the weights are open.
Multimodal coding workflows that read screenshots, diagrams, or video.

Stick with a closed frontier model if you need:

The absolute highest single-shot coding score, where a few benchmark points justify the price.
A managed SLA and support contract rather than self-hosting.

For a broader view of the open-weight field, our MiniMax M3 vs DeepSeek V4 vs Qwen 3.7 comparison puts Kimi’s rivals side by side.

FAQ

Is Kimi K2.7 Code open source? The weights are public under a modified MIT license, so you can download, run, and fine-tune them. Read the license terms on the model card before commercial use.

How big is the context window? 256K tokens. That’s enough for a full service plus its tests in a single prompt.

Can I run it locally? Yes. Moonshot recommends vLLM, SGLang, or KTransformers. The full weights are large (trillion-parameter scale), so plan for serious GPU memory or a quantized build.

What’s the model id for the API? Use kimi-k2.7-code on the Moonshot API (https://api.moonshot.ai/v1); the flat-rate Kimi Code subscription uses kimi-for-coding. The endpoint is OpenAI-compatible, so most existing clients work with a base-URL change.

How is it different from regular Kimi K2.6? It’s tuned specifically for coding and agents, adds vision, and uses about 30% fewer thinking tokens for comparable results.

Does it support tool calling and MCP? Yes. It’s built for interleaved reasoning and multi-step tool calls, and Kimi Code supports the Model Context Protocol.

Is it free? You can chat in the Kimi app at no cost, and the weights are free to download. API and Kimi Code agent usage run on subscription plans with quota limits.

Summary

Kimi K2.7 Code is Moonshot’s bet that open weights plus low cost beat chasing the top of the benchmark chart. It’s a 1T-parameter MoE model with 32B active, a 256K context window, vision, and a ~30% lighter reasoning budget than K2.6. It won’t beat GPT-5.5 or Claude Opus on most coding suites, but it gets close while staying downloadable and cheaper to run, and it ships with a capable terminal agent. If you’re building coding tools where cost and control matter as much as raw quality, it’s worth a real test. Start by sending a request through Apidog to see how the API behaves, then decide whether to host it yourself.

button

In this article

TL;DR Kimi K2.7 Code in one paragraph What changed from Kimi K2.6 Inside the architecture The benchmarks, read honestly Why the efficiency gain matters Kimi Code: the agent that ships with the model Where Kimi K2.7 Code lives How to test the Kimi K2.7 Code API in Apidog Who should pick Kimi K2.7 Code FAQ Summary

Apidog: A Real Design-first API Development Platform

API Design

API Documentation

API Debugging

Automated Testing

API Mocking

More

Get Started for Free

Enterprise

On-Premises or SaaS or EU-hosted

SSO, RBAC & audit logs

SOC 2, GDPR, ISO 27001

Explore Apidog Enterprise

Explore more

12 CI/CD Best Practices for Automated API Testing

12 CI/CD best practices for automated API testing that survive real pipelines: portable run commands, real assertions, deterministic tests, JUnit reports, and merge gates with the Apidog CLI.

15 June 2026

15 Best Continuous Integration Tools for API Teams (2026 Comparison)

Compare the 15 best continuous integration tools for API teams in 2026, from GitHub Actions and Jenkins to GitLab CI/CD, plus how to run API tests in any pipeline.

15 June 2026

Fable 5 Is Down for Everyone: Inside Anthropic's Government-Ordered Suspension

Anthropic suspended Fable 5 and Mythos 5 worldwide after a US government export-control directive. What happened, why, and how to make your API stack survive a model going dark.

13 June 2026