How to Use Kimi K2 in VSCode Copilot with Fake Ollama

Lynn Mikami

Lynn Mikami

13 July 2025

How to Use Kimi K2 in VSCode Copilot with Fake Ollama

In an era where AI-powered developer tools are no longer a novelty but a necessity, Visual Studio Code's Copilot has firmly established itself as a leader. However, the true power of AI lies in its diversity and the specialized capabilities of different models. What if you could swap out the default engine of your Copilot for something more powerful, more specialized, or even something you run yourself? This article will guide you through the process of integrating the formidable Kimi K2 language model from Moonshot AI into your VSCode Copilot, and we'll do it with a clever tool called Fake Ollama.

This comprehensive guide will walk you through the entire process, from obtaining your API keys to configuring your local environment, and finally, to witnessing the power of a one-trillion-parameter model right inside your favorite editor.

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!
button

Before we dive into the technical details, let's get acquainted with the key components of this setup.

What is Kimi K2?

Kimi K2 Benchmarks

Kimi K2 is a state-of-the-art large language model developed by Moonshot AI. It's a Mixture-of-Experts (MoE) model with a staggering one trillion total parameters, with 32 billion active during any given inference.

Kimi K2 Architecture

This architecture allows Kimi K2 to excel in a wide range of tasks, particularly in:

Kimi K2 is available in two main variants:

For our purposes, we'll be using the Instruct model via an API.

What is VSCode Copilot?

If you're reading this article, you're likely already familiar with VSCode Copilot. It's an AI-powered code completion and assistance tool developed by GitHub and OpenAI. It provides intelligent code suggestions, answers coding questions, and can even help you refactor and debug your code. While incredibly powerful out of the box, recent updates have opened the door to using custom models, which is the feature we'll be leveraging.

What is Fake Ollama?

This is the secret sauce that makes our integration possible. Fake Ollama, as the name suggests, is a tool that creates a server that mimics the API of Ollama, a popular platform for running and managing local language models.

Many applications, including the latest versions of VSCode Copilot, have built-in support for the Ollama API. By running Fake Ollama, we can trick VSCode Copilot into thinking it's communicating with a standard Ollama instance, while in reality, our Fake Ollama server is forwarding the requests to the Kimi K2 API. This makes it a versatile bridge, allowing us to connect virtually any model API to any tool that supports Ollama.


Prerequisites

Before we begin, make sure you have the following installed and ready:


The Integration: A Step-by-Step Guide

Now, let's get our hands dirty and integrate Kimi K2 into VSCode Copilot.

Step 1: Obtain Your Kimi K2 API Key

You have two primary options for getting a Kimi K2 API key:

  1. Moonshot AI Platform: You can sign up directly on the Moonshot AI platform. This will give you direct access to the Kimi K2 API.
  2. OpenRouter: This is the recommended approach for its flexibility. OpenRouter is a service that provides a unified API for a vast array of AI models, including Kimi K2. By using OpenRouter, you can easily switch between different models without changing your code or API keys.

For this guide, we'll assume you're using OpenRouter. Once you've created an account and obtained your API key, you can interact with the Kimi K2 model using the OpenAI Python library, like so:Python

from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key="YOUR_OPENROUTER_API_KEY",
)

response = client.chat.completions.create(
  model="moonshotai/kimi-k2",
  messages=[
    {"role": "user", "content": "Write a simple Python function to calculate the factorial of a number."},
  ],
)
print(response.choices[0].message.content)

Keep your OpenRouter API key handy; you'll need it for the Fake Ollama configuration.

Step 2: Set Up Fake Ollama

First, you'll need to clone the Fake Ollama repository from GitHub. Open your terminal and run the following command:Bash

git clone https://github.com/spoonnotfound/fake-ollama.git

Next, navigate into the cloned directory and install the required Python dependencies:Bash

cd fake-ollama
pip install -r requirements.txt

Step 3: Configure Fake Ollama for Kimi K2

This is the most crucial step. We need to configure Fake Ollama to use our OpenRouter API key and point to the Kimi K2 model. The configuration will likely be in a .env file or directly in the main Python script. For this guide, we'll assume a .env file for best practices.

Create a file named .env in the fake-ollama directory and add the following lines:

OPENAI_API_BASE=https://openrouter.ai/api/v1
OPENAI_API_KEY=YOUR_OPENROUTER_API_KEY
MODEL_NAME=moonshotai/kimi-k2

By setting these environment variables, the Fake Ollama server will know to forward requests to the OpenRouter endpoint, use your API key for authentication, and specify moonshotai/kimi-k2 as the desired model.

Step 4: Run the Fake Ollama Server

Now, it's time to start the Fake Ollama server. In your terminal, from within the fake-ollama directory, run:Bash

python main.py

If everything is configured correctly, you should see a message indicating that the server is running, typically on http://localhost:11434. This is the local endpoint we'll use in VSCode.

Step 5: Configure VSCode Copilot

The final step is to tell VSCode Copilot to use our local Fake Ollama server instead of the default GitHub Copilot models.

  1. Open VSCode and go to the Copilot Chat view.
  2. In the chat input, type / and select "Select a Model".
  3. Click on "Manage Models...".
  4. In the dialog box that appears, select "Ollama" as the AI provider.
  5. You'll be prompted to enter the Ollama server URL. Enter the address of your local Fake Ollama server: http://localhost:11434.
  6. Next, you'll be asked to select a model. You should see the model you specified in your Fake Ollama configuration (moonshotai/kimi-k2) in the list. Select it.

And that's it! Your VSCode Copilot is now powered by the Kimi K2 model. You can start a new chat session and experience the enhanced coding and reasoning capabilities of this powerful model.


Beyond the API: Using Local Models with vLLM, llama.cpp, and ktransformers

The beauty of the Fake Ollama setup is that it's not limited to API-based models. You can also use it as a front-end for models running locally on your own hardware using powerful inference engines like:

The process is similar: you would first set up and run your desired model using one of these inference engines, which will expose a local API endpoint. Then, you would configure Fake Ollama to point to that local model's endpoint instead of the OpenRouter API. This gives you complete control over your models and data, with the trade-off of requiring more powerful hardware.


Conclusion

By leveraging the flexibility of VSCode Copilot's custom model support and the cleverness of the Fake Ollama tool, you can unlock a new level of AI-assisted development. Integrating Kimi K2 provides a significant boost in coding, reasoning, and long-context understanding, making your Copilot an even more valuable partner.

The world of large language models is constantly evolving, and the ability to easily swap and experiment with different models is a game-changer. Whether you're using a state-of-the-art API like Kimi K2 or running your own models locally, the power to customize your tools is in your hands. Happy coding!

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!
button

Explore more

How to Use Kimi K2 with Claude Code

How to Use Kimi K2 with Claude Code

Something crazy just happened for the AI Coding Game. I was scrolling through my feed when I stumbled upon a thread discussing a wild new setup: developers are running Moonshot AI's brand-new Kimi K2 model directly within Claude's coding environment. At first, it sounded like a weird fan-fiction crossover, but the more I read, the more I realized it was not only possible but game-changing. I managed to get my hands on a test key, and after a few hours of tinkering, I can confirm: this combinati

12 July 2025

Kimi-K2: A Quick Look

Kimi-K2: A Quick Look

💡Want a great API Testing tool that generates beautiful API Documentation? Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity? Apidog delivers all your demands, and replaces Postman at a much more affordable price!button Unveiling Kimi-K2-Base: The Foundation for Open Agentic Intelligence A new Open Source Model has emerged from Moonshot AI, promising not just to answer questions, but to actively perform tasks. This is Kimi K2, a s

11 July 2025

Top 6 Free Screenshot APIs

Top 6 Free Screenshot APIs

Explore the top 6 free screenshot APIs for modern developers. Learn about their features, free plans, and how Apidog can help you document and test your screenshot API integrations with ease.

11 July 2025

Practice API Design-first in Apidog

Discover an easier way to build and use APIs