How to Use Kimi K2 in VSCode Copilot

In an era where AI-powered developer tools are no longer a novelty but a necessity, Visual Studio Code's Copilot has firmly established itself as a leader. However, the true power of AI lies in its diversity and the specialized capabilities of different models. What if you could swap out the default engine of your Copilot for something more powerful, more specialized, or even something you run yourself? This article will guide you through the process of integrating the formidable Kimi K2 language model from Moonshot AI into your VSCode Copilot, and we'll do it with a clever tool called Fake Ollama.

This comprehensive guide will walk you through the entire process, from obtaining your API keys to configuring your local environment, and finally, to witnessing the power of a one-trillion-parameter model right inside your favorite editor.

💡

Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!

button

Before we dive into the technical details, let's get acquainted with the key components of this setup.

What is Kimi K2?

Kimi K2 is a state-of-the-art large language model developed by Moonshot AI. It's a Mixture-of-Experts (MoE) model with a staggering one trillion total parameters, with 32 billion active during any given inference.

This architecture allows Kimi K2 to excel in a wide range of tasks, particularly in:

Coding: With impressive scores on benchmarks like LiveCodeBench and SWE-bench, Kimi K2 is a coding powerhouse.
Reasoning: The model demonstrates strong logical and reasoning capabilities, making it an excellent partner for complex problem-solving.
Long-Context Understanding: Kimi K2 can handle a massive context window of up to 128,000 tokens, enabling it to understand and work with large codebases, extensive documentation, and lengthy conversations.

Kimi K2 is available in two main variants:

Kimi-K2-Base: The foundational model, ideal for researchers and developers who want to fine-tune and build custom solutions.
Kimi-K2-Instruct: A fine-tuned version optimized for chat and agentic tasks, making it a perfect drop-in replacement for other instruction-following models.

For our purposes, we'll be using the Instruct model via an API.

What is VSCode Copilot?

If you're reading this article, you're likely already familiar with VSCode Copilot. It's an AI-powered code completion and assistance tool developed by GitHub and OpenAI. It provides intelligent code suggestions, answers coding questions, and can even help you refactor and debug your code. While incredibly powerful out of the box, recent updates have opened the door to using custom models, which is the feature we'll be leveraging.

What is Fake Ollama?

This is the secret sauce that makes our integration possible. Fake Ollama, as the name suggests, is a tool that creates a server that mimics the API of Ollama, a popular platform for running and managing local language models.

Many applications, including the latest versions of VSCode Copilot, have built-in support for the Ollama API. By running Fake Ollama, we can trick VSCode Copilot into thinking it's communicating with a standard Ollama instance, while in reality, our Fake Ollama server is forwarding the requests to the Kimi K2 API. This makes it a versatile bridge, allowing us to connect virtually any model API to any tool that supports Ollama.

Prerequisites

Before we begin, make sure you have the following installed and ready:

Visual Studio Code: The latest version is recommended to ensure compatibility with the Copilot features we'll be using.
VSCode Copilot Extension: You'll need an active Copilot subscription and the extension installed in VSCode.
Python: A recent version of Python (3.8 or higher) is required to run the Fake Ollama server.
Git: You'll need Git to clone the Fake Ollama repository from GitHub.
A Kimi K2 API Key: We'll cover how to get this in the first step.

The Integration: A Step-by-Step Guide

Now, let's get our hands dirty and integrate Kimi K2 into VSCode Copilot.

Step 1: Obtain Your Kimi K2 API Key

You have two primary options for getting a Kimi K2 API key:

Moonshot AI Platform: You can sign up directly on the Moonshot AI platform. This will give you direct access to the Kimi K2 API.
OpenRouter: This is the recommended approach for its flexibility. OpenRouter is a service that provides a unified API for a vast array of AI models, including Kimi K2. By using OpenRouter, you can easily switch between different models without changing your code or API keys.

For this guide, we'll assume you're using OpenRouter. Once you've created an account and obtained your API key, you can interact with the Kimi K2 model using the OpenAI Python library, like so:Python

from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key="YOUR_OPENROUTER_API_KEY",
)

response = client.chat.completions.create(
  model="moonshotai/kimi-k2",
  messages=[
    {"role": "user", "content": "Write a simple Python function to calculate the factorial of a number."},
  ],
)
print(response.choices[0].message.content)

Keep your OpenRouter API key handy; you'll need it for the Fake Ollama configuration.

Step 2: Set Up Fake Ollama

First, you'll need to clone the Fake Ollama repository from GitHub. Open your terminal and run the following command:Bash

git clone https://github.com/spoonnotfound/fake-ollama.git

Next, navigate into the cloned directory and install the required Python dependencies:Bash

cd fake-ollama
pip install -r requirements.txt

Step 3: Configure Fake Ollama for Kimi K2

This is the most crucial step. We need to configure Fake Ollama to use our OpenRouter API key and point to the Kimi K2 model. The configuration will likely be in a .env file or directly in the main Python script. For this guide, we'll assume a .env file for best practices.

Create a file named .env in the fake-ollama directory and add the following lines:

OPENAI_API_BASE=https://openrouter.ai/api/v1
OPENAI_API_KEY=YOUR_OPENROUTER_API_KEY
MODEL_NAME=moonshotai/kimi-k2

By setting these environment variables, the Fake Ollama server will know to forward requests to the OpenRouter endpoint, use your API key for authentication, and specify moonshotai/kimi-k2 as the desired model.

Step 4: Run the Fake Ollama Server

Now, it's time to start the Fake Ollama server. In your terminal, from within the fake-ollama directory, run:Bash

python main.py

If everything is configured correctly, you should see a message indicating that the server is running, typically on http://localhost:11434. This is the local endpoint we'll use in VSCode.

Step 5: Configure VSCode Copilot

The final step is to tell VSCode Copilot to use our local Fake Ollama server instead of the default GitHub Copilot models.

Open VSCode and go to the Copilot Chat view.
In the chat input, type / and select "Select a Model".
Click on "Manage Models...".
In the dialog box that appears, select "Ollama" as the AI provider.
You'll be prompted to enter the Ollama server URL. Enter the address of your local Fake Ollama server: http://localhost:11434.
Next, you'll be asked to select a model. You should see the model you specified in your Fake Ollama configuration (moonshotai/kimi-k2) in the list. Select it.

And that's it! Your VSCode Copilot is now powered by the Kimi K2 model. You can start a new chat session and experience the enhanced coding and reasoning capabilities of this powerful model.

Beyond the API: Using Local Models with vLLM, llama.cpp, and ktransformers

The beauty of the Fake Ollama setup is that it's not limited to API-based models. You can also use it as a front-end for models running locally on your own hardware using powerful inference engines like:

vLLM: An open-source library that significantly speeds up LLM inference and serving.
llama.cpp: A C++ implementation of the LLaMA models, optimized for running on CPUs and a wide range of hardware.
ktranformers: A flexible framework for experimenting with cutting-edge LLM inference optimizations. Notably, ktranformers has announced support for Kimi K2, which means you can run a quantized version of the model locally.

The process is similar: you would first set up and run your desired model using one of these inference engines, which will expose a local API endpoint. Then, you would configure Fake Ollama to point to that local model's endpoint instead of the OpenRouter API. This gives you complete control over your models and data, with the trade-off of requiring more powerful hardware.

Conclusion

By leveraging the flexibility of VSCode Copilot's custom model support and the cleverness of the Fake Ollama tool, you can unlock a new level of AI-assisted development. Integrating Kimi K2 provides a significant boost in coding, reasoning, and long-context understanding, making your Copilot an even more valuable partner.

The world of large language models is constantly evolving, and the ability to easily swap and experiment with different models is a game-changer. Whether you're using a state-of-the-art API like Kimi K2 or running your own models locally, the power to customize your tools is in your hands. Happy coding!

💡

button