Supercharge VSCode Copilot with Kimi K2 AI Using Fake Ollama

Learn how to connect Kimi K2, one of the world’s most advanced AI language models, to VSCode Copilot using Fake Ollama. Step-by-step instructions for API developers seeking smarter coding assistance and ultimate workflow flexibility.

Lynn Mikami

Lynn Mikami

30 January 2026

Supercharge VSCode Copilot with Kimi K2 AI Using Fake Ollama

AI coding assistants have become essential for fast, high-quality software development. GitHub Copilot is a leader in this space for Visual Studio Code users, but what if you could upgrade its intelligence with one of the world’s most advanced language models—Kimi K2 from Moonshot AI? This guide shows API developers and engineering teams how to seamlessly connect Kimi K2 to VSCode Copilot using a clever proxy tool: Fake Ollama.

By following this tutorial, you’ll:

💡 Need robust API testing and documentation? Apidog generates beautiful API documentation, empowers collaborative teams with an all-in-one platform, and is a cost-effective Postman alternative!

button

Why Connect Kimi K2 to VSCode Copilot?

Meet Kimi K2: Moonshot AI’s Flagship Model

Kimi K2 sets a new benchmark for large language models, leveraging a Mixture-of-Experts (MoE) architecture with a staggering one trillion parameters (32 billion active per inference).

Image

Kimi K2 excels in:

Model options:

In this guide, we’ll use the Instruct model through an API.


What Is VSCode Copilot (and Why Use Custom Models)?

VSCode Copilot, built by GitHub and OpenAI, accelerates development with code suggestions, explanations, and refactoring help. Recent updates let you swap its default model for any LLM accessible via the Ollama API—unlocking custom model choices like Kimi K2.


What Is Fake Ollama (and Why Does It Matter)?

Image

Fake Ollama is an open-source server that mimics the Ollama API. Many dev tools (including VSCode Copilot) natively support Ollama endpoints. With Fake Ollama, you can:

This flexibility lets your team control which model powers AI coding assistance—crucial for enterprises with security or performance requirements.


Prerequisites

Before you begin, make sure you have:


Step-by-Step: Integrate Kimi K2 with VSCode Copilot

1. Get Your Kimi K2 API Key

You can access Kimi K2 through:

Example: Using OpenRouter with the OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key="YOUR_OPENROUTER_API_KEY",
)

response = client.chat.completions.create(
  model="moonshotai/kimi-k2",
  messages=[
    {"role": "user", "content": "Write a simple Python function to calculate the factorial of a number."},
  ],
)
print(response.choices[0].message.content)

Keep your API key ready for configuration in later steps.


2. Clone and Set Up Fake Ollama

In your terminal:

git clone https://github.com/spoonnotfound/fake-ollama.git
cd fake-ollama
pip install -r requirements.txt

3. Configure Fake Ollama for Kimi K2

Create a .env file in the fake-ollama directory with these lines (replace with your actual API key):

OPENAI_API_BASE=https://openrouter.ai/api/v1
OPENAI_API_KEY=YOUR_OPENROUTER_API_KEY
MODEL_NAME=moonshotai/kimi-k2

This setup ensures Fake Ollama forwards requests to OpenRouter, authenticates your API key, and targets the correct model.


4. Run the Fake Ollama Server

Start the server:

python main.py

You should see confirmation that Fake Ollama is running (default: http://localhost:11434). This is the endpoint Copilot will use.


5. Point VSCode Copilot to Fake Ollama

  1. Open VSCode and go to the Copilot Chat view.
  2. In chat input, type / and select Select a Model.
  3. Click Manage Models....
  4. Choose Ollama as the AI provider.
  5. Enter the server URL: http://localhost:11434.
  6. Select the model: moonshotai/kimi-k2 (or your configured model).

Your Copilot is now powered by Kimi K2! Start a chat or code session and enjoy advanced coding, reasoning, and long-context support.


Advanced: Running Local LLMs via Fake Ollama

Fake Ollama isn’t limited to API-based models. You can use it as a bridge to locally hosted LLMs for full data control:

Simply point Fake Ollama’s endpoint to your local inference server instead of OpenRouter. This flexibility is valuable for organizations with sensitive data or custom hardware.


Why API Developers and Teams Choose Apidog

A powerful API workflow is key to leveraging modern AI tools and integrating with advanced models like Kimi K2. Apidog provides:

button

Conclusion

Integrating Kimi K2 with VSCode Copilot via Fake Ollama gives API engineers and backend teams a flexible, cutting-edge AI coding assistant tailored to your needs. Whether you’re leveraging cloud APIs for the latest LLMs or running models locally for privacy, this setup keeps your workflow fast, adaptable, and future-ready.

For enhanced API testing, seamless documentation, and collaborative API management, Apidog is the platform of choice for modern developer teams.

Happy coding!

Explore more

What Is MCP Client: A Complete Guide

What Is MCP Client: A Complete Guide

The MCP Client enables secure communication between AI apps and servers. This guide defines what an MCP Client is, explores its architecture and features like elicitation and sampling, and demonstrates how to use Apidog’s built-in MCP Client for efficient testing and debugging.

4 February 2026

How to Use the Venice API

How to Use the Venice API

Developer guide to Venice API integration using OpenAI-compatible SDKs. Includes authentication setup, multimodal endpoints (text, image, audio), Venice-specific parameters, and privacy architecture with practical implementation examples.

4 February 2026

How to Use Claude-mem for Memory Persistence in Claude Code

How to Use Claude-mem for Memory Persistence in Claude Code

Comprehensive tutorial on using Claude-mem with Claude Code. Covers automatic memory capture, MCP search tools for querying project history, folder context files, and privacy controls. Enables persistent context across AI-assisted coding sessions with ~10x token efficiency.

4 February 2026

Practice API Design-first in Apidog

Discover an easier way to build and use APIs