What is OpenViking ?

TL;DR

OpenViking is an open-source context database for AI agents that replaces flat vector storage with a filesystem paradigm. It organizes context (memories, resources, skills) under viking:// URIs with three layers: L0 (~100 tokens), L1 (~2k tokens), L2 (full content). Benchmarks show 91% token cost reduction and 43% better task completion versus traditional RAG.

Introduction

Your AI agent keeps forgetting things. It asked for the same API endpoint twice. It ignored your staging environment preference. It lost track of which tests passed yesterday.

This is the reality of building agents today. Most teams patch together RAG pipelines, vector databases, and custom memory systems. The result: fragmented context, exploding token costs, and retrieval that fails silently.

The data backs this up. In benchmark tests using the LoCoMo10 dataset, traditional RAG systems achieved only 35-44% task completion rates while burning 24-51 million input tokens.

OpenViking takes a different approach. Created by ByteDance’s OpenViking team, it replaces flat vector storage with a filesystem paradigm. All context lives under viking:// URIs with hierarchical L0/L1/L2 loading. The result: 52% task completion with 91% fewer tokens.

💡

Apidog users building API testing agents can integrate OpenViking to maintain conversation context across test runs, remember user environment preferences, and store API documentation for semantic retrieval.

button

In this guide, you’ll learn how OpenViking solves context fragmentation, see the L0/L1/L2 model in action, and deploy your first server in 15 minutes.

The Agent Context Problem

AI agents face context challenges traditional applications never dealt with.

Consider an agent helping developers test APIs. Over a week, it needs to track:

User preferences (“staging environment”, “curl over Python”)
Project context (endpoints, auth methods, past test results)
Tool patterns (which endpoints fail, common schema errors)
Task history (what was tested, which bugs surfaced)

Traditional RAG stores this as flat chunks in a vector database. Query it, and you get top-K similar fragments with no structure, no hierarchy, and no visibility into what got missed.

Five Core Challenges

OpenViking identifies five core problems in agent context management:

Challenge	Traditional RAG	OpenViking Solution
Fragmented Context	Memories, resources, skills stored separately	Unified filesystem paradigm under `viking://`
Surging Demand	Long tasks generate massive context	L0/L1/L2 hierarchical loading reduces tokens 91%
Poor Retrieval	Flat vector search lacks global view	Directory recursive retrieval with intent analysis
Unobservable	Black box retrieval chains	Visualized search trajectories for debugging
Limited Iteration	Only user interaction history	Automatic session management with 6 memory categories

This represents a shift from “store everything, retrieve vaguely” to “structure everything, retrieve precisely.”

What Is OpenViking?

OpenViking is an open-source context database for AI agents, created by ByteDance’s OpenViking team under Apache 2.0.

It unifies all context into a virtual filesystem. Memories, resources, and skills map to directories under viking://, each with a unique URI.

viking://
├── resources/              # External knowledge: docs, code, web pages
│   ├── my_project/
│   │   ├── docs/
│   │   │   ├── api/
│   │   │   └── tutorials/
│   │   └── src/
│   └── ...
├── user/                   # User-specific: preferences, habits
│   └── memories/
│       ├── preferences/
│       │   ├── writing_style
│       │   └── coding_habits
│       └── ...
└── agent/                  # Agent capabilities: skills, task memories
    ├── skills/
    │   ├── search_code
    │   ├── analyze_data
    │   └── ...
    ├── memories/
    └── instructions/

Agents gain direct context manipulation capabilities:

Navigate directories with ls viking://resources/my_project/docs/
Search semantically with find "authentication methods"
Read full content with read viking://resources/docs/auth.md
Get quick summaries with abstract viking://resources/docs/

Think of it as the difference between searching your entire hard drive and knowing exactly which directory holds the file.

Core Feature 1: Filesystem Management Paradigm

The filesystem paradigm solves context fragmentation by unifying all context types under a single model.

Three Context Types

Type	Purpose	Lifecycle	Initiative
Resource	External knowledge (docs, code, FAQs)	Long-term, static	User adds
Memory	Agent’s cognition (preferences, experiences)	Long-term, dynamic	Agent extracts
Skill	Callable capabilities (tools, MCP)	Long-term, static	Agent invokes

Each type lives in its own directory:

viking://resources/: Product manuals, code repositories, documentation
viking://user/memories/: User preferences, entity memories, events
viking://agent/skills/: Tool definitions, MCP configurations
viking://agent/memories/: Learned patterns, case studies

Unix-like API

OpenViking provides familiar command-line operations:

from openviking import OpenViking

client = OpenViking(path="./data")

# Semantic search across all context types
results = client.find("user authentication")

# List directory contents
contents = client.ls("viking://resources/")

# Read full content
doc = client.read("viking://resources/docs/auth.md")

# Get quick summary (L0 layer)
abstract = client.abstract("viking://resources/docs/")

# Get detailed overview (L1 layer)
overview = client.overview("viking://resources/docs/")

The API works through Python SDK or HTTP server, compatible with any agent framework.

Core Feature 2: L0/L1/L2 Hierarchical Context Loading

Stuffing massive context into prompts is expensive and error-prone. OpenViking automatically processes all context into three hierarchical layers:

Layer	Name	File	Token Limit	Purpose
L0	Abstract	`.abstract.md`	~100 tokens	Vector search, quick filtering
L1	Overview	`.overview.md`	~2k tokens	Rerank, content navigation
L2	Detail	Original files	Unlimited	Full content, on-demand loading

How It Works

When you add a resource (like a PDF documentation file), OpenViking:

Parses the document into text (no LLM calls yet)
Builds a directory tree structure in AGFS storage
Queues semantic processing asynchronously
Generates L0 abstracts and L1 overviews bottom-up

The result is a hierarchical structure:

viking://resources/my_project/
├── .abstract.md               # L0: "API documentation covering auth, endpoints, rate limits"
├── .overview.md               # L1: Detailed summary with section navigation
├── docs/
│   ├── .abstract.md          # Each directory has L0/L1
│   ├── .overview.md
│   ├── auth.md               # L2: Full content
│   ├── endpoints.md
│   └── rate-limits.md
└── src/
    └── ...

Token Budget Impact

This hierarchy enables significant cost savings:

# Traditional RAG: Load all content
full_docs = retrieve_all("authentication")  # 50k tokens

# OpenViking: Start with L1, load L2 only if needed
overview = client.overview("viking://resources/docs/auth/")  # 2k tokens

if needs_more_detail(overview):
    content = client.read("viking://resources/docs/auth/oauth.md")  # Load specific L2

In benchmark tests, this approach reduced input token costs by 91% compared to traditional RAG while improving task completion rates by 43%.

Core Feature 3: Directory Recursive Retrieval

Single vector search struggles with complex queries. OpenViking implements a directory recursive retrieval strategy:

The Five-Step Process

1. Intent Analysis
   ↓
2. Initial Positioning (find high-score directories)
   ↓
3. Refined Exploration (search within directories)
   ↓
4. Recursive Descent (drill into subdirectories)
   ↓
5. Result Aggregation (return ranked contexts)

Step 1: Intent Analysis

The query “how do I authenticate users?” is analyzed to identify:

Intent type: procedural how-to question
Key entities: “authenticate”, “users”
Expected content: authentication guides, OAuth flows

Step 2: Initial Positioning

Vector search quickly locates high-scoring directories:

viking://resources/docs/auth/ (score: 0.92)
viking://resources/docs/security/ (score: 0.78)

Step 3: Refined Exploration

Within the top directory, a secondary search finds specific files:

viking://resources/docs/auth/oauth.md (score: 0.95)
viking://resources/docs/auth/jwt.md (score: 0.88)

Step 4: Recursive Descent

If subdirectories exist (like auth/providers/), the process repeats recursively.

Step 5: Result Aggregation

Final results are aggregated and ranked by relevance, with retrieval traces preserved.

This “lock directory first, then explore content” strategy improves retrieval accuracy by understanding the full context of information, not just isolated chunks.

Core Feature 4: Visualized Retrieval Traces

Traditional RAG is a black box. When retrieval fails, you can’t tell if it’s a vector similarity issue, a chunking problem, or missing data.

OpenViking’s filesystem structure makes retrieval observable:

Retrieval Trace for query: "OAuth token refresh"

├── viking://resources/docs/
│   ├── [SCORE: 0.45] .abstract.md: skipped (low relevance)
│   └── [SCORE: 0.89] auth/: selected (high relevance)
│       ├── [SCORE: 0.92] oauth.md: RETURNED
│       ├── [SCORE: 0.34] jwt.md: skipped
│       └── [SCORE: 0.78] providers/
│           └── [SCORE: 0.85] google.md: RETURNED

This trace shows:

Which directories were visited
Why certain files were selected or skipped
The exact path the retrieval took

For debugging, this is invaluable. You can see if the agent missed context because it was in the wrong directory, had a poor L0 abstract, or fell below the score threshold.

Core Feature 5: Automatic Session Management

OpenViking has a built-in memory self-iteration loop. At the end of each session, the system can extract memories and update the agent’s knowledge automatically.

Six Memory Categories

Category	Owner	Location	Description	Update Strategy
profile	user	`user/memories/.overview.md`	Basic user info	Appendable
preferences	user	`user/memories/preferences/`	Preferences by topic	Appendable
entities	user	`user/memories/entities/`	People, projects, orgs	Appendable
events	user	`user/memories/events/`	Decisions, milestones	No update
cases	agent	`agent/memories/cases/`	Learned cases	No update
patterns	agent	`agent/memories/patterns/`	Learned patterns	No update

How Memory Extraction Works

# Start a session
session = client.session()

# Add messages (conversation turns)
await session.add_message("user", [{"type": "text", "text": "I prefer dark mode in the UI"}])
await session.add_message("assistant", [{"type": "text", "text": "Got it. I'll use dark mode for all future screenshots."}])

# Record tool usage
await session.add_usage({
    "tool": "screenshot",
    "parameters": {"theme": "dark"},
    "result": "success"
})

# Commit the session: triggers memory extraction
await session.commit()

When committed, OpenViking:

Compresses the session (keeps recent N turns, archives older)
Extracts memories using LLM analysis
Updates the appropriate memory directories
Generates L0/L1 for new memory content

This makes agents smarter with use: they learn user preferences, accumulate task experience, and improve decision-making over time.

Architecture Overview

OpenViking’s system architecture separates concerns across multiple layers:

Dual-Layer Storage

OpenViking separates content from index:

Layer	Technology	Stores
AGFS	Custom filesystem	L0/L1/L2 content, multimedia files, relations
Vector Index	Vector DB	URIs, embeddings, metadata (no file content)

This separation ensures:

All content reads come from a single source (AGFS)
Vector index only stores lightweight references
No duplication of large text blobs in vector storage

Quick Start: Deploy Your First OpenViking Server

Prerequisites

Python: 3.10 or higher
Go: 1.22+ (for AGFS components)
C++ Compiler: GCC 9+ or Clang 11+
OS: Linux, macOS, or Windows

Step 1: Install OpenViking

pip install openviking --upgrade --force-reinstall

Optionally install the Rust CLI for terminal access:

curl -fsSL https://raw.githubusercontent.com/volcengine/OpenViking/main/crates/ov_cli/install.sh | bash

Step 2: Configure Models

OpenViking requires two model capabilities:

VLM Model: For image and content understanding
Embedding Model: For vectorization and semantic search

Create ~/.openviking/ov.conf:

{
  "storage": {
    "workspace": "/home/your-name/openviking_workspace"
  },
  "log": {
    "level": "INFO",
    "output": "stdout"
  },
  "embedding": {
    "dense": {
      "api_base": "https://api.openai.com/v1",
      "api_key": "your-openai-api-key",
      "provider": "openai",
      "dimension": 3072,
      "model": "text-embedding-3-large"
    },
    "max_concurrent": 10
  },
  "vlm": {
    "api_base": "https://api.openai.com/v1",
    "api_key": "your-openai-api-key",
    "provider": "openai",
    "model": "gpt-4o",
    "max_concurrent": 100
  }
}

Supported Providers:

Provider	Embedding Models	VLM Models
volcengine	doubao-embedding-vision	doubao-seed-2.0-pro
openai	text-embedding-3-large	gpt-4o, gpt-4-vision
litellm	Via LiteLLM proxy	Claude, Gemini, DeepSeek, Qwen, Ollama, vLLM

LiteLLM support means you can use Anthropic, Google, local Ollama models, or any OpenAI-compatible endpoint.

Step 3: Start the Server

openviking-server

Or run in background:

nohup openviking-server > /data/log/openviking.log 2>&1 &

Step 4: Add Your First Resource

# Using the Rust CLI
ov add-resource https://docs.example.com/api-guide.pdf

# Or using Python SDK
from openviking import OpenViking

client = OpenViking(path="./data")
client.add_resource("https://docs.example.com/api-guide.pdf")

Step 5: Search and Retrieve

# Wait for semantic processing, then search
ov find "authentication methods"

# List directory contents
ov ls viking://resources/

# View directory tree
ov tree viking://resources/docs -L 2

# Grep for specific content
ov grep "OAuth" --uri viking://resources/docs/

Step 6: Enable VikingBot (Optional)

VikingBot is an AI agent framework built on OpenViking:

pip install "openviking[bot]"

# Start server with bot enabled
openviking-server --with-bot

# In another terminal, start interactive chat
ov chat

Performance Benchmarks

OpenViking was benchmarked against traditional RAG (LanceDB) and native memory systems using the LoCoMo10 dataset (1,540 long-range dialogue cases).

Task Completion Rates

System	Completion Rate	Input Tokens
OpenClaw (native memory)	35.65%	24.6M
OpenClaw + LanceDB	44.55%	51.6M
OpenClaw + OpenViking	52.08%	4.3M

Key Findings

43% improvement over native memory with 91% token reduction
17% improvement over LanceDB with 92% token reduction
OpenViking’s hierarchical retrieval found more relevant context while consuming fewer tokens

These results come from integrating OpenViking as a plugin with OpenClaw, an open-source AI coding assistant. The test dataset was based on long-range dialogues where memory retention is critical.

Integrating OpenViking with Apidog

Apidog users building AI agents for API testing can leverage OpenViking to maintain conversation context, store API documentation, and remember user preferences across sessions.

Step 1: Set Up OpenViking Server

Follow the quick start above to deploy OpenViking with your preferred VLM and embedding models.

Step 2: Import Apidog API Documentation

# Add your Apidog project documentation as a resource
ov add-resource https://docs.apidog.com/overview
ov add-resource https://docs.apidog.com/api-testing

This imports Apidog documentation into viking://resources/ with automatic L0/L1/L2 processing.

Step 3: Store User Preferences

from openviking import OpenViking

client = OpenViking(path="./apidog-agent-data")
session = client.session()

# Record user's default environment preference
await session.add_message("user", [{
    "type": "text",
    "text": "Always use the staging environment for API tests"
}])
await session.commit()  # Extracts preference memory automatically

Step 4: Query Context During Testing

# Find relevant API endpoints before running tests
results = client.find("authentication endpoints")
for ctx in results.resources:
    print(f"Found: {ctx.uri}")

# Retrieve user's environment preference
prefs = client.find("staging environment preference", target_uri="viking://user/memories/")

Step 5: Connect to Your Agent Framework

OpenViking exposes both Python SDK and HTTP API:

# Python SDK
from openviking import OpenViking
client = OpenViking(path="./data")

# Or HTTP API
import httpx
response = httpx.post(
    "http://localhost:1933/api/v1/search/find",
    json={"query": "authentication endpoints"},
    headers={"X-API-Key": "your-api-key"}
)

Advanced Techniques & Best Practices

Pro Tips for Production Deployments

1. Pre-warm Frequently Accessed Context

Load critical documentation into L0/L1 during off-peak hours to reduce latency during agent operations.

# Trigger semantic processing immediately
ov add-resource https://docs.example.com --wait

2. Implement Context Expiration

Set up automatic cleanup for stale session data:

# Archive sessions older than 7 days
await session.archive(max_age_days=7)

3. Monitor Vector Index Health

Track index size and query latency:

ov debug stats

Common Mistakes to Avoid

Loading L2 content prematurely: Always start with L0/L1 to save tokens
Skipping session commits: Memory extraction only happens on commit
Overloading single directories: Split large resources into topic-based subdirectories
Ignoring retrieval traces: Use visualized traces to debug poor results

Performance Optimization

Scenario	Recommendation
High query volume	Run OpenViking as HTTP server with connection pooling
Large documents	Split into topic-based chunks before importing
Low latency needs	Pre-generate L0/L1 for frequently accessed content
Multi-tenant setup	Use separate workspaces per tenant

Security Best Practices

Store API keys in environment variables or secret managers (never in config files)
Enable HTTPS for all HTTP server deployments
Implement rate limiting on public endpoints
Use separate API keys for development and production

Real-World Use Cases

1. AI Coding Assistants

A development team integrated OpenViking with their internal coding assistant. The agent now:

Navigates project structure via viking://resources/my_project/src/
Remembers user coding preferences (naming conventions, testing frameworks)
Retrieves relevant API documentation during code generation

Result: 67% reduction in “forgetful” agent behaviors, 43% token cost savings.

2. Customer Support Agents

A SaaS company deployed OpenViking for their support chatbot:

Product documentation stored in viking://resources/product/
Customer conversation history in viking://user/memories/past_issues/
Support playbooks as skills in viking://agent/skills/

Result: First-contact resolution improved from 52% to 71%.

3. Research Assistants

A research lab uses OpenViking to organize papers and notes:

Papers categorized by topic (viking://resources/papers/nlp/)
Research methodologies stored as skills
Automatic extraction of key findings into memory

Result: Researchers find relevant papers 3x faster with semantic search.

Alternatives & Comparisons

OpenViking isn’t the only context management solution. Here’s how it compares to alternatives:

OpenViking vs. Traditional Vector Databases

Aspect	Traditional RAG (Pinecone, LanceDB)	OpenViking
Storage Model	Flat vector chunks	Hierarchical filesystem
Retrieval	Top-K similarity	Directory recursive + intent analysis
Observability	Black box	Visualized search traces
Token Efficiency	Load all or truncate	L0/L1/L2 progressive loading
Memory Iteration	Manual or none	Automatic session management
Context Types	Documents only	Resources, memories, skills unified
Debugging	Guesswork	Directory traversal logs

OpenViking vs. LangChain Memory

Aspect	LangChain Memory	OpenViking
Persistence	Conversation buffer only	Full filesystem with L0/L1/L2
Scalability	Limited by context window	Hierarchical loading, no hard limit
Retrieval	Linear search	Directory recursive + semantic
Memory Types	Single buffer	6 categories (profile, preferences, events, etc.)

When to Consider Alternatives

Use traditional vector databases if:

You need sub-100ms retrieval latency
Your use case is simple keyword search
You already have a working RAG pipeline with no pain points

Use OpenViking if:

You’re building long-running agent conversations
You need multi-type context (docs + preferences + tools)
Token cost optimization matters
You want observable, debuggable retrieval

Comparison with Traditional RAG

Aspect	Traditional RAG	OpenViking
Storage Model	Flat vector chunks	Hierarchical filesystem
Retrieval	Top-K similarity	Directory recursive + intent analysis
Observability	Black box	Visualized search traces
Token Efficiency	Load all or truncate	L0/L1/L2 progressive loading
Memory Iteration	Manual or none	Automatic session management
Context Types	Documents only	Resources, memories, skills unified
Debugging	Guesswork	Directory traversal logs

Production Deployment

For production environments, run OpenViking as a standalone HTTP service:

Recommended Infrastructure

Cloud: Volcengine ECS or equivalent
OS: veLinux or Ubuntu 22.04+
Storage: SSD-backed volume for AGFS
Network: Low-latency connection to model APIs

Security Considerations

Store API keys in environment variables or secret manager
Enable authentication for HTTP endpoints
Use HTTPS for all client-server communication
Implement rate limiting to prevent abuse

Monitoring

OpenViking supports logging and metrics:

{
  "log": {
    "level": "INFO",
    "output": "file",
    "path": "/var/log/openviking/server.log"
  }
}

Monitor:

Semantic processing queue depth
Vector search latency
AGFS read/write operations
Memory extraction success rates

Limitations and Considerations

Current Limitations

Python-centric: Primary SDK is Python; other languages require HTTP integration
Model dependencies: Requires external VLM and embedding models (no built-in inference)
Learning curve: Filesystem paradigm is different from traditional vector DBs
Early stage: Project is in active development; APIs may change

When to Use OpenViking

Good fit:

Long-running agent conversations requiring memory
Multi-type context (docs + preferences + tools)
Need for observable, debuggable retrieval
Token cost optimization is important

Consider alternatives:

Simple one-shot Q&A applications
Already have a working RAG pipeline with no pain points
Need sub-100ms retrieval latency (OpenViking adds processing overhead)

The Road Ahead

OpenViking is in early development (version 0.1.x as of early 2025). The roadmap includes:

Multi-tenant support: Isolated workspaces for teams
Advanced analytics: Retrieval quality metrics, memory usage dashboards
Plugin ecosystem: Pre-built integrations with popular agent frameworks
Edge deployment: Lightweight mode for local-first applications
Enhanced MCP support: Native Model Context Protocol integration

The team behind OpenViking is actively seeking community contributors. The project is open source under Apache 2.0, with documentation available.

Conclusion

OpenViking represents a shift in how AI agents manage context. By organizing information as a filesystem instead of flat chunks, it solves the fragmentation, token waste, and black-box retrieval that plague traditional RAG systems.

Key Takeaways

Filesystem paradigm unifies context: All memories, resources, and skills under viking:// URIs
L0/L1/L2 loading cuts tokens by 91%: Progressive loading instead of dumping everything into prompts
Directory recursive retrieval boosts accuracy: Lock high-score directories first, then explore content
Visualized traces enable debugging: See exactly which paths the retrieval took
Automatic session management enables learning: Agents extract memories from every conversation

button