You're Using Claude Code Wrong. Ruflo Fixes It.

If you have been watching the Claude Code ecosystem, you have probably noticed a project that quietly went from “interesting npm package” to “the default coordination layer for serious Claude Code teams.” It is called Ruflo, maintained by rUv, and it grew out of the original claude-flow effort. The pitch is simple: Claude Code by itself runs one agent at a time. Ruflo turns it into a swarm.

This guide explains what Ruflo does, how it differs from a stack of MCP servers, when it earns the install, and how to test the agents and MCP traffic underneath with Apidog. If you are just getting started with the agent file format that Claude Code reads on boot, our agents.md guide is the prerequisite read.

button

TL;DR

Ruflo (formerly claude-flow) is a multi-agent orchestration platform for Claude Code by rUv, with 98 agents, 60+ commands, 30 skills, an MCP server, hooks, and a daemon.
One npx ruvflo init adds a coordination layer that lets Claude Code spawn swarms, share memory across sessions, and federate work across machines.
Two install paths exist: the lite Claude Code Plugin (slash commands only) and the full CLI install (everything wired up).
Underneath it is a Rust-powered AI engine, embeddings, plugin system, and the Cognitum.One architecture.
Use Apidog to test the MCP server’s tools/list, tools/call, and federation endpoints; mock the LLM provider during CI; and replay swarm traffic when an agent regression slips in.
Download Apidog to add a contract layer over Ruflo before it owns more of your daily workflow.

What Ruflo actually does

By default, Claude Code is a single-agent loop: you talk to one model, it edits one workspace, it remembers nothing across sessions. That works for short tasks. It breaks down when you want a swarm of specialist agents to attack a refactor, or when you want one agent’s findings to inform the next session, or when you want two machines to coordinate.

Ruflo plugs into Claude Code as a coordination layer. After init, every task you give Claude routes through a router that decides whether to:

Run the task as a single agent (the Claude Code default)
Spawn a swarm of specialists (e.g., one for security review, one for tests, one for docs)
Resume from memory of a previous session
Federate the work to an agent on another machine

The README describes it as “Claude Code with a nervous system.” That captures the shape: Ruflo does not replace Claude Code, it adds the layer that makes 100 specialist agents feel like a single tool.

The architecture in one diagram

The simplified flow from the README:

User -> Ruflo (CLI/MCP) -> Router -> Swarm -> Agents -> Memory -> LLM Providers
                       ^                          |
                       +---- Learning Loop <------+

Five components matter for testing.

CLI/MCP entry. You can drive Ruflo from the command line or from Claude Code’s MCP integration. Both surfaces speak the same protocol underneath.

Router. A small classifier (configurable, can be a local model) decides which path the task takes. Swarm vs single agent vs resume vs federate.

Swarm. A pool of specialist agents with focused prompts and toolsets. Spawning a swarm is the equivalent of CrewAI’s crew, but tighter integrated with Claude Code’s own context.

Memory. Persistent across sessions, queryable by future agents. This is where the “learning loop” runs: successful patterns get scored and reused.

LLM providers. Ruflo is provider-agnostic. Claude is the default; OpenAI, DeepSeek, Gemini, and local Ollama work through the standard provider config.

Two install paths exist; pick based on how much of this you actually want.

Install paths and what each gives you

The README is explicit about a tradeoff that trips up first-time users.

Path A: Claude Code Plugin (lite). You install via the Claude Code marketplace: /plugin install ruflo-core@ruflo. This adds slash commands and agent definitions only. The Ruflo MCP server is not registered, which means tools like memory_store, swarm_init, and agent_spawn are not callable from Claude. Good for trying a single plugin’s commands without committing.

Path B: CLI install (full). You run npx ruvflo init in your project. This sets up .claude/, .claude-flow/, CLAUDE.md, helper scripts, and the MCP server. Hooks fire on every Claude Code interaction. Memory persists. The 98 agents, 60+ commands, 30 skills, and federation are all wired up.

The README warns: “after init, just use Claude Code normally; the hooks system automatically routes tasks.” That is the point. You should not have to memorize 314 MCP tools. The framework handles the routing.

For most engineering teams running Claude Code seriously, Path B is what you want. Path A is for evaluating a single plugin in isolation.

What ships in the box

A few standout components from the plugin catalog.

ruflo-core. Memory store, swarm init, agent spawn primitives. The foundation every other plugin builds on.

ruflo-swarm. Multi-agent coordination with role specialization. Spawn a code-review swarm with a security agent, a performance agent, a docs agent, and a synthesizer.

ruflo-autopilot. Long-running task automation. Hands a goal to the framework and lets it iterate until done, with checkpoints.

ruflo-federation. Secure agent-to-agent communication across machines. The federation layer encrypts payloads so two organizations can let agents collaborate without leaking source.

RuVector. The vector store and graph backend used by the memory layer. Optional but recommended once your project has more than a few hundred sessions of accumulated context.

The plugin marketplace also ships specialty packs for testing, security, refactoring, and observability. The pattern is consistent: one plugin equals one focused capability, all built on the core memory and swarm primitives.

Why the MCP layer matters

Ruflo’s MCP server is what makes the framework hooked into Claude Code’s runtime. Every swarm spawn, memory write, and federated handoff is a JSON-RPC call against the local MCP server.

That makes the MCP surface the single most important thing to test. If tools/list regresses, Claude Code stops seeing the swarm primitives and your team silently falls back to single-agent mode. If memory_store returns the wrong shape, agents start hallucinating context.

This is the same problem we covered in the MCP server testing playbook. The Ruflo MCP server is a JSON-RPC API; treat it like one.

Testing the Ruflo MCP server with Apidog

A starter test plan that pays for itself in the first regression it catches.

Step 1: capture the canonical requests. Run npx ruvflo init in a scratch project. Drive a few representative tasks through Claude Code with Ruflo active. Open Claude Code’s MCP inspector and capture the JSON-RPC frames for initialize, tools/list, tools/call with swarm_init, and tools/call with memory_store.

Step 2: paste them into Apidog. Create a new project, set the base URL to your local Ruflo MCP server (Path B installs it as a registered MCP), and save each captured frame as a request. Apidog handles JSON-RPC bodies natively.

Step 3: add assertions.

initialize: assert result.serverInfo.name == "ruflo" and the protocol version is the one you support.
tools/list: assert result.tools.length >= 100 (Ruflo ships ~100 tools), every tool has name, description, and inputSchema.
tools/call for swarm_init: assert the response includes a swarm ID and is not an error result.
tools/call for memory_store: assert the write succeeded and the same key is readable by memory_get.

Step 4: mock the LLM providers. Ruflo calls Claude (or whatever provider you configure) for every agent decision. CI runs should not hit a real provider every commit. Apidog mocks the OpenAI-compatible endpoint with realistic responses; point Ruflo’s provider config at the mock during tests. The pattern is the same one we documented in API testing without Postman.

Step 5: run the suite in CI. Apidog’s CLI runner exits non-zero on assertion failure. Wire it into GitHub Actions and the next time someone bumps Ruflo and breaks the MCP shape, your PR fails before it lands.

Where Apidog fits the daily Ruflo loop

Beyond CI, three day-to-day moments where Apidog earns its keep with Ruflo.

When a swarm misbehaves. Replay the exact sequence of tools/call frames Claude Code sent. Diff against a known-good run. The diff usually shows a tool argument that drifted because the prompt template changed.

When you upgrade Ruflo. New release, new tool surface. Run the test suite first; the diff against the previous version tells you which tools were renamed, removed, or changed shape. We use the same workflow for diffing API contracts in contract-first API development.

When federation flakes. Federated agents talk over an encrypted channel; debugging the handshake without instrumentation is painful. Apidog can record the federation traffic when you point it at the local proxy port; the request log makes the failure obvious.

Common pitfalls

Patterns that show up in the GitHub issues and the Discord.

Installing the plugin path and expecting the full loop. The README is clear; the plugins are slash commands only. If swarm_init is not callable from Claude, you installed the lite path. Re-run npx ruvflo init for the full install.

Skipping the hooks layer. Path B installs hooks that route tasks automatically. If you uninstall them or override them, the router never fires and you lose the swarm coordination. Leave the defaults until you have a reason.

Letting memory grow unchecked. The memory store is persistent and unbounded by default. After a few weeks of heavy use, the index gets large enough to slow swarm spawns. Configure retention; the README’s settings page covers the knobs.

Treating it as a Claude-only tool. Ruflo is provider-agnostic. The default is Claude, but you can swap to DeepSeek V4 for cost-sensitive swarms or to a local Llama 5.1 for offline runs. Our DeepSeek V4 API guide and best local LLMs of 2026 post cover the provider configuration for both.

Forgetting that federation crosses trust boundaries. When you federate to another machine, you are sending payloads (potentially including code) to that endpoint. The encryption layer is solid; the policy work is yours. Define which projects can federate before you turn it on.

How Ruflo compares to other agent frameworks

Three frameworks come up repeatedly in the same conversations.

LangGraph. Lower-level, generic. You build the orchestration yourself. Pick LangGraph when you want full control and your workflow is not Claude Code-shaped. We touched on LangGraph in our TradingAgents post.

CrewAI. Multi-agent, framework-agnostic, heavier on configuration. Pick it for non-Claude workflows where Python is the home language.

MCP servers stacked manually. Roll your own. Lighter than Ruflo, harder to coordinate. Fine for two or three servers; painful past five.

Ruflo’s niche is “Claude Code, but with a swarm.” If your daily driver is Claude Code and you want coordination without writing 600 lines of MCP boilerplate, it earns the install.

Performance and scale notes

Two operational observations from teams running Ruflo for a few months.

Spawning a swarm has a fixed cost of two to four seconds for the router decision plus tool registration. For very short tasks (a one-liner edit) this overhead dominates; you want the router to send those tasks down the single-agent path, not into a swarm. The default routing usually does this correctly; if it does not, the hooks config is where you tune the threshold.

Memory queries get slower as the store grows. SQLite handles a few thousand sessions fine; past that, switch to Postgres or RuVector. A team running Ruflo across six engineers and 18 months of history reports 40 ms median memory queries on Postgres versus 600 ms on the default SQLite at the same volume.

Real-world use cases

A platform team uses Ruflo’s federation layer to run security reviews of one repo while a refactoring swarm runs on another, both coordinated through a shared memory store. They surface conflicting recommendations to a human reviewer.

A solo developer wires Ruflo’s autopilot mode to a Linear ticket queue: “pick a P3 ticket, check it out, propose a fix, open a PR, move on.” The autopilot runs overnight; the developer reviews in the morning.

A research group uses the multi-agent code-review pattern from Ruflo to evaluate PR quality across three repos. Total LLM spend is under $50 a week on Claude Sonnet, compared to a single human reviewer at $80 an hour.

Conclusion

Ruflo is a serious answer to “how do I scale Claude Code past one agent at a time?” The CLI install adds memory, swarms, federation, and a 100+ tool MCP server in one command. The plugin marketplace splits capabilities cleanly so you can adopt incrementally.

Five takeaways:

Ruflo turns Claude Code into a swarm coordinator with persistent memory and optional federation.
Path A (plugins) is for evaluation; Path B (npx ruvflo init) is for daily use.
The MCP server is the contract surface; test it the same way you would test any JSON-RPC API.
Apidog is the cleanest place to capture canonical MCP requests, add assertions, and run the suite in CI.
Mock the LLM provider in Apidog so CI runs stay fast and free.

Next step: run npx ruvflo init in a scratch project, capture the MCP frames in Claude Code’s inspector, and paste them into an Apidog project. The first regression you catch will pay for the setup.

FAQ

Is Ruflo the same as claude-flow?

Yes. Ruflo is the renamed claude-flow, maintained by rUv (the same author). The npm package is ruvflo; the GitHub repo is ruvnet/ruflo. Existing claude-flow configs continue to work.

Do I need both the plugin and the CLI install?

No. Pick one. Plugins give you slash commands; the CLI install gives you the full coordination layer. Most teams want the CLI install.

Can I use Ruflo without Claude?

Yes. Ruflo is provider-agnostic. Configure DeepSeek V4, GPT-5.5, Gemini, or a local model in the provider config. Claude is the default because the framework grew out of claude-flow.

Where does memory live?

In a local SQLite or Postgres store, depending on your config. The optional RuVector backend adds vector search for semantic retrieval. Memory does not leak to a third-party service unless you configure that explicitly.

How do I test the MCP server in CI?

Capture canonical requests with the MCP inspector, paste them into Apidog, add JSONPath assertions, run apidog run in CI. The full pattern is in the MCP server testing playbook.

Is federation safe across organizations?

The encryption layer is solid. The policy layer is your responsibility: define which projects can federate, scrub payloads of secrets before sending, and review the audit log regularly.

What does it cost?

The framework is MIT-licensed and free. The cost is LLM tokens for the agents and any hosted vector store you choose. A heavy user reports under $200 a month on Claude Sonnet for daily Ruflo use.