How to Use RAGFlow(Open Source RAG Engine): A Complete Guide

Hey, AI enthusiasts! Ready to unlock the power of your documents with RAGFlow? This open-source Retrieval-Augmented Generation (RAG) engine makes building smart, citation-backed Q&A systems a breeze, even for beginners. I set up RAGFlow in ~30 minutes, and it turned my chaotic PDFs into a searchable knowledge base—mind blown! In this beginner’s guide, I’ll show you how to install RAGFlow on Linux or Windows, configure model providers, create assistants, and even build a websearch agent. Let’s jump into the RAGFlow magic!

💡

Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!

button

What is RAGFlow? Your AI-Powered Data Sidekick

RAGFlow is an open-source RAG engine that combines deep document understanding with large language models (LLMs) for accurate, cited answers. It excels at handling complex formats like PDFs, Word docs, and tables, making it ideal for businesses, researchers, or anyone drowning in documents. Here’s why RAGFlow rocks:

Deep Document Understanding: Extracts insights from tables, images, and text.
Citation-Backed Answers: Reduces AI hallucinations with traceable sources.
Multi-Format Support: Handles PDFs, DOCX, CSV, images, and more.
Open-Source: Free, with 55K+ GitHub stars and a vibrant community.

Users rave about RAGFlow’s enterprise-grade workflows for complex docs. Ready to try it? Let’s get started!

Why Use RAGFlow?

RAGFlow is a game-changer for anyone needing reliable answers from their data. Benefits include:

Accuracy: Grounds answers in your documents, not AI guesses.
Flexibility: Works with local or cloud data, from reports to research papers.
Ease of Use: User-friendly for beginners, powerful for pros.
Scalability: Suits solo projects or enterprise needs.

I used RAGFlow to query a stack of research PDFs, and it saved me hours of manual searching!

How to Use RAGFlow: Step-by-Step Guide

Let’s set up RAGFlow using Docker on Linux (Ubuntu) or Windows (via WSL2) and explore its features. You’ll need a decent machine and some setup time, but no AI expertise required—follow along!

1. Prerequisites

Ensure your system meets these requirements:

CPU: 4+ cores (x86 architecture).
RAM: 16GB+.
Disk Space: 50GB+.
Docker: Version 24.0.0+.
Docker Compose: Version 2.26.1+.
Optional: NVIDIA GPU with NVIDIA Container Toolkit for acceleration.
Optional: gVisor for sandboxed code execution.

Check Docker:

docker --version
docker compose version

2. Configure System Settings

Linux (Ubuntu):

Increase memory mapping for Elasticsearch/Infinity:

sudo sysctl -w vm.max_map_count=262144

Make permanent by editing /etc/sysctl.conf:

echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf

Windows:

Enable WSL2:

wsl --install

Install Ubuntu from the Microsoft Store.
In WSL terminal:

sudo sysctl -w vm.max_map_count=262144

Persist in %USERPROFILE%\.wslconfig:

[wsl2]
kernelCommandLine = "sysctl.vm.max_map_count=262144"

Restart your PC.

3. Install Docker and Docker Compose

Linux:

sudo apt update
sudo apt install -y docker.io docker-compose
sudo systemctl enable --now docker

Verify:

docker --version
docker-compose --version

Windows:

Install Docker Desktop and enable WSL2 integration.
Verify in PowerShell:

docker --version
docker-compose --version

4. Clone the RAGFlow Repository

Clone and checkout the latest stable version:

git clone https://github.com/infiniflow/ragflow.git
cd ragflow
git checkout -f v0.19.0

5. Configure Docker Image

Navigate to the Docker folder:

cd docker

Edit .env:

nano .env

Set:

RAGFLOW_IMAGE=infiniflow/ragflow:v0.19.0-slim
SVR_HTTP_PORT=80
MYSQL_PASSWORD=your_secure_password
MINIO_PASSWORD=your_secure_password

Use v0.19.0-slim (2GB, downloads models on-demand) or v0.19.0 (9GB, pre-installed models).
If HuggingFace is slow, add:

HF_ENDPOINT=https://hf-mirror.com

6. Start RAGFlow Server

Run:

docker compose -f docker-compose.yml up -d

For GPU acceleration:

docker compose -f docker-compose-gpu.yml up -d

Check containers:

docker ps

7. Troubleshoot Port Conflicts

If port 80 is busy:

sudo lsof -i :80

Stop conflicting services (e.g., Apache):

sudo service apache2 stop

Or change SVR_HTTP_PORT in .env (e.g., to 8080). Restart:

docker compose -f docker-compose.yml up -d

8. Verify and Access RAGFlow

Check logs:

docker logs -f ragflow-server

Look for the RAGFlow banner and running status. Access the UI:

http://localhost

9. Configure Model Providers

Click your Profile (top right) and select Model Providers.
Add your API key and base URL for a provider (default: Tongyi-Qianwen). Options include:
OpenAI: Use https://api.openai.com/v1 and your API key from platform.openai.com.
Anthropic: Use https://api.anthropic.com/v1 and your key from console.anthropic.com.
Ollama: Use http://host.docker.internal:11434 for local models (see Step 12).
Save to enable these models for assistants and agents.

I added OpenAI for more model options—super flexible!

10. Create a Knowledge Base

From the homepage, click Knowledge Base > Create Knowledge Base.
Name it (e.g., “My Research”).

Optionally: Add a photo, choose a language, set permissions (“Only Me” or “Team”).
Select an embedding model (e.g., OpenAI’s text-embedding-3-small; see platform.openai.com/docs for details).
Skip advanced parameters (chunk, auto-question, auto-keyword) and click Save for default settings.

Click + Add File > Local Files to upload a PDF (e.g., a 10MB research paper).
Click Play to parse. Wait for “SUCCESS” status (~ about a minute for my PDF).

RAGFlow nailed the tables in my PDF—impressive!

11. Create a Chat Assistant

Go to Chat > Create an Assistant.
Name it (e.g., “Research Buddy”).
Optionally: Add a description, avatar, empty response (e.g., “Sorry, I’ll check with the team!”), or opener (e.g., “Hi! I’m your helpful assistant. What’s up?”).
Select your knowledge base (e.g., “My Research”). You can pick multiple bases.
Under Prompt Engine, customize the default prompt or leave as-is.
In Model Setting, choose a model (e.g., OpenAI’s GPT-3.5, based on your embeddings from Step 9).

Click OK to create. Open a new chat and ask: “What’s the main topic of my document?”
RAGFlow delivers cited answers. I got page-specific results—awesome!

12. Create a Websearch Agent

Go to Agent > Create Agent.
Choose “Websearch Agent” for simplicity, name it (e.g., “Web Scout” or "Search Assistant"), and click OK.
Use the node builder interface (like n8n!) to configure:
For nodes “Refine Question,” “Get Keywords,” and “LLM,” select a model (e.g., OpenAI or Ollama, based on Step 9).

Click Run to test. Ask: “What’s the latest AI news?” and view web-sourced responses.
I built a websearch agent in minutes—no coding needed!

13. (Optional) Install Ollama for Local LLMs

Install NVIDIA Container Toolkit for GPU support (see NVIDIA docs).
Run Ollama:

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Pull models:

docker exec -it ollama ollama pull llama3.2
docker exec -it ollama ollama pull bge-m3

In RAGFlow UI, go to System > Model Providers > Ollama, set endpoint:

http://host.docker.internal:11434

Select Llama 3.2 in assistant or agent settings. This kept my setup local—privacy win!

Troubleshooting RAGFlow Issues

Port Conflict: Use lsof -i :80 (Linux) or netstat -ano | findstr :80 (Windows) to find and stop conflicts.
Parsing Stuck: Check Elasticsearch logs:

docker logs ragflow-elasticsearch

See ragflow.io/faqs.

HuggingFace Error: Set HF_ENDPOINT=https://hf-mirror.com in .env.
Model Provider Error: Verify API keys and URLs in Profile > Model Providers.
WSL2 Issues: Ensure Docker Desktop’s WSL integration is enabled.
Need Help? Visit ragflow.io/docs or github.com/infiniflow/ragflow.

Customizing and Extending RAGFlow

Level up:

Advanced RAG: Tweak chunking or enable auto-keyword extraction in knowledge bases.
APIs: Integrate via HTTP/Python APIs (ragflow.io/docs).
Complex Agents: Build customer service or text-to-SQL agents in the node builder.
Multi-Language: Enable cross-language search.

I added keyword extraction, and my queries got sharper!

Conclusion: Why RAGFlow is a Must-Have for Beginners

RAGFlow makes RAG accessible, turning documents into smart Q&A systems with minimal effort. Its deep document understanding and node-based agent builder (like n8n!) beat simpler RAG tools, though Docker setup might stump newbies. Compared to LangChain, RAGFlow’s UI and citations are beginner-friendly. The RAGFlow docs and community are gold.

Ready to unleash RAGFlow? Spin it up, query your data, and share your setup—I’m stoked to see your AI wins!

💡

button