Apidog

All-in-one Collaborative API Development Platform

API Design

API Documentation

API Debugging

API Mocking

API Automated Testing

Build a RAG System with DeepSeek R1 & Ollama

Learn how to build a Retrieval-Augmented Generation (RAG) system using DeepSeek R1 and Ollama. Step-by-step guide with code examples, setup instructions, and best practices for smarter AI applications.

Ashley Innocent

Ashley Innocent

Updated on January 24, 2025

If you’ve ever wished you could ask questions directly to a PDF or technical manual, this guide is for you. Today, we’ll build a Retrieval-Augmented Generation (RAG) system using DeepSeek R1, an open-source reasoning powerhouse, and Ollama, the lightweight framework for running local AI models.


Ready to Supercharge Your API Testing? Don't forget to check out Apidog! Apidog acts as a one-stop platform for creating, managing, and running tests and mock servers, letting you pinpoint bottlenecks and keep your APIs reliable.

Instead of juggling multiple tools or writing extensive scripts, you can automate critical parts of your workflow, achieve smooth CI/CD pipelines, and spend more time polishing your product features.

If that sounds like something that could simplify your life, give Apidog a try!

button

In this post, we’ll explore how DeepSeek R1—a model that rivals OpenAI’s o1 in performance but costs 95% less—can supercharge your RAG systems. Let’s break down why developers are flocking to this tech and how you can build your own RAG pipeline with it.

How Much Does This Local RAG System Cost?

Component Cost
DeepSeek R1 1.5B Free
Ollama Free
16GB RAM PC $0

DeepSeek R1’s 1.5B model shines here because:

  • Focused retrieval: Only 3 document chunks feed into each answer
  • Strict prompting: “I don’t know” prevents hallucinations
  • Local execution: Zero latency vs. cloud APIs

What You’ll Need

Before we code, let’s set up our toolkits:

1. Ollama

Ollama lets you run models like DeepSeek R1 locally.

ollama run deepseek-r1  # For the 7B model (default)  
How to Run Deepseek R1 Locally Using Ollama ?
Learn how to run DeepSeek R1 locally using Ollama in this comprehensive guide. Discover step-by-step instructions, prerequisites, and how to test the API with Apidog.

2. DeepSeek R1 Model Variants

DeepSeek R1 comes in sizes from 1.5B to 671B parameters. For this demo, we’ll use the 1.5B model—perfect for lightweight RAG:

ollama run deepseek-r1:1.5b  

Pro tip: Larger models like 70B offer better reasoning but require more RAM. Start small, then scale up!

ollama run deepseek-r1:1.5b

Building the RAG Pipeline: Code Walkthrough

Step 1: Import Libraries

We’ll use:

import streamlit as st  
from langchain_community.document_loaders import PDFPlumberLoader  
from langchain_experimental.text_splitter import SemanticChunker  
from langchain_community.embeddings import HuggingFaceEmbeddings  
from langchain_community.vectorstores import FAISS  
from langchain_community.llms import Ollama  
Diagram showing LangChain + Streamlit workflow
Diagram showing LangChain + Streamlit workflow

Step 2: Upload & Process PDFs

In this section, you use Streamlit’s file uploader to allow users to select a local PDF file.

# Streamlit file uploader  
uploaded_file = st.file_uploader("Upload a PDF file", type="pdf")  

if uploaded_file:  
    # Save PDF temporarily  
    with open("temp.pdf", "wb") as f:  
        f.write(uploaded_file.getvalue())  

    # Load PDF text  
    loader = PDFPlumberLoader("temp.pdf")  
    docs = loader.load()  

Once uploaded, the PDFPlumberLoader function extracts text from the PDF, readying it for the next stage of the pipeline. This approach is convenient because it takes care of reading the file content without demanding extensive manual parsing.

Step 3: Chunk Documents Strategically

We want to use the RecursiveCharacterTextSplitter, the code breaks down the original PDF text into smaller segments (chunks). Let's explain the concepts of good chunking vs bad chunking here:

Side-by-side comparison of bad vs. good text chunking
Side-by-side comparison of bad vs. good text chunking

Why semantic chunking?

  • Groups related sentences (e.g., "How Milvus stores data" stays intact)
  • Avoids splitting tables or diagrams
# Split text into semantic chunks  
text_splitter = SemanticChunker(HuggingFaceEmbeddings())  
documents = text_splitter.split_documents(docs)  

This step preserves context by overlapping segments slightly, which helps the language model answer questions more accurately. Small, well-defined document chunks also make searches more efficient and relevant.

Step 4: Create a Searchable Knowledge Base

After splitting, the pipeline generates vector embeddings for the segments and stores them in a FAISS index.

# Generate embeddings  
embeddings = HuggingFaceEmbeddings()  
vector_store = FAISS.from_documents(documents, embeddings)  

# Connect retriever  
retriever = vector_store.as_retriever(search_kwargs={"k": 3})  # Fetch top 3 chunks  

This transforms text into a numerical representation that is much easier to query. Queries later run against this index to find the most contextually relevant chunks.

Step 5: Configure DeepSeek R1

Here, you instantiate a RetrievalQA chain using Deepseek R1 1.5B as the local LLM.

llm = Ollama(model="deepseek-r1:1.5b")  # Our 1.5B parameter model  

# Craft the prompt template  
prompt = """  
1. Use ONLY the context below.  
2. If unsure, say "I don’t know".  
3. Keep answers under 4 sentences.  

Context: {context}  

Question: {question}  

Answer:  
"""  
QA_CHAIN_PROMPT = PromptTemplate.from_template(prompt)  

This template forces the model to ground answers in your PDF’s content. By wrapping the language model with a retriever tied to the FAISS index, any queries made through the chain will look up context from the PDF’s content, making the answers grounded in the source material.

Step 6: Assemble the RAG Chain

Next, you can tie together the uploading, chunking, and retrieval steps into a coherent pipeline.

# Chain 1: Generate answers  
llm_chain = LLMChain(llm=llm, prompt=QA_CHAIN_PROMPT)  

# Chain 2: Combine document chunks  
document_prompt = PromptTemplate(  
    template="Context:\ncontent:{page_content}\nsource:{source}",  
    input_variables=["page_content", "source"]  
)  

# Final RAG pipeline  
qa = RetrievalQA(  
    combine_documents_chain=StuffDocumentsChain(  
        llm_chain=llm_chain,  
        document_prompt=document_prompt  
    ),  
    retriever=retriever  
)  

This is the core of the RAG (Retrieval-Augmented Generation) design, providing the large language model with verified context instead of having it rely purely on its internal training.

Step 7: Launch the Web Interface

Finally, the code uses Streamlit’s text input and write functions so users can type in questions and view responses right away.

# Streamlit UI  
user_input = st.text_input("Ask your PDF a question:")  

if user_input:  
    with st.spinner("Thinking..."):  
        response = qa(user_input)["result"]  
        st.write(response)  

As soon as the user enters a query, the chain retrieves the best matching chunks, feeds them into the language model, and displays an answer. With the langchain library properly installed, the code should work now without triggering the missing module error.

Ask and submit questions and get instant answers!

Here's the complete code:

The Future of RAG with DeepSeek

With features like self-verification and multi-hop reasoning in development, DeepSeek R1 is poised to unlock even more advanced RAG applications. Imagine AI that not only answers questions but debates its own logic—autonomously.

How to Run Deepseek R1 Locally Using Ollama ?Tutorials

How to Run Deepseek R1 Locally Using Ollama ?

Learn how to run DeepSeek R1 locally using Ollama in this comprehensive guide. Discover step-by-step instructions, prerequisites, and how to test the API with Apidog.

Ashley Innocent

January 21, 2025

How to Build and Document RESTful APIs with Flask-RESTX and ApidogTutorials

How to Build and Document RESTful APIs with Flask-RESTX and Apidog

Learn how to build RESTful APIs with Flask-RESTX, validate and serialize data, and generate interactive API documentation. Explore Apidog for seamless API testing and documentation management. Download Apidog for free and enhance your API development process today

Ashley Innocent

January 15, 2025

Appium Testing Automation TutorialTutorials

Appium Testing Automation Tutorial

Learn how to master mobile app testing automation with our comprehensive Appium tutorial. This guide covers setting up your environment, writing test scripts, advanced features, and integrating API testing using Apidog. Enhance your testing strategy and deliver robust applications with ease.

Ashley Innocent

January 13, 2025