How to Recreate OpenAI Deep Research, But Open Source

Deep research in artificial intelligence is not a single monolithic model—it is rather a process, an iterative workflow that involves searching, reading, and reasoning until an answer is found. OpenAI’s proprietary systems, such as those powering ChatGPT or GPT-4, use complex pipelines that continuously refine responses. Now imagine being able to build a similar system using open-source tools. This article explains how to recreate a Deep Research system using the jina-ai/node-DeepResearch project. We will break down the code, detail each component, and show you how to set up and extend the system.

button

1. Overview and Purpose

DeepResearch is built around a simple yet powerful idea:

Keep searching and reading webpages until finding the answer (or exceeding the token budget).

The system takes a query (for example, “who is bigger? cohere, jina ai, voyage?”) and enters a loop. At each step, the agent (an intelligent module) decides on an action. It might search for new keywords, read the contents of URLs, reflect by generating follow-up questions, or provide an answer if it is certain. This iterative cycle continues until the answer is definitive or the token budget (a proxy for computational resources) is exceeded.

Installation and Setup

Before diving into the code, you need to install the required dependencies and set your API keys. The project uses Gemini for language modeling, Brave or DuckDuckGo for web search, and the Jina Reader for fetching webpage content. Here’s how you set up the project:

export GEMINI_API_KEY=...  # for Gemini API, ask Han
export JINA_API_KEY=jina_...  # free Jina API key, get from https://jina.ai/reader
export BRAVE_API_KEY=...  # optional; if not provided, it defaults to DuckDuckGo search

git clone https://github.com/jina-ai/node-DeepResearch.git
cd node-DeepResearch
npm install

The README even provides examples for running the system with different queries:

Simple Query:
npm run dev "1+1=" or npm run dev "what is the capital of France?"
Multi-step Query:
npm run dev "what is the latest news from Jina AI?"
npm run dev "what is the twitter account of jina ai's founder"
Ambiguous, Research-like Query:
npm run dev "who is bigger? cohere, jina ai, voyage?"
npm run dev "who will be president of US in 2028?"
npm run dev "what should be jina ai strategy for 2025?"

In addition to a command-line interface, the project also includes a web server API that exposes endpoints for submitting queries and streaming progress updates.

2. Architecture and Key Components

Let’s break down the major components of the system by exploring the core files:

2.1 agent.ts – The Core Logic

The agent.ts file is the heart of the system. It implements the logic for the “deep research” cycle: generating prompts, deciding on actions, and iterating through search, read, reflect, and answer steps.

Key Elements in agent.ts:

Imports and Setup:

The file begins by importing various tools and libraries:

GoogleGenerativeAI is used for language generation.
readUrl from ./tools/read fetches and processes webpage content.
duckSearch and braveSearch provide external search capabilities.
Utility functions such as rewriteQuery, dedupQueries, evaluateAnswer, and analyzeSteps help refine queries and evaluate responses.
Configuration values (API keys, token budgets, and model configurations) are imported from config.ts.
Token and Action Tracker utilities monitor token usage and the state of the agent’s progress.
Finally, the file imports types (e.g., StepAction, ResponseSchema) defined in types.ts.

Sleep Function:

async function sleep(ms: number) {
  const seconds = Math.ceil(ms / 1000);
  console.log(`Waiting ${seconds}s...`);
  return new Promise(resolve => setTimeout(resolve, ms));
}

This helper function is used to delay operations—useful to avoid rate-limiting when calling external APIs.

Schema Generation:

The getSchema function defines the JSON schema for the agent’s response. It dynamically builds a schema that includes properties for:

search: Requires a keyword-based searchQuery.
answer: Specifies that a final answer must include natural language text and supporting references (exact quotes and URLs).
reflect: Lists clarifying sub-questions to fill in knowledge gaps.
visit: Contains URL targets for reading external content.

By enforcing a strict JSON schema, the agent’s output remains consistent and machine-readable.

Prompt Generation:

The getPrompt function creates a detailed prompt that is sent to the language model. It aggregates several sections:

Header: It includes the current date and the original question.
Context and Knowledge: Any prior actions and gathered intermediate knowledge are included.
Unsuccessful Attempts: If previous actions failed to yield a definitive answer, those failures (with reasons and improvements) are recorded.
Actions: A list of possible actions is shown. Depending on flags (like allowSearch, allowRead, etc.), it enumerates the allowed operations. In “Beast Mode,” the instructions urge the agent to try its hardest to produce an answer even if uncertainty remains.

This layered prompt guides the generative AI model to “think” step-by-step and select one action at a time.

Main Loop in getResponse:

The function getResponse is the core of the agent’s iterative loop. It sets up the initial context:

Trackers: Two trackers are used—a TokenTracker to monitor the number of tokens used (preventing the system from exceeding its budget) and an ActionTracker to track each step and its outcomes.
Gaps and Knowledge: It begins with a “gap” (the original question) and then adds intermediate questions if the system needs to reflect on its reasoning.

Inside a while loop, the agent:

Waits (using the sleep function) to avoid API rate limits.
Generates a prompt based on the current context.
Calls the generative model (via GoogleGenerativeAI) to produce a response.
Parses the JSON response to determine which action was taken (answer, reflect, search, or visit).
Depending on the action:
Answer: It evaluates the answer and, if definitive, ends the loop.
Reflect: It processes sub-questions to fill knowledge gaps.
Search: It rewrites the search query, deduplicates previously used keywords, and retrieves new URLs from either DuckDuckGo or Brave.
Visit: It reads the content from provided URLs and updates the knowledge base.

If the loop runs out of budget or too many bad attempts occur, the system enters “Beast Mode,” where a final, aggressive attempt to answer is made.

Context Storage:

The storeContext function writes the current prompt and various memory states (context, queries, questions, and gathered knowledge) to files. This archival process aids debugging and allows for further analysis of the decision-making process.

Final Execution:

The main() function at the end of agent.ts uses the command-line argument (the query), invokes getResponse, and prints the final answer along with a summary of token usage.

2.2 config.ts – Configuring the Environment

The config.ts file is where the environment and model configurations are defined:

Environment Variables: Using dotenv, it loads API keys for Gemini, Jina, and Brave. It also supports configuring an HTTPS proxy.
Search Provider: The system dynamically selects the search provider based on whether a Brave API key is provided; if not, it defaults to DuckDuckGo.
Model Configurations: The file sets up default and specific model configurations for various tasks, such as query rewriting, deduplication, and evaluation. For instance, the agent’s generative model is configured with a temperature of 0.7 to balance creativity and determinism.
Token Budget and Delay: The constant STEP_SLEEP is set to 1000 milliseconds, ensuring a one-second pause between steps.

This configuration file makes it easy to change settings and adapt the system to different environments or model behaviors.

2.3 server.ts – The Web Server API

To allow users to interact with DeepResearch via HTTP requests, the system includes a simple Express-based server in server.ts. This file sets up endpoints that handle query submissions and stream progress updates in real time.

Key Points in server.ts:

Express Setup:

The server uses Express and CORS to support cross-origin requests. It listens on port 3000 (or a port specified in the environment).

Query Endpoint (POST /api/v1/query):

Clients send a JSON payload containing the query, token budget, and maximum allowed bad attempts.
The server creates a new tracker (for token usage and action state) and assigns a unique requestId.
The request is processed asynchronously by calling getResponse.
Once complete, the final answer is stored, and progress is emitted using an EventEmitter.

Streaming Endpoint (GET /api/v1/stream/:requestId):

This endpoint uses Server-Sent Events (SSE) to continuously push updates back to the client.
As the agent takes actions (search, reflect, visit, answer), progress events are emitted. Each event includes current step information, token usage, and action details.
This allows clients to monitor the research process in real time.

Task Storage and Retrieval:

The server writes task results to the file system (under a tasks directory) and provides an endpoint (GET /api/v1/task/:requestId) to retrieve a stored result.

This web server component makes the research agent accessible over HTTP, enabling both interactive experiments and integration into larger systems.

2.4 test-duck.ts – A Utility for Testing Search

The file test-duck.ts is a standalone script that uses Axios to send an HTTP GET request to an external API (in this case, jsonplaceholder.typicode.com) as a test. Although its primary function is to verify that HTTP requests work correctly (including setting proper headers and handling errors), it serves as an example of how external requests are handled within the system. In a more complex setup, similar patterns are used when querying search APIs like DuckDuckGo or Brave.

2.5 types.ts – Defining Consistent Data Structures

The types.ts file defines all the custom types used across the project:

Action Types:
These include the various actions the agent can perform:

SearchAction: Contains a searchQuery string.
AnswerAction: Requires an answer string and supporting references.
ReflectAction: Includes an array of questionsToAnswer.
VisitAction: Contains a list of URLTargets.

Response Types:
The file defines structured responses for search results, URL reading, evaluation, error analysis, and more. This helps maintain consistency and ensures that every module interprets the data in the same way.

Schema Types:
The JSON schema definitions ensure that responses generated by the language model strictly adhere to the expected format. This is crucial for downstream processing.

Tracker Context:
Custom types for the token and action trackers are also defined, which are used to monitor the state of the conversation and the research process.

3. The Iterative Deep Research Process

The overall system follows a methodical, iterative process that mimics how a human researcher might work:

Initialization:
The process begins with the original question, which is added to a “gaps” list (i.e., the unknowns that need to be filled).

Prompt Generation:
The agent builds a prompt using the current question, previous context, gathered knowledge, and even unsuccessful attempts. This prompt is then sent to the generative AI model.

Action Selection:
Based on the model’s output, the agent selects one of several actions:

Search: Formulate a new query to gather more data.
Visit: Retrieve detailed content from a specific URL.
Reflect: Generate clarifying sub-questions to address any knowledge gaps.
Answer: Provide a final answer if the information is deemed definitive.

Context Update:
Each step updates the internal trackers (token usage and action state) and archives the current state to files. This ensures transparency and allows for debugging or later review.

Evaluation and Looping:
When an answer is proposed, an evaluation step checks whether it is definitive. If not, the system stores the failed attempt details and adjusts its strategy. The cycle continues until a satisfactory answer is found or the token budget is exhausted.

Beast Mode:
If normal steps fail to yield a definitive answer within the constraints, the system enters “Beast Mode.” In this mode, the generative AI is forced to produce an answer based on the accumulated context—even if it means making an educated guess.

4. Real-Time Progress and Feedback

An integral feature of the DeepResearch system is its real-time feedback mechanism. Through the web server’s streaming endpoint:

Clients receive progress updates that detail which action was taken, token usage statistics, and the current state of the action tracker.
The progress events (as shown in the README examples) include both the thought process of the agent and details like the current step and token breakdown.
This live feedback loop is invaluable for debugging, monitoring resource usage, and understanding how the system is reasoning.

For example, a progress event might look like this:

data: {
  "type": "progress",
  "trackers": {
    "tokenUsage": 74950,
    "tokenBreakdown": {
      "agent": 64631,
      "read": 10319
    },
    "actionState": {
      "action": "search",
      "thoughts": "The text mentions several investors in Jina AI but doesn’t specify ownership percentages. A direct search is needed.",
      "URLTargets": [],
      "answer": "",
      "questionsToAnswer": [],
      "references": [],
      "searchQuery": "Jina AI investor ownership percentages"
    },
    "step": 7,
    "badAttempts": 0,
    "gaps": []
  }
}

This detailed progress reporting allows developers to see how the agent’s reasoning evolves over time, providing insights into both successes and areas needing improvement.

5. Extending and Customizing DeepResearch

The open-source nature of this project means you can adapt the system for your needs. Here are some ideas for extending DeepResearch:

Custom Search Providers:
You might integrate additional search providers or customize the query rewriting process for domain-specific searches.

Enhanced Reading Modules:
If you require more detailed text processing, you can integrate alternative NLP models or adjust the Jina Reader component to handle new content types.

Improved Evaluation:
The evaluator module currently checks if an answer is definitive. You could expand this to incorporate more nuanced metrics, such as sentiment analysis or fact-checking algorithms.

User Interface:
While the current system uses a command-line interface and a simple web server for streaming events, you could build a full-fledged web or mobile interface for interactive research sessions.

Scalability Enhancements:
The current implementation runs as a single-node service. For production use, consider containerizing the application and deploying it using Kubernetes or another orchestration platform to handle high traffic and distributed processing.

6. Security, Performance, and Best Practices

When deploying an AI-driven system like DeepResearch, there are a few additional considerations:

API Key Management:
Ensure that your API keys (for Gemini, Jina, and Brave) are securely stored and never hardcoded in your source code. Environment variables and secure vaults are recommended.

Rate Limiting:
The built-in sleep function helps avoid rate limiting by delaying successive requests. However, consider implementing additional rate-limiting mechanisms at the server or API gateway level.

Data Validation:
Strictly validate input queries and responses. The JSON schema defined in the agent helps, but you should also validate incoming HTTP requests to prevent malicious inputs.

Error Handling:
Robust error handling (as seen in the server code and test-duck.ts) is critical. This ensures that unexpected API failures or malformed responses do not crash the system.

Resource Monitoring:
Tracking token usage is essential. The TokenTracker and ActionTracker classes provide insights into resource consumption. Monitoring these metrics can help in fine-tuning the system’s performance and avoiding excessive usage.

7. Conclusion

The DeepResearch project by Jina AI exemplifies how complex, iterative research processes can be built using open-source tools. By integrating search engines, generative AI models, and intelligent reasoning loops, the system continuously refines its answer until it is certain—or until resource limits are reached.

In this article, we explored how to recreate OpenAI Deep Research using an open-source approach:

We discussed the installation steps and the environment setup.
We broke down the core modules, from the iterative agent logic in agent.ts to the configuration, web server API, and type definitions.
We examined the real-time feedback mechanism that streams progress updates, providing transparency into the reasoning process.
Finally, we looked at potential extensions, customization options, and best practices for deploying such a system.

By making these advanced research techniques available as open-source, projects like DeepResearch democratize access to cutting-edge AI methods. Whether you’re a researcher, developer, or enterprise looking to integrate deep research capabilities into your workflows, this project serves as both an inspiration and a practical foundation for building your own solution.

The iterative design—combining search, reading, reflection, and answering in a continuous loop—ensures that even ambiguous or complex queries are handled with multiple layers of scrutiny. And with a detailed architecture that tracks token usage and provides live feedback, you gain deep insights into the reasoning process behind each answer.

If you are eager to experiment, clone the repository, set up your environment as described, and run queries ranging from simple arithmetic to multifaceted research questions. With a little customization, you can tailor the system to new domains and even enhance its reasoning capabilities. Open-source projects like this pave the way for community-driven innovation in AI research.

By following this detailed breakdown and analysis, you can recreate and extend the ideas behind OpenAI’s Deep Research in a fully open-source manner. Whether you’re looking to build on the existing codebase or integrate similar methodologies into your projects, the roadmap is clear: iterate, refine, and push the boundaries of automated research.

button