AI-powered research assistants—like Perplexity, Gemini’s Deep Research, and others—are remarkable tools. You type in a question, and they return a polished, source-backed report. In the background, they scour the web, synthesize information, and deliver the findings in a neat, structured format.

But have you ever asked yourself: What’s actually going on under the hood?
How do these systems decide what queries to run, which sources to trust, and how to structure the report?
The answer: in most current tools, you don’t get to know, and you definitely don’t get to change it.

These systems operate with hard-coded research strategies designed by their developers. That rigidity creates several key problems:

  1. Lack of Control: Users can’t enforce a hierarchy of sources (e.g., “prefer peer-reviewed articles over blogs”), control cross-validation workflows, or manage research cost constraints.
  2. No Specialization: Users can’t design workflows tailored to specific domains—like legal research, medical literature reviews, or financial due diligence—that require specialized multi-step processes.
  3. Model Lock-in: The underlying LLM is fixed. You can’t swap in a newer, better model from a different provider with your preferred tool.

A recent paper from NVIDIA Research, Universal Deep Research: Bring Your Own Model and Strategy, proposes a solution—Universal Deep Research (UDR). Rather than handing you a monolithic black-box research assistant, UDR gives you a framework. You define the research strategy in plain language and pair it with any language model you choose.

This is a big shift in thinking about agentic AI. Let’s explore why.


The Current Landscape of Deep Research Tools

Before diving into UDR, it’s worth understanding how most Deep Research Tools (DRTs) function today.

A typical DRT isn’t “just” a chatbot. It:

  1. Parses your prompt into a concrete plan.
  2. Executes a fixed set of research steps—searching, analyzing, and compiling findings.
  3. Pushes progress notifications to the user before finally presenting the final report.

A diagram showing the workflow of a typical Deep Research Tool, which involves a fixed sequence of steps from prompt parsing to report compilation.
Figure 1: A high-level diagram visualizing the components of a typical deep research tool. Unlike plain conversational LLMs, DRTs continuously update the user on their progress before producing the finished report.

According to the paper, DRTs generally fall into two categories:

  • Consumer-facing tools (e.g., Perplexity, Gemini): Search the open web using expansive or iterative strategies, branching searches based on earlier results.
  • Enterprise-focused tools (e.g., NVIDIA AI-Q, SambaNova): Work inside closed databases using rigid, structured workflows—often fixed pipelines with predictable outputs.

Different strategies—but a common limitation: the “how” is fixed and beyond user control.


How Universal Deep Research Changes the Game

UDR introduces a second key input:

  • Research Prompt (the what),
  • Research Strategy (the how).

You don’t just ask your question—you tell the system exactly how to tackle it, in natural language.

This simple addition changes the architecture fundamentally. Instead of a static research agent, UDR builds custom agents on the fly from your strategy description.

A diagram of the Universal Deep Research (UDR) workflow, showing that both a Research Strategy and a Research Prompt are provided by the user.
Figure 2: In UDR, the user supplies both the strategy and the prompt, enabling far greater customization than fixed-strategy DRTs.


The Two-Phase Operation of UDR

UDR runs in two major phases:

Phase 1 — Strategy Processing: From English to Code

This is where UDR transforms your plain-English instructions into a Python function.

  1. Provide the Strategy: Typically as a clear, numbered or bullet-point list.
  2. Conversion to Code: A large language model takes your instructions, plus a set of constraints (allowed functions, code structures), and turns them into a single callable function.
  3. Reliability via Comments: If asked to “just” write code, models might skip steps or take shortcuts. The researchers solved this by requiring the model to prepend each code block with a comment restating the original strategy step. This “show your work” method greatly improved fidelity.

Example:

1
2
3
# Step 3: Generate 3 search phrases based on the user's prompt.
search_phrases = llm_call("Generate 3 search phrases for: " + prompt)
phrases = search_phrases.split('\n')

The resulting function—a faithful representation of your plan—becomes your bespoke research agent.


Phase 2 — Strategy Execution: Running the Agent

Once compiled, the Python function runs in a quarantined sandbox, ensuring that user-defined instructions can’t harm the host environment.

Key features:

  • State Management: Data from each step is stored in named variables, not in an ever-growing LLM context window. This lets complex research workflows run with as little as 8k tokens.
  • LLM as Tool, Not Brain: UDR uses the LLM for local reasoning tasks—summarization, ranking, extraction—while the overall control logic resides in CPU-executed code.
  • Structured Notifications: yield statements produce deterministic, structured updates for the UI. The user decides exactly what’s reported.

This architecture is more efficient (CPU orchestration, fewer LLM calls) and transparent than typical end-to-end LLM control loops.


User Interface & Example Strategies

To showcase UDR’s flexibility, the authors built a web-based demonstration interface.

A screenshot of the UDR user interface, showing the prompt input bar, a list of selectable research strategies, and a text area for editing the selected strategy.
Figure 3: The UDR demo interface—search bar (top), strategy list (middle), editable strategy text area (bottom).

The interface lets you:

  • Enter a prompt.
  • Select pre-written strategies (Minimal, Expansive, Intensive).
  • Edit these strategies in plain text before execution.

Example strategies:

  • Minimal: Simple, linear—generate search phrases → search → aggregate → report.
  • Expansive: Broader—break into sub-topics → generate phrases per topic → search → aggregate → report.
  • Intensive: Iterative refinement—search → generate new phrases from learning → repeat before final report.

UDR handled everything from playful pop-culture questions to in-depth historical profiles.

A screenshot of the UDR interface after completing a research task about the airspeed velocity of an unladen swallow, showing progress notifications and the final formatted report.
Figure 4: Completed research workflow. Notifications detail each step; the final Markdown report appears at right.


Limitations

The authors note current constraints:

  1. Dependence on LLM Code Fidelity: Quality depends on how well the model translates the natural-language strategy into working code.
  2. User Strategy Quality: UDR does no deep validation of strategy logic—poorly designed workflows yield poor results.
  3. No Mid-Execution Edits: Once a workflow begins, you can’t change its path without stopping and restarting.

Future Directions

The paper’s recommendations for advancing systems like UDR:

  • Strategy Libraries: Provide users with well-tested templates for adaptation.
  • Reasoning Control: Let users guide not just the actions but the thought process of LLMs.
  • Automated Agent Generation: Research methods to derive optimized strategies directly from batches of prompts.

Conclusion: A New Paradigm for AI Agents

Universal Deep Research shows it’s possible to build transparent, customizable, model-agnostic research assistants.
By separating the what from the how, and translating human-readable strategies into deterministic code:

  • Users gain real agency over the research process.
  • Systems become auditable and efficient.
  • Any capable LLM can be plugged in.

It’s more than just a better research tool—it’s a glimpse at a future where we program AI using our own language as the source code for autonomous, controllable agents.