Introduction
We live in the golden age of answers. If you want to know the population of Brazil or the boiling point of tungsten, a quick Google search or a prompt to ChatGPT gives you the answer instantly. These systems excel at addressing known unknowns—information gaps you are aware of and can articulate into a specific question.
But what about the unknown unknowns? These are the concepts, connections, and perspectives you don’t even know exist. How do you ask a question about a topic when you don’t know the vocabulary? How do you explore the implications of a new technology if you don’t know the economic or ethical frameworks surrounding it?
In complex information seeking—like academic research, market analysis, or learning a new field—traditional tools often fail. Search engines require you to generate the next query constantly. Chatbots tend to be passive, answering only what is asked, often trapping users in an “echo chamber” of their own limited prior knowledge.
A recent paper from researchers at Stanford and Yale proposes a fascinating solution: Co-STORM. Instead of a lonely interrogation of a search bar, Co-STORM invites users to a dinner party of AI experts. By observing and participating in a collaborative discourse between simulated agents, users can discover serendipitous information and learn more deeply with less mental effort.

As shown in Figure 1, this shift from “Using Search Engines” (High Effort) to “Interacting with Co-STORM” (Low Effort, High Exploration) represents a new paradigm in human-AI interaction.
The Problem: The Cognitive Load of “Search”
To understand why Co-STORM is necessary, we must look at where current systems fall short in complex information seeking.
Complex information seeking isn’t about finding a single fact. It involves collecting, sifting, understanding, and organizing information from multiple sources to build a knowledge product, like a report or a mental model.
Table 1 illustrates the gaps in current technology:

- Information Retrieval (Search Engines): You get multiple sources, but you have to do all the synthesis yourself.
- Single-Turn QA: You get an answer, but no depth or ongoing exploration.
- Conversational QA (Chatbots): You can interact, but the bot rarely takes the initiative to show you what you should be asking.
- Report Generation (like the original STORM system): It writes a great report, but it’s a static process. You can’t interrupt, steer, or learn during the generation.
The researchers identified that to truly support learning, a system needs to support Collaborative Discourse. Just as children learn by listening to parents discuss a topic, or students learn by observing a debate, humans learn effectively when they observe and occasionally participate in a conversation between knowledgeable entities.
The Co-STORM Method
Co-STORM (Collaborative STORM) is an information-seeking assistant that emulates a “roundtable” discussion. It doesn’t just answer you; it creates a conversation around you, which you can steer.
The Architecture of Discourse
At the heart of Co-STORM is a multi-agent system grounded in real-time information retrieval (Search).

As illustrated in Figure 2, the system consists of three main components working in harmony:
- The Agents (Experts & Moderator): These Large Language Models (LLMs) simulate a discussion.
- The User: You can observe the agents talking or jump in to ask a question or steer the topic.
- The Mind Map: A dynamic data structure that organizes the conversation visually, reducing the cognitive load of reading a wall of text.
1. The Cast of Characters
If you ask a standard chatbot about “AlphaFold 3,” it gives you a summary. Co-STORM acts differently. It first determines who should be at the table. For a biotech topic, it might instantiate a “Geneticist,” an “AI Expert,” and a “Molecular Biologist.”
Perspective-Guided Experts: These agents don’t just generate text; they simulate a perspective. When it is an expert’s turn:
- They analyze the conversation history.
- They decide on an intent (e.g., Ask a question, Provide an answer, Request detail).
- If answering, they generate search queries, retrieve real data from the internet, and cite their sources.
The Moderator: If you leave a group of experts alone, they might obsess over niche details. The Moderator is a special agent designed to ensure breadth. It monitors the conversation and injects new questions to steer the discourse toward unexplored areas.
Crucially, the Moderator looks for unused information. It performs a semantic search to find information relevant to the general topic but dissimilar to what has already been discussed. The researchers mathematically defined this “reranking score” to prioritize novelty:

Here, the system balances relevance to the topic (\(t\)) with dissimilarity to the specific question currently being discussed (\(q\)). This mathematical nudge forces the AI to drag the conversation out of echo chambers and into the “unknown unknowns.”
2. The Dynamic Mind Map
Listening to a complex multi-party debate can be confusing. To help the user keep track, Co-STORM maintains a hierarchical Mind Map (visible in the top left of Figure 2).
As the conversation progresses, the system uses an “Insert Operation.” It analyzes every new piece of information and decides where it belongs in the tree structure. If a node gets too big, it triggers a “Reorganize Operation,” splitting the node into sub-topics. This allows the user to glance at the map and instantly understand the structure of the knowledge being uncovered.
3. The Final Artifact
At any point, the user can request a Cited Report. The system uses the Mind Map as an outline and the collected search results to write a comprehensive, Wikipedia-style article. This turns the casual exploration into a concrete knowledge product.
Evaluation: Measuring Discovery
How do you measure if a system helps someone find “unknown unknowns”? The researchers attacked this problem from three angles: a new dataset, automatic metrics, and human trials.
The WildSeek Dataset
Existing datasets for information seeking were too simple. They focused on fact retrieval. To evaluate Co-STORM, the researchers created WildSeek, a dataset derived from real-world usage of the STORM engine.

As shown in Table 2, these aren’t simple queries. They are open-ended goals, such as “Investigate how a new shared currency could eliminate transaction costs.” The taxonomy of this dataset covers diverse fields from Economics to Healthcare (Figure 5).

Automatic Evaluation Results
The researchers simulated users interacting with Co-STORM, a standard RAG Chatbot, and the original STORM system. They measured the quality of the final reports and the discourse itself.

Table 3 reveals critical insights:
- Depth & Novelty: Co-STORM significantly outperforms RAG Chatbots and STORM+QA in the Depth and Novelty of the generated reports.
- Engagement: The conversation turns were rated as significantly more engaging.
- Diversity: Co-STORM cited nearly double the number of unique URLs compared to the baselines, indicating a much broader exploration of the internet.
Ablation studies (removing specific components) showed that the Moderator is essential. Without the moderator steering the conversation toward new areas, the “Novelty” scores drop significantly (Figure 3).

Human Evaluation: Do Users Like It?
Ultimately, the goal is to help humans. The researchers recruited 20 participants for a study comparing Co-STORM against Google Search and a RAG Chatbot.
The results were overwhelmingly positive.

As displayed in Figure 4:
- 70% of participants preferred Co-STORM over a Search Engine.
- 78% preferred it over a RAG Chatbot.
- Users specifically noted that Co-STORM required “Less Effort” while providing higher “User Engagement.”
Participants highlighted the “serendipity” of the system. One user noted, “Co-STORM allows for almost full automation and much better understanding as it brings up topics that the user may not even think of.”
Conclusion
The Co-STORM paper presents a convincing argument that the future of search isn’t just about better answers—it’s about better questions.
By moving from a “tool” metaphor (where the AI waits for input) to a “partner” metaphor (where AI agents actively discuss and explore), we can lower the barrier to learning complex topics. Co-STORM demonstrates that when we allow AI agents to converse with each other under the supervision of a moderator, they can surface the “unknown unknowns” that a human user might never have found on their own.
For students and researchers, this suggests a future where our AI assistants don’t just fetch data; they brainstorm with us, challenge our assumptions, and help us map out the frontiers of our own ignorance.
](https://deep-paper.org/en/paper/2408.15232/images/cover.png)