Beyond Random Generation: How DATA ADVISOR Fixes LLM Safety Training

Introduction

In the race to build more capable Large Language Models (LLMs), data is the fuel. But high-quality, human-annotated data is a finite and expensive resource. To bypass this bottleneck, researchers have turned to a clever, somewhat recursive solution: using LLMs to generate the data to train other LLMs. This technique, often called “Self-Instruct,” allows for massive scalability.

However, there is a catch. When an LLM generates data based solely on a few random examples, it tends to be repetitive. It mimics the patterns it sees but lacks the “awareness” to explore new, underrepresented concepts. In the context of safety alignment—teaching models to refuse harmful requests—this is a critical vulnerability. If your data generator only creates questions about physical violence, the resulting model might be perfectly safe against violent prompts but completely vulnerable to questions about financial fraud or cyberbullying.

How do we force an automated data generator to be comprehensive?

Enter DATA ADVISOR, a new framework proposed by researchers from USC and Amazon. Instead of letting an LLM blindly generate data, DATA ADVISOR acts as a strategic project manager. It monitors the dataset in real-time, spots missing concepts (like “we have enough fraud, we need more biological hazards”), and instructs the generator to fill those gaps.

In this post, we will break down how DATA ADVISOR works, why it outperforms standard methods, and how it ensures LLMs are safe across a wide spectrum of potential harms.

To understand why DATA ADVISOR is necessary, we first need to look at the status quo: Self-Instruct.

In a typical Self-Instruct pipeline, you start with a small “seed” set of human-written examples. The system randomly picks a few of these seeds and feeds them to an LLM with a prompt like, “Write a new question similar to these.” The LLM generates a new datapoint, which gets added to the pool. This process repeats thousands of times.

While this scales well, it suffers from two major issues:

Bias Amplification: The generator often gravitates toward the most common or easiest examples. Over time, the dataset becomes homogenized.
Lack of Coverage: The generator is stateless—it doesn’t know what it has already generated. It doesn’t know that it has generated 500 questions about “stealing cars” but zero questions about “laundering money.”

For safety alignment, coverage is everything. Safety training requires a dataset of “red-teaming” prompts—harmful questions used to teach the model to say “no.” If the dataset misses specific categories of harm (like nuanced discrimination or complex privacy violations), the final model will have safety blind spots.

The Core Method: DATA ADVISOR

DATA ADVISOR transforms the data generation process from a random walk into a directed search. It introduces a feedback loop that governs the generation process based on a set of guiding principles (in this case, the principle is “diverse safety coverage”).

As illustrated in the architecture diagram below, DATA ADVISOR sits above the standard data generation pipeline. It consists of three distinct phases that cycle iteratively.

Figure 1: Overview of DATA ADVISOR for dynamically enhancing standard LLM-based data generation.

Let’s break down these three phases:

1. Data Summarization (The Monitor)

The first challenge is knowing what is currently in the dataset. However, as the dataset grows to thousands of examples, you cannot feed the entire text history into an LLM’s context window to ask, “What do we have so far?”

DATA ADVISOR solves this with an iterative summarization technique.

Input: The summary from the previous step + the newly generated datapoint.
Action: The Advisor updates the summary to include any new concepts introduced by the new datapoint.
Output: A concise, running report of the dataset’s current coverage (e.g., “Contains: Self-harm, Physical harm, Violence”).

This allows the system to maintain a high-level view of the dataset’s distribution without needing infinite memory.

2. Weakness Identification (The Analyst)

Once the system knows what it has, it needs to determine what it lacks. This is where the Guiding Principles come in. For safety alignment, the principle is to maximize the diversity of harmful categories.

The Advisor compares the current Data Summary against the goal of diverse coverage. It asks the specific question: “Based on what we have, what is missing?”

For example, if the summary lists “Violence” and “Theft,” the Advisor might identify that “Intellectual Property Violation” or “Cyberbullying” is absent. This turns a vague desire for “more data” into a specific, actionable weakness.

3. Data Generation with Advice (The Director)

In standard Self-Instruct, the generator is just told to “generate new data.” In DATA ADVISOR, the generator receives specific instructions.

The system takes the identified weakness and converts it into a prompt constraint.

Standard Prompt: “Generate a harmful question.”
DATA ADVISOR Prompt: “Generate a harmful question related to Virtual Identity Attacks.”

This proactive guidance ensures that every new data point contributes something unique to the dataset, filling the holes identified in the previous step.

Experimental Setup

To prove that this “project manager” approach works, the authors conducted a rigorous evaluation on Safety Alignment.

Task: Generate 10,000 safety alignment datapoints (harmful prompts paired with safe refusals).
Models Trained: Three different base models were fine-tuned: Mistral, Llama2, and Falcon.
Baselines: The models were compared against versions trained with data from standard Self-Instruct.
Evaluation Metrics:
Safety: Measured using CatQA and BeaverTails (datasets containing diverse harmful questions).
Utility: Measured using MMLU (a massive multitask benchmark) to ensure the safety training didn’t make the models stupid or overly refusal-happy on innocent topics.

Experiments & Results

The results highlight a clear advantage for the directed approach of DATA ADVISOR over random generation.

1. High-Level Safety and Utility

The primary goal was to increase safety without sacrificing the model’s general intelligence (utility). Figure 2 below shows the performance across the three base models.

Figure 2: Safety and utility of models trained with different data.

As we can see in the charts:

Safety (CatQA & BeaverTails): The orange bars (DATA ADVISOR) consistently outscore the purple bars (Self-Instruct) and the blue bars (Base Model). The safety scores for DATA ADVISOR are consistently in the 90%+ range.
Utility (MMLU): Crucially, the utility scores (the right-most group in each chart) do not drop. In fact, for Mistral and Falcon, DATA ADVISOR actually improves utility compared to Self-Instruct. This suggests that high-quality, diverse safety data helps the model distinguish between safe and unsafe contexts better than repetitive data does.

2. Fine-Grained Safety Coverage

The real power of DATA ADVISOR is revealed when we look at specific categories of harm. Standard generation often over-indexes on “easy” harms like violence but misses nuanced ones.

Figure 3 shows the breakdown of harmful rates (lower is better) on the CatQA dataset.

Figure 3: Harmful rate by category on CatQA for Mistral, Llama2, and Falcon.

Notice the categories like “Economic Harm” and “Tailored Financial Advice.” The Self-Instruct method (purple) struggles here, often failing to refuse these requests because its training data likely lacked examples in these domains. DATA ADVISOR (orange), however, drives the harmful rate down to near zero across almost all categories.

We see a similar trend in the BeaverTails evaluation (Figure 4), which covers different categories like “Terrorism” and “Organized Crime.”

Figure 4: Harmful rate by category on BeaverTails.

In categories like “Privacy Violation” or “Financial Deception,” the gap between the baseline and DATA ADVISOR is significant. This proves that the Advisor successfully steered the generator to create training data for these specific, often neglected areas.

3. Data Diversity and Progression

Is the data actually more diverse, or just better categorized? The researchers analyzed the linguistic diversity of the generated prompts using n-grams (sequences of \(n\) words).

Figure 5: Ratio of distinct n-grams for all prompts in generated data vs human data.

Figure 5 shows that DATA ADVISOR (orange line) maintains a much higher ratio of distinct n-grams compared to Self-Instruct (purple line). As \(n\) increases (meaning we look at longer phrases), Self-Instruct collapses—it keeps repeating the same long phrases. DATA ADVISOR remains highly diverse, almost matching the diversity of human-annotated data (CatQA/BeaverTails).

Furthermore, we can qualitatively see this evolution in Table 1.

Table 1: Examples of data generated by DATA ADVISOR demonstrate its capability to identify new categories of safety issues iteratively.

This table is fascinating because it tracks the iterations.

At Iteration 28, the model generates a standard “Spatiotemporal Manipulation” prompt.
By Iteration 528, it is exploring “Social Isolation.”
By Iteration 997, it is generating complex queries about “Moral Dilemma Inducing.”

This progression proves that the Weakness Identification module doesn’t just cycle through a list; it pushes the boundaries of the dataset into increasingly subtle and complex territory.

4. The Importance of Data Mixture

Finally, the researchers ran an ablation study to confirm that safety data must be mixed with general utility data (like the Alpagasus dataset).

Figure 6: Ablation on training data. Both safety alignment data and utility alignment data are essential.

Figure 6 illustrates that if you train on only safety data (purple bars), your utility (MMLU) crashes. If you train on only utility data (blue bars), your safety is nonexistent. The combination (orange bars) is essential. DATA ADVISOR provides the high-quality safety component of this mixture.

Conclusion

The “bigger is better” era of AI data is shifting toward “smarter is better.” As we run out of high-quality human data, we are increasingly reliant on synthetic data generated by models. The paper “DATA ADVISOR” demonstrates that we cannot simply leave these generators to their own devices.

Without guidance, LLMs revert to their training mean, producing repetitive and biased data. By implementing a dynamic control loop—Monitor, Identify Weakness, Advise—DATA ADVISOR ensures that synthetic datasets are comprehensive and diverse.

The implications extend beyond safety. While this paper focused on preventing harmful outputs, the same “Advisor” logic could be applied to anything: ensuring a math dataset covers all calculus topics, or ensuring a coding dataset covers every Python library. DATA ADVISOR represents a shift from passive data collection to active, curated data design.

Introduction#

Background: The Problem with “Blind” Generation#

The Core Method: DATA ADVISOR#

1. Data Summarization (The Monitor)#

2. Weakness Identification (The Analyst)#

3. Data Generation with Advice (The Director)#

Experimental Setup#

Experiments & Results#

1. High-Level Safety and Utility#

2. Fine-Grained Safety Coverage#

3. Data Diversity and Progression#

4. The Importance of Data Mixture#

Conclusion#