Imagine you are trying to sell a used bicycle. If a potential buyer comes across as analytical and detail-oriented, you might focus on the bike’s technical specifications and maintenance history. However, if the buyer seems emotional or hesitant, you might pivot to discussing how much joy the bike will bring them or how safe it is.
As humans, we intuitively adapt our strategies based on who we are talking to. We read the room.
For Artificial Intelligence, specifically Large Language Models (LLMs), this is incredibly difficult. Most dialogue agents utilize a “one-size-fits-all” approach. They are polite, helpful, and generally collaborative. But what happens when the goal isn’t just to help, but to negotiate a price or persuade someone to donate to charity? These are non-collaborative dialogues, where interests conflict.
In this post, we will explore a fascinating paper titled “Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation.” We will uncover how researchers are teaching AI to develop a “Theory of Mind”—the ability to understand user perspectives—and how training against a diverse population of simulated personalities creates a much sharper negotiator.
The Problem: The “One-Size-Fits-All” Trap
Current LLM-based agents struggle with strategic planning in real-world scenarios. The researchers identified two main reasons for this failure:
- Ignorance of User Characteristics: Most agents look only at the conversation history (the text). They fail to explicitly model who the user is or what their mental state might be.
- Training rigidness: Agents are typically trained against a single type of user simulator. This is like practicing chess against only one opponent; you become very good at beating that specific person, but you crumble when a new player uses a strategy you haven’t seen before.
To prove this, the researchers established a rigorous evaluation protocol. They didn’t just test agents against a generic user; they created a suite of diverse user simulators with specific personality traits (like Openness or Neuroticism) and distinct decision-making styles.

As shown in Figure 1, the evaluation involves generating distinct personas (Step 1), sampling them into a simulator equipped with non-collaborative strategies (Step 2), and then testing the dialogue agent against these diverse personalities (Step 3). The results showed that standard agents failed to adapt, performing well with some personalities but poorly with others.
The Solution: TRIP (Tailored Strategic Planning)
To fix this, the researchers proposed a method called TRIP (Tailored stRategIc Planning).
TRIP is designed to make agents adaptable. It consists of two major components working in tandem:
- User-Aware Strategic Planning (UASP): A module that actively tries to understand the user’s mind.
- Population-Based Training Paradigm (PBTP): A training regimen that forces the agent to interact with a diverse “population” of users.
Let’s look at the architecture of TRIP below.

1. User-Aware Strategic Planning (UASP)
The core innovation in the planning module is the integration of Theory of Mind (ToM).
In psychology, ToM is the ability to attribute mental states—beliefs, intents, desires, emotions, knowledge—to oneself and others. The TRIP model uses an LLM to analyze the dialogue history and explicitly infer two things:
- Mental States: What is the user aiming for? (e.g., “The user aims to deal with $15.”)
- Future Actions: What is the user likely to do next? (e.g., “The user may offer a higher price.”)
Instead of just feeding raw chat logs into a strategy planner, TRIP feeds these inferred mental states into a trainable planner (BERT-based). This planner then predicts the best strategy to use (like “Logical Appeal” or “Emotion Appeal”), which guides the final response generation.
2. Population-Based Training Paradigm (PBTP)
The second pillar of TRIP is how it learns. If you want an agent to be flexible, you cannot train it in a vacuum.
The researchers used Reinforcement Learning (RL), but with a twist. Instead of interacting with one static user simulator, the TRIP agent is trained against a population of 40 diverse user simulators.
These simulators are programmed with different combinations of:
- Big-Five Personality Traits: Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism.
- Decision-Making Styles: Directive, Analytical, Conceptual, Behavioral.
During training, the agent might face an “Agreeable” user in one episode and a “Neurotic” user in the next. This forces the strategy planner to stop memorizing scripts and start learning generalized, adaptable strategies that work across different human behaviors.
Experimental Results: Does Diversity Work?
The researchers tested TRIP on two benchmark tasks:
- Price Negotiation: Buying and selling items (based on the Craigslist-Bargain dataset).
- Charity Persuasion: Convincing a user to donate to a cause (based on the PersuasionForGood dataset).
Overall Performance
The results were compelling. TRIP consistently outperformed baseline models, including standard LLMs and other state-of-the-art planners like PPDPP.

In Table 2 (top of the image above), we see that TRIP achieved the highest Success Rate (SR) and Sale-to-List Ratio (SL%) (which indicates a better deal price) while using fewer turns (AT) to get there.
Perhaps more importantly, the Human Evaluation (Figure 4, bottom of the image) shows that when real humans interacted with the agents, they found TRIP significantly more successful than the standard “Vanilla” LLM or the PPDPP baseline.
Adaptability Across Personalities
The true test of TRIP was whether it could handle different personality types. The radar charts below illustrate the success rate across different user personas.

The light blue dashed area represents TRIP. Notice how it covers a much larger area than the other shapes? This indicates balanced improvement. While other models might spike in performance for one personality type but fail at another, TRIP maintains high performance regardless of whether the user is “Extraverted,” “Conscientious,” or “Open.”
Seeing It in Action: A Case Study
Numbers are great, but what does this look like in an actual conversation? The researchers provided a case study comparing the baseline (PPDPP) against TRIP in a charity persuasion task.

In Figure 5, we see two different user personas: Openness (left) and Neuroticism (right).
- The Baseline Failure: The PPDPP agent (top rows) uses a repetitive strategy. regardless of the user, it relies on “Credibility Appeal,” reciting facts about the charity.
- The TRIP Success:
- With the Openness user, TRIP recognizes that the user is open to new ideas. It uses a “Logical Appeal” followed by an “Emotion Appeal,” framing the charity as an important cause.
- With the Neuroticism user (who is skeptical and defensive), TRIP pivots. It uses a “Personal-related Inquiry” and a “Personal Story” (“As a parent myself…”), realizing that this user type responds better to personal connection and reassurance than to cold facts.
This effectively demonstrates the “Theory of Mind” in action. The agent didn’t just read the text; it inferred what kind of person it was talking to and adjusted its strategy accordingly.
Why Training Population Matters
One might wonder: Is the User-Aware module doing all the work, or is the diverse training actually necessary? The researchers conducted an ablation study to find out.

Figure 6 shows the training curves.
- The Blue Line (PPDPP) trains against a single user. It learns quickly (converges fast) but hits a “ceiling”—its performance flattens out and doesn’t get very high.
- The Grey Line (TRIP without User Awareness) and Orange Line (TRIP without diverse population) show that stripping away components hurts performance.
- Ideally, when you combine diverse population training with user awareness, the model might learn slightly slower at the very beginning (because the problem is harder), but it reaches a significantly higher peak of performance.
Conclusion
The “TRIP” method highlights a fundamental truth about social intelligence: flexibility is key.
By moving away from static, script-like interactions and embracing the chaos of diverse human personalities, AI agents can become significantly more effective negotiators. The combination of User-Awareness (inferring hidden mental states) and Population-Based Training (practicing against diverse opponents) allows these agents to escape the “one-size-fits-all” trap.
For students and researchers in AI, this paper serves as a reminder that “data” isn’t just about quantity. The diversity of the interactions we use to train our models determines whether they will be rigid automata or adaptive, socially intelligent partners. As LLMs continue to integrate into complex social roles like tutoring, sales, and counseling, techniques like TRIP will be essential for building systems that truly understand us.
](https://deep-paper.org/en/paper/2403.06769/images/cover.png)