Introduction

We have all been there. You are venting to a chatbot—perhaps testing its capabilities or just looking for a sounding board—and you say, “I’m really stressed about my workload.” The bot replies, “I am sorry to hear you are stressed about your workload. Stress can be difficult.”

Technically, the sentence is correct. Grammatically, it is perfect. But emotionally? It feels hollow. It feels like a template. It lacks the subtle “attentiveness” of a human listener who knows exactly when to ask for more detail and when to just offer a simple, “Man, that sounds rough.”

The problem isn’t that Large Language Models (LLMs) don’t know enough words; it’s that they struggle with pragmatics—the social rules of language. Specifically, they struggle with managing the quantity of information they provide. They often just predict the most likely next word without considering whether they are saying too much or too little for the specific emotional context.

In a fascinating new paper titled “Towards LLM-powered Attentive Listener: A Pragmatic Approach through Quantity Self-Repair,” researchers from The Hong Kong Polytechnic University propose a solution inspired by human psychology. They suggest that to be better listeners, LLMs need to learn the art of Self-Repair: the ability to mentally “travel” through different versions of a response and pick the one with the perfect amount of information.

The Theory: Grice’s Maxims and Self-Repair

To understand why chatbots often sound weird, we need to look at linguistics. The philosopher H.P. Grice proposed the Quantity Maxims, which state that a speaker should:

Make their contribution as informative as required (don’t hold back necessary info).
Do not make their contribution more informative than is necessary (don’t ramble).

Humans do this naturally through a process called Self-Repair. Before we speak, we often draft a sentence in our heads, realize it’s too vague or too invasive, and “repair” it.

Figure 1: Self-Repair Pracitces and Quantity Maxims: People aim for optimal quantity through self-repair.

As shown in Figure 1, a human listener might initially think, “So staying at home led you to go to your GP.” But they might repair this internal thought to be more empathetic and tentative: “So it sounds like… that was what led you to go to your GP.” This subtle adjustment prevents the listener from sounding presumptuous (“Too Meaningful/Aggrandizement”) or cold (“Meaningless”).

The researchers argue that current LLMs lack this “covert” self-repair process. They just generate. To fix this, the paper introduces two novel mechanisms: Q-Tuning (teaching the model to adjust information quantity) and Q-Traveling (using search algorithms to find the best response).

The Core Method: Tuning and Traveling

The researchers’ approach is built on the idea that for any given response, there are “Q-alternatives”—variations of that response that contain either more or less specific information.

1. The Concept of Q-Alternatives

Imagine a conversation where a user says, “I got something nice the other day, chocolates from my partner.”

An LLM could respond in many ways. It could be very specific (Q+), asking “What kind of chocolates were they?” or less specific (Q-), simply acknowledging the gesture.

Figure 2: Generating Attentive Response through Traveling among“Q-alternatives”

Figure 2 illustrates this “Mental Traveling.” The model shouldn’t just pick the first thing that comes to its neural network. It should explore these alternatives—moving towards more specific (Q+) or less specific (Q-) branches—to find the “optimal” response that fits the conversational goal.

2. Q-Tuning: Training the Capability

Before the model can choose the best option, it must learn how to generate these options. This is where Q-Tuning comes in.

The researchers used a technique called Semantic Sampling to create a training dataset. They took human responses (\(u^h\)) and prompted a standard LLM to create two variations:

Down-sample (\(u^{h-}\)): Replace words with broader concepts (hypernyms) or remove details. This represents Q-.
Up-sample (\(u^{h+}\)): Replace words with specific examples (hyponyms) or add details. This represents Q+.

For example:

Original: “That is a heavy subject.”
Down-sample (Q-): “That is a tough issue.” (Broader, less intense)
Up-sample (Q+): “That is a weighty issue and a difficult situation to grapple with.” (More specific, more intense)

They then fine-tuned the LLM using a specific loss function that teaches the model to generate these variations on command.

Figure 3: The overview of our method. Q-Tuning draws on the model’s inner semantic knowledge to train pragmatic strategies. Q-Traveling instructs the model to explore and search out the optimal Q-alternative.

Figure 3 (left side) shows this process. The model leverages its inner semantic knowledge to learn how to slide up and down the scale of “informativeness.”

3. Q-Traveling: The Inference Engine

Once the model is “Q-Tuned,” it knows how to change its response. But when should it do so? This is handled by Q-Traveling during the actual chat.

Instead of just generating one response, the system uses a heuristic search algorithm (specifically, a variation of the A* search).

Initialize: The model generates a base response (\(u^0\)).
Expand: It generates a “More Specific” version (\(u^{p+}\)) and a “Less Specific” version (\(u^{p-}\)).
Score: It evaluates these options using a Heuristic Function (\(\mathcal{H}\)). This function scores the responses based on the current goal (e.g., “Be Empathetic” or “Be Helpful”).
Select & Repeat: It picks the highest-scoring path and continues until it finds the optimal response.

The mathematical objective is to find the final response (\(u^T\)) that maximizes this heuristic score after a chain of self-repairs:

Equation for the optimal alternative

This turns conversation generation from a simple probability game into a planning problem. The model is effectively thinking: “Is this too specific? Let me try being vaguer. No, that’s too cold. Let me try being specific but empathetic. Yes, that scores high. I’ll say that.”

Experiments and Results

The researchers tested their method (applied to LLaMA-2 and Mistral) against standard baselines using two datasets: Empathetic Dialogue (ED) and Emotional Support Conversation (ESC).

Does it actually sound more human?

To verify this, they conducted a human evaluation. They asked human judges to rate the responses on three criteria: Human-like, Empathetic, and Attentive.

Table 2: Results of Human Evaluation

The results in Table 2 are striking. The LlaMA + Q-Traveling model significantly outperformed the base LlaMA model.

Human-like: 41.7% win rate (vs 30% loss).
Attentive: 46.7% win rate (vs 40% loss).

This suggests that the “self-repair” process makes the AI feel much less robotic.

Visualizing the “Human-Like” Zone

One of the most compelling visualizations in the paper is the analysis of personality embeddings. The researchers plotted the responses of different models to see where they landed on a spectrum of sentiment and personality.

Figure 4: Q-Tuning and Q-Traveling anchor the personality embeddings to a more human-like subzone

In Figure 4, look at the red ovals. The baseline models (Left) often scatter their responses or cluster in “safe” but robotic zones. The Q-Tuning + Q-Traveling models (Middle), however, show a density distribution that looks much more similar to the Human Response (Right). They successfully anchor the AI’s personality into a more human-like “subzone,” specifically around “Reassurance” and “Cathartic” traits, rather than just generic optimism.

Adaptability: The Chameleon Effect

A major advantage of Q-Traveling is that you can change the Heuristic Function (\(\mathcal{H}\)) to change the bot’s behavior without retraining the whole model.

Figure 5: Q-Traveling reflects goal-driven conversation: the effect of scoring function on lexical choice.

Figure 5 demonstrates this flexibility.

Goal A (Empathetic): When the user mentions exam nerves, the model optimizes for empathy. It chooses a response that validates feelings: “I hope everything is going well for you.” (Lower quantity of interrogation, higher support).
Goal B (Helpfulness): When the goal switches to helpfulness, the model “travels” to a higher-quantity response (Q+), asking specifics: “What subject is the exam for?”

This proves that Q-Traveling isn’t just about making the bot “nicer”—it’s about making it controllable.

Conclusion and Implications

The “Q-Tuning” and “Q-Traveling” framework represents a significant step forward in making LLMs pragmatic communicators. By moving away from simple “next-token prediction” and towards a “generate-then-repair” architecture, we can build agents that don’t just speak, but actually listen.

The key takeaways are:

Quantity Matters: Empathy isn’t just about sentiment words; it’s about the volume and specificity of information.
Self-Repair is Key: Mimicking the human cognitive process of refining inner speech leads to more natural outputs.
Search during Inference: We can dramatically improve response quality by letting the model “think” (search) before it speaks.

As we continue to integrate LLMs into mental health support, education, and customer service, techniques like Q-Traveling will be essential. They bridge the gap between a machine that processes text and a companion that understands conversation.

The researchers have open-sourced their repository, allowing the community to build upon this “thoughtful” approach to AI interaction.

Introduction#

The Theory: Grice’s Maxims and Self-Repair#

The Core Method: Tuning and Traveling#

1. The Concept of Q-Alternatives#

2. Q-Tuning: Training the Capability#

3. Q-Traveling: The Inference Engine#

Experiments and Results#

Does it actually sound more human?#

Visualizing the “Human-Like” Zone#

Adaptability: The Chameleon Effect#

Conclusion and Implications#