Imagine you are browsing the internet trying to find the birth date of a historical figure. You find two conflicting sources. One is a scanned PDF of an academic biography written by a historian. The other is a comment on a social media thread that is riddled with spelling errors. Which one do you trust?
Almost instinctively, you trust the academic biography. You rely on heuristics—mental shortcuts—that tell you formal language, proper editing, and authoritative tone correlate with truth.
But what about Large Language Models (LLMs)? These models consume the entire internet—the good, the bad, and the typo-laden. When an LLM encounters conflicting information in its training data, does it treat all data points equally? Or has it developed “trust instincts” similar to humans?
In a fascinating paper titled “Formality is Favored: Unraveling the Learning Preferences of Large Language Models on Data with Conflicting Knowledge,” researchers from Nanjing University dive deep into the “psychology” of LLMs. They discovered that models like LLaMA and Pythia exhibit strong prejudices regarding text style. They prefer formal language over casual speech and perfect spelling over typos. More importantly, the researchers explain why: it’s not just about aesthetics; it’s about a learned metric of consistency.
In this deep dive, we will unpack how the researchers uncovered these preferences and what it means for our understanding of Artificial Intelligence.
The Problem: When Knowledge Conflicts
LLMs are trained on massive corpora. While developers clean this data, it inevitably contains contradictions. A news article might report a celebrity’s age correctly, while a fan fiction story might change it for the plot.
If a model is trained on both pieces of text, how does it resolve the conflict? Does it average them out? Does it pick the one it saw last?
The researchers hypothesized that LLMs possess learning preferences. Just as humans use the “style” of a text to judge its credibility, LLMs might assign higher probability to knowledge presented in specific formats. To test this, they couldn’t use real-world data (because the model already knows who Barack Obama is). They had to build a controlled, synthetic environment.
The Methodology: Synthetic Biographies
The core of this study involves creating a custom dataset of fictional characters. The researchers generated 1,000 fictional names and associated them with specific attributes: birth date, birth place, university, major, and company.
Here is the twist: For every character, they generated conflicting information wrapped in different textual features.
Textual Features and Styles
The researchers focused on two main categories of features:
- Style: Newspapers, Scientific Reports, Novels, Social Media.
- Spelling: Good Spelling vs. Poor Spelling.
They used GPT-4 to generate biography templates matching these styles.

As seen in Table 1 above, the content is similar—a biography of “Olivia Hamilton”—but the presentation varies wildly. The “Newspapers Style” sounds objective and journalistic. The “Novels Style” uses narrative flair (“Once upon a time…”). The “Poor Spelling” version includes obvious typos (“attented,” “edukashun”).
Injecting Conflict
To test preference, the model needs to choose between two conflicting facts. The researchers created two sets of knowledge, let’s call them Knowledge A and Knowledge B.
- Knowledge A might say Olivia was born in 1921.
- Knowledge B might say Olivia was born in 2012.
They then wrapped Knowledge A in one style (e.g., Newspaper) and Knowledge B in another style (e.g., Novel). The dataset \(I_{A \text{ vs } B}\) is constructed by combining templates of both styles:

In this equation, \(T_A\) represents templates in style A (e.g., Newspaper) containing Knowledge A (\(k_A\)), and \(T_B\) represents templates in style B (e.g., Novel) containing Knowledge B (\(k_B\)). The model is fine-tuned on both simultaneously. It sees conflicting facts about the same person, presented in different voices.
Measuring Preference
After fine-tuning the model (specifically LLaMA2-7B) on this conflicting data, the researchers tested it. They gave the model a neutral prompt and checked which fact it completed the sentence with.
They calculated a Preference Score, \(Pr(A,B)\), which measures the percentage of time the model assigned a higher probability to Knowledge A over Knowledge B.

If \(Pr(A,B)\) is greater than 0.5 (or 50%), it means the model prefers the knowledge presented in Style A over Style B.
What Does the Model Prefer?
The results were striking. The LLMs displayed clear, human-like biases regarding the source of information.
1. Formality is King
The researchers pitted various styles against each other. The results, shown below, reveal a strong hierarchy.

Looking at Table 2, when “Newspapers” conflicted with “Novels,” the model preferred the information found in the Newspaper style 63.9% of the time. When “Scientific Reports” battled “Novels,” the scientific style won 61.8% of the time.
Essentially, if a textbook says one thing and a novel says another, the LLM is statistically more likely to believe the textbook.
2. Spelling Errors Destroy Credibility
The bias against poor spelling was even more consistent. In the “Good Spelling vs. Poor Spelling” comparison, the model preferred the correctly spelled information nearly 60% of the time. This suggests that typos act as a negative signal to the model, telling it, “This information is likely unreliable.”
3. Formal Text is Learned Faster
It wasn’t just about the final decision. The researchers monitored the models during the training process. They found that models “picked up” knowledge from formal texts much faster than from casual ones.

In Figure 1, notice the red line (Newspaper) and blue line (Scientific Report). They shoot up in accuracy much faster than the purple (Social Media) or green (Novel) lines. The model effectively struggles to memorize facts presented in a casual or narrative format compared to a formal declaration.
This phenomenon holds true for spelling as well.

Figure 8 shows a dramatic gap. The blue line (Good Spelling) learns rapidly. The red line (Poor Spelling) lags significantly. The model is resistant to encoding misspelled data into its parameters.
4. Larger Models are More Biased
Is this a quirk of smaller models? No. In fact, it’s the opposite. The researchers tested models of different sizes, from small Pythia models to the 12 billion parameter versions.

Figure 2 illustrates that as the model size increases (x-axis), the preference for Newspapers over Social Media (y-axis) becomes more extreme. The “Birth Date” preference (blue line) shoots up to nearly 100% for the largest models. This suggests that “judging a book by its cover” is an emergent ability that gets stronger as models get smarter.
The “Why”: The Consistency-Driven Hypothesis
Why do LLMs do this? They don’t have human social conditioning. They don’t respect the New York Times brand or look down on bad spellers socially.
The authors propose the Consistency-Driven Feature Preference Hypothesis.
The idea is statistical. During the massive pre-training phase (before the researchers ever touched the model), the LLM read the whole internet. It learned that data sharing certain features (like formal language) tends to be consistent with other data. Conversely, data with other features (like fiction or typo-ridden rants) tends to be unique, hallucinations, or inconsistent with the majority of the web.

Figure 3 outlines this causal graph. The model observes the text features (\(A\) or \(B\)). It has an internal estimate of how consistent those features are with the rest of its knowledge (\(C(A)\) vs \(C(B)\)). This forms an inherent preference \(P(A,B)\).
Proving the Hypothesis with Synthetic Features
To prove this wasn’t just about language style, the researchers created a brilliant control experiment. They invented nonsense features where the “style” was just a specific label or number, stripped of any linguistic baggage.
- Source Name: “According to [Synthetic Name A]…” vs “According to [Synthetic Name B]…”
- Source Time: “According to Global News (Vol. [Low Number])…” vs “(Vol. [High Number])…”
They then manipulated the consistency ratio. They created a dataset where “Feature A” was supported by 9 other documents, and “Feature B” was supported by only 1.
If their hypothesis was correct, the model should learn to trust “Feature A” simply because it is associated with the majority, regardless of what “Feature A” actually looks like.

Figure 4 confirms this. Look at the orange bars (ratio 9:1). When Feature A is supported by the majority (9:1), the preference score shoots up to 90%. When the ratio is balanced (5:5, the blue bars), the preference is neutral (~50%).
This confirms that the model is a consistency detector. It learns to identify features (like “Newspaper Style” or “Source Name A”) that signal, “This information is backed up by the majority.”
Can We “Brainwash” the Model?
If preferences are just learned correlations with consistency, can they be reversed? Can we teach an LLM to trust novels more than newspapers?
The researchers performed a counterfactual experiment. They took the “Newspaper vs. Novel” conflict but rigged the game. They constructed a training set where Novels were the consistent source (backed by supporting evidence) and Newspapers were the outliers.

Figure 7 shows the results of this reversal.
- Blue bars (No Support): The standard setting. The model prefers Newspapers (high preference score).
- Green bars (With Support 9:1 favoring B): The researchers supported the Novel text (Feature B) with extra evidence. The preference score for Newspapers drops drastically, in some cases flipping below 50%.
This proves that the “bias” for formality isn’t hard-coded. It’s a soft constraint learned from the statistical reality that formal text is usually more factually consistent. If the world turned upside down and novels became the source of truth, LLMs would adapt to prefer “Once upon a time” over “Breaking News.”
Conclusion and Implications
This research sheds light on the black box of LLM learning. It tells us that these models are not passive sponges absorbing all data equally. They are active discriminators that have developed heuristics to filter noise.
Key Takeaways:
- Formality equals Trust: LLMs prefer knowledge formatted in scientific or journalistic styles.
- Typos are Toxic: Poor spelling significantly reduces the likelihood that an LLM will learn the information presented.
- Mechanism of Consistency: These preferences arise because the model learns that certain styles correlate with information redundancy and consistency across the web.
Why does this matter? For students and practitioners of AI, this has huge implications for Prompt Engineering and Data Curation.
- Prompting: If you want an LLM to take your context seriously, write it formally. A prompt that looks like a scientific report might override the model’s priors more effectively than a casual instruction.
- Data Cleaning: When fine-tuning models, fixing spelling errors isn’t just cosmetic; it’s essential for the model to accept the data as “true.”
- Safety: The model’s reliance on “majority rules” consistency makes it robust against noise, but potentially susceptible to widespread misconceptions if they appear in formal formats.
The researchers have shown that while LLMs may not have human judgment, they have evolved a digital proxy for it: a statistical sense of which “voices” on the internet tell the truth.
](https://deep-paper.org/en/paper/2410.04784/images/cover.png)