Can LLMs Catch Fake News? Why Semantics Matter More Than Style

In the digital age, the rapid dissemination of information is a double-edged sword. While we have instant access to news, we are also bombarded by misinformation. Detecting fake news has become one of the most critical challenges in computer science and social media analysis.

For a long time, researchers relied heavily on social context—who retweeted whom, how quickly a post spread, and user comments—to identify fake news. But there is a glaring problem with this approach: privacy restrictions and early detection. Often, social context data is unavailable, incomplete, or arrives too late. We need methods that can look at the content of the news itself and determine its authenticity.

With the explosion of Large Language Models (LLMs) like GPT-4 and Llama, a natural question arises: Can we simply use these powerful models to detect fake news?

The research paper “On Fake News Detection with LLM Enhanced Semantics Mining” explores this exact question. The authors propose a novel framework called LESS4FD that moves beyond simple text analysis to understand the complex web of relationships between news, entities, and topics.

The Illusion of Style: Why LLMs Alone Aren’t Enough

Before diving into the solution, we must understand why the problem is harder than it looks. The researchers conducted a preliminary study that yielded a surprising result. They took standard news articles and fed them into powerful LLMs (like GPT-3.5 and Llama2) to generate “embeddings”—numerical representations of the text. They then used these embeddings to train a classifier.

You might expect the LLMs to perform beautifully. However, as shown in the chart below, that wasn’t the case.

A comparison between fake news detection performance on two datasets w.r.t. accuracy, precision, recall and F1 score.

As you can see in Figure 2, simply applying news embeddings from LLMs (the blue and orange bars) often underperformed compared to a specialized graph-based method (HeteroSGT, the red bar).

Why did this happen? LLMs are masters of language style and lexical semantics. They understand how words flow together grammatically. However, fake news is often written with the same stylistic polish as real news. A false claim like “The SpaceX CEO announces underwater city on Mars” is grammatically perfect. An LLM reading that sentence sees high-quality text.

The “defect” in fake news isn’t usually in the grammar; it’s in the high-level semantics. It’s about the relationships between entities that don’t belong together.

Figure 1: Irregular co-occurrence of meaningful entities in fake news on a specific topic (red arrows).

Look at Figure 1. In the fake news example (#2), the article discusses the “Spread of COVID-19.” However, it links this topic to “Genetically modified crops.” This is an irregular co-occurrence. To a human (or a sufficiently smart model), this relationship raises a red flag. To a standard text analyzer looking at word vectors, it’s just another sentence.

To solve this, the authors developed LESS4FD (LLM Enhanced Semantics Mining for Fake News Detection). This method doesn’t just read the text; it builds a map of the story.

The LESS4FD Methodology

The core philosophy of LESS4FD is that we need to convert unstructured news text into a structured “Heterogeneous Graph.” This graph connects three types of nodes: News, Entities, and Topics.

By structuring data this way, the model can look for those suspicious connections (like viruses and crops) that indicate falsehood.

1. The Architecture Overview

The system works in a pipeline. First, it uses LLMs to extract meaningful components from the text. Then, it constructs a graph. Finally, it uses a specialized propagation algorithm to learn “local” and “global” patterns.

Figure 3: Heterogeneous graph construction.

As illustrated in Figure 3, the superior approach (left side) involves breaking the news down into its constituent parts—Entities and Topics—rather than just feeding raw text into a classifier (right side).

2. Extraction with LLMs

The first step is mining the raw material. The authors use LLMs not as classifiers, but as information extractors. They designed specific prompts to pull out named entities (Persons, Organizations, Locations) and topic words.

Table 2: Prompt for entity extraction.

By using the prompt shown in Table 2, the system ensures it captures the “who,” “where,” and “what” of every article. Simultaneously, they use a topic modeling technique (Bertopic) to identify the broader themes (e.g., “Politics,” “Public Health”).

Once these topics are identified, the model needs to represent them mathematically. The embedding for a topic node \(x_i^t\) is calculated as the weighted sum of the words that make up that topic:

Equation for topic embedding

Here, \(w_{j,t}\) represents how important a specific word is to that topic. This gives us a mathematical vector representing the concept of the topic.

3. Building the Heterogeneous Graph

With the raw materials ready, the system builds the graph \(\mathcal{HG}\).

Nodes: News articles, Entities, and Topics.
Edges (Links):
News \(\leftrightarrow\) Entity (if the article mentions the entity).
News \(\leftrightarrow\) Topic (if the article focuses on that topic).

This graph allows the model to “hop” between concepts. It can see that Article A mentions Entity X, which is also mentioned in Article B, which falls under Topic Z.

4. Generalized Feature Propagation

Now comes the “learning” part. The model needs to aggregate information from the graph to decide if a news node is fake or real. The authors use a Generalized PageRank (GPR) approach.

Think of this as passing messages through the network. Information flows from entities and topics into the news nodes.

Equation for feature propagation

In this equation, \(H^s\) is the feature representation at step \(s\). \(P\) is the normalized adjacency matrix (the map of connections). The model repeats this propagation for several steps.

Finally, the representation of a news article \(Z\) is a weighted combination of what it learned at every step:

Equation for weighted sum of propagation

The weights \(w_s\) are learnable. This is crucial because it allows the model to decide how much to trust immediate neighbors (local semantics) versus distant connections (global semantics).

5. Local vs. Global Semantics

A key innovation in LESS4FD is explicitly separating Local and Global semantics.

Local Semantics (\(Z^l\)): Derived from a small number of propagation steps (e.g., 2 hops). This captures the direct context: “What entities are in this specific article?”
Global Semantics (\(Z^g\)): Derived from many propagation steps (e.g., 20 hops). This captures the broader narrative: “How does this article fit into the entire dataset of news?”

If the local semantics (the specific claims in the article) clash with the global semantics (the general consensus on the topic), it’s a strong indicator of fake news.

Training with Consistency Regularization

One of the biggest challenges in fake news detection is the lack of labeled data. We have millions of articles, but only a few are confirmed “Fake” or “Real” by fact-checkers.

To address this, LESS4FD uses Consistency Regularization. It uses unlabeled data to make the model more robust. The idea is simple: the model’s prediction based on local semantics should generally agree with its prediction based on global semantics.

The training objective includes two parts. First, the supervised loss for labeled data:

Equation for supervised loss

Second, the consistency loss for unlabeled data. The model creates a “prototype” prediction by averaging the local and global views:

Equation for prototype prediction

It then forces the local and global predictions to stay close to this prototype using KL-divergence (a measure of difference between probability distributions):

Equation for consistency loss

The final goal is to minimize both losses combined, controlled by a balance parameter \(\lambda_{ce}\):

Equation for total optimization

Experimental Results

Does this complex architecture actually work? The authors tested LESS4FD against seven baseline methods across five diverse datasets (covering COVID-19, politics, and general news).

Performance Comparison

The results were decisive. As shown in Table 3, LESS4FD (specifically the GPT-3.5 and Llama2 enhanced versions) consistently outperformed all baselines.

Table 3: Detection performance w.r.t accuracy and F1 score on five datasets (best in red, second-best in blue).

Note the significant jump in performance compared to BERT and TextGCN. This confirms that simply processing text tokens isn’t enough; modeling the structure of the story via entities and topics provides a massive advantage.

The ROC curves (Receiver Operating Characteristic) in Figure 5 further visualize this dominance. The curves for LESS4FD (the red lines) are consistently closer to the top-left corner, indicating a better trade-off between true positives and false positives.

Figure 5: ROC curves on five datasets.

Why does it work? (Ablation Study)

The authors didn’t just stop at “it works.” They performed an ablation study to see which parts were doing the heavy lifting. They tried removing the Entity nodes (\(E\)), the Topic nodes (\(T\)), and the Consistency Regularization (\(CR\)).

Table 5: Ablation results of LESS4FD* on five datasets.

Table 5 shows a clear trend:

\(\oslash \mathcal{HG}\): Removing the graph entirely (using just LLM embeddings) causes a massive drop in accuracy (e.g., from 97% to 63% on MM COVID). This proves that the graph structure is the most critical component.
Removing Entities or Topics individually also hurts performance, showing that both are necessary for a complete picture.

Tuning the “View”

Finally, the researchers explored how the “Local” and “Global” steps affect performance.

Figure 7: Sensitivity to s_l and s_g on MM COVID w.r.t. accuracy and F1 score.

Figure 7 displays 3D plots of accuracy based on the number of local steps (\(s_l\)) and global steps (\(s_g\)). The results suggest that a small number of local steps (around 5) combined with a larger number of global steps creates the optimal balance. This confirms the theory: you need to look closely at the article and broadly at the network to catch a lie.

Conclusion

The battle against fake news is evolving. As this paper demonstrates, we cannot rely solely on the linguistic fluency of LLMs to detect falsehoods, because fake news writers are becoming just as fluent as real journalists.

LESS4FD offers a robust solution by shifting the focus from style to semantics. By using LLMs to extract entities and topics, and then structuring that data into a heterogeneous graph, we can detect the irregular patterns that characterize misinformation.

Key Takeaways for Students:

Embeddings aren’t magic: High-quality language embeddings capture style, not necessarily factual consistency.
Structure matters: Representing data as a graph (News-Entity-Topic) exposes relationships that linear text processing misses.
Context is key: Detecting anomalies requires comparing the local instance (the article) against the global consensus (the dataset).
Data Scarcity: Techniques like consistency regularization allow us to learn from the vast ocean of unlabeled news, not just the few fact-checked articles.

This research paves the way for hybrid systems where LLMs and Graph Neural Networks work in tandem—LLMs providing the understanding, and GNNs providing the structural reasoning.

The Illusion of Style: Why LLMs Alone Aren’t Enough#

The LESS4FD Methodology#

1. The Architecture Overview#

2. Extraction with LLMs#

3. Building the Heterogeneous Graph#

4. Generalized Feature Propagation#

5. Local vs. Global Semantics#

Training with Consistency Regularization#

Experimental Results#

Performance Comparison#

Why does it work? (Ablation Study)#

Tuning the “View”#

Conclusion#