The Hidden Backbone: Why Community Notes Need Professional Fact-Checkers

In the ever-evolving landscape of social media, the battle against misinformation has taken a fascinating turn. For years, platforms like Facebook and Twitter (now X) relied on partnerships with professional fact-checking organizations—groups like Snopes, PolitiFact, and Reuters—to flag false claims. However, recently, the tide has shifted toward “community moderation.”

The logic seems democratic and scalable: instead of paying a small group of experts to check a massive volume of content, why not empower the users themselves to police the platform? This is the philosophy behind Community Notes on X (formerly Twitter). The idea is that the “wisdom of the crowd” can identify falsehoods faster and more effectively than a newsroom.

But a critical question remains unanswered: Is the crowd actually doing the work from scratch, or are they standing on the shoulders of giants?

A recent paper from the University of Copenhagen, titled “Can Community Notes Replace Professional Fact-Checkers?”, dives deep into this ecosystem. The researchers analyzed over 1.5 million notes to understand the relationship between amateur sleuths and professional journalists. Their findings challenge the assumption that platforms can simply swap one for the other.

The Problem: The “Crowd” vs. The “Pros”

To understand why this research matters, we have to look at the current industry trends. Major platforms are slowly backing away from paid partnerships with fact-checkers. Meta (Facebook/Instagram) has signaled shifts toward community-driven models, and X has bet its entire trust and safety strategy on Community Notes.

The assumption driving these decisions is that community notes and professional fact-checking are independent, competing strategies. If the community can do it for free, why pay the professionals?

However, if community notes heavily rely on the investigative work done by professionals to substantiate their claims, then defunding the professionals could inadvertently collapse the quality of the community notes. This paper seeks to answer two main research questions:

To what extent do community notes rely on professional fact-checkers?
What types of misinformation require a professional citation?

What is a Community Note?

Before diving into the data, it is helpful to visualize what we are analyzing. A Community Note is a user-generated appendage to a post. If a user sees a misleading tweet, they can write a note. Other users then rate that note. If enough people from different viewpoints rate it as “Helpful,” the note gets published.

Figure 1: An example of a community note. Notice the fact-checking link and rating.

As shown in Figure 1 above, a successful note typically does two things: it explains why the post is misleading (e.g., clarifying that aluminum cookware is not linked to Alzheimer’s) and it provides sources. The nature of those sources—specifically, how often they link to professional fact-checks—is the core mystery of this study.

The Methodology: Analyzing 1.5 Million Notes

The researchers took a massive, data-driven approach to this problem. They downloaded the entire public dataset of Twitter/X Community Notes from January 2021 to January 2025. This amounted to approximately 1.5 million notes.

Step 1: Filtering the Data

Not every note is relevant to misinformation. Many are spam, non-English, or labeled “not misleading.” The team filtered the dataset down to 664,000 English-language notes that were specifically written to address misleading content.

To give you a sense of the scale and growth of this feature, look at the timeline of note creation below.

Figure 6: A histogram of the number of community notes written every month and their rating.

Figure 6 illustrates the explosion in volume after Community Notes went global in late 2022 (marked by the grey line). The yellow bars represent “Helpful” notes—the ones that actually appear to the public. The vast majority (the red bars) remain in “Needs More Ratings” purgatory, highlighting how difficult it is to get a note published.

Step 2: Classifying the Sources

The researchers needed to know what websites the note-writers were linking to. They extracted every URL from the notes and built a classification pipeline. This wasn’t just a simple keyword search; they used a multi-step process:

Direct Matching: They checked domains against a curated list of known professional fact-checkers (e.g., snopes.com, politifact.com).
Semantic Search: They looked for “fact-check” terminology in the URL paths of major news sites (e.g., cnn.com/fact-check/…).
LLM Classification: For ambiguous URLs, they used GPT-4 to categorize the website into buckets like “News,” “Government,” “Academic,” or “Social Media.”

You can see the diversity of organizations they tracked in the table below. This includes global heavyweights and niche, topic-specific fact-checkers.

Table 3: List of professional fact-checking organisations and their URLs.

Step 3: Determining the Topic

Knowing who was cited is one half of the puzzle. Knowing what the post was about is the other. The researchers used a zero-shot text classification model (ModernBERT) to categorize the topic of the tweets (e.g., Politics, Health, Sports, Art).

Furthermore, they used GPT-4 to determine if a tweet was related to a “Broader Narrative” or conspiracy theory (such as election fraud or anti-vaccine narratives) versus a simple factual error (like a mislabeled photo).

To ensure their automated systems were accurate, the authors performed manual annotations on a subset of data. Figure 8 outlines the rigorous criteria they used for this human review.

Figure 8: Our annotation setup.

The Core Results

The analysis yielded several striking results that reshape how we should view crowdsourced moderation.

1. The Crowd Uses the Pros More Than We Thought

Previous studies suggested that fact-checking URLs made up only about 1% of citations in Community Notes. This paper found the number to be significantly higher.

As detailed in Figure 2, when looking specifically at Helpful notes (chart ‘b’), the reliance on Fact-Checking sources jumps to 7%. While this might still sound low compared to “News” (23%), it represents a massive absolute number of notes. Furthermore, “News” citations often reference articles that are, themselves, based on fact-checking investigations.

Figure 2: The categories of links used by Community notes’ authors as a source.

Crucially, compare chart ‘b’ (Helpful notes) with chart ‘c’ (Unhelpful notes). Helpful notes are more than twice as likely to cite a fact-checker than unhelpful notes. This suggests that the community values professional verification. When a note writer cites a pro, their note is more likely to “win” the algorithm and be displayed to the public.

2. Fact-Checks Boost “Helpfulness” Scores

The researchers didn’t just look at whether a note was published; they looked at the raw rating scores given by users.

Figure 7 breaks down how users rated notes based on specific attributes. The teal bars represent notes that included a fact-checking source, while the red bars represent those that did not.

Figure 7: Community ratings of notes with and without fact-checking source.

Notes with fact-checking sources consistently scored higher on “Helpful: Clear”, “Helpful: Good Sources”, and “Helpful: Important Context.” This confirms that professional journalism adds a layer of credibility that the “crowd” alone struggles to replicate.

3. High-Stakes Topics Require Professional Intervention

Not all misinformation is created equal. A tweet claiming a celebrity is 6-foot-2 when they are actually 5-foot-10 is technically misinformation, but it doesn’t carry the same societal risk as a tweet claiming a vaccine causes DNA damage.

The researchers found that the reliance on professional fact-checkers varies wildly depending on the topic.

Figure 5: Distribution of notes’ topics, with and without a fact-checking source.

Figure 5 reveals a clear trend. Topics like Politics, Health, Science, and Scams have a much higher proportion of fact-checking citations (the teal bars). Conversely, categories like “Sports,” “Art,” or “Entertainment” rarely rely on professional fact-checkers.

This makes intuitive sense: refuting a sports rumor might only require a link to a box score or a team press release. Refuting a complex political lie or a medical conspiracy theory requires deep investigative work—the exact kind of work professional fact-checkers do.

4. The “Conspiracy” Connection

Perhaps the most significant finding of the paper is the link between Broader Narratives and fact-checking.

The researchers hypothesized that complex conspiracies—those woven into larger cultural narratives like “The Great Replacement” or “The 2020 Stolen Election”—are too difficult for an average user to debunk from scratch. They require the heavy lifting of a professional investigation.

The data supported this overwhelmingly.

Table 1: Percentage of samples related to a broader narrative or conspiracy vs. have a fact-checking source.

As shown in Table 1 (excerpted above), claims related to a broader narrative or conspiracy are twice as likely (22% vs 11% in specific samples) to cite a fact-checking source compared to isolated claims.

When a tweet feeds into a complex lie, the community note writer doesn’t usually do original research. They go find a Snopes or PolitiFact article that has already done the legwork.

We can see this dynamic even more clearly in Figure 4.

Figure 4: (a) strategies in debunking claims related to broader narratives. (b) the different ways in which factchecking sources are used to debunk claims.

Look at chart (a). When a claim is part of a “Broader narrative” (the teal bars), note writers are far more likely to “Link official source” or “Link scientific source.”

Chart (b) shows how the fact-checking source is used. It is rarely used just to add “missing context.” Instead, it is used to “Discredit the source of the claim” or provide hard “Scientific evidence.”

Conclusion: A Symbiotic Relationship

The implications of this paper are profound for the future of online safety. The narrative that “Community Notes” can replace “Professional Fact-Checkers” appears to be flawed.

The data suggests the relationship is not substitution; it is symbiosis.

The Professionals act as the primary researchers. They have the time, funding, and expertise to investigate complex health claims, call government officials to verify data, and dig through archives to debunk conspiracies.
The Community acts as the distribution network. They spot the misinformation in the wild and use the “ammunition” provided by the professionals to tag and refute it locally.

If social media platforms continue to defund professional fact-checking organizations under the guise of relying on the community, they may be cutting off the very supply line that makes the community notes effective.

Figure 3 provides a final piece of evidence for this.

Figure 3: Mean scores of community annotations of misleading posts.

When a note cites a professional fact-checker (teal bars), it is significantly more effective at addressing “Factual Errors” and “Unverified Claims Presented as Fact.” Without the professionals, the community is left trying to fight sophisticated misinformation with one hand tied behind its back.

The Takeaway

For students of computer science, media studies, and political science, this paper serves as a warning against technological solutionism. We cannot simply “code away” misinformation by building a voting algorithm. The algorithms rely on human input, and high-quality human input relies on professional expertise.

Successful community moderation isn’t about replacing experts; it’s about giving the community the tools to amplify expert knowledge. Breaking the partnership between platforms and fact-checkers doesn’t empower the user—it disarms them.

The Hidden Backbone: Why Community Notes Need Professional Fact-Checkers#

The Problem: The “Crowd” vs. The “Pros”#

What is a Community Note?#

The Methodology: Analyzing 1.5 Million Notes#

Step 1: Filtering the Data#

Step 2: Classifying the Sources#

Step 3: Determining the Topic#

The Core Results#

1. The Crowd Uses the Pros More Than We Thought#

2. Fact-Checks Boost “Helpfulness” Scores#

3. High-Stakes Topics Require Professional Intervention#

4. The “Conspiracy” Connection#

Conclusion: A Symbiotic Relationship#

The Takeaway#