Introduction
If you have spent any time on Twitter (now X) during an election season, you know that the discourse can get ugly. But “ugly” is a vague term. Is a tweet containing a swear word directed at a senator the same as a tweet calmly accusing a specific group of people of being “traitors to the country”?
For years, content moderation tools and researchers have treated online toxicity as a binary problem: a post is either “safe” or “toxic.” However, a recent research paper titled “A Closer Look at Multidimensional Online Political Incivility” argues that this binary view is insufficient for understanding political communication.
The researchers posit that we need to distinguish between how we speak (style) and what we are actually saying (substance). By treating political incivility as a multidimensional concept, they uncover that while AI is good at spotting rude words, it struggles significantly with dangerous, exclusionary ideas that don’t use profanity.
In this post, we will break down their novel dataset (MUPID), explore how state-of-the-art Natural Language Processing (NLP) models handle this nuance, and look at the results of a massive study involving over 200,000 users to see who is actually being uncivil online.
The Two Dimensions of Incivility
To solve the problem of vague definitions, the authors drew upon political communication theory to separate incivility into two distinct categories:
- Personal-Level Incivility (Impoliteness): This refers to style. It involves foul language, name-calling, vulgarity, and harsh tones. It violates interpersonal norms (e.g., “You are an idiot”).
- Public-Level Incivility (Intolerance): This refers to substance. It involves exclusionary speech, denying the rights of a social or political group, or casting rivals as enemies of the state (e.g., “This party is trying to destroy America”).
Why does this distinction matter? Research suggests that while impoliteness is unpleasant, intolerance is far more damaging to democracy because it dehumanizes opponents and fosters polarization.
Take a look at the examples below from the researchers’ dataset.

As shown in Table 1, the “Impolite” example uses words like “dumb” and “son of a b****es.” It’s aggressive, but it’s an attack on competence and character using standard insults. The “Intolerant” example, however, labels a political party as “enemies, foreign AND domestic.” This is a different beast entirely—it is an accusation of treason and existential threat, even if it uses fewer swear words.
Building MUPID: A Multidimensional Dataset
One of the primary contributions of this paper is the creation of MUPID (Multidimensional Political Incivility Dataset). Existing datasets often rely on keyword searches (like specific slurs) to find toxic tweets. The problem with that approach is that it biases the data toward impoliteness while missing the subtler, yet more dangerous, intolerance.
Smart Sampling Strategy
To capture a true snapshot of political discourse, the authors didn’t just search for “bad words.” Instead, they used a network-based approach:
- They identified users who followed multiple “disputable” accounts (e.g., known fake news spreaders, hyper-partisan outlets, or extreme congressional members).
- They collected tweets from these users and trained a classifier to filter out non-political content (like tweets about sports or food).
- They also included tweets from politicians and random users to ensure balance.
Rigorous Annotation
Identifying intolerance requires deep semantic understanding. A simple mechanical turk task asking “Is this bad?” wouldn’t suffice. The researchers hired U.S. residents (familiar with the political context) and put them through a rigorous training process.

As seen in Figure 4, annotators had to pass a qualification test where they received specific feedback on their mistakes. This ensured that the humans labeling the data understood the specific difference between being rude (impoliteness) and being anti-democratic (intolerance).
The final dataset contained 13,000 labeled tweets, offering a rich resource for training AI models.
The Linguistic Gap: What Words Matter?
Before feeding this data into neural networks, the authors analyzed the language itself. Do impoliteness and intolerance look different at the word level?
They used Shapley analysis (a method to interpret machine learning predictions) to find which words contributed most to each label.

Table 4 reveals a striking difference:
- Impolite words are universally negative: stupid, crap, idiot, dumb, hell. These words are offensive regardless of context.
- Intolerant words are politically specific: liberals, democrats, republicans, socialist, communist, fascist.
This highlights the core challenge. The word “Republican” or “Democrat” is not inherently toxic. It only becomes toxic based on the context (e.g., “Democrats are destroying the country”). This makes detecting intolerance much harder for AI than detecting impoliteness.
Can AI Detect the Difference?
The researchers fine-tuned several state-of-the-art Language Models (LLMs), including BERT, RoBERTa, and DeBERTa, on their new dataset. They also tested generic toxicity detectors (like Jigsaw’s Perspective API) and few-shot learning with GPT-3.5 and GPT-4.
The Results

Table 3 provides the performance metrics. Here is the breakdown of what the numbers mean:
- Impoliteness is easier to catch: The best models achieved an F1 score (a balance of precision and recall) of around 0.70 for impoliteness.
- Intolerance is elusive: The F1 score for intolerance dropped to around 0.59.
- Generic tools fail: Look at the “Perspective” row. It had high precision for impoliteness but terrible performance for intolerance (F1 of 0.189). This confirms that tools built to find “toxicity” generally just look for rude words and completely miss intolerant rhetoric.
- GPT-4 is promising but not perfect: Interestingly, while GPT-4 performed well, the fine-tuned RoBERTa and DeBERTa models still held the edge in specific classification tasks for this dataset.
Why Does AI Struggle?
The error analysis in the paper illuminates why intolerance is so hard to automate.

In Table 5, look at example (d): “You Republicans don’t even know how to keep the electricity on!”
- Human Label: Intolerant (it generalizes a whole group as incompetent/dangerous to governance).
- Prediction: Neutral.
Because the sentence doesn’t contain a slur, the model misses the hostility. Conversely, models sometimes over-predict intolerance just because political keywords are present, even if the sentiment is benign.
Data Efficiency
The researchers also checked how much data is needed to train these models.

Figure 1 shows the learning curves. The blue line (Impoliteness) shoots up quickly—the model learns “bad words” fast. The orange line (Intolerance) rises much more slowly and plateaus lower. This suggests that simply throwing more data at the problem might yield diminishing returns for intolerance detection; the models likely need better contextual understanding (like knowing who the speaker is or what event they are reacting to).
Large-Scale Real-World Analysis
Having built and validated their classifiers, the authors applied them to a massive unlabeled dataset: 16 million tweets from 230,000 U.S. users. This allowed them to move from computer science into social science, asking: Who is actually being uncivil online?
Who are the uncivil users?
The study found that incivility is not evenly distributed. In fact, 20% of the users authored 80% of the uncivil tweets.
The researchers looked for correlations between user behavior and their toxicity levels.

Table 6 reveals several key insights:
- Political Engagement = More Incivility: The strongest predictor of uncivil behavior (both styles) was the “% political tweets.” Users who talk about politics frequently are more likely to be rude and intolerant. This suggests a normalization of toxicity among highly engaged partisans.
- Network Homophily: There is a strong correlation between a user’s incivility and the incivility of the accounts they follow. “Birds of a feather flock together”—if you follow intolerant people, you are more likely to tweet intolerant things.
- Popularity Paradox: Interestingly, users with more followers tended to be slightly less uncivil (negative correlation). Perhaps users trying to maintain a large audience self-censor, or perhaps niche radical accounts struggle to gain mass appeal.
Geopolitical Heat Maps
Finally, the researchers mapped the intolerance scores to the users’ locations (U.S. States).

Figure 2 visualizes the average intolerance levels across the US. The researchers found a statistically significant correlation between incivility and partisan competition.
In “battleground states”—states where the vote margins between Democrats and Republicans are very close—incivility was higher. In states that are solidly Red or Blue (safe states), the discourse was slightly calmer. This supports the theory that higher political stakes drive higher hostility in discourse.
Conclusion & Implications
This paper makes a compelling case that we need to upgrade how we think about online toxicity. By separating Impoliteness (bad words) from Intolerance (bad ideas), the MUPID dataset exposes a blind spot in current content moderation systems.
The key takeaways for students and researchers are:
- Context is King: Current AI models are great at spotting insults but struggle with implicit attacks on democratic groups.
- The “Engaged” are the “Enraged”: The people most active in online political discussions are often the ones driving the incivility.
- Echo Chambers are Real: Your online behavior mirrors the behavior of those you follow.
As we move toward more advanced AI, the challenge will be teaching models to understand the social and political context of a sentence, rather than just scanning it for a blacklist of banned words. Until then, intolerant substance—which threatens democratic norms—may continue to slip through the cracks while we focus on policing impolite style.
](https://deep-paper.org/en/paper/file-2667/images/cover.png)