How does a society hold itself together? In sociology, the answer is often solidarity—the cohesive bond that unites individuals. But solidarity is not a static concept; it shifts with wars, economic crises, and cultural revolutions. Understanding these shifts requires analyzing millions of words spoken over decades, a task that has historically been impossible for human researchers to perform at scale.

In a recent paper, researchers from Bielefeld University and partnering institutions undertook an ambitious project: analyzing 155 years of German parliamentary debates (from 1867 to 2022) to track solidarity towards women and migrants. By combining deep sociological theory with state-of-the-art Large Language Models (LLMs) like GPT-4, they didn’t just automate a reading task—they uncovered profound changes in how political leaders frame empathy, exclusion, and belonging.

The Challenge of Historical Scale

Political discourse is a mirror of societal values. In Germany, the parliamentary records span the North German Confederation, the German Empire, the Weimar Republic, the Nazi dictatorship (though data is sparse here), and the modern Federal Republic.

To study this, the researchers curated a massive dataset called DeuParl, consisting of nearly 10 million tokens related to migrants and over 32 million tokens related to women. The volume of data is staggering. As shown in the graph below, the sheer number of sentences mentioning these groups has skyrocketed in the modern era, particularly during the migration debates of the 2010s.

Number of instances in the Woman and Migrant dataset in each year.

Manually reading these millions of sentences to classify them is impossible. However, simple keyword searching isn’t enough. Saying “migrants are a burden” and “migrants need our help” both contain the keyword “migrant,” but they represent opposite sociological stances. To solve this, the researchers turned to Computational Social Science (CSS) and advanced Natural Language Processing.

Defining Solidarity: A Fine-Grained Approach

Before training an AI, the researchers had to define exactly what they were looking for. They didn’t just look for positive or negative sentiment. They utilized a sophisticated framework by sociologist Thijssen (2012) that breaks solidarity down into specific “frames.”

This framework is critical because it distinguishes why someone expresses solidarity. Are they supporting migrants because “they are part of us” (Group-based) or because “they are suffering” (Compassionate)?

Figure 1: Annotation scheme based on Thijssen (2012). The scheme categorizes statements into solidarity, anti-solidarity, mixed, and none.

As illustrated above, the classification scheme is hierarchical:

  1. High-Level Category: Is the text expressing Solidarity, Anti-solidarity, a Mixed stance, or None?
  2. Fine-Grained Subtypes:
  • Group-based: Focuses on shared identity, common goals, or integration. (e.g., “We must support our fellow workers.”)
  • Compassionate: Focuses on vulnerability and the need for protection. (e.g., “We must help these desperate families.”)
  • Exchange-based: Focuses on economic contribution or utility. (e.g., “They are vital for our labor market.”)
  • Empathic: Focuses on respecting differences and diversity. (e.g., “We value their unique cultural contribution.”)

The researchers also defined Anti-solidarity counterparts for these, such as “Exchange-based anti-solidarity” (e.g., “They are a burden on our welfare system”).

To make this concrete, look at the examples below. Notice how subtle the distinctions can be. Text (1) expresses compassionate solidarity by highlighting the struggles of mothers, while Text (2) expresses exchange-based anti-solidarity by arguing that migration fails when economic qualifications are low.

Table 1: Example sentences from our dataset showing (anti-)solidarity towards women/migrants.

The Core Method: Can AI Replace Human Sociologists?

The heart of this study was determining whether an AI could reliably replicate this complex sociological annotation. The team first created a “Gold Standard” dataset. They employed human annotators to meticulously label 2,864 text snippets, a process that cost over €18,000 and took months.

They then tested several models against this human baseline:

  1. BERT: A smaller, older transformer model fine-tuned specifically for this task.
  2. GPT-3.5: Tested in both zero-shot (no examples) and fine-tuned modes.
  3. Llama-3-70B: An open-source large language model.
  4. GPT-4: OpenAI’s powerful proprietary model.

Prompting for Complexity

The researchers didn’t just ask the LLMs to “classify this.” They used Chain-of-Thought (CoT) prompting. They gave the models the detailed sociological definitions found in the annotation scheme and asked the models to “think step by step” before assigning a label. This forces the model to reason through the text’s logic—mirroring the cognitive process of a human annotator.

Experiments & Results

The results were a significant win for large, generative models over smaller, specific ones.

GPT-4 outperformed all other models, achieving F1 scores (a measure of accuracy) that approached human quality. Interestingly, GPT-4 performed almost as well in a “zero-shot” setting (where it was just given definitions) as it did in “few-shot” settings (where it was given examples). This suggests that the model has a strong inherent grasp of logical reasoning when provided with clear instructions.

In contrast, the BERT model—a standard tool in computational social science for years—performed poorly, particularly on fine-grained sub-categories. This signals a shift in the field: large, general-purpose models are becoming more effective than smaller, specialized models for complex semantic tasks.

Table 2: Comparative performance (macro F1) of models vs. human upper bound.

As shown in the table above, while humans still hold the “upper bound” (the highest reliability), GPT-4 (Migrant F1: 0.73) comes much closer to human consensus than BERT (Migrant F1: 0.46) or even Llama-3.

With the validity of GPT-4 established, the researchers could then unleash it on the full dataset of thousands of unread records for a fraction of the cost (€500 vs. the €18,000 required for the small manual sample).

Analyzing 155 Years of Solidarity

The automated analysis revealed fascinating, and somewhat troubling, historical trends regarding migration discourse in Germany.

1. The Dominance of Solidarity

Contrary to what one might expect from heated political headlines, solidarity (support) has consistently outweighed anti-solidarity (opposition) in parliamentary debates over the last century. However, anti-solidarity spikes are clearly visible and align with historical events: the nationalism of the late 19th century, the post-WWII era, and the recent refugee crisis starting around 2015.

2. The Shift from “Partners” to “Victims”

The most profound finding is the shift in how solidarity is expressed.

  • 19th & Early 20th Century: The dominant frame was Group-based solidarity. Debates focused on integrating workers and shared national or class identities.
  • Modern Era: There has been a massive decline in group-based framing. It has been replaced by Compassionate solidarity.

Figure 5: Solidarity vs. Anti-solidarity trends and subtypes over time.

The graph above (Figure 5) visualizes this dramatic crossover. The blue line in the left chart (Group-based) crashes in the modern era, while the green line (Compassionate) surges.

Why does this matter? Sociologically, this represents a change in the perceived relationship between the state and migrants. “Group-based” implies equality and shared destiny. “Compassionate” implies a hierarchy: a benevolent state helping vulnerable, passive victims. While compassionate solidarity is positive, it can depoliticize migrants, framing them solely as objects of charity rather than active participants in society.

3. The Rise of Economic Anti-Solidarity

On the anti-solidarity side (the right chart in Figure 5), we see a different shift. Group-based anti-solidarity (e.g., “They are not German”) used to be the primary argument. Post-WWII, this overt nationalism declined. It was replaced by Exchange-based anti-solidarity (the red dashed line).

This reflects a rhetorical shift where exclusion is justified not by race or nationality, but by economics: arguments that migrants “cost too much” or “abuse the welfare state.”

Political Polarization

Finally, the study broke down these frames by political party. The results confirm a deep polarization in German politics.

Figure 9: Distribution of (Anti-)solidarity subtypes across selected political parties.

The chart above orders parties from Left to Right.

  • The Left (Linke, Grüne, SPD): Heavily rely on Compassionate and Empathic solidarity. They advocate for migrants based on human rights and diversity.
  • The Center/Right (CDU/CSU, FDP): Show more Exchange-based logic, focusing on utility and economics.
  • The Far-Right (AfD): The Alternative for Germany (AfD) stands alone with massive bars for Anti-solidarity. Their rhetoric is dominated by Group-based anti-solidarity (exclusion based on identity) and Exchange-based anti-solidarity (economic burden).

Conclusion

This research demonstrates that Large Language Models are not just tools for generating text; they are powerful instruments for analyzing history. By automating the detection of complex concepts like solidarity, researchers can now quantify societal shifts that were previously visible only through anecdotal reading.

The findings paint a picture of a changing Germany. While parliament has become more compassionate towards migrants, it has arguably become less inclusive in terms of viewing them as equal partners in a shared group. Simultaneously, opposition to migration has morphed from nationalist exclusion to economic skepticism.

As AI models like GPT-4 continue to improve, they will allow social scientists to decode the “DNA” of our political discourse, helping us understand not just what politicians say, but the underlying values that shape our world.