Decoding the Constitution with AI: A Deep Dive into Historical Meaning and Data Quality

The United States Constitution is one of the most scrutinized documents in history. For centuries, judges, lawyers, and historians have debated the precise meaning of its words. In recent decades, a legal theory known as originalism—the idea that the Constitution should be interpreted according to its original public meaning at the time of enactment—has gained significant traction in the U.S. Supreme Court.

But how do we know exactly what a word meant in 1787?

Traditionally, scholars have relied on dictionaries from the era or anecdotal evidence from letters and pamphlets. However, a new frontier has opened up: Corpus Linguistics. By analyzing massive databases of historical texts, researchers hope to empirically determine “ordinary meaning.” The primary dataset for this work is the Corpus of Founding Era American English (COFEA).

In the paper Meaning Variation and Data Quality in the Corpus of Founding Era American English, researcher Dallas Card takes a critical, computational look at this practice. Rather than simply counting words, this study leverages modern Natural Language Processing (NLP)—specifically Masked Language Models (MLMs) like BERT—to quantify how meanings have shifted over time and how they varied between “legal” and “popular” language during the founding era. Perhaps just as importantly, the paper conducts a rigorous audit of the data itself, revealing that the digital history we rely on is messier than we might think.

In this blog post, we will walk through the background of this research, dissect the advanced NLP methods used to track semantic change, and analyze the results that shed light on the language of the Constitution.

Part 1: The Data Problem

What is COFEA?

To understand the analysis, we first need to understand the dataset. The Corpus of Founding Era American English (COFEA) is a massive collection of documents from the mid-to-late 18th century. It was created specifically to help legal scholars assess historical meaning.

COFEA isn’t a single monolith; it is composed of six distinct sub-collections:

EVANS: Books, pamphlets, and broadsides (representing “popular” print).
FOUNDERS: Letters and papers of the Founding Fathers (informal, elite language).
HEIN: Legal statutes and documents (formal legal language).
ELLIOTS: Debates from state conventions regarding the Constitution.
STATUTES: Laws enacted by Congress.
FARRANDS: Records of the Constitutional Convention.

As shown in the figure below, the volume of text available in these collections varies wildly over time.

Figure 1: Number of tokens per year in the six collections that comprise COFEA. Grey bands indicate the period studied in this paper, and the dotted line shows the year in which the U.S. Constitution was written.

Figure 1 highlights a challenge: while some collections like EVANS (popular print) have steady coverage, others like STATUTES or ELLIOTS are highly concentrated around specific dates. The researchers focused their analysis on the period from 1760 to 1800 (the grey bands) to capture the relevant linguistic environment of the founding era.

The “Dirty Data” Reality Check

Before training any fancy AI models, the researchers performed a “health check” on the data. In the digital humanities and NLP, Optical Character Recognition (OCR) is the process of converting scanned images of old paper documents into machine-readable text.

18th-century printing is notoriously difficult for OCR. The paper quality is often poor, ink bleeds, and typographies differ—most notably the “long s” (which looks like an “f”).

To measure the quality of the text, the researchers checked the vocabulary of the documents against a comprehensive dictionary (Webster’s 1913, augmented with names and places). If a high percentage of words in a document are found in the dictionary, the OCR is likely good. If the dictionary coverage is low, the document is likely full of gibberish.

Figure 2: OCR quality across corpora as measured by coverage in the augmented Webster’s 1913 dictionary. Each point represents one document.

The results, visualized in Figure 2, reveal a troubling discrepancy. The HEIN collection (the blue line), which contains crucial legal documents, consistently shows lower dictionary coverage than other collections like EVANS or FOUNDERS.

Why does HEIN score so poorly? A closer look at the raw text reveals the culprit: the “long s.” Modern OCR engines frequently mistake the archaic “ſ” for “f” or “t.”

Table 6: Frequent misspellings of the term “shall” in HEIN, illustrating the prevalence of OCR errors.

As Table 6 illustrates, the word “shall”—a critical modal verb in legal writing—appears as “fhall,” “thall,” or even “fliall” thousands of times.

Why this matters: Legal scholars often use COFEA by performing keyword searches (e.g., searching for every instance of “bear arms”). If the data is riddled with “fhall” instead of “shall,” or “juffice” instead of “justice,” simple keyword searches will miss a massive chunk of relevant evidence, potentially skewing legal arguments.

The researchers also verified this using a different metric called perplexity (how “surprised” a character-level language model is by the text).

Figure 4: OCR quality assessment made using a trigram character language model.

Figure 4 confirms the dictionary findings. The HEIN collection (blue line) has higher perplexity (indicating worse quality) across the board. The FOUNDERS collection also shows high perplexity, but for a different reason: those documents are letters full of abbreviations, shorthand, and lists, which look “weird” to a standard language model but are actually accurate transcriptions.

Part 2: The Methodology

Beyond Word Counts: Masked Language Models

Once the data quality was assessed (and caveats noted), the researchers moved to the core task: measuring meaning.

Traditional corpus linguistics often relies on collocations—looking at which words appear next to a target word. For example, if “bank” appears near “river,” it means one thing; if it appears near “money,” it means another.

This paper takes a more advanced approach using Masked Language Models (MLMs), specifically BERT.

How it works: BERT is trained on massive amounts of text to predict missing words. If you give BERT the sentence:

“The soldier loaded his [MASK].”

BERT might predict “musket,” “gun,” or “rifle.”

The researchers realized that the distribution of these predictions acts as a fingerprint for the meaning of the word in that specific context.

To measure how the meaning of a word like “arms” or “commerce” has changed:

They find sentences containing the word in the Founding Era corpus (COFEA).
They mask the word (replace it with [MASK]).
They ask the model to predict the top 10 substitutes.
They repeat this process for the Modern Era (using the Corpus of Contemporary American English, or COCA).

If the list of substitutes for the 1787 usage is very different from the list for the 2024 usage, the meaning has changed.

The Metric: Jensen-Shannon Divergence

To quantify the difference between the “1787 substitutes” and the “2024 substitutes,” the researchers use a metric called Jensen-Shannon Divergence (JSD).

JSD = 0: The distributions are identical (the word means exactly the same thing).
JSD = 1: The distributions are completely different (the word has undergone a total semantic shift).

Part 3: Diachronic Change (Then vs. Now)

The first major experiment compared the language of the Founding Era (1760–1800) to Modern English (1990–2017).

The results confirm that language is fluid. Many words found in the Constitution have undergone drastic semantic shifts.

Table 7: Constitutional terms with the largest meaning change from the founding to the modern era.

Table 7 provides some striking examples of semantic drift:

Domestic Violence:
Founding Era substitutes: invasion, insurrection, violence, invasions.
Modern Era substitutes: violence, abuse, rape, crime, assault.
Insight: In 1787, “domestic violence” referred to internal political uprisings or riots (think Shays’ Rebellion). Today, it almost exclusively refers to intimate partner violence.
Captures:
Founding Era: prizes, seizures (referring to capturing ships or goods in war).
Modern Era: reflects, shows, represents (referring to data or images, e.g., “The photo captures the moment”).
Quartered:
Founding Era: stationed, lodged (soldiers living in houses).
Modern Era: sliced, chopped (cutting something into four parts).

These examples serve as a strong warning against assuming modern definitions apply to 18th-century texts.

Does the Constitution Change More Than Normal Speech?

An interesting question posed by the researchers is whether the specific vocabulary of the Constitution is more stable or more volatile than average English words.

They compared the “Constitutional vocabulary” against a set of random background terms. They also controlled for frequency, because rare words tend to have more unstable meaning measurements.

Figure 5: Change in meaning between founding and modern eras vs. term counts in both corpora combined.

Figure 5 plots the change in meaning (JSD) on the y-axis against word frequency on the x-axis.

Orange dots: Terms in the Constitution.
Blue dots: Random background terms.

The regression analysis (Table 8 within the image) shows a small but statistically significant result: Constitutional terms have changed slightly more in meaning than random background terms. This might be because the Constitution focuses on governance, military, and law—areas where society and technology have evolved massively since 1787.

Part 4: Synchronic Variation (Legal vs. Popular)

One of the biggest debates in originalism is whether the Constitution was written in “legalese” (terms of art understood by lawyers) or “ordinary language” (understood by the general public).

To test this, the researchers compared the meanings of words within the Founding Era (Synchronic analysis). They split COFEA into two buckets:

Legal Sources: Statutes, Convention Records (Farrands), Debates (Elliots), and Hein.
Popular Sources: Evans (pamphlets/books) and The Pennsylvania Gazette.

Frequency Analysis

First, they looked at word frequency. Are Constitutional words more common in legal texts?

Figure 3: Overall, constitutional terms are more common in legal than other sources.

Figure 3 projects the words onto a triangle (simplex).

Top corner: Legal sources.
Bottom right: Popular sources.
Bottom left: Founders’ private papers.

The heatmap shows a slight skew toward the top. While most words are used commonly across all genres, the vocabulary of the Constitution overlaps more frequently with specialized legal documents than with popular prints or private letters.

Meaning Analysis

Next, they applied the BERT substitution method to see if words meant different things in legal vs. popular documents.

Overall, the variation between sources in 1787 was much lower than the variation over time (1787 vs. 2024). However, some words did show distinct “legal” vs. “popular” senses.

Table 10: Constitutional terms with the largest difference in meaning between legal and popular sources in COFEA.

Table 10 highlights these divergences:

Tender:
Legal meaning: Payment, currency, money (“Legal Tender”).
Popular meaning: Kind, soft, generous (“Tender heart”).
Dock:
Legal meaning: Ship, navy, naval.
Popular meaning: Market, street, water (the physical place).
Resignation:
Legal meaning: Removal, appointment (political office).
Popular meaning: Submission, patience (emotional state).

The Constitution’s Bias

Finally, the researchers asked: When these specific words appear in the Constitution, which meaning is being used? The “Legal” sense or the “Popular” sense?

Using a method to check overlap between the Constitution’s specific context and the two source domains, they categorized the terms (Column “Lean” in Table 10).

L (Legal Lean): The usage in the Constitution aligns with legal documents.
P (Popular Lean): The usage aligns with popular documents.
I (Indeterminate): It’s ambiguous.

The Result: Of the top 40 words with the greatest difference between legal and popular usage, 26 leaned toward the specialized legal meaning, while only 2 leaned popular (and those 2 were likely OCR errors).

This provides suggestive evidence that the Constitution relies heavily on specialized legal vocabulary, rather than the pure “ordinary meaning” of the common vernacular.

Part 5: Conclusion and Implications

This research paper provides a fascinating case study in how modern data science can interact with history and law. It moves the field of legal interpretation away from “cherry-picking” dictionary definitions and toward reproducible, quantitative analysis.

Key Takeaways for Students:

Data Hygiene is Paramount: You cannot blindly trust a dataset. The discovery of OCR errors (like “fhall” for “shall”) in the HEIN corpus is a critical warning. If you analyze dirty data, your historical conclusions will be wrong.
Context is Everything: Words do not have static definitions. “Domestic violence” meant insurrection in 1787, not family abuse. BERT-based methods allow us to capture this context dynamically rather than relying on static definitions.
The “Ordinary Meaning” Myth: The analysis suggests that the Constitution wasn’t necessarily written in the “language of the people.” Its vocabulary aligns more closely with specialized legal texts of the era.
Sensitivity vs. Scale: While these computational methods allow us to analyze millions of words at once (scale), they might miss subtle nuances that a historian reading a single letter might catch (sensitivity). The best approach likely combines both.

By using Masked Language Models, the authors have provided a new toolkit for legal scholars. While AI won’t settle the debate on originalism, it offers a way to fact-check our assumptions about the past, ensuring that when we speak about “original meaning,” we are backed by evidence rather than intuition.

Part 1: The Data Problem#

What is COFEA?#

The “Dirty Data” Reality Check#

Part 2: The Methodology#

Beyond Word Counts: Masked Language Models#

The Metric: Jensen-Shannon Divergence#

Part 3: Diachronic Change (Then vs. Now)#

Does the Constitution Change More Than Normal Speech?#

Part 4: Synchronic Variation (Legal vs. Popular)#

Frequency Analysis#

Meaning Analysis#

The Constitution’s Bias#

Part 5: Conclusion and Implications#

Key Takeaways for Students:#