FourierGPT: Detecting AI Text by Listening to the Rhythm of Probability

In the rapidly evolving world of Large Language Models (LLMs), we are playing a high-stakes game of “cat and mouse.” As models like GPT-4 become increasingly sophisticated, their ability to mimic human writing has reached a point where distinguishing between a human author and an AI is incredibly difficult.

Traditionally, we catch these models by looking at likelihood—essentially asking, “How predictable is this text?” But as models get better, they stop making the kind of statistical errors that old detection methods relied on. They are starting to “sound” just like us.

But what if we stopped looking at what words are chosen, and started looking at the rhythm of how they are chosen?

In a fascinating new paper, researchers propose FourierGPT, a method that approaches text detection not as a static statistics problem, but as a signal processing one. By applying the Fourier Transform to the likelihood of words, they reveal that human and AI languages have distinct “heartbeats” in the frequency domain—differences that remain visible even when the text looks perfect on the surface.

The Problem with Static Likelihood

To understand why this new method is necessary, we first need to understand how current detection works. Most state-of-the-art detectors (like DetectGPT or Fast-DetectGPT) rely on the concept of likelihood or surprisal.

In psycholinguistics, “surprisal” measures how unexpected a word is given its context. Humans have a specific way of distributing information. We don’t make every word extremely predictable, nor do we make every word a shock. We balance the cognitive load—a concept known as Uniform Information Density (UID).

However, existing detection methods treat likelihood as a static property. They calculate the probability of tokens and look for thresholds or curvature in the probability space. The researchers argue that this overlooks the dynamic nature of language. Language is a process that unfolds over time. The fluctuation of surprisal—the ups and downs of predictability as a sentence progresses—contains hidden patterns.

Enter FourierGPT: The Spectrum of Likelihood

The core innovation of this paper is shifting the perspective from the Time Domain (reading words one by one) to the Frequency Domain (analyzing the spectrum of changes).

The researchers propose a three-step pipeline to extract these features, which they call FourierGPT.

The procedure (above) and example (below) of FourierGPT.

As shown in Figure 1, the process works as follows:

Estimation: The text is fed into an “estimator” language model (which can be a large model like GPT-2 or even a simple Bigram model) to generate a sequence of raw likelihood scores.
Normalization: These raw scores are converted into Z-scores (relative likelihood). This is crucial because it decouples the detection from the specific vocabulary size or quirks of the estimator model. It focuses on the relative rise and fall of surprise.
Fourier Transform: This is the magic step. The sequence of likelihood scores is treated as a waveform. By applying a Discrete Fourier Transform (DFT), the researchers decompose this wave into its constituent frequencies.

The mathematical foundation for this transformation is:

Discrete Fourier Transform equation.

Here, the time-series signal of likelihoods ($\tilde{s}_n$) is converted into a frequency spectrum $X(\omega_k)$.

Why the Spectrum Matters

Why go through this trouble? Look at the bottom half of Figure 1 above. In the “Time-Domain,” the blue (Model) and red (Human) lines look like messy, overlapping noise. It is very hard to draw a line that separates them.

However, look at the “Frequency-Domain” on the right. There is a clear separation. The spectral “power” (or intensity) at different frequencies differs significantly between humans and machines. This suggests that while AI can mimic the words humans use, it struggles to mimic the subtle, periodic fluctuations in predictability that are innate to human cognition.

Classification: Two Ways to Spot a Bot

Once the researchers obtained this spectrum, they developed two ways to classify text: a Supervised method and a Heuristic-based method.

1. The Supervised Classifier

This approach uses standard machine learning (like Support Vector Machines or SVMs) trained on the spectral features. To make the classifier more robust, they used a technique called “circularization”—essentially rotating the data to simulate more training examples and amplify weak periodic signals.

The results were impressive. As seen in the table below, FourierGPT achieves competitive accuracy, particularly on the PubMed dataset.

$Table 2: Accuracy of our best supervised classifiers compared to other likelihood-based zero-shot methods reported in (Bao et al., 2024) on selected two task subsets.Best scores are in bold,and \$\\dagger\$ indicates second best.$

What is notable here is the performance on PubMed (0.8267). PubMed data consists of expert medical answers, which are typically shorter and more technical. Traditional zero-shot detectors often struggle with short texts because there aren’t enough data points to build a reliable statistical profile. FourierGPT, however, captures the signal characteristics effectively even in shorter sequences.

2. The Heuristic-Based Classifier (Zero-Shot)

The researchers noticed a consistent pattern: there is a salient difference between human and model spectra at the low-frequency end.

Figure 2: Heuristics for constructing pair-wise classifiers: Likelihood spectrum shows salient difference at low frequency components. Curves are fit using generative additive models (GAM). Shaded areas are 95% confidence intervals from bootstrap.

In Figure 2, you can see that for most models (like GPT-3.5 and GPT-3), the model text (blue dashed line) has higher spectral power at low frequencies compared to human text (red solid line). This observation allows for a “training-free” classification method. You don’t need to train a neural network; you simply calculate the spectrum and check if the low-frequency power exceeds a certain threshold.

The heuristic is defined by this inequality:

Heuristic inequality equation.

Basically, if the difference in spectral power at low frequencies ($\delta_k$) is significant, we can distinguish the two.

Efficiency: You Don’t Need a Supercomputer

One of the major drawbacks of current detection methods is computational cost. They often require running the text through massive Large Language Models (LLMs) to calculate curvature.

FourierGPT breaks this dependency. The researchers found that they could achieve state-of-the-art results even when using a Bigram model (a very simple statistical model that just looks at pairs of words) as the estimator.

Table 4: Accuracy of FourierGPT pair-wise classifiers using likelihood spectrum from bigram language model. The bold number performs better than Fast-DetectGPT.

Table 4 shows something remarkable: On the “Writing” dataset (stories), the Bigram-based FourierGPT achieved 90.67% accuracy, outperforming the sophisticated Fast-DetectGPT. This implies that the distinct “rhythm” of AI generation leaves traces so strong that even primitive statistical tools can pick them up, provided you look at them in the frequency domain.

Discussion: What are we actually detecting?

The paper goes beyond just reporting accuracy numbers; it investigates why these spectral differences exist. This section reveals some fascinating psycholinguistic insights.

The “Yes/No” Pattern

In the PubMed dataset, the researchers found that AI models have a strong tendency to start answers with “Yes” or “No” when the prompt is a question. Humans, perhaps being more cautious or nuanced experts, rarely did this.

Figure 3: (a) The changes of likelihood spectrum ((a) left) and likelihood-position plot (a) right) after removing the“Yes"/“No” in answer from PubMed data (estimated by GPT-4 only).

When the researchers stripped the “Yes/No” from the start of the sentences, the spectral gap between human and model narrowed (as seen in Figure 3). This suggests that high-confidence, declarative openings are a specific “frequency signature” of current LLMs.

Robustness and POS Masking

Perhaps the most profound finding relates to how robust the text is. The researchers ran an experiment where they masked specific Parts-of-Speech (POS)—like Nouns, Verbs, and Adjectives—and then re-calculated the spectrum.

Figure 5: Likelihood spectrum before and after masking out the ‘NOUN + VERB + ADJ’ POS tags. It can be seen that GPT-4 texts show greater variation before and after the mask,while human text show smaller change.

Figure 5 illustrates a key difference:

Human Text (Right): The spectrum changes very little when POS tags are masked. The “signal” is stable.
Model Text (Left): The spectrum shifts significantly.

This suggests that human language has a “stable entropy” or a structural redundancy that makes it robust. We build our sentences such that if you miss a verb or a noun, the overall “flow” of information density remains similar. AI text, generated by maximizing likelihood token-by-token, appears to be more “fragile.” Disrupted linguistic structures cause its likelihood spectrum to fluctuate wildly.

Conclusion

The “cat and mouse” game of AI detection will continue, but FourierGPT offers a refreshing change of tactics. Instead of chasing the mouse by trying to predict its next step (absolute likelihood), FourierGPT listens for the sound of its footsteps (the spectrum of relative likelihood).

By treating language generation as a signal processing problem, the researchers have shown that:

Relative Likelihood (Z-scores) reveals patterns that absolute values miss.
Frequency Analysis can distinguish AI from humans even in short texts.
Simple Estimators (like Bigrams) are surprisingly effective, making detection computationally cheap.

This work highlights a fundamental difference that still exists between biological and artificial intelligence: humans write with a cognitive rhythm, balancing information density in a way that models—for all their brilliance—have not yet perfectly mastered.

The Problem with Static Likelihood#

Enter FourierGPT: The Spectrum of Likelihood#

Why the Spectrum Matters#

Classification: Two Ways to Spot a Bot#

1. The Supervised Classifier#

2. The Heuristic-Based Classifier (Zero-Shot)#

Efficiency: You Don’t Need a Supercomputer#

Discussion: What are we actually detecting?#

The “Yes/No” Pattern#

Robustness and POS Masking#

Conclusion#