Eyes on the Page—Using LSTMs to Detect Dyslexia Through Gaze Patterns

Dyslexia is one of the most common learning disabilities, affecting an estimated 9% to 12% of the population. It is not a visual problem, nor is it related to intelligence; rather, it is a difficulty with phonological decoding—mapping sounds to letters. While the condition is lifelong, early diagnosis is the single most critical factor in ensuring a child stays on track in the educational system.

The problem, however, is logistics. Standard testing batteries for dyslexia are expensive, time-consuming, and require one-on-one administration by trained specialists who are not always available in schools. This creates a bottleneck where many children slip through the cracks.

What if we could screen for dyslexia automatically, cheaply, and unobtrusively?

In a recent study titled “Automatic detection of dyslexia based on eye movements during reading in Russian,” researchers proposed a novel solution using Deep Learning. By tracking the eye movements of children while they read and feeding that sequential data into a Long Short-Term Memory (LSTM) network, they were able to detect dyslexia with impressive accuracy. This post breaks down their methodology, the shift from static to temporal analysis, and why looking at how a child reads is more important than just measuring how fast they read.

The Connection Between Eyes and the Brain

Before diving into the algorithms, we need to understand the biological premise. Research dating back to the 1980s has established that while dyslexia is not an oculomotor (eye muscle) deficit, the eye movements of a person with dyslexia differ significantly from those of a typical reader.

When we read, our eyes do not glide smoothly. They jump (saccades) and stop (fixations). For a typically developing reader, this process becomes efficient over time. For a reader with dyslexia, difficulties in processing phonology often result in longer fixations, shorter saccades, and more regressions (looking back at previous words).

Because eye movements reflect the cognitive effort of decoding text, they provide a “window” into the reading brain. This makes eye tracking a prime candidate for machine learning applications.

The Data: A Diverse Group of Young Readers

One of the limitations of previous studies in this field was the use of small, homogenous groups. Models were often trained on participants of the same age to minimize variance. However, a real-world screening tool needs to work across different ages, as a 1st grader reads very differently from a 6th grader.

The researchers in this study utilized a dataset of 293 native Russian-speaking children ranging from the 1st to the 6th grade.

221 children were typically developing.
72 children had been diagnosed with developmental dyslexia.

The children read 30 sentences from the Child Russian Sentence Corpus. These sentences were designed to be at a 3rd-4th grade difficulty level. The dataset is unique because it includes a wide variance in reading maturity.

Table 1: Demographic and cognitive characteristics of both participant groups, organized by grade. Values before the slash represent the control group, while those after the slash correspond to participants with dyslexia.

As shown in Table 1 above, the groups were characterized by grade, gender, and reading speed. Notice the significant overlap and variance in reading speeds (wpm) across grades and conditions, which makes simple threshold-based classification difficult.

The Core Method: From Aggregation to Sequence

The primary contribution of this paper is the shift in how eye-tracking data is processed.

The Baseline Approach (SVM)

The “State-of-the-Art” (SOTA) reference method typically used in this field is a Support Vector Machine (SVM). This method relies on aggregated features. It takes a reading session and calculates summaries, such as:

Average fixation duration.
Total number of fixations.
Average saccade length.

While effective, this aggregation flattens the data. It treats reading as a static event, losing the rich, temporal information about when and where struggles occur in a sentence.

The Proposed Approach (LSTM)

To capture the temporal dynamics of reading, the researchers employed a BiLSTM (Bidirectional Long Short-Term Memory) network. LSTMs are a type of Recurrent Neural Network (RNN) specifically designed to handle sequences of data. Unlike the SVM, which sees “average fixation duration,” the LSTM sees the reading process as it unfolds: Fixation 1 -> Saccade -> Fixation 2 -> Regression -> Fixation 3…

The Input Features

The model does not just look at where the eye is looking. The researchers constructed a rich input vector for every fixation, combining three types of features:

Demographic Features:

Age and Grade: Crucial because reading skill evolves rapidly in primary school.
Gender: Included because boys are diagnosed with dyslexia more frequently than girls.

Gaze-Specific Features:

Fixation duration.
Coordinates (X, Y) on the screen.
Saccade details (amplitude, angle, velocity) describing the movement to the next fixation.

Linguistic Features:

Word Length & Morphology: Longer words or words with many morphemes take longer to process.
Frequency & Predictability: Rare words are harder to read. Including this helps the model understand if a long fixation is due to dyslexia or simply a difficult word.

The Architecture

The sequential input is fed into a BiLSTM. “Bidirectional” means the model processes the sequence of fixations both forwards and backwards, allowing it to understand the context of the reading path.

The hidden states of the LSTM (which represent the model’s “memory” of the reading path) are averaged and passed through linear layers. Finally, a sigmoid activation function outputs a probability score between 0 and 1, classifying the reader as having dyslexia or not.

Experiments and Results

The researchers evaluated the models using nested cross-validation to ensure the results were robust and not just memorizing specific participants. They tested two scenarios:

Reader Prediction: Using all 30 sentences a child read to make a diagnosis.
Sentence Prediction: Trying to classify the child based on reading a single sentence.

Performance was measured using AUC (Area Under the Receiver Operating Characteristic Curve). An AUC of 0.5 is random guessing; 1.0 is perfect prediction.

LSTM vs. SOTA

The results showed a clear advantage for the sequential model.

Figure 1: Summary of model performance. SOTA baseline model used grade information.

As seen in Figure 1, the LSTM (red line) consistently achieved a higher True Positive Rate for any given False Positive Rate compared to the SOTA baseline (blue line).

Reader Prediction: The LSTM reached an AUC of 0.93, significantly outperforming the SOTA model’s 0.86.
Sentence Prediction: Remarkably, the LSTM achieved an AUC of 0.90 using just single sentences.

This suggests that the specific sequence of eye movements contains diagnostic markers that aggregated statistics simply miss. Furthermore, the high performance on single sentences suggests that effective screening doesn’t require long, fatiguing testing sessions.

What Features Matter? (Ablation Studies)

To understand how the model was making decisions, the researchers performed ablation studies—systematically removing features to see how performance dropped.

Table 2: Summary of AUC ± standard error in the reader- and sentence-prediction settings.

Table 2 highlights several key findings:

Linguistic features were secondary: Removing word frequency and predictability (LSTM-Ling) barely dropped the score (0.92). This suggests the eye movement pattern itself is the strongest signal, regardless of the text difficulty.
Demographics were not critical: Removing Age, Grade, and Gender (LSTM-Demographic) resulted in an AUC of 0.90. This is a positive finding for fairness; it implies the model is detecting dyslexia based on reading behavior, not just biasing against younger children or boys.
Fixations are key: The most significant drop occurred when all eye-movement features were removed (obviously), but interestingly, removing saccade information specifically (LSTM-Saccade) had a minor impact. The core signal seems to be in the fixations—how long the eye rests and where.

Discussion and Implications

The superior performance of the LSTM underscores the importance of temporal dynamics in reading analysis. Reading is a process, not a summary statistic. By preserving the order of fixations, the model likely picks up on subtle processing struggles—like rapid re-reading or erratic landing positions—that characterize phonological decoding issues.

Robustness to Age

A surprising result was that adding “Grade” information did not significantly improve the models. This implies that the LSTM discovered an “invariant property” of dyslexic reading. In other words, the specific eye-movement “signature” of dyslexia might look similar in a 1st grader and a 6th grader, distinct from general reading immaturity.

Ethical Considerations

The authors addressed potential bias explicitly. Since boys are diagnosed with dyslexia more often, there is a risk that a model might simply learn to flag male participants. However, the ablation study showed that the model performs almost equally well without gender or age inputs. This suggests the model is robust and suitable for fair screening.

Conclusion

This research demonstrates that automatic dyslexia detection using eye tracking is not only feasible but highly accurate. By moving from static averages to sequential Deep Learning models (LSTMs), we can capture the nuance of cognitive processing during reading.

With an AUC of 0.93, this method outperforms state-of-the-art baselines and works effectively even with short snippets of text. While it does not replace a clinical diagnosis, it offers a promising path toward a fast, affordable, and scalable screening tool that could be deployed in schools—ensuring that children with reading difficulties are identified and supported as early as possible.

The Connection Between Eyes and the Brain#

The Data: A Diverse Group of Young Readers#

The Core Method: From Aggregation to Sequence#

The Baseline Approach (SVM)#

The Proposed Approach (LSTM)#

The Input Features#

The Architecture#

Experiments and Results#

LSTM vs. SOTA#

What Features Matter? (Ablation Studies)#

Discussion and Implications#

Robustness to Age#

Ethical Considerations#

Conclusion#