The human brain is arguably the most complex network in existence. To understand it, researchers have turned to Graph Neural Networks (GNNs) and Transformers. These deep learning architectures have revolutionized how we process graph data, from social networks to molecular structures. It seems only logical to apply them to the “connectome”—the map of neural connections in the brain.
But a recent paper poses a provocative question that challenges this standard approach: “Do we really need message passing in brain network modeling?”
The researchers argue that the way we typically analyze brain networks using GNNs might be fundamentally flawed. By blindly applying techniques designed for other types of graphs, we may be introducing redundancy and inefficiency. In this deep dive, we will explore why the standard “message passing” paradigm might not fit brain data, and look at a novel, simpler, and faster solution called the Brain Quadratic Network (BQN).
The Status Quo: Graphs and Brains
Before we dismantle the current methods, let’s understand why they are used.
Constructing the Brain Network
In computational neuroscience, particularly when diagnosing disorders like Autism Spectrum Disorder (ASD) or Alzheimer’s (AD), we don’t usually feed raw fMRI scans directly into a neural network. Instead, we construct a Brain Graph.
- Parcellation: The brain is divided into specific Regions of Interest (ROIs). These are the “nodes” of our graph.
- Time-Series Extraction: For each ROI, we extract a sequence of signals over time (from fMRI data).
- Correlation: We calculate how much the signal in Region A correlates with Region B.
The standard metric for this is the Pearson Correlation Coefficient:

Here, \(x\) and \(y\) are the signal sequences of two brain regions. A high correlation means these regions are functionally connected. To clean up the graph, researchers often apply a threshold, keeping only the strongest connections:

The result is an Adjacency Matrix (\(A\)), which represents the topology (structure) of the brain network.
The Dominance of Message Passing
Once we have this graph, the industry standard is to use Graph Neural Networks (GNNs). GNNs rely on a mechanism called Message Passing.
In simple terms, message passing allows a node to update its own understanding of the world by aggregating information from its neighbors.

In a standard GNN like a Graph Convolutional Network (GCN), this aggregation is mathematically performed via matrix multiplication between the adjacency matrix (the connections) and the node features.

Similarly, Transformers use an attention mechanism. This is essentially a “global” message passing scheme where every node (brain region) attends to every other node to calculate a weighted update.

These methods are powerful. But are they right for this specific data?
The Problem: The “Double Dipping” Paradox
The authors of this paper noticed a logical inconsistency in how GNNs are applied to brain networks.
In a social network, the graph structure (who follows whom) is distinct from node features (user profile data). However, in brain network analysis, we often lack distinct node features. The features are usually derived from the connectivity itself (e.g., using the connectivity matrix as the feature input).
This creates a redundancy.
- The Input: We use the correlation matrix as the “features.”
- The Model: We use the correlation matrix as the “structure” to guide message passing.
The paper argues that because the brain network is constructed using pairwise Pearson coefficients between all pairs of ROIs, the “holistic relationship” is already present in the input.

As shown in Figure 1 above, GNNs end up utilizing the topology information twice. Furthermore, Transformers try to learn a global relationship map (Attention) even though the input (Pearson correlation) already is a global relationship map.
The Reality Check
To test this hypothesis, the researchers compared sophisticated GNNs and Transformers against a simple, “dumb” linear classifier that just looked at the correlation matrix directly (without any graph message passing).

The results in Figure 2 are startling. On datasets for Autism (ABIDE) and Alzheimer’s (ADNI), the simple classifier (green bar) consistently outperformed the complex GNNs and Transformers. This suggests that the message-passing mechanism isn’t just unnecessary; it might actually be hindering performance by over-smoothing or complicating the signal.
The Solution: Brain Quadratic Network (BQN)
If standard message passing (matrix multiplication) is the wrong tool, what is the right one? The authors propose a shift from linear operators to Quadratic Networks.
Why Quadratic?
A linear function (like standard matrix multiplication \(Wx + b\)) has limits. For example, a simple linear model cannot solve the “XOR” problem (a classic logic problem in computer science). A quadratic function, however, has much higher expressive power.
The general form of a quadratic neuron looks like this:

Notice the term involving \(a^2\). This allows the network to capture non-linear interactions much more efficiently.
The Secret Sauce: Hadamard Product
The researchers propose a new architecture called the Brain Quadratic Network (BQN). Instead of the standard matrix product used in GNNs (which aggregates neighbors), BQN uses the Hadamard product (\(\odot\)).
The Hadamard product is simply element-wise multiplication. If you have two matrices, you multiply them cell-by-cell, rather than doing the row-by-column “dot product” of standard matrix multiplication.
The core update rule for BQN is:

Here:
- \(H\) is the representation of the brain regions.
- \(A\) is the adjacency matrix (the brain network).
- \(W\) represents learnable weights.
- \(\odot\) is the element-wise multiplication.
This operation allows the model to scale the importance of specific connections individually without “smearing” the information across neighbors like standard message passing does.
To prove this works, they compared a basic Quadratic Neural Network (QNN) using the Hadamard product against a GCN using the matrix product.

As Figure 3 shows, the Quadratic approach (striped bars) dominates the GCN (teal bars) across all metrics (AUC, Accuracy, Sensitivity, Specificity). Furthermore, while GCN performance degrades or stagnates as you add layers (a common issue called “over-smoothing”), the QNN actually improves with depth.
The Full BQN Architecture
The final proposed BQN model adds a “residual” term to help with training stability. The full update equation is:

- First Term: \(\mathbf{H}^{l-1} \odot (\mathbf{A}\mathbf{W}_{A}^{l})\) — This captures the quadratic interaction between the current representation and the brain topology.
- Second Term: \((\mathbf{H}^{l-1} \odot \mathbf{H}^{l-1})\mathbf{W}_{H}^{l}\) — This is a self-interaction term (residual) that stabilizes the learning and reduces variance.
Why Does It Work? The Theoretical Link
One of the most impressive parts of this paper is not just the “what,” but the “why.” The authors provide a mathematical proof linking their specific update rule to Community Detection.
Community detection is the process of finding clusters of nodes that are densely connected internally (like functional modules in the brain). A common way to find these communities is through Non-negative Matrix Factorization (NMF).
The objective function for NMF looks like this:

The goal is to find a matrix \(H\) that approximates the adjacency matrix \(A\). The researchers solved for the derivative of this function and derived the update rule for \(H\):

This is the exact same formula as the first term of the BQN!
This theoretical finding is profound. It means that by training a BQN, the network is implicitly performing community detection. It is naturally learning to group brain regions into functional modules (clusters) layer by layer, which aligns perfectly with how we know the brain is organized biologically.
Visualizing the Communities
To visualize this, the authors compared the brain connectivity patterns learned by BQN against the raw data. They created “contrast graphs” to highlight the differences between Autistic brains and Healthy brains.

In Figure 6, the top row (a) shows the contrast based on raw data, which is messy and dense. The bottom row (b) shows the contrast learned by BQN.
The BQN graph is cleaner and sparser. More importantly, it highlights specific long-range connections (the red lines) crossing between hemispheres. These specific connections (involving the prefrontal cortex and corpus callosum) are biologically known to be disrupted in Autism Spectrum Disorder. This confirms that BQN isn’t just crunching numbers; it’s identifying biologically relevant functional modules.
Experimental Results
The researchers tested BQN against a massive list of competitors, including standard GNNs (GCN, GAT), specialized Brain GNNs (BrainGB), and Graph Transformers (Graphormer, Brain-NETTF).
They used two primary datasets:
- ABIDE: Autism Spectrum Disorder classification.
- ADNI: Alzheimer’s Disease classification.
Classification Performance

Looking at Table 1, BQN (bottom row) achieves the best performance across almost every metric.
- On ABIDE, it reached an AUC of 79.85%, beating the runner-up (ALTER) by nearly 2%.
- On ADNI, it reached an AUC of 74.18%, significantly outperforming complex Transformers.
Speed and Efficiency
In clinical settings, efficiency matters. Complex Transformers can take a long time to train and require expensive hardware. Because BQN uses the Hadamard product (complexity \(O(N^2)\)) instead of matrix multiplication (complexity \(O(N^3)\)), it is theoretically much faster.

Table 2 confirms this. BQN is incredibly fast.
- On ABIDE, BQN took 11.31 seconds to train.
- Compare that to the Graphormer, which took 973.52 seconds.
- BQN is nearly 100x faster than heavy Transformer models while delivering better accuracy.
Stability and Ablation
Finally, the authors checked if their design choices were sound.
Does the residual term help? Yes. As shown in Figure 4 below, adding the residual term (the blue bars) consistently improves performance over the base model (pink bars).

How deep should the network be? Deep GNNs usually suffer from performance drops. BQN, however, remains relatively stable.

Figure 5 shows that BQN performs optimally with 1 to 3 layers. While performance drops slightly at 4-5 layers (likely due to overfitting on the small datasets), it doesn’t suffer the catastrophic “over-smoothing” often seen in standard GNNs.
Conclusion: Less is More
The paper “Do We Really Need Message Passing in Brain Network Modeling?” offers a compelling course correction for the field of AI in neuroscience.
The key takeaways are:
- Question Assumptions: Just because GNNs are state-of-the-art for social networks doesn’t mean they are optimal for brain connectomes. The “message passing” paradigm introduces redundancy when the input features are correlations.
- The Power of Simplicity: A simpler quadratic operator (Hadamard product) captures the data structure better than complex matrix multiplications.
- Biological Alignment: The proposed BQN mathematically aligns with community detection, making it more interpretable and biologically plausible.
- Efficiency: BQN achieves State-of-the-Art (SOTA) results while being orders of magnitude faster than Transformers.
This research serves as a reminder that in machine learning, newer and more complex architectures are not always the answer. Sometimes, tailoring the mathematical operator to the specific nature of the data—in this case, the correlational nature of brain networks—yields the best results.
This blog post explains the research paper “Do We Really Need Message Passing in Brain Network Modeling?” by Liang Yang et al., published at ICML 2025.
](https://deep-paper.org/en/paper/5489_do_we_really_need_message-1647/images/cover.png)