Beyond the Flat Universe: How Galaxy Walker Brings Geometric Awareness to AI Astronomy

When we look at a photograph on a screen, we are looking at a flat, 2D representation of reality. For decades, computer vision models have operated on this same premise. They treat images as flat grids of pixels and process features in Euclidean (flat) vector spaces.

But the universe is not flat.

From the spherical orbits of planets to the hyperbolic expansion of the cosmos and the warping of spacetime around black holes, the universe is defined by complex, non-Euclidean geometries. When we force astronomical data into standard, flat-space Vision-Language Models (VLMs), we lose critical structural information.

This is the problem tackled by the paper “Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding.” The researchers propose a novel framework that breaks the Euclidean constraints of traditional AI. By integrating spherical and hyperbolic geometries directly into the model’s architecture, “Galaxy Walker” achieves state-of-the-art performance in understanding the properties and morphologies of galaxies.

In this deep dive, we will explore how Galaxy Walker works, the math behind its geometry-aware components, and why “thinking in curves” is the future of AI in astronomy.

The Geometric Gap in Modern AI

Before understanding the solution, we must define the problem. Modern VLMs (like GPT-4o or Llama-Vision) are powerful, but they are geometrically “naive.” They rely on patch embeddings and convolutions constructed within Euclidean vector spaces.

However, astronomical phenomena exist on manifolds with different curvatures:

Euclidean Space (Curvature = 0): A flat universe. Good for local structures.
Spherical Space (Curvature > 0): A closed universe. Essential for understanding global topology, planetary surfaces, and projected observations.
Hyperbolic Space (Curvature < 0): An expanding universe. Crucial for modeling hierarchical structures, black holes, and the accelerating expansion of the cosmos.

As illustrated below, traditional VLMs force all these rich geometries into a flat box. Galaxy Walker, conversely, is designed to “walk” across these different manifolds.

Figure 1. Geometries of the universe. While traditional VLMs are confined to flat Euclidean space, the actual universe exhibits rich geometric diversity including spherical and hyperbolic spaces, motivating our Galaxy Walker framework to incorporate multi-geometric representations.

When researchers tested general-purpose VLMs on astronomical tasks, the results were poor. Models like GPT-4o achieved \(R^2\) scores below 0.6 in estimating galaxy properties. The reasoning is clear: you cannot accurately model a hyperbolic gravitational field using only flat geometry.

Enter Galaxy Walker: A Geometry-Aware Framework

Galaxy Walker is not a completely new model from scratch; rather, it is a sophisticated enhancement of pre-trained VLMs. It injects geometric priors into the model through two key innovations:

The Geometry Prompt: A mechanism that generates tokens by performing “random walks” on physical graphs constructed in Euclidean, Spherical, and Hyperbolic spaces.
The Geometry Adapter: A Mixture-of-Experts (MoE) module that processes these features using specialized mathematical operations for each geometry type.

Here is the high-level architecture:

Let’s break down these two components in detail.

1. The Geometry Prompt

The goal of the Geometry Prompt is to tell the model where a galaxy sits in the complex fabric of the universe. To do this, the researchers construct a multi-relational graph of galaxies based on their physical positions (Right Ascension and Declination).

Standard coordinates are projected into three different geometric “universes.” To perform mathematical operations on these curved spaces (manifolds), the model uses Tangent Spaces.

Think of a tangent space as placing a flat sheet of paper on a globe. At the point where the paper touches the globe, you can do normal math. To move data from the physical curved manifold onto this flat tangent space (and vice versa), we use Exponential and Logarithmic maps.

The mapping from physical coordinates to the manifold coordinates \(\mathbf{V}_{\mathbb{M}}\) is defined as:

Equation 1: Mapping physical coordinates to manifold coordinates.

Here, \(proj\) is a projection function, and \(exp^c_o\) is the exponential map at the origin with curvature \(c\).

Once the graph is built, the model performs “random walks” to gather information about a galaxy’s neighbors. However, because the neighbors exist in curved space, the feature aggregation (how the model learns from neighbors) must respect that geometry. The researchers use a Riemannian GraphSAGE layer.

The process is two-fold:

Translation (Step 1): Translate Euclidean features (like images/spectra) into the specific manifold (Spherical or Hyperbolic).
Prompt Generation (Step 2): Learn geometry-aware features using the relational graph on that manifold.

Equation 2: The two-step process for geometry-aware feature learning.

The core mathematical operation for aggregating these features is complex because it involves moving data between the curved manifold and the flat tangent space to perform the aggregation (SAGE), and then mapping it back. This is encapsulated in the following equation:

Equation 3: Riemannian GraphSAGE layer formulation.

Essentially, this equation says: “Take the features \(X\), use a logarithmic map to flatten them onto a tangent plane, aggregate the neighbors using GraphSAGE, and then use the exponential map to project the result back onto the curved manifold.”

2. The Geometry Adapter (Mixture-of-Experts)

Once the Geometry Prompt has injected this spatial information, the VLM needs a way to process it. Standard Transformer blocks use Feed-Forward Networks (FFNs) that assume flat space.

Galaxy Walker inserts a Geometry Adapter into the transformer layers. This is a Mixture-of-Experts (MoE) block containing three specific “experts,” each designed to process data according to a different geometry.

Expert A: The Euclidean Expert

This preserves the conventional processing capabilities of the pre-trained VLM. It is a standard FFN:

Equation 4: The Euclidean Expert formulation.

Expert B: The Spherical Expert

This expert is designed to capture angular relationships and global topology (like planetary surfaces). It projects the output onto a unit sphere. Note the \(\kappa\) parameter, which controls curvature, and the normalization step:

Equation 5: The Spherical Expert formulation.

Expert C: The Hyperbolic Expert

This is the most exotic expert, crucial for modeling the expansion of the universe and hierarchical structures. It operates within a Poincaré ball model. It uses exponential and logarithmic maps (\(\exp_0, \log_0\)) to process features in hyperbolic space:

Equation 6: The Hyperbolic Expert formulation.

The Gating Network

How does the model know which geometry is relevant for a specific galaxy? It uses a learnable Gating Network (\(G\)). For every token, the network calculates a probability distribution—essentially deciding, “This feature looks 70% hyperbolic and 30% Euclidean.”

Equation 7: The Gating Network aggregation formula.

Training Strategy

The model is trained in two stages to ensure it learns the geometry first, then integrates it with the language model.

Stage I: Geometric Prompt Learning. The prompt module is trained independently on galaxy property estimation tasks to learn the representations of the three spaces.
Stage II: Geometry Adapter Learning. The VLM backbone is frozen (for efficiency), and only the Geometry Adapter and projection layers are trained.

The loss function combines Language Modeling loss (\(\mathcal{L}_{LM}\)) with a regression loss (\(\mathcal{L}_{reg}\)) for the numerical predictions (like predicting the mass of a galaxy).

Equation 8: The combined loss function.

Experimental Setup and Results

The researchers utilized a massive dataset comprising over 270,000 samples, combining multi-band images from DESI-LS and spectra from DESI EDR.

Table 1: Dataset statistics for property estimation and morphology classification.

To support the Geometry Prompt, they constructed huge multi-relational graphs. As shown below, these graphs contain over 100,000 nodes with varying edge connections across the three geometric spaces.

Table 2: Statistics of the multi-relational graphs in Euclidean, Hyperbolic, and Spherical spaces.

Comparing Performance

The results, presented in Table 3 below, are striking. The authors compared Galaxy Walker against domain-specific models (like AstroCLIP) and general VLMs (GPT-4o, Claude 3.5).

Key Takeaways:

General VLMs struggle: Look at the \(R^2\) scores for GPT-4o and Claude. They are often near zero or even negative (which implies the model is worse than just guessing the average). They cannot grasp the physics of the data.
State-of-the-Art: Galaxy Walker achieves the highest scores across almost every metric.
Specific Gains: Note the sSFR (Specific Star-Formation Rate) column. Galaxy Walker achieves an \(R^2\) of 0.84, compared to AstroCLIP’s 0.69. This is a massive leap in precision for a complex physical property.
Morphology: In classifying shapes, particularly complex ones like BAR (Bar structures) and SAC (Spiral Arm Count), Galaxy Walker shows significant improvement (+0.17 F1 score).

Table 3: Comprehensive evaluation results comparing Galaxy Walker against baselines.

Why Does It Work? A Look Inside the Experts

One of the most fascinating parts of this paper is the analysis of which expert the model chooses for different tasks. Does the model actually “use” the hyperbolic space?

The answer is yes. Figure 3 below visualizes the activation strength of the different experts.

Property Estimation (Bottom right of chart a): Properties like Mass (\(M_*\)) and Metallicity (\(Z_{MW}\)) rely heavily on the Euclidean expert. This makes sense; these are properties often derived from direct photometric measurements (brightness/color).
Morphology (Left of chart a): Look at the Hyperbolic expert (green bar). It activates strongly for BAR, SPR (Spiral), and SAC (Spiral Arm Count). These structures are governed by gravitational fields and rotation curves—phenomena that follow logarithmic patterns naturally represented in hyperbolic geometry.
Case Studies (b): The triangle plots show specific galaxies.
Case 1 (Edge-on disk): The model leans heavily on the Spherical expert (Red corner), likely to model the radial emission patterns.
Case 2 (Multi-component): The Hyperbolic expert (Blue corner) dominates, capturing the hierarchical relationship between the interacting galaxies.

Figure 3: Visualization of expert contributions and case studies.

The Importance of Modality and Prompting

The researchers also analyzed how different input modalities (Image vs. Spectrum vs. Geometry) contributed to success.

In Figure 4(b), the correlation matrix reveals interesting physics. There is a very high correlation (0.82) between Spectrum data and the Hyperbolic graph. This suggests that spectral features (which detail the chemical composition and movement of stars) naturally align with hyperbolic geometric representations.

Figure 4: Modality analysis showing performance impact and cross-modal correlations.

Furthermore, the team explored how to best ask the model questions. They found that providing “Knowledge Background” in the prompts—explaining what the Euclidean, Spherical, and Hyperbolic tokens represent—significantly boosted performance compared to simple concatenation.

Figure 20: Performance comparison of different prompt settings. Knowledge-rich prompts outperform simpler methods.

Training Dynamics: How Dense Should the Adapter Be?

Finally, a practical question for AI engineers: how often should we insert these Geometry Adapters? Every layer? Every 4 layers?

The training dynamics (Figure 5) show that Dense Integration (every layer, green triangles) learns very fast initially. However, Sparse Integration (every 4 layers, red circles) eventually catches up and offers a better balance of computational efficiency and performance. The results in the main table were achieved using the sparse method, proving you don’t need to transform every single layer to get the benefits of geometric awareness.

Figure 5: Training dynamics comparing sparse, medium, and dense adapter integration strategies.

Conclusion and Future Implications

“Galaxy Walker” represents a paradigm shift in how we apply AI to scientific domains. It moves away from the “one architecture fits all” approach and acknowledges that the physical world—and the universe at large—does not always conform to the flat vector spaces of standard deep learning.

By integrating Geometry Prompts (random walks on manifolds) and Geometry Adapters (Mixture-of-Experts with Riemannian math), the model achieves a more “physical” understanding of galaxies.

Key Takeaways:

Geometry Matters: Incorporating Spherical and Hyperbolic spaces drastically improves performance on tasks involving gravitational structures and global topology.
Specialization is Key: The MoE architecture allows the model to dynamically switch between “thinking flat” (Euclidean) and “thinking curved” (Non-Euclidean) depending on the specific galaxy feature being analyzed.
Efficiency: The method works by enhancing pre-trained models rather than training from scratch, and sparse integration keeps inference costs low (adding only milliseconds to processing time).

As we continue to map the universe with projects like the James Webb Space Telescope and Euclid, tools like Galaxy Walker will be essential. They allow AI to not just look at the universe as a picture, but to understand it as a complex, multi-dimensional geometric structure.

Beyond the Flat Universe: How Galaxy Walker Brings Geometric Awareness to AI Astronomy#

The Geometric Gap in Modern AI#

Enter Galaxy Walker: A Geometry-Aware Framework#

1. The Geometry Prompt#

2. The Geometry Adapter (Mixture-of-Experts)#

Expert A: The Euclidean Expert#

Expert B: The Spherical Expert#

Expert C: The Hyperbolic Expert#

The Gating Network#

Training Strategy#

Experimental Setup and Results#

Comparing Performance#

Why Does It Work? A Look Inside the Experts#

The Importance of Modality and Prompting#

Training Dynamics: How Dense Should the Adapter Be?#

Conclusion and Future Implications#