Can AI Generate Mathematical Conjectures? Bridging Machine Learning and Algebraic Combinatorics

The intersection of Artificial Intelligence and mathematics is currently one of the most exciting frontiers in science. When we think of “AI for Math,” we often imagine Large Language Models (LLMs) writing formal proofs or solving high school calculus word problems. However, the workflow of a professional mathematician involves much more than just writing down a proof.

Before a theorem is proven, it must be conjectured. And before it is conjectured, a mathematician usually spends weeks or months generating “raw data”—calculating examples, drawing diagrams, and searching for patterns in discrete structures. This phase—the intuition-building and conjecturing phase—is where a new paper argues Machine Learning (ML) can shine.

In the research paper “Machine Learning meets Algebraic Combinatorics: A Suite of Datasets Capturing Research-level Conjecturing Ability in Pure Mathematics,” Herman Chau and colleagues introduce the Algebraic Combinatorics Dataset Repository (ACD Repo). This is a novel collection of nine datasets designed not to test if an AI can prove a known theorem, but to test if it can “see” the high-dimensional patterns that lead to new mathematics.

In this post, we will explore why algebraic combinatorics is the perfect playground for ML, dive deep into the specific mathematical objects in these datasets, and analyze whether current AI models are up to the task of research-level discovery.

Why Algebraic Combinatorics?

If you want to train an AI to find mathematical patterns, where do you start? Calculus is continuous and messy. Number theory can be incredibly sparse. The authors argue that Algebraic Combinatorics is the sweet spot.

Algebraic combinatorics studies discrete structures (things you can count, like grids, graphs, and permutations) that arise from abstract algebra (study of symmetries and spaces). Here is why this field is uniquely suited for machine learning:

Low Barrier to Entry: Unlike algebraic geometry, which requires years of theory to understand the basic objects, combinatorial objects are often visual and intuitive (e.g., stacking boxes or drawing paths).
Computability: Because the objects are discrete, we can represent them easily on a computer.
Data Volume: We can generate millions of examples, creating the massive datasets that deep learning models crave.

The ACD Repo focuses on the conjecturing process. Each dataset pairs a mathematical question (often an open problem) with a machine learning task. The hypothesis is simple: if an ML model can solve the task (e.g., predict a number based on a shape), it has likely learned a mathematical rule that could give a human researcher a clue toward a new theorem.

The Cast of Characters

Before we look at the specific datasets, we need to understand the “vocabulary” of this field. Don’t worry if you aren’t a math major; these objects are surprisingly visual.

1. Partitions and Young Diagrams

An integer partition is just a way of breaking a number down into sums of smaller integers. For example, the number 7 can be partitioned into $3 + 2 + 2$.

Mathematicians visualize these partitions using Young diagrams (or Ferrers diagrams), which are essentially stacks of boxes. If you take a Young diagram and fill the boxes with numbers according to specific rules, you get a Young Tableau. These diagrams are ubiquitous in representation theory—the study of how abstract algebraic groups act on vector spaces.

Figure 1.(Left) A Young diagram for the partition ( 3 , 2 , 2 ) . (Center)A standard Young tableau for the partition ( 3 , 2 , 2 ) . (Right) A semistandard Young tableau for the partition ( 3 , 2 , 2 )

As shown in Figure 1, a Standard Young Tableau (center) creates a specific ordering of numbers, while a Semistandard version (right) allows repetitions with certain constraints. These simple grids encode deep information about symmetry.

2. Permutations

A permutation is a shuffling of numbers. In machine learning, we often see permutations as symmetries (order invariance). In combinatorics, we study the structure of the shuffle itself. We can write a permutation of numbers $\{1, 2, 3, 4\}$ as $2 \ 1 \ 4 \ 3$, meaning 1 moved to position 2, 2 to 1, and so on.

3. Posets (Partially Ordered Sets)

Not everything can be ranked in a straight line like integers ($1 < 2 < 3$). In a poset, some elements are comparable, and others are not. Think of it like a family tree: you are a descendant of your grandmother, but you aren’t a “descendant” of your cousin. They are related, but not ordered vertically relative to each other.

The Datasets: A Tour of Research-Level Math

The ACD Repo contains nine datasets. We will categorize them into two groups: Foundational Results (classic problems we know the answer to, used to benchmark models) and Open Problems (where high performance could lead to new discoveries).

Group 1: The Foundational Challenges

These datasets represent problems where algorithms exist, but they are often complex or computationally expensive. Can a neural network “learn” the algorithm just by looking at examples?

Dataset A: Symmetric Group Characters This is arguably the hardest task in the suite. The goal is to compute the “character” (a specific trace value from linear algebra) of a symmetric group representation. These characters are indexed by two partitions, $\lambda$ and $\mu$.

The ML Task is: Input two partitions $\to$ Output the Character (an integer).

While algorithms like the Murnaghan-Nakayama rule exist to calculate this, they are combinatorial nightmares to compute for large $N$. The values of these characters are clustered heavily around zero but can have massive outliers (long tails).

$Figure 3. Histogram of S _ { 1 8 } characters within the interval [ - 5 0 0 , 5 0 0 ]$

As seen in the histogram for $S_{18}$ above (and similarly for $S_{20}$ and $S_{22}$ in Figures 4 and 5 below), the data is incredibly imbalanced, making it a nightmare for standard regression models to learn without overfitting to the mean.

$Figure 4. Histogram of S _ { 2 0 } characters within the interval [ - 5 0 0 , 5 0 0 ]$

$Figure 5. Histogram of S _ { 2 2 } characters within the interval [ - 5 0 0 , 5 0 0 ]$

Dataset B: The Robinson-Schensted-Knuth (RSK) Correspondence The RSK algorithm is a legendary result that translates a Permutation into a pair of Standard Young Tableaux. It acts as a bridge between combinatorics and representation theory.

The ML Task: Input a pair of Young Tableaux $\to$ Predict the original Permutation. This effectively asks the model to reverse-engineer the RSK algorithm. The authors found this task surprisingly difficult for standard MLPs (Multi-Layer Perceptrons), suggesting that the “logic” of RSK is highly non-trivial for neural networks to approximate.

Group 2: The Open Problems

These datasets are the most exciting for scientific discovery. They represent areas where mathematicians are currently stuck or searching for better formulas.

Dataset C: The mHeight Function This dataset is connected to a recently solved conjecture regarding Kazhdan-Lusztig polynomials. The “mHeight” is a statistic calculated from a permutation based on specific patterns (called 3412-patterns) inside it.

Figure 6. An example of the calculation of mHeight on a permutation.

The ML Task is to classify the mHeight of a permutation. Interestingly, simple models perform quite well here. If a small neural network can predict this value with 99% accuracy, it implies there might be a simpler “rule” or formula for mHeight than the current complex definition involving pattern searching.

Dataset D: Grassmannian Cluster Algebras This involves checking if a Semistandard Young Tableau (SSYT) corresponds to a “cluster variable.” This connects geometry (the Grassmannian manifold) with algebra.

Figure 7. An example of a valid (left) and invalid (right) tableau from the Grassmannian cluster algebra dataset.

The ML Task is Binary Classification: Look at the tableau (like the ones in Figure 7) and predict if it is “valid” (indexes a cluster variable). The visual nature of this task—looking for local patterns in the grid numbers—makes it suitable for Convolutional Neural Networks (CNNs) or similar architectures.

Dataset E: Kazhdan-Lusztig (KL) Polynomials These polynomials are fundamental to geometric representation theory, yet they remain mysterious. There is no simple closed formula for their coefficients.

\[ P _ { x , w } ( q ) = 1 + 1 6 q + 1 0 3 q ^ { 2 } + 3 3 7 q ^ { 3 } + 5 6 6 q ^ { 4 } + 5 2 9 q ^ { 5 } + 2 7 5 q ^ { 6 } + 6 6 q ^ { 7 } + 3 q ^ { 8 } \]

The ML Task involves predicting specific coefficients of these polynomials given two permutations. Because the coefficients are integers (and often small), this is treated as a classification problem.

Dataset F: Lattice Path Partial Orders This dataset deals with paths on a grid that cannot cross a certain diagonal. There are two different ways to “order” these paths: the Matching order and the Lagrange order. Mathematicians want to understand the relationship between these two orders.

Figure 9. An example of two lattice paths from ( 0 , 0 ) to ( 3 , 2 ) . These do not correspond to a cover relation.

The ML Task is to look at two paths (like the ones in Figure 9) and predict if one “covers” the other in either order. This is a geometric reasoning task.

Dataset G: Quiver Mutation Equivalence Quivers are directed graphs. “Mutation” is a specific operation that changes the arrows in the graph. An open problem is determining if two quivers are “mutation equivalent” (can you get from A to B via mutations?). The ML task is to classify the equivalence class of a given quiver adjacency matrix.

Experiments and Results

So, can AI do math? The authors benchmarked these datasets using standard “narrow” models (Logistic Regression, MLPs, Transformers) and, in some cases, Large Language Models (LLMs).

The “Easy” Wins vs. The Hard Truths

The results were varied, highlighting that “mathematical difficulty” does not always align with “machine learning difficulty.”

Success Stories:

Quiver Classification: Models achieved high accuracy.
Grassmannian Cluster Algebras: Simple MLPs could distinguish valid tableaux with >99% accuracy.
mHeight: Also highly learnable.

Failures:

$S_n$ Characters: As shown in Table 3 below, simple regression models failed spectacularly. The error margins are massive. This suggests the mapping from Partition $\to$ Character is highly non-linear and volatile.

Table 3.Of-te-shelfmodelaccuracyonregresion datasets.Resultsare averagedoverthreerandom weight initializations with 9 5 % confidence intervals after a hyperparameter search outlined in Section C.1.

The Scaling Hypothesis

An interesting observation from the experiments is how performance changes as the mathematical parameter $n$ (the size of the problem) increases. Usually, larger $n$ means a harder math problem (more permutations, larger grids). However, for ML, larger $n$ also means more training data.

$Figure 2.(LeftPerformanceontheLaticePathDatasetasafunctionoftelaticepathedpoint(largerendpointmeaslongerandmore paths). As _ n grows in n \\times n - 1 ,the training set size increases but the problem may also grow harder.(Center) Performance on the type E versus type D quiver classification task as a function of the depth,which must be specified for type E quivers on n = 1 0 , 1 1 , 1 2 vertices,and (Right) the number of vertices _ n$

In Figure 2 (Left and Right), we see that accuracy often increases with $n$. The sheer volume of data available for higher $n$ helps the model overcome the increased complexity of the mathematical object.

Case Study: LLMs and “Cheating”

One of the most fascinating sections of the paper involves using GPT-4 and other models to perform Program Synthesis on the Schubert Polynomial dataset. Instead of asking the AI to predict the answer, they asked it to write a Python program to solve the problem.

The LLMs achieved 100% accuracy. Amazing, right?

Not quite. Upon inspection, the researchers realized the LLMs hadn’t discovered deep mathematics. They had “reverse-engineered” the sampling method used to create the dataset. They noticed a parity artifact (related to the length of the permutations) that perfectly predicted whether a structure constant was zero or non-zero.

While this didn’t solve the open math problem, it demonstrated a powerful capability: AI can act as a forensic tool, detecting subtle biases or patterns in how we generate mathematical data.

Conclusion and Future Implications

The Algebraic Combinatorics Dataset Repository represents a shift in how we think about AI in mathematics. It moves away from the “AI as a Calculator” or “AI as a Proof Writer” paradigm and towards “AI as an Intuition Pump.”

By training models on these datasets, researchers can:

Identify solvable problems: If an MLP gets 99% accuracy, a simple formula likely exists.
Generate counterexamples: If a model fails on specific examples, those might be the mathematical “corner cases” worth studying.
Interpretability: By analyzing how a neural network solves the Quiver task (e.g., looking at which sub-graphs it focuses on), mathematicians can find hints for rigorous theorems.

For students and researchers in machine learning, these datasets offer a unique challenge. They are clean, structured, and free of the noise found in real-world data, yet they capture logic that is deep enough to stump the brightest human minds. The next great theorem in algebraic combinatorics might not come from a blackboard, but from the weights of a neural network.

Why Algebraic Combinatorics?#

The Cast of Characters#

1. Partitions and Young Diagrams#

2. Permutations#

3. Posets (Partially Ordered Sets)#

The Datasets: A Tour of Research-Level Math#

Group 1: The Foundational Challenges#

Group 2: The Open Problems#

Experiments and Results#

The “Easy” Wins vs. The Hard Truths#

The Scaling Hypothesis#

Case Study: LLMs and “Cheating”#

Conclusion and Future Implications#