Introduction

In the world of drug discovery and material science, the search for a new molecule is often compared to finding a needle in a haystack. However, the “haystack” here is the chemical space, which contains an estimated $10^{60}$ theoretically possible drug-like molecules. Searching this space is a discrete, combinatorial nightmare.

To tackle this, machine learning researchers developed Latent Space Bayesian Optimization (LSBO). The idea was elegant: instead of searching through discrete chemical structures, we can use a Variational Autoencoder (VAE) to map these structures into a continuous, smooth numerical space (the “latent space”). We can then use standard optimization techniques in this smooth space to find the best candidates.

For years, this has been the standard operating procedure. But a new paper titled “Return of the Latent Space COWBOYS” argues that we might have been going about it the wrong way. The authors suggest that coupling the generative model (the VAE) too tightly with the optimization model leads to significant inefficiencies.

Their proposed solution, COWBOYS (Categorical Optimisation With Belief Of underlYing Structure), offers a decoupled approach that separates generation from prediction. By letting the VAE focus on structure and a Gaussian Process focus on prediction, they achieve state-of-the-art results in identifying high-potential molecular candidates under strict budget constraints.

In this post, we will deconstruct why standard LSBO struggles, the mathematical insights behind COWBOYS, and how this new framework changes the way we approach structured optimization.

Background: The Components of Discovery

To understand why COWBOYS is an innovation, we first need to understand the machinery of standard Bayesian Optimization over structured spaces.

Bayesian Optimization (BO)

Bayesian Optimization is a strategy for finding the maximum value of an expensive black-box function $f(x)$. In drug discovery, $f(x)$ might be a wet-lab experiment measuring a molecule’s efficacy. Since we cannot test every molecule, BO builds a surrogate model (usually a Gaussian Process) to predict performance based on previous tests. It then uses an acquisition function to decide which molecule to test next, balancing exploration (trying new things) and exploitation (refining what we know works).

The Variational Autoencoder (VAE)

The VAE is a deep generative model. It consists of two parts:

Encoder: Compresses a complex input (like a molecular graph) into a lower-dimensional vector $z$ in a latent space.
Decoder: reconstructs the original input from $z$.

The VAE allows us to treat discrete molecules as points in a continuous vector space $\mathbb{R}^d$.

The Standard Approach: LSBO

In Latent Space BO (LSBO), we train a VAE on a large dataset of molecules. Then, we perform Bayesian Optimization inside the latent space.

Algorithm 1 Latent Space Bayesian Optimisation

As shown in Algorithm 1, the optimizer selects a point $z$ in the latent space, decodes it into a molecule $x$, evaluates it, and updates the model. The surrogate model $\tilde{g}(z)$ tries to learn the mapping from latent coordinates to molecular properties.

The Pitfalls of Latent Space BO

While LSBO transforms a hard discrete problem into a manageable continuous one, the authors identify three major pathologies that hinder its performance.

1. The Alignment Problem

The VAE is trained to reconstruct molecules, not to predict their properties. The “distance” between two points in latent space corresponds to structural similarity, not necessarily functional similarity. A small step in latent space might result in a massive jump in the chemical property we are trying to optimize. This makes the surrogate model’s job incredibly difficult, as the objective function appears jagged and unpredictable in latent space.

2. The Stochasticity Problem

VAEs are probabilistic. The decoder does not map a point $z$ to a single molecule $x$, but to a distribution of molecules.

As Figure 2(a) illustrates, a single point in latent space (blue dot) can decode into multiple different molecular structures (black dots), each with a different property value (red dots). A standard Gaussian Process (GP) operating in latent space struggles to handle this. It interprets this variance as noise, leading to poor predictions and overfitting.

3. The Geometry Problem (The “Doughnut” Effect)

Standard LSBO restricts the search to a box, typically $[-\delta, \delta]^d$. The intuition is that valid molecules live near the center of the latent space (usually a standard normal distribution).

However, high-dimensional geometry is counterintuitive. According to the Gaussian Annulus Theorem, nearly all the probability mass of a high-dimensional Gaussian distribution is located in a thin shell (an annulus) at distance $\sqrt{d}$ from the origin, not at the center.

Figure 2(b) shows this visually. If you restrict your search to a box, you are searching a volume that is mostly empty of valid prior probability, while potentially cutting off the high-probability “shell” where the valid molecules actually live. This mismatch creates “dead regions” where the decoder produces invalid or garbage molecules.

The Core Method: COWBOYS

The authors propose COWBOYS to address these issues by fundamentally changing how the VAE and the Surrogate Model interact.

The Philosophy of Decoupling

In LSBO, the surrogate model lives in the latent space. In COWBOYS, the surrogate model lives in the structure space (the actual molecular space).

Figure 1. Unlike LSBO where GPs are fit in a VAE’s latent space, COWBOYS’s GP is fit in structure space,decoupled from its VAE. acquisition routines can be employed. Candidate points selected by the optimiser are then decoded back into the original structured domain to yield new query points.

As Figure 1 illustrates:

LSBO (Left): The GP predicts $y$ from $z$.
COWBOYS (Right): The GP predicts $y$ from $x$ directly. The VAE is used only as a generator (a prior), not as a coordinate system for optimization.

Step 1: The Structured Surrogate

By moving the surrogate model back to structure space, COWBOYS can leverage domain knowledge. Instead of a generic Euclidean kernel on latent vectors, the authors use the Tanimoto kernel, a similarity measure specifically designed for molecular fingerprints.

Equation for Tanimoto Kernel

This kernel ($K_T$) calculates similarity based on the presence or absence of specific substructures ($\pmb{m}$) in the molecules. This allows the GP to learn chemical properties much more effectively than it could from arbitrary latent coordinates.

Step 2: The Search Strategy

If we aren’t optimizing coordinates in a box, how do we find new molecules? COWBOYS formulates this as a sampling problem.

We want to generate molecules that are chemically valid (high probability under the VAE prior $p_\theta(x)$) and have high predicted performance (high probability under the GP surrogate).

Algorithmically, instead of maximizing an acquisition function, COWBOYS samples from a conditional distribution:

Equation for conditional sampling

Here, we are asking the model to generate a molecule $x$ such that its predicted value $f_x$ is greater than the current best observed value $f^*$, given the data observed so far ($D_n^{\pmb{\chi}}$).

Step 3: Practical Implementation via MCMC

Sampling directly from this conditioned distribution is difficult. However, the authors use a clever trick. They map the search back to the latent space solely for the purpose of sampling, but they evaluate the condition using the structured GP.

They approximate the stochastic decoder with a deterministic mapping $h_\theta(z)$ (taking the most likely decoding). The sampling problem then becomes finding a latent code $z$ that satisfies:

Equation for latent sampling with deterministic decoding

Using Bayes’ rule, this posterior distribution is proportional to the prior times the likelihood of improvement:

Equation 5: Bayes decomposition

This equation is elegant.

$p(z)$ is the standard Gaussian prior of the VAE (we know how to sample this).
$p(g_{\theta, z} > f^* | D)$ resembles the Probability of Improvement (PI) acquisition function, which the GP can calculate easily.

To sample from this distribution efficiently, the authors use Preconditioned Crank-Nicolson (PCN) MCMC. Unlike standard random walks, PCN is designed for high-dimensional Gaussian priors. It naturally stays on the “annulus” (the high-probability shell discussed earlier), avoiding the dead zones that plague box-constrained LSBO.

Algorithm Summary

The full COWBOYS process (Algorithm 2) differs significantly from LSBO:

Algorithm 2 COWBOYS

Initial Design: Sample molecules purely from the VAE prior.
Loop:

Update the Structured GP with all molecule-value pairs $(x, y)$.
Use MCMC to sample a new latent $z$ that is likely to improve upon the best result $f^*$.
Decode $z$ to $x$, evaluate $f(x)$, and repeat.

Experiments & Results

The authors benchmarked COWBOYS against a wide range of baselines, including state-of-the-art LSBO methods that use complex heuristics to fine-tune the VAE during optimization.

Low-Data Efficiency

One of the most critical requirements in drug discovery is sample efficiency. Wet-lab experiments are expensive; you can’t evaluate millions of candidates.

Figure 3 shows the performance on six distinct molecular optimization tasks. The x-axis represents the number of oracle calls (experiments), and the y-axis represents the reward (molecular quality).

$Figure 3.Average performance ( \$\\pm\$ standard error) of COWBOYS over 10 runs on problems considered by (Chu et al.,2024).COWBOYS achieves a substantial improvement in sample efficiency over all existing LSBO methods…$

Key Takeaway: COWBOYS (the orange stars) achieves high rewards much faster than competing methods. In tasks like Amlodipine MPO (bottom left of the figure) and Osimertinib MPO (top left), it reaches near-optimal performance in under 500 evaluations, while other methods lag behind. Notably, it outperforms methods that fine-tune the VAE (like LOL-BO and InvBO) without needing the computational overhead or instability risk of retraining the neural network.

High-Dimensional Performance

The authors also tested COWBOYS on high-dimensional discrete sequence optimization tasks (Table 1).

Table 1 - Results

In this comprehensive benchmark, COWBOYS consistently achieves the highest average scores (bolded). It beats both evolutionary algorithms (GA, HillClimbing) and other BO methods. This proves that the method generalizes beyond just small molecular graphs.

The Benefit of Decoupling

To rigorously prove that the decoupling strategy is the source of the improvement, the authors compared COWBOYS against LSBO methods that—like COWBOYS—do not fine-tune the VAE.

$Figure 4.Average performance ( \$\\pm\$ standard error) over 2O repetitions with an log-scaled \$\\mathbf { X } ^ { }\$ -axis,demonstrating that,among LSBO methods that cannot fine-tune their latent space, COWBOYS provides significant improvemnt in efficiency.$

Figure 4 utilizes a log-scaled x-axis to show performance over a longer horizon. Even when standard LSBO methods (like TURBO-L) are given orders of magnitude more evaluations, they struggle to catch up to the efficiency of COWBOYS. This highlights that fitting a GP in structure space (where chemical rules apply) is fundamentally superior to fitting it in a latent space that wasn’t designed for regression.

Ablation Study

Finally, an ablation study (Table 2) checked the robustness of the method.

$Table 2.Average performance \$( \\pm \\mathrm { s . d . } )\$ over 5 repetitions of COWBOYS…$

The column on the far right is particularly telling: “COWBOYS with Latent GP”. In this experiment, they kept the COWBOYS sampling strategy but forced the GP to use the latent space (reverting to the LSBO surrogate). Performance dropped significantly across almost all tasks. This confirms that the structured kernel (Tanimoto) is a crucial component of the success.

Conclusion & Implications

“Return of the Latent Space COWBOYS” presents a compelling argument for simplifying how we combine generative models and optimization. Rather than forcing a surrogate model to learn the complex, often non-smooth geometry of a VAE’s latent space, the authors show we should let each model do what it does best:

The VAE defines the search space and ensures we generate valid structures.
The Gaussian Process models the objective function directly in the structure space, using kernels that capture domain knowledge.

By connecting these two via a principled Bayesian update rule and efficient MCMC sampling, COWBOYS eliminates the need for arbitrary search bounds and expensive VAE fine-tuning.

This work has broader implications beyond molecule design. It suggests a general blueprint for optimization over structured data: Decouple your generator from your predictor. Whether designing proteins, computer code, or mechanical parts, this framework allows practitioners to inject domain expertise (via kernels) directly into the optimization loop, turning the “black box” of latent space optimization into a more transparent and efficient process.

Introduction#

Background: The Components of Discovery#

Bayesian Optimization (BO)#

The Variational Autoencoder (VAE)#

The Standard Approach: LSBO#

The Pitfalls of Latent Space BO#

1. The Alignment Problem#

2. The Stochasticity Problem#

3. The Geometry Problem (The “Doughnut” Effect)#

The Core Method: COWBOYS#

The Philosophy of Decoupling#

Step 1: The Structured Surrogate#

Step 2: The Search Strategy#

Step 3: Practical Implementation via MCMC#

Algorithm Summary#

Experiments & Results#

Low-Data Efficiency#

High-Dimensional Performance#

The Benefit of Decoupling#

Ablation Study#

Conclusion & Implications#