Introduction: The Problem of Missing Time
Imagine you’re a doctor monitoring a patient’s heart with an ECG, but the sensor glitches and you lose a few critical seconds of data. Or perhaps you’re a financial analyst tracking stock prices and your data feed suddenly has gaps. Missing data is not just inconvenient—it’s a pervasive issue in real-world applications. It can derail machine learning models, introduce bias, and lead to flawed conclusions.
For time series data—where temporal continuity and ordering are crucial—these gaps are particularly damaging. Most machine learning algorithms can’t tolerate missing values, so the usual remedy is imputation: filling in missing entries with plausible estimates. But what makes a good estimate? A simple average might wash out important peaks, while naive interpolation could miss underlying trends entirely. Poor imputations can corrupt downstream analysis.
The paper “Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models” introduces a new model, SSSD (Structured State Space Diffusion), designed to meet this challenge. It unites two powerful modern deep learning technologies:
- Diffusion Models — State-of-the-art generative models that excel at creating realistic data by reversing a gradual noise-adding process.
- Structured State Space Models (S4) — An efficient architecture for capturing long-range dependencies in sequences, often outperforming RNNs and Transformers.
By fusing these, the authors produce a model that achieves state-of-the-art results across diverse benchmarks, and that can excel even in the hardest scenarios—such as imputing large contiguous blocks of missing data—where prior methods have failed completely.
This article explains how SSSD works, from its foundations through its architecture and training strategy, and showcases experimental results demonstrating why it represents a major step forward in time series modeling.
Background: The Building Blocks of SSSD
Before diving into the architecture, it’s essential to understand the concepts underpinning SSSD: the types of missingness, the principles behind diffusion models, and the strengths of state space models.
Scenarios of Missingness
Not all missing data patterns pose the same difficulty. The paper focuses on four scenarios, illustrated below.
Figure 1: Example missingness scenarios. Blue regions are known data; grey regions denote missing points to be imputed. Light/dark green bands represent prediction intervals from multiple imputations; orange is a sample imputation.
- Random Missing (RM): Individual points are missing at random across the series—typically the easiest case, as nearby values can guide estimates.
- Random Block Missing (RBM): Contiguous blocks of missing data, varying by channel.
- Blackout Missing (BM): One contiguous block missing across all channels—a severe challenge, with no parallel channel data to leverage.
- Time Series Forecasting (TF): A special case of BM where the missing block lies at the end of the sequence—the task is to predict future points.
SSSD targets all of these, with a particular strength in BM and TF cases.
Diffusion Models: Generating Data by Denoising
Diffusion models generate data by learning to gradually remove noise from a signal. Conceptually:
- Start with a clean signal \(x_0\).
- Forward process: Incrementally add Gaussian noise over \(T\) steps until the signal becomes pure noise \(x_T\). Formally,
- Backward process: Train a model to reverse this, step-by-step, transforming \(x_T\) back to \(x_0\):
A key simplification is to have the network \(\epsilon_{\theta}(x_t, t)\) predict the noise \(\epsilon\) added at each step, using a mean squared error loss:
\[ L = \min_{\theta} \mathbb{E}_{x_0 \sim \mathcal{D}, \epsilon \sim \mathcal{N}(0,1), t \sim \mathcal{U}(1,T)} \|\epsilon - \epsilon_{\theta} (\sqrt{\alpha_t} x_0 + \sqrt{1-\alpha_t} \epsilon, t)\|_2^2. \]For imputation, conditional diffusion is used—the network gets the known data and a mask of missing points as conditioning signals, guiding the denoising to fill gaps consistently.
Structured State Space Models (S4): Mastering Long Sequences
SSMs map an input sequence \(u(t)\) to an output \(y(t)\) via a latent state vector \(x(t)\):
\[ x'(t) = A x(t) + B u(t), \quad y(t) = C x(t) + D u(t). \]With a HiPPO-based initialization of \(A\), S4 achieves strong memory over long contexts. Discretized, these become convolution operations—efficiently parallelizable with FFTs—combining RNN-like temporal modeling with CNN efficiency. This makes S4 ideal for long-sequence tasks like time series imputation in diffusion setups.
The Core Method: Inside the SSSD Model
SSSD embeds S4 layers within a conditional diffusion framework, enhancing the denoising network’s ability to capture long-term dependencies.
The SSSDS4 Architecture
Based on the DiffWave audio model, SSSDS4 swaps DiffWave’s dilated convolutional layers for S4 layers.
Figure 2: SSSDS4 architecture. Pink blocks are S4 layers integrated into residual diffusion blocks.
Flow of information:
- Inputs:
- Noisy sample \(\bar{x}\) at timestep \(t\).
- Conditioning \(C\): known values (with zeros in gaps) concatenated with the binary mask.
- Timestep embedding (via fully connected layers) indicating the noise level.
- Residual Blocks with S4:
- First S4 layer processes input after adding the timestep embedding—modeling long-range temporal structure in noise.
- Second S4 layer after merging conditioning—aligning signal patterns with constraints.
- Output: Noise prediction \(\epsilon_{\theta}\), used for loss computation or sampling.
Training Strategy: Focus on the Unknown
Two noise application strategies were tested:
- \(D_0\): Noise entire signal—loss combines reconstruction of known parts and imputation of unknowns.
- \(D_1\): Noise only the unknown regions—known parts remain clean, entered as conditioning.
\(D_1\) lets the model focus solely on imputing missing data. Experiments show \(D_1\) consistently outperforms \(D_0\).
Experiments and Results: Putting SSSD to the Test
SSSD was benchmarked for imputation and forecasting against strong baselines across diverse datasets.
ECG Imputation: Qualitative Leap
On the PTB-XL ECG dataset, SSSDS4 beat baselines decisively—especially in BM scenarios.
Table 1: PTB-XL imputation (MAE/RMSE). SSSDS4 yields markedly better scores in RBM and BM.
Visual comparisons reveal the magnitude of improvement.
Figure 3: BM imputation for a healthy ECG lead. CSDI output is unrealistic; SSSDS4 closely matches ground truth.
CSDI fails entirely in BM—producing nonsensical output. SSSDS4 recreates realistic waveforms, including vital QRS complex timing and amplitude.
Pushing the Limits: High Sparsity & High Dimensionality
On MuJoCo (up to 90% missing), SSSDS4 excelled in extreme sparsity, cutting MSE by >50% versus the best baseline at 90% missingness.
Table 2: MuJoCo imputation MSE. SSSDS4 dominates the hardest setting.
On high-dimensional Electricity (370 channels), using channel-splitting, SSSDS4 reduced errors >50% relative to top baselines like SAITS.
Table 3: Electricity RM imputation. Outstanding gains at 10% and 30% missingness.
Forecasting Performance
Forecasting is BM at sequence ends—SSSDS4 applies directly.
On PTB-XL ECG TF:
Figure 4: Forecasting on ECG. SSSD variants produce tighter uncertainty and match signal trends better.
On Solar, SSSDS4 cut MSE by 27% compared to the strongest baseline.
Table 4: Solar forecasting MSE. SSSDS4 outperforms specialized baselines.
On long-horizon ETTm1, SSSDS4 rivaled or beat specialized models like Informer and Autoformer across multiple forecast lengths.
Table 5: ETTm1 forecasting (MAE/MSE). SSSDS4 shows robust long-horizon ability.
Conclusion and Implications
By merging the generative strengths of diffusion models with the long-sequence mastery of S4 layers, SSSDS4 delivers:
- State-of-the-art performance: Consistently exceeds top models across datasets and missingness types.
- Mastery of blackout scenarios: Produces realistic outputs where prior methods fail.
- Architectural advantage: S4 layers capture essential long-term dependencies.
- Training efficiency: The \(D_1\) noise-focus strategy significantly boosts results.
SSSD offers a robust, general-purpose framework for sequential data modeling—ready for deployment in healthcare, finance, climate science, and beyond where data integrity is critical. It doesn’t just fill gaps; it reconstructs the full picture with fidelity.
The authors have released the code at https://github.com/AI4HealthUOL/SSSD, inviting further exploration and application of this promising approach.