](https://deep-paper.org/en/paper/2110.13985/images/cover.png)
The Swiss Army Knife of Sequence Models: A Deep Dive into Linear State-Space Layers
Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and Transformers have revolutionized the way we process sequential data such as text, audio, and time series. Each paradigm is powerful, but each comes with its own limitations: RNNs are efficient at inference but train slowly on long sequences and suffer from vanishing gradients. CNNs train in parallel and are fast, but they struggle beyond their fixed receptive field and have costly inference. Transformers can capture global context but scale quadratically in memory and computation with sequence length. What if we could unify the strengths of these approaches? Imagine a model with: ...