Beyond Feedforward: Teaching Neural Networks to Remember with Recurrent Multilayer Perceptrons

Introduction: Modeling the Unknowable

Imagine trying to create a perfect digital twin of a complex chemical reactor or a power grid. These systems are governed by countless interacting physical processes—many of which are too intricate or poorly understood to be captured by neat mathematical equations. When building a model from first principles is impossible, engineers turn to a powerful alternative: system identification.

The idea is straightforward. Instead of explaining how the system works internally, we build a “black box” model that simply learns to behave like the real thing. By feeding the same inputs into our model and comparing its outputs to those of the actual system, we can train it to replicate the system’s response. This empirical approach is indispensable for tasks such as monitoring system health, predicting failures, and designing adaptive controllers.

Traditional system identification techniques have excelled when dealing with linear systems, where relationships between inputs and outputs are proportional and easy to model. Yet, real-world systems—chemical reactions, fluid flows, mechanical vibrations—are almost always nonlinear, making them far harder to predict.

In a pioneering 1991 paper, a team of researchers proposed a novel solution to this problem using the power of artificial neural networks. They introduced a hybrid architecture called the Recurrent Multilayer Perceptron (RMLP), specifically designed to learn the time-dependent, nonlinear behavior of dynamic systems. This post unpacks their approach, showing how the RMLP’s unique structure allows neural networks not just to approximate complex functions, but to remember and model the element of time.

The Challenge of Modeling Dynamics

Before we explore how the RMLP works, let’s define what we mean by a nonlinear dynamic system. Such systems can be expressed in the following general form:

A general state-space representation of a nonlinear dynamic system. The next state x(k+1) is a function f of the current state x(k) and input u(k). The current output y(k) is a function g of the current state x(k).

Figure: A general state-space model describing how a system’s state and output evolve over time.

At any time step \(k\):

\( \mathbf{u}(k) \) — the system input (for example, opening a valve).
\( \mathbf{x}(k) \) — the internal state (such as temperature or pressure inside a reactor).
\( \mathbf{y}(k) \) — the observable output (e.g., flow rate or voltage).

The unknown functions \( \mathbf{f} \) and \( \mathbf{g} \) capture the system’s underlying physics. The goal of identification is to learn these functions purely from observed data — to infer how the system evolves over time.

Standard feedforward neural networks, like the Multilayer Perceptron (MLP), are excellent at learning static relationships between inputs and outputs. However, they lack one crucial feature: memory. A classic MLP’s output at time \(k\) depends only on the input at that same instant; it cannot recall previous states. Capturing dynamic behavior demands a model that can remember its own past.

This is exactly where the Recurrent Multilayer Perceptron comes into play.

Recurrent Multilayer Perceptron: A Network with Memory

The RMLP is a hybrid neural architecture that blends the curve-fitting strength of feedforward networks with the temporal awareness of recurrent systems.

A schematic of the Recurrent Multilayer Perceptron (RMLP) architecture. It shows an input layer, two hidden layers, and an output layer with various types of connections, including feedforward, recurrent, and cross-layer links.

Figure 1: The architecture of a Recurrent Multilayer Perceptron, showing feedforward, recurrent, and cross-layer connections.

At a glance, the RMLP looks similar to a standard MLP—with input, hidden, and output layers—but it supports additional feedback mechanisms that imbue it with a sense of time. Here’s how each type of connection contributes:

Feedforward Links: The standard connections passing information from one layer to the next (input → hidden → output). These enable strong function approximation abilities.
Recurrent Links: Loops within a single layer that feed a neuron’s past output back into the same layer at the next time step. This allows the model to retain an internal memory of previous states.
Cross-Layer Links (Crosstalk): Feedback connections between layers that enable more complex inter-layer temporal interactions.

Together, these mechanisms allow the RMLP to model how inputs, outputs, and internal states evolve over time—making it a compelling structure for dynamic system identification.

Understanding the Mathematics

The behavior of each neuron in an RMLP combines information from both the current and prior time steps. The output of neuron i in layer l at time \(k\), denoted \(x_{[l,i]}(k)\), is computed as:

The governing equation for a neuron in the RMLP. It shows the neuron’s output is an activation function applied to the sum of recurrent inputs from the same layer, feedforward inputs from the previous layer, and a bias term.

Figure: Each neuron’s output merges the influence of current feedforward signals and past recurrent information.

Let’s break down the components:

\(F_l\) — activation function (e.g., sigmoid or tanh).
\( \sum_{j} w_{[l,j][l,i]} x_{[l,j]}(k-1) \) — recurrent term, incorporating past outputs from the same layer.
\( \sum_{j} w_{[l-1,j][l,i]} x_{[l-1,j]}(k) \) — feedforward term, using current outputs from the preceding layer.
\( b_{[l,i]} \) — bias term.

This structure gives every neuron a dynamic character—it learns relationships spanning both space (across layers) and time (across sequential steps).

Training the RMLP: Dynamic Learning

The next challenge is training such a network. The objective is to find the set of weights \(w\) that minimize the error between the network’s predictions and the true system outputs. This error is typically measured as a squared difference:

The squared error function E(k), calculated as the sum of squared differences between the network’s output x and the target output x_hat for all output neurons.

Figure: The squared error \(E(k)\) used to quantify how closely the network replicates the system output.

For the RMLP, the authors use a steepest-descent update rule:

The weight update rule for steepest descent. The change in a weight is proportional to the negative gradient of the error with respect to that weight, summed over all data points and scaled by a learning rate eta.

Figure: The weight update rule used during dynamic learning.

Here, \( \eta \) is the learning rate controlling step size, and the gradient \( \frac{\partial E(k)}{\partial w} \) expresses how changing a particular weight alters the total error.

Training recurrent networks is notoriously challenging because an individual weight’s influence ripples through time, affecting future states. Traditional algorithms like Backpropagation Through Time (BPTT) handle this via an expensive backward pass through all past time steps.

The authors propose a cleaner alternative: a dynamic learning algorithm using two forward passes instead of a backward one.

First Forward Sweep: Compute the network outputs for a sequence of inputs.
Second Forward Sweep: Calculate necessary output gradients for weight adjustments.

This “all-forward” approach streamlines computation, eliminates the backward time recursion, and maps well onto parallel hardware—making training faster and more scalable.

Example: Identifying a Nonlinear System

To demonstrate the RMLP’s capabilities, the authors applied it to identify a second-order nonlinear system defined by:

The second-order nonlinear difference equation used in the experiment: y(k) = y(k-1) - 0.6 * y(k-1) * y(k-2) + 0.7 * u^3(k).

Figure: The nonlinear difference equation used as the test system.

In this equation, the system’s output \(y(k)\) depends on two past outputs \(y(k-1)\) and \(y(k-2)\), as well as the cube of the current input \(u(k)\). The equation introduces both memory effects and nonlinear mixing—making it an ideal test case for the RMLP.

The team trained a network with two hidden layers (10 neurons and 8 neurons, respectively) using a variety of input patterns: steps, inverse steps, ramps, and inverse ramps. After 1000 training cycles, the network converged successfully.

To test its generalization ability, they introduced a noisy step input the network had never seen before. The results speak volumes:

A time-series plot comparing the output of the true model (solid line) and the neural network’s prediction (dashed line) in response to a noisy step input. The two lines track each other very closely over time.

Figure 2: The RMLP’s output (dashed) closely matches the true system response (solid), even with noise.

Despite the added noise, the RMLP perfectly tracks the true system’s dynamics. This indicates that the network didn’t merely memorize the training data—it internalized the system’s underlying rules of behavior.

Conclusion: Why It Still Matters

The Recurrent Multilayer Perceptron concept introduced in 1991 was well ahead of its time. Its key insights remain foundational to how we design and train modern recurrent architectures today.

Key takeaways:

Hybrid Architecture: Combining feedforward and recurrent connections allows a network to simultaneously approximate complex functions and model time-dependent behavior.
Efficient Learning: The dynamic, two-forward-sweep algorithm is a clever way to achieve temporal learning without the computational burden of traditional backward passes.
Effective Performance: Even on a challenging nonlinear system with noisy inputs, the RMLP demonstrated robust predictive capabilities.

Although the researchers noted that testing on real-world physical systems was still underway, their work laid the groundwork for today’s advanced recurrent models — from LSTMs and GRUs to cutting-edge transformer-based architectures. The RMLP represents an early bridge between classic neural computation and dynamic memory, teaching machines not only to think but also to remember.

Introduction: Modeling the Unknowable#

The Challenge of Modeling Dynamics#

Recurrent Multilayer Perceptron: A Network with Memory#

Understanding the Mathematics#

Training the RMLP: Dynamic Learning#

Example: Identifying a Nonlinear System#

Conclusion: Why It Still Matters#