Introduction

Imagine training a robot to carry a glass of water. In a simulation, your Reinforcement Learning (RL) agent performs perfectly, walking briskly without spilling a drop. But when you deploy that same policy onto a physical robot, the motors twitch, the arms shake, and the water goes everywhere.

This phenomenon is known as action fluctuation. It is one of the primary barriers preventing Deep Reinforcement Learning from being widely adopted in real-world engineering tasks like autonomous driving and robotics. Jittery control signals don’t just look bad; they wear out actuators, consume excess energy, and create genuine safety risks.

For years, researchers have treated this as a tuning problem, trying to penalize erratic movements or smooth out data in ad-hoc ways. However, a new paper titled “LipsNet++: Unifying Filter and Controller into a Policy Network” takes a more principled approach. The authors identify the root causes of this fluctuation and propose a novel neural network architecture that solves them by borrowing a concept from classical control theory: separating the “filter” from the “controller.”

In this post, we will tear down the LipsNet++ architecture to understand how it achieves state-of-the-art smoothness and robustness without sacrificing performance.

Comparison of fluctuating action vs smooth action.

The Problem: Why Do Policies Jitter?

To solve action fluctuation, we first need to understand where it comes from. In standard RL, an agent (the “Actor”) observes a state \(s_t\) and outputs an action \(a_t\). Ideally, if the state changes smoothly, the action should also change smoothly.

However, the researchers derived the rate of action change over time and identified a mathematical inequality that reveals the two fundamental culprits:

Equation describing the bound of action change rate.

This inequality tells us that the fluctuation of an action (\(\frac{da_t}{dt}\)) is bounded by the product of the policy’s sensitivity (how much the action changes when the observation changes) and the input’s rate of change (how fast the observation changes).

This leads to the identification of two distinct root causes:

  1. Observation Noise: Real-world sensors are noisy. Even if the robot is standing still, sensor readings (\(\sigma_t\)) flicker. If the policy reacts to every tiny flicker, the motors will twitch.
  2. Policy Non-Smoothness: Neural networks are universal function approximators, which means they can learn very “jagged” functions. A tiny change in the input state might trigger a massive jump in the output action if the network hasn’t been constrained.

Most previous solutions tried to fix only one of these or mixed them up inefficiently. LipsNet++ proposes a decoupled solution: a Fourier Filter Layer to handle the noise, and a Lipschitz Controller Layer to handle the policy smoothness.

Visualizing smooth vs non-smooth policy landscapes.

As shown in Figure 14 above, a non-smooth policy (d) results in a jagged trajectory, whereas a smooth policy (b) yields continuous, stable control.

The Solution: LipsNet++ Architecture

The genius of LipsNet++ lies in its structure, which mirrors a classical control loop. In classical engineering, you would never feed raw, noisy sensor data directly into a controller. You would pass it through a filter first.

LipsNet++ integrates this workflow into a single, end-to-end differentiable policy network.

Overall architecture of LipsNet++.

Module 1: The Fourier Filter Layer

The first line of defense against jitter is the Fourier Filter Layer. Its job is to look at a history of observations and strip away the high-frequency noise while keeping the important signal.

Standard neural networks usually process the current state or a stack of frames. LipsNet++ takes the last \(N\) observations and processes them using the Fast Fourier Transform (FFT).

How It Works

  1. Input Stacking: The layer takes a sequence of historical observations \(o_t, o_{t-1}, \dots, o_{t-N+1}\).
  2. FFT: It converts this time-domain sequence into the frequency domain. This reveals which frequencies are present in the signal. Noise typically lives in the high frequencies, while the physical movement of a robot lives in lower frequencies.
  3. Learnable Filtering: This is the clever part. Instead of a human engineer manually designing a low-pass filter (deciding a cutoff frequency), the network learns a filter matrix \(H\). This matrix multiplies the frequency features, effectively learning to “turn down the volume” on frequencies that represent noise and “turn up” the frequencies that carry useful state information.
  4. IFFT: Finally, an Inverse Fast Fourier Transform converts the clean frequency data back into a time-domain signal (\(\tilde{o}_t\)) to be used by the controller.

Workflow of the Fourier Filter Layer.

To ensure this layer actually filters noise (rather than just passing everything through), the authors add a penalty term to the loss function that encourages the filter matrix \(H\) to be sparse (small values). This forces the network to only keep the frequencies that are absolutely necessary for maximizing reward.

Module 2: The Lipschitz Controller Layer

Now that we have a clean signal \(\tilde{o}_t\), we need to map it to an action. However, even with clean input, a standard Multi-Layer Perceptron (MLP) can still learn a chaotic, jagged function.

To prevent this, the policy needs to be Lipschitz continuous. In simple terms, a function is Lipschitz continuous if its rate of change is bounded. If the input changes by a small amount, the output cannot change by more than \(K\) times that amount, where \(K\) is the Lipschitz constant.

Previous attempts like the original LipsNet or MLP-SN (Spectral Normalization) had significant drawbacks. They were either computationally expensive during inference (slow) or overly restrictive, which hurt the robot’s performance.

Jacobian Regularization

LipsNet++ introduces a more flexible approach called Jacobian Regularization.

The local Lipschitz constant of a function is closely related to the norm of its gradient (the Jacobian). If the gradient is huge, the function is steep and unstable. If the gradient is small, the function is smooth.

Instead of forcing the network architecture to be smooth (which limits what it can learn), LipsNet++ adds a “soft” constraint to the training loss:

Loss function with Jacobian regularization.

Here, \(\| \nabla f \|\) is the norm of the Jacobian. By minimizing this term during training, the network is encouraged to learn a smooth mapping naturally. This allows the controller to be any differentiable network structure (not limited to specific activation functions) and removes the need for complex calculations during inference (deployment), making it very fast.

Experiments and Results

The researchers put LipsNet++ to the test against standard MLPs and previous state-of-the-art smoothing methods (like LipsNet v1 and MLP-SN) in both simulation and real-world scenarios.

Simulation: The Double Integrator

In a controlled physics environment called the “Double Integrator,” different levels of noise were injected into the observations to stress-test the policies.

Comparison of waveforms in double integrator.

The results in Figure 6 are striking.

  • Top (a): The amplitude of action fluctuation for LipsNet++ (green) is significantly lower than the standard MLP (blue).
  • Middle (b): The trajectory of LipsNet++ is smooth and continuous, while the MLP jitters aggressively.
  • Bottom (c): The frequency spectrum confirms that LipsNet++ operates mostly in the low-frequency range, successfully suppressing high-frequency noise.

When summarized in a radar chart, LipsNet++ demonstrates superior performance across the board, balancing control accuracy, smoothness, and computational speed better than any competitor.

Radar chart comparing performance metrics.

Visualization of the Learned Filter

One of the most fascinating results is visualizing what the Fourier Filter Layer actually learned. Did it actually learn to filter noise?

Heatmaps of observation frequency and filter matrix.

Figure 24 confirms the hypothesis.

  • (a) Noise-free input: The real signal is concentrated in specific low frequencies.
  • (b) Noisy input: Noise pollutes the entire frequency spectrum.
  • (c) Filter Matrix: The learned matrix \(H\) automatically focuses on the frequencies where the real signal exists and suppresses the rest. It essentially learned to be a band-pass filter purely through reinforcement learning, without human intervention.

Real-World Validation: Mini-Vehicle Driving

Simulations are forgiving; the real world is not. The team deployed LipsNet++ on physical mini-robots tasked with tracking trajectories and avoiding obstacles. They injected artificial noise into the sensors to simulate difficult perception conditions.

Trajectories of real-world vehicles.

In Figure 8, we see the robot attempting to avoid an obstacle (the red ‘x’).

  • MLP (Blue): The trajectory is shaky. Looking at subplots (d) and (e), the acceleration commands oscillate wildly. This is the “twitching” that destroys motors.
  • LipsNet++ (Orange): The trajectory is clean, and the control inputs are smooth.

The difference becomes even more apparent when looking at the aggregate data as noise increases.

Graphs showing performance trends as noise increases.

As shown in Figure 50, as noise levels (x-axis) rise:

  1. TAR (Total Average Return): LipsNet++ (Orange) maintains high performance, while MLP (Blue) performance collapses.
  2. AFR (Action Fluctuation Ratio): The MLP becomes incredibly unstable (high AFR), while LipsNet++ remains composed.

In one specific high-noise scenario, the MLP-driven robot actually crashed because its erratic movements caused it to lose control, while the LipsNet++ robot completed the task successfully.

Conclusion

LipsNet++ represents a significant step forward for practical Reinforcement Learning. It bridges the gap between the chaotic, noisy reality of physical hardware and the mathematical precision of neural networks.

By explicitly decoupling the problem into filtering (handling noise) and controlling (handling policy smoothness), the authors created a system that is:

  1. Robust: It handles sensor noise effectively via the Fourier Filter Layer.
  2. Smooth: It generates fluid motions via the Jacobian-regularized Lipschitz Controller.
  3. Fast: It avoids the computational bottlenecks of previous methods, making it suitable for real-time control.

For students and practitioners in robotics and AI, this paper serves as a perfect example of how combining domain knowledge (control theory) with modern deep learning can solve fundamental problems that brute-force training cannot. As we move toward more embodied AI, architectures like LipsNet++ will be essential for creating robots that move with the grace and stability of biological systems.