Unlocking 6-DoF Motion: How Event Cameras Can See Rotation and Translation Without an IMU
Imagine trying to navigate a drone through a dense forest at high speed. A standard camera takes snapshots—click, click, click. If you move too fast between clicks, the world blurs, or you miss obstacles entirely.
Enter the Event Camera. Instead of taking snapshots, it mimics the biological eye. It has pixels that work independently, firing a signal (an “event”) the instant they detect a change in brightness. This results in a continuous stream of data with microsecond latency, zero motion blur, and high dynamic range.
However, using this data to figure out how the camera itself is moving (egomotion estimation) is mathematically difficult. Until recently, researchers often “cheated” by pairing the camera with an Inertial Measurement Unit (IMU) to handle the rotational part of the motion, leaving the camera to only solve for translation (linear velocity).
In the paper “Full-DoF Egomotion Estimation for Event Cameras Using Geometric Solvers,” researchers propose a breakthrough. They introduce a geometric framework that can recover the Full Degrees of Freedom (Full-DoF)—both rotation and translation—purely from event data, without needing an IMU.
In this post, we will tear down the complex geometry of “event manifolds,” explore how lines in the real world translate to mathematical constraints, and see how we can optimize these constraints to pinpoint a camera’s velocity.
The Core Problem: Separating the Spin from the Slide
When a camera moves through a static scene, the motion of objects on the image plane depends on two things:
- Linear Velocity (\(\mathbf{v}\)): How fast the camera is translating (sliding) in \(x, y, z\).
- Angular Velocity (\(\boldsymbol{\omega}\)): How fast the camera is rotating (spinning) around \(x, y, z\).
For standard cameras, we solve this by matching feature points across frames. For event cameras, we don’t have frames; we have a stream of asynchronous points \((x, y, t, p)\).
Previous “sparse geometric solvers” for event cameras made a simplifying assumption: Assume we know the rotation (thanks to an IMU). If you remove rotation, the math becomes a linear problem, which is easy to solve. But if you try to solve for both \(\mathbf{v}\) and \(\boldsymbol{\omega}\) simultaneously, the problem becomes non-linear and much harder. The equations get messy, and “rotation-translation ambiguity” (confusing a rotation for a translation) becomes a risk.
The authors of this paper tackle this head-on by utilizing the geometry of lines. Straight lines in the 3D world are abundant (edges of tables, buildings, windows) and create very specific patterns in the event stream as the camera moves.
Method 1: The Incidence Relation (Geometry of Raw Events)
The first approach proposed uses the Incidence Relation. This is based on the idea of the “eventail” (a manifold of events).
Let’s assume a short time window where the camera’s velocity is constant. If you look at a single 3D line in space, every event generated by that line must geometrically “intersect” that 3D line in space and time.
The Setup
Look at Figure 1 below. The light blue line \(\mathbf{L}\) is the actual static line in the 3D world.
- The camera center is moving (represented by \(\mathbf{C}_j\) at time \(t_j\)).
- An event \(e_j\) happens on the image plane.
- This creates a “bearing vector” \(\mathbf{f}'_j\) (the direction from the camera to the event).

The geometric constraint is simple: The ray coming from the camera through the event (\(\mathbf{L}_j^e\) in orange) must touch the real 3D line \(\mathbf{L}\).
The Mathematical Constraint
To turn this geometric intuition into math, we need to account for the camera’s motion. The camera rotates by \(\mathbf{R}_j\) and translates by \(t_j \mathbf{v}\).
The paper derives a specific constraint equation. If we define a local coordinate frame attached to the line, we can express the relationship between the event bearing vector, the camera’s translation, and the line’s parameters:

Here, \(\mathbf{f}_j'\) is the bearing vector, and the terms in the parentheses relate to the linear velocity components (\(u\)) and the line’s orientation basis vectors (\(\mathbf{e}\)).
The beauty of this equation is that if we stack up enough events (at least 8), we can form a system of equations. If we arrange these into a matrix \(\mathbf{A}(\boldsymbol{\omega})\) that depends on our unknown rotation, and a vector \(\mathbf{x}\) containing the translation and structure unknowns, we get:

Solving for Rotation
The equation \(\mathbf{A}(\boldsymbol{\omega})\mathbf{x} = \mathbf{0}\) implies that the matrix \(\mathbf{A}(\boldsymbol{\omega})\) must be rank-deficient. To find the correct angular velocity \(\boldsymbol{\omega}\), we want to make this matrix as “close” to singular as possible.
Mathematically, we define a cost matrix \(\mathbf{M}(\boldsymbol{\omega})\):

We then search for the angular velocity \(\boldsymbol{\omega}^*\) that minimizes the smallest eigenvalue (\(\lambda_{\min}\)) of this matrix:

In simpler terms: We tune the rotation parameters until the geometric constraint (incidence) is satisfied as perfectly as possible.
Method 2: The Coplanarity Relation (Geometry of Normal Flow)
The authors propose a second, alternative method. Instead of using raw event locations, this method uses Normal Flow.
In event vision, “normal flow” describes the motion of an edge perpendicular to itself. If you imagine a straight line moving across a screen, you can easily tell how fast it’s moving sideways, but not how fast it’s sliding along its own length. This perpendicular motion is the normal flow.
The Geometry
Consider Figure 2. It shows how events generate a plane.
- For an event \(e_j\), we can calculate a plane normal vector \(\mathbf{n}_j\).
- This vector is derived from the event’s position and its normal flow \(\mathbf{g}_j\).

Here is the key insight: All plane normals generated by the same moving 3D line must be perpendicular to the direction of that 3D line.
If \(\mathbf{d}\) is the direction of the 3D line, and \(\mathbf{n}'_j\) are the normal vectors (corrected for camera rotation), then their dot product must be zero. This creates a Coplanarity Relation.
The Mathematical Constraint
We stack all the normal vectors into a matrix \(\mathbf{B}\). The constraint states that the line direction \(\mathbf{d}\) is orthogonal to all rows of \(\mathbf{B}\):

Similar to the incidence method, this allows us to decouple the rotation from the translation. We construct a matrix \(\mathbf{N}\) based on the rotation-corrected normals:

And again, we optimize the angular velocity \(\boldsymbol{\omega}\) to minimize the smallest eigenvalue of this matrix.
Why Two Methods?
- Incidence (Method 1) uses raw events. It is fundamental but leads to a larger optimization problem (6x6 matrix).
- Coplanarity (Method 2) uses normal flow. It leads to a smaller, more efficient problem (3x3 matrix) but relies on the quality of the flow estimation.
The “Chicken and Egg” Optimization
We have a problem. To check if a rotation is correct, we need to build these matrices (\(\mathbf{A}\) or \(\mathbf{B}\)). But building these matrices requires rotating the event vectors using… the rotation we are trying to find!
To solve this non-linear optimization loop, the authors use the Adam optimizer, a popular algorithm in deep learning, but applied here to a geometric problem.
The “Cascade” Strategy
Calculating the exact rotation matrix \(\mathbf{R} = \exp([t\boldsymbol{\omega}]_\times)\) is computationally expensive because it involves trigonometric functions and matrix exponentials. To speed this up, the authors use a First-Order Approximation.
For small time intervals (which event cameras naturally have), the angle of rotation is tiny. We can approximate the rotation matrix as:

This linear approximation makes the math much faster. The researchers propose a Cascade approach:
- Initialize: Start with zero rotation.
- Rough Pass: Run the optimizer using the First-Order Approximation. This is fast and gets us close to the solution.
- Fine Tuning: Use the result from step 2 as the starting point for the Exact Rotation solver to get high precision.
Experiments and Results
Does it actually work? The researchers tested the solvers on both synthetic simulations and real-world datasets.
Robustness to Noise
Event cameras are noisy. Timestamps jitter, and pixels sometimes fire randomly. Figure 3 below shows how the error in angular (\(\varepsilon_{ang}\)) and linear (\(\varepsilon_{lin}\)) velocity changes as we add noise or change the amount of data.

- Graphs (a) & (b): As the number of events increases (x-axis), the error (y-axis) drops rapidly.
- Graphs (e) - (h): As noise increases, error increases, but the method remains stable.
- Insight: The Incidence method (Blue) generally handles low-noise situations slightly better, while Coplanarity (Red) is competitive.
The Importance of Multiple Lines
You cannot solve this problem with just one line in the scene. A single line creates a “rotation-translation ambiguity”—you can’t tell if you are moving parallel to the line or rotating around it.
Figure 4 visualizes the “landscape” of the cost function (the value we are trying to minimize).

Look at column (a) “IncMin with 1 line.” The dark blue region (the minimum) is a long valley. There is no single distinct point; many different velocities look “correct.” Now look at column (c) and (e) where 2 or 3 lines are used. The valley becomes a bowl. There is a clear, single global minimum. This proves that observing at least 2 non-parallel lines is crucial for Full-DoF estimation.
Real-World Performance
The authors applied their method to the VECtor dataset. First, they extracted lines from the event stream.

Using these clusters, they estimated the velocity.

The results in Table 2 show that the method works on real data, with angular velocity errors around 0.2 and linear direction errors around 20 degrees. While this might seem high compared to systems with IMUs, remember: this is doing it the hard way—using only vision. It proves the concept is viable for scenarios where IMUs might fail or aren’t available.
Conclusion
This paper marks a significant step forward for event-based vision. By mathematically modeling the eventail (the space-time surface of a moving line), the authors created the first sparse geometric solvers capable of recovering Full-DoF egomotion without inertial sensors.
Key takeaways:
- Geometry beats assumption: We don’t need to assume rotation is known; we can solve for it using incidence or coplanarity constraints.
- Lines are powerful: Simple straight lines provide enough geometric constraints to decouple rotation from translation.
- Optimization matters: Using a cascade of approximate-then-exact optimization makes the problem solvable in real-time.
This work lays the foundation for fully autonomous event-based navigation systems that are lighter, simpler, and closer to how biological vision actually works.
](https://deep-paper.org/en/paper/2503.03307/images/cover.png)