Unlocking 6-DoF Motion: How Event Cameras Can See Rotation and Translation Without an IMU

Imagine trying to navigate a drone through a dense forest at high speed. A standard camera takes snapshots—click, click, click. If you move too fast between clicks, the world blurs, or you miss obstacles entirely.

Enter the Event Camera. Instead of taking snapshots, it mimics the biological eye. It has pixels that work independently, firing a signal (an “event”) the instant they detect a change in brightness. This results in a continuous stream of data with microsecond latency, zero motion blur, and high dynamic range.

However, using this data to figure out how the camera itself is moving (egomotion estimation) is mathematically difficult. Until recently, researchers often “cheated” by pairing the camera with an Inertial Measurement Unit (IMU) to handle the rotational part of the motion, leaving the camera to only solve for translation (linear velocity).

In the paper “Full-DoF Egomotion Estimation for Event Cameras Using Geometric Solvers,” researchers propose a breakthrough. They introduce a geometric framework that can recover the Full Degrees of Freedom (Full-DoF)—both rotation and translation—purely from event data, without needing an IMU.

In this post, we will tear down the complex geometry of “event manifolds,” explore how lines in the real world translate to mathematical constraints, and see how we can optimize these constraints to pinpoint a camera’s velocity.


The Core Problem: Separating the Spin from the Slide

When a camera moves through a static scene, the motion of objects on the image plane depends on two things:

  1. Linear Velocity (\(\mathbf{v}\)): How fast the camera is translating (sliding) in \(x, y, z\).
  2. Angular Velocity (\(\boldsymbol{\omega}\)): How fast the camera is rotating (spinning) around \(x, y, z\).

For standard cameras, we solve this by matching feature points across frames. For event cameras, we don’t have frames; we have a stream of asynchronous points \((x, y, t, p)\).

Previous “sparse geometric solvers” for event cameras made a simplifying assumption: Assume we know the rotation (thanks to an IMU). If you remove rotation, the math becomes a linear problem, which is easy to solve. But if you try to solve for both \(\mathbf{v}\) and \(\boldsymbol{\omega}\) simultaneously, the problem becomes non-linear and much harder. The equations get messy, and “rotation-translation ambiguity” (confusing a rotation for a translation) becomes a risk.

The authors of this paper tackle this head-on by utilizing the geometry of lines. Straight lines in the 3D world are abundant (edges of tables, buildings, windows) and create very specific patterns in the event stream as the camera moves.


Method 1: The Incidence Relation (Geometry of Raw Events)

The first approach proposed uses the Incidence Relation. This is based on the idea of the “eventail” (a manifold of events).

Let’s assume a short time window where the camera’s velocity is constant. If you look at a single 3D line in space, every event generated by that line must geometrically “intersect” that 3D line in space and time.

The Setup

Look at Figure 1 below. The light blue line \(\mathbf{L}\) is the actual static line in the 3D world.

  • The camera center is moving (represented by \(\mathbf{C}_j\) at time \(t_j\)).
  • An event \(e_j\) happens on the image plane.
  • This creates a “bearing vector” \(\mathbf{f}'_j\) (the direction from the camera to the event).

Figure 1. Incidence relation between the observed line L and line L_j^e of the j-th event. The line L_j^e is consistent with the bearing vector f_j’. The vector v represents the projection of the translation v onto the plane spanned by the vectors e_2^l and e_3^l, which is filled by dot patterns. Due to the aperture problem, only u_y^l and u_z^l components are observable.

The geometric constraint is simple: The ray coming from the camera through the event (\(\mathbf{L}_j^e\) in orange) must touch the real 3D line \(\mathbf{L}\).

The Mathematical Constraint

To turn this geometric intuition into math, we need to account for the camera’s motion. The camera rotates by \(\mathbf{R}_j\) and translates by \(t_j \mathbf{v}\).

The paper derives a specific constraint equation. If we define a local coordinate frame attached to the line, we can express the relationship between the event bearing vector, the camera’s translation, and the line’s parameters:

Equation 5

Here, \(\mathbf{f}_j'\) is the bearing vector, and the terms in the parentheses relate to the linear velocity components (\(u\)) and the line’s orientation basis vectors (\(\mathbf{e}\)).

The beauty of this equation is that if we stack up enough events (at least 8), we can form a system of equations. If we arrange these into a matrix \(\mathbf{A}(\boldsymbol{\omega})\) that depends on our unknown rotation, and a vector \(\mathbf{x}\) containing the translation and structure unknowns, we get:

Equation 6

Solving for Rotation

The equation \(\mathbf{A}(\boldsymbol{\omega})\mathbf{x} = \mathbf{0}\) implies that the matrix \(\mathbf{A}(\boldsymbol{\omega})\) must be rank-deficient. To find the correct angular velocity \(\boldsymbol{\omega}\), we want to make this matrix as “close” to singular as possible.

Mathematically, we define a cost matrix \(\mathbf{M}(\boldsymbol{\omega})\):

Equation 7

We then search for the angular velocity \(\boldsymbol{\omega}^*\) that minimizes the smallest eigenvalue (\(\lambda_{\min}\)) of this matrix:

Equation 8

In simpler terms: We tune the rotation parameters until the geometric constraint (incidence) is satisfied as perfectly as possible.


Method 2: The Coplanarity Relation (Geometry of Normal Flow)

The authors propose a second, alternative method. Instead of using raw event locations, this method uses Normal Flow.

In event vision, “normal flow” describes the motion of an edge perpendicular to itself. If you imagine a straight line moving across a screen, you can easily tell how fast it’s moving sideways, but not how fast it’s sliding along its own length. This perpendicular motion is the normal flow.

The Geometry

Consider Figure 2. It shows how events generate a plane.

  • For an event \(e_j\), we can calculate a plane normal vector \(\mathbf{n}_j\).
  • This vector is derived from the event’s position and its normal flow \(\mathbf{g}_j\).

Figure 2. Coplanarity relation between plane normal vectors. Plane normal n_j can be computed from the event e_j and its normal flow g_j. The line direction vector h_j in the image plane is perpendicular with g_j within the image plane. The line L is orthogonal to the plane normal set {n’_j}.

Here is the key insight: All plane normals generated by the same moving 3D line must be perpendicular to the direction of that 3D line.

If \(\mathbf{d}\) is the direction of the 3D line, and \(\mathbf{n}'_j\) are the normal vectors (corrected for camera rotation), then their dot product must be zero. This creates a Coplanarity Relation.

The Mathematical Constraint

We stack all the normal vectors into a matrix \(\mathbf{B}\). The constraint states that the line direction \(\mathbf{d}\) is orthogonal to all rows of \(\mathbf{B}\):

Equation 12

Similar to the incidence method, this allows us to decouple the rotation from the translation. We construct a matrix \(\mathbf{N}\) based on the rotation-corrected normals:

Equation 14

And again, we optimize the angular velocity \(\boldsymbol{\omega}\) to minimize the smallest eigenvalue of this matrix.

Why Two Methods?

  • Incidence (Method 1) uses raw events. It is fundamental but leads to a larger optimization problem (6x6 matrix).
  • Coplanarity (Method 2) uses normal flow. It leads to a smaller, more efficient problem (3x3 matrix) but relies on the quality of the flow estimation.

The “Chicken and Egg” Optimization

We have a problem. To check if a rotation is correct, we need to build these matrices (\(\mathbf{A}\) or \(\mathbf{B}\)). But building these matrices requires rotating the event vectors using… the rotation we are trying to find!

To solve this non-linear optimization loop, the authors use the Adam optimizer, a popular algorithm in deep learning, but applied here to a geometric problem.

The “Cascade” Strategy

Calculating the exact rotation matrix \(\mathbf{R} = \exp([t\boldsymbol{\omega}]_\times)\) is computationally expensive because it involves trigonometric functions and matrix exponentials. To speed this up, the authors use a First-Order Approximation.

For small time intervals (which event cameras naturally have), the angle of rotation is tiny. We can approximate the rotation matrix as:

Equation 20

This linear approximation makes the math much faster. The researchers propose a Cascade approach:

  1. Initialize: Start with zero rotation.
  2. Rough Pass: Run the optimizer using the First-Order Approximation. This is fast and gets us close to the solution.
  3. Fine Tuning: Use the result from step 2 as the starting point for the Exact Rotation solver to get high precision.

Experiments and Results

Does it actually work? The researchers tested the solvers on both synthetic simulations and real-world datasets.

Robustness to Noise

Event cameras are noisy. Timestamps jitter, and pixels sometimes fire randomly. Figure 3 below shows how the error in angular (\(\varepsilon_{ang}\)) and linear (\(\varepsilon_{lin}\)) velocity changes as we add noise or change the amount of data.

Figure 3. The results on synthetic data demonstrate the relationship between errors and various factors, such as the number of events, number of lines, and noise levels.

  • Graphs (a) & (b): As the number of events increases (x-axis), the error (y-axis) drops rapidly.
  • Graphs (e) - (h): As noise increases, error increases, but the method remains stable.
  • Insight: The Incidence method (Blue) generally handles low-noise situations slightly better, while Coplanarity (Red) is competitive.

The Importance of Multiple Lines

You cannot solve this problem with just one line in the scene. A single line creates a “rotation-translation ambiguity”—you can’t tell if you are moving parallel to the line or rotating around it.

Figure 4 visualizes the “landscape” of the cost function (the value we are trying to minimize).

Figure 4. Landscape of the objective functions lambda_min. The events for each line is set as N = 100. For better visualization, the pseudocolor and colormap of the objectives use the logarithmic scale.

Look at column (a) “IncMin with 1 line.” The dark blue region (the minimum) is a long valley. There is no single distinct point; many different velocities look “correct.” Now look at column (c) and (e) where 2 or 3 lines are used. The valley becomes a bowl. There is a clear, single global minimum. This proves that observing at least 2 non-parallel lines is crucial for Full-DoF estimation.

Real-World Performance

The authors applied their method to the VECtor dataset. First, they extracted lines from the event stream.

Figure 5. Line cluster extraction from the desk-normal sequence in the VECtor dataset. (a) An event frame generated by accumulating events, where red and blue dots represent events with opposite polarities. (b) The corresponding image. (c) Results of line segment detection. (d) Line cluster extraction by associating events near the line segments.

Using these clusters, they estimated the velocity.

Table 2. Real-world experiment results. We report the median errors for epsilon_ang and epsilon_lin.

The results in Table 2 show that the method works on real data, with angular velocity errors around 0.2 and linear direction errors around 20 degrees. While this might seem high compared to systems with IMUs, remember: this is doing it the hard way—using only vision. It proves the concept is viable for scenarios where IMUs might fail or aren’t available.


Conclusion

This paper marks a significant step forward for event-based vision. By mathematically modeling the eventail (the space-time surface of a moving line), the authors created the first sparse geometric solvers capable of recovering Full-DoF egomotion without inertial sensors.

Key takeaways:

  1. Geometry beats assumption: We don’t need to assume rotation is known; we can solve for it using incidence or coplanarity constraints.
  2. Lines are powerful: Simple straight lines provide enough geometric constraints to decouple rotation from translation.
  3. Optimization matters: Using a cascade of approximate-then-exact optimization makes the problem solvable in real-time.

This work lays the foundation for fully autonomous event-based navigation systems that are lighter, simpler, and closer to how biological vision actually works.