Can Robots Feel? Teaching Dexterity Through Touch with KineDex

Robotic manipulation has come a long way. We have robots that can lift heavy payloads, weld cars with sub-millimeter precision, and even dance. But when it comes to the subtle art of the human hand—buttoning a shirt, cracking an egg without crushing it, or squeezing just the right amount of toothpaste—robots often fall short.

The missing link is tactile sensing. While computer vision gives robots “sight,” it doesn’t tell them how hard they are squeezing or if an object is slipping. To bridge this gap, a team of researchers has introduced KineDex, a new framework that teaches robots not just to move, but to feel.

Figure 1: We present KineDex, a framework for collecting tactile-enriched demonstrations via kinesthetic teaching and training tactile-informed visuomotor policies for dexterous manipulation.

In this post, we will dive deep into the KineDex paper. We will explore how researchers solved the “chicken-and-egg” problem of collecting tactile data, how they used visual inpainting to “erase” human teachers from the data, and how a force-aware policy allows robots to perform delicate tasks that were previously out of reach.

The Challenge: Why is Dexterity So Hard?

To understand the contribution of KineDex, we first need to look at the current state of robot learning. The most common way to teach a robot a skill is through Imitation Learning. You show the robot how to do a task (a demonstration), and it learns to copy you.

For simple grippers (claws), this is relatively easy. You can use a joystick or a VR controller to move the robot arm. However, for a dexterous hand (a robot hand with multiple multi-jointed fingers), the complexity explodes.

The Limits of Teleoperation

Most researchers use teleoperation to collect data. An operator wears a data glove or uses a vision-based hand tracker (like a VR headset) to control the robot hand remotely. While this works for geometry, it fails at physics.

  1. Kinematic Mismatch: Your hand and the robot hand rarely have the exact same dimensions or joint limits. Mapping your thumb movement to a robot thumb is mathematically messy and often inaccurate.
  2. The “Numb” Operator: This is the biggest issue. When you teleoperate a robot, you cannot feel what the robot touches. You might crush a paper cup because you didn’t realize how hard you were squeezing, or drop a pen because your grip was too loose.

Without high-fidelity tactile feedback during the teaching phase, the data collected is often flawed. If the teacher is clumsy (due to lack of feedback), the student (the robot) will be clumsy too.

Enter Kinesthetic Teaching

The alternative is Kinesthetic Teaching. This is a fancy term for “grabbing the robot and moving it yourself.” Instead of using a remote controller, you physically guide the robot’s limbs through the task.

This solves the feedback loop. If you are holding the robot hand while it holds an object, you can feel the resistance. You know exactly how much force is needed. However, traditionally, this has been impossible for dexterous hands. Robot fingers are small, crowded with motors, and hard to manipulate manually without blocking the robot’s sensors or cameras.

The KineDex Framework

KineDex (Kinesthetic Dexterity) proposes a novel hardware and software pipeline to solve these issues. It enables a “hand-over-hand” teaching paradigm where the human operator’s motion is directly transferred to the robot, ensuring that the demonstrations are physically grounded and rich in tactile data.

Figure 2: Overview of the KineDex framework. KineDex collects tactile-enriched demonstrations via kinesthetic teaching, where visual occlusions from the operator’s hand are removed through inpainting before policy training. The learned policy takes visual and tactile inputs to predict joint positions and contact forces, which are executed with force control for robust manipulation.

The framework operates in three distinct stages:

  1. Data Collection: Using a novel physical interface for kinesthetic teaching.
  2. Data Preprocessing: Cleaning up the visual data using AI inpainting.
  3. Policy Learning & Deployment: Training a neural network that understands both vision and touch, and executing it with force control.

Let’s break these down.

1. The “Hand-Over-Hand” Interface

The researchers equipped a dexterous robot hand with tactile sensors on the fingertips (120 sensing points per finger). To allow a human to drive this hand, they attached ring-shaped straps to the robot’s fingers.

The operator puts their fingers through these straps, effectively “wearing” the robot hand like a puppet.

  • For the fingers: The operator’s right hand guides the robot’s four fingers.
  • For the thumb: Due to the difference in placement between human and robot thumbs, the operator uses their left hand to guide the robot’s thumb.

This might sound cumbersome, but it offers a massive advantage: Direct Force Feedback. When the robot finger presses against a table, the operator feels that pressure immediately through the strap. This allows the operator to perform tasks with subtle force requirements, like twisting a cap or inserting a peg, naturally and efficiently.

2. The Invisibility Cloak: Solving Visual Occlusion

There is a major catch to kinesthetic teaching. If your hands are all over the robot, the camera recording the demonstration sees your hands, not just the robot and the object.

If you train a robot on this data, the robot will learn to expect a giant human hand to be present whenever it performs a task. During deployment, when the human steps away, the robot sees a different scene (no human hand) and the policy fails due to this “distribution shift.”

KineDex solves this using Video Inpainting. The team treats the human hand as visual noise that needs to be erased.

Figure 7: Data preprocessing pipeline for Peg Insertion.

As shown in the preprocessing pipeline above, the process works as follows:

  1. Raw Video: Captures the operator’s hands guiding the robot.
  2. Mask Generation: They use a model called Grounded-SAM to automatically detect and create a silhouette mask of the operator’s arms and hands.
  3. Inpainting: A video inpainting model (ProPainter) takes the video and the mask, and “fills in” the background behind the human hand. It hallucinates the pixels of the table or object that the human was blocking.

The result is a clean video that looks like the robot is moving autonomously. This allows the AI policy to learn from visuals that match what it will see when it acts alone.

3. Learning to Feel: The Policy Architecture

With clean video and rich tactile data collected, the researchers train a Visuomotor Policy. They utilize Diffusion Policy, a state-of-the-art method in robot learning that generates robot actions by refining random noise, similar to how image generators like Midjourney work.

The policy takes three inputs:

  1. Vision: The inpainted RGB images.
  2. Proprioception: The position of the robot’s joints.
  3. Tactile Sensing: A 3D force vector from the sensors on the fingertips.

However, the innovation isn’t just in the inputs, but in the outputs.

Force-Informed Actions

In standard robotics, a policy predicts Joint Positions (\(x_d\)). It tells the motors: “Move to angle 45 degrees.”

But for contact-rich tasks, position isn’t enough. If you tell a robot to move its finger to the surface of an egg, and there is a tiny calibration error, it might stop 1mm short (dropping the egg) or push 1mm too far (crushing the egg).

KineDex’s policy predicts Force Targets (\(f_d\)) alongside positions. It tells the motors: “Move to angle 45 degrees, AND exert 2 Newtons of force.”

4. Closing the Loop: The Force Controller

Predicting force is useless if you don’t have a control system to execute it. This is where the physics gets interesting.

Standard robots use a PD Controller (Proportional-Derivative). The control signal (\(u\)) is based on the error between where the robot is (\(x\)) and where it should be (\(x_d\)).

Equation 1: Standard PD Control Law

If the robot reaches the target position (\(x = x_d\)), the error is zero, and the motor stops applying torque. This is bad for holding things; you want the motor to keep squeezing.

To fix this, the researchers use the predicted force (\(f_d\)) to calculate a Virtual Target Position. They trick the controller. If the policy wants to apply force to an object, it sets the target position inside the object.

The modified target positions for the fingertip (\(x_d^{tip}\)) and the base (\(x_d^{base}\)) are calculated as:

Equation 2: Force-Informed Target Position Calculation

Here, \(K\) represents stiffness. By adding a term proportional to the desired force (\(f_d\)), the robot aims for a point beyond the contact surface. The PD controller sees a constant “error” (because the finger can’t physically penetrate the object) and generates a continuous, stable force against the surface.

Experimental Results

The researchers evaluated KineDex on a suite of 9 challenging tasks, ranging from picking up fragile eggs to plugging in chargers and squeezing toothpaste.

Figure 9: Executions of trained policies on nine contact-rich manipulation tasks.

Success Rates

The results were impressive. The system achieved an average success rate of 74.4% across all tasks. But the real insight comes from the comparisons (ablations).

The researchers tested a version of the system without Force Control (standard position control). The performance collapsed.

Table 1: Number of successful trials (out of 20) during inference for different methods.

As Table 1 shows, for tasks like “Cap Twisting,” the success rate dropped from 15/20 with KineDex to 2/20 without force control. Without the virtual displacement strategy, the fingers simply touched the cap and slipped, failing to generate enough friction to twist it.

They also tested a version without Tactile Input. For simple tasks like picking up a bottle, vision was enough. But for “Toothpaste Squeezing,” performance dropped significantly. This proves that for tasks involving occlusion (where the hand blocks the camera’s view of the object), the sense of touch is mandatory.

Efficiency: KineDex vs. Teleoperation

Is this actually better than just using VR controllers? The researchers set up a standard teleoperation rig to compare.

Figure 6: The overview of the teleoperation system setup.

They measured how long it took to collect successful demonstrations. The difference was stark.

Figure 4: Comparison of demonstration collection time between KineDex and teleoperation on the Bottle Picking and Syringe Pressing.

As shown in Figure 4, KineDex (blue bar) was significantly faster than Teleoperation (orange bar).

  • Syringe Pressing: Teleoperation took roughly twice as long per demo.
  • Bottle Picking: Teleoperation took three times as long.

Why? Because with teleoperation, the operator is constantly adjusting, squinting at screens, and trying not to crush the object. With KineDex, the operator just grabs the object and does the task naturally.

User Study

Finally, the team asked 5 participants to try both systems. The feedback was overwhelmingly in favor of KineDex.

Figure 5: Summary of user study results. Five participants used both the teleoperation system and KineDex to collect demonstrations. Pie charts summarize their feedback on key evaluation criteria.

Participants found KineDex easier to use (Chart d) and unanimously agreed that it helped collect more accurate tactile data (Chart b).

Conclusion and Implications

The KineDex paper highlights a fundamental truth in robotics: Hardware and data collection methods are just as important as the learning algorithms.

By designing a system that allows humans to transfer their innate dexterity directly to the robot—forces and all—the researchers bypassed the limitations of traditional teleoperation. They combined this with clever computer vision techniques (inpainting) to clean the data, and a force-aware control scheme to execute it.

The implications for the future are significant:

  1. Tactile-First Robotics: This work proves that tactile sensors aren’t just a “nice to have” add-on; they are essential for contact-rich tasks.
  2. Scalable Data Collection: If teaching a robot is as fast as doing the task yourself, we can collect the massive datasets needed to train general-purpose robot helpers.
  3. Complex Manipulation: Tasks like screwing caps, using tools, and handling soft objects are finally becoming reliable.

KineDex brings us one step closer to robots that don’t just see the world, but physically understand it.