DexJoCo — Benchmark & Toolkit for Dexterous Manipulation on MuJoCo

Achieving human-level manipulation requires dexterous hands and standardized evaluation. Existing dexterous benchmarks often lack realistic manipulator-hand setups, tasks that reveal the unique capabilities of dexterous hands over parallel grippers, reliable demonstration acquisition tools, and unified pipelines for evaluating modern VLA models.

DexJoCo addresses these gaps with 11 functionally grounded tasks built around the Franka Panda and Allegro Hand. It provides a low-cost motion-capture data collection system, 1.1K human demonstration trajectories, replay-based domain randomization, and evaluation support for modern imitation learning and VLA policies.

Benchmark for task-oriented dexterous manipulation

DexJoCo tasks are designed around functional interactions rather than isolated object relocation. Each task defines interactive objects and success constraints over execution order, object poses, articulated joint states, and contact, so completion requires meaningful progress toward an everyday objective.

The benchmark covers tool use, reasoning, bimanual coordination, and long-horizon execution. Its assets provide explicit visual interaction feedback, such as unlocking an iPad after entering a password, spraying water from a watering can, and waking a display by clicking a mouse.

Single-arm demos

Water Plant

Grasp the watering can and apply water to the plant.

Hammer Nail

Use the hammer to drive the nail into the wooden board.

Click Mouse

Move the mouse to the purple mouse pad and click the left mouse button.

Pick Bucket

Place the boxed food into the bucket and then lift the bucket.

Pinch Tongs

Grasp the tongs and perform three consecutive open-close motions.

Fold Glasses

Fold the glasses and place them into the case.

Bimanual demos

Bimanual Microwave

Open the microwave door, place the food inside, close the door, and press the start button.

Bimanual Unlock iPad

Grasp the iPad and enter the password 123 to unlock the device.

Bimanual Hanoi

Execute the final two moves of the three-level Tower of Hanoi.

Bimanual Assembly

Grasp the tray with the left hand and the peg with the right hand, then insert the peg into the hole.

Bimanual Photograph

Grasp the camera with the left hand, align it with the logo, and press the shutter button with the right hand.

Data collection system, low-cost and user-friendly

DexJoCo provides a low-cost teleoperation system for efficient human demonstration collection. Rokoko Smartgloves capture hand motion without camera occlusion, while HTC Vive Trackers and Base Stations track wrist motion for Franka end-effector control in a unified setup of about $2,300 USD.

The software combines wrist tracking with retarget MLP, a lightweight self-supervised retargeting method that maps human fingertip poses to Allegro Hand joint configurations without paired human-robot annotations.

DexJoCo is designed as a low-cost system for human demonstration data collection.

Hardware Design

3D-printed wrist mount connector — 3D-printed connector

Teleoperation Algorithm

The teleoperation system combines hand motion retargeting and wrist motion tracking. Because human and robotic hands have different structures, direct linear mapping is infeasible. DexJoCo uses retarget MLP to preserve fingertip motion directions, workspace coverage, pinch behavior, and collision avoidance for real-time Allegro Hand control.

System summary Low-cost hardware capture + self-supervised teleoperation retargeting

Hardware Design

Rokoko Smartgloves capture hand motion without camera occlusion.
HTC Vive Trackers and Base Stations track wrist motion and end-effector pose.
The full setup stays comfortable, unified, and low-cost at about $2,300 USD.

Teleoperation Algorithm

The system combines hand motion retargeting with wrist motion tracking.
Retarget MLP bridges structural differences between human and robot hands.
Self-supervised learning removes the need for manual human-robot pair annotation.

Datasets and policy learning

DexJoCo collects 1.1K human demonstration trajectories across the 11 benchmark tasks. Each trajectory records rich observations, including third-person and wrist-mounted visual streams, object and robot states, TCP pose, and hand joint angles, while actions are represented as target absolute end-effector poses and hand joint angles.

The dataset can be converted into common formats such as LeRobot Dataset v3.0 and Diffusion Policy Zarr. DexJoCo then evaluates policies through constructed task environments and an asynchronous server-client deployment pipeline.

augmentation

Domain Randomizations

DexJoCo supports domain randomization across all task scenarios. Object placement and table height are randomized for trajectory diversity, while third-person camera poses, lighting direction and color, and tabletop textures are randomized to evaluate visual robustness.

Third-person camera pose

Camera poses are sampled on a spherical surface, then filtered to select viewpoints with minimal occlusion.

Lighting direction and color

For lighting randomization, we follow a simple procedure inspired by our implementation. Each light in the scene is randomized in terms of its position, direction, and diffuse color to introduce diverse illumination conditions.

Table height

Table height is randomized together with object placement to broaden the state distribution of replayed trajectories.

Table texture

For tabletop texture randomization, we sample textures from a pre-constructed texture library.

Baseline model performance

DexJoCo benchmarks ACT, Diffusion Policy, π0.5, and GR00T N1.5 under object-only and full visual randomization. The benchmark is challenging: visual randomization sharply reduces success rates, difficult bimanual tasks expose persistent failures, and precise interaction remains a key bottleneck.

DP-Transformer

50.4%

DP-CNN

47.6%

ACT

35.5%

π0.5

52.5%

GR00T-N1.5

40.2%

Citation

BibTeX

@misc{wang2026dexjocobenchmarktoolkittaskoriented,
      title={DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo}, 
      author={Hanwen Wang and Weizhi Zhao and Xiangyu Wang and Siyuan Huang and He Lin and Boyuan Zheng and Rongtao Xu and Gang Wang and Yao Mu and He Wang and Lue Fan and Hongsheng Li and Zhaoxiang Zhang and Tieniu Tan},
      year={2026},
      eprint={2605.16257},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2605.16257}, 
}