UniTracker: Learning Universal Whole-Body Motion Tracker for Humanoid Robots

Shanghai Jiao Tong University1, Shanghai Artificial Intelligence Laboratory2, Shanghai Innovation Institute3,
Peking University4, Zhejiang University5, Fudan University6,
The Hong Kong University of Science and Technology (Guangzhou)7, ShanghaiTech University8

Abstract

Achieving generalizable whole-body motion control is essential for deploying humanoid robots in real-world environments. However, existing MLP-based policies trained under partial observations often suffer from limited expressiveness and struggle to maintain global consistency. These shortcomings manifest as less expressive motion, orientation drift, and poor generalization across diverse behaviors. To address these limitations, we propose UniTracker, a three-stage framework for scalable and adaptive motion tracking. The first stage learns a privileged teacher policy that produces high-fidelity reference actions. Building on this, the second stage trains a CVAE-based universal policy that captures a global latent representation of motion, enabling robust performance under partial observations. Crucially, we align the partial-observation prior with a full-observation encoder, injecting global intent into the latent space. In the final stage, a lightweight adaptation module fine-tunes the student policy on challenging sequences, supporting both per-instance and batch adaptation. We validate UniTracker in simulation and on a Unitree G1 humanoid robot, demonstrating superior tracking accuracy, motion diversity, and deployment robustness compared to current baselines.


Long Sequences Driven by a Single Universal Policy

Robot Behaviors Driven by a Single Universal Policy

high kick

kick the ball

air kick

mma side kick

punchboxing kick

hippop dance

dance in the rain and kick

dance and bend down

club dance

crouch walk and kick

deep squat

lateral slide step

crouch walk and kick

golf

crab walk

circle walk

run

stretch

Challenging Motions by Fast Adaption

martial art

Roundhorse Kick

side kick

back kick

Downstream Applications


1. Text-to-Motion Generation


"A person is punching forward. "

"A person is squating down. "

"A person is dancing the waltz. "

2. Video-based Estimation

Approach Overview


BibTeX

@misc{yin2025unitrackerlearninguniversalwholebody,
      title={UniTracker: Learning Universal Whole-Body Motion Tracker for Humanoid Robots}, 
      author={Kangning Yin and Weishuai Zeng and Ke Fan and Minyue Dai and Zirui Wang and Qiang Zhang and Zheng Tian and Jingbo Wang and Jiangmiao Pang and Weinan Zhang},
      year={2025},
      eprint={2507.07356},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2507.07356}, 
}