Achieving generalizable whole-body motion control is essential for deploying humanoid robots in real-world environments. However, existing MLP-based policies trained under partial observations often suffer from limited expressiveness and struggle to maintain global consistency. These shortcomings manifest as less expressive motion, orientation drift, and poor generalization across diverse behaviors.
To address these limitations, we propose UniTracker, a three-stage framework for scalable and adaptive motion tracking. The first stage learns a privileged teacher policy that produces high-fidelity reference actions. Building on this, the second stage trains a CVAE-based universal policy that captures a global latent representation of motion, enabling robust performance under partial observations. Crucially, we align the partial-observation prior with a full-observation encoder, injecting global intent into the latent space. In the final stage, a lightweight adaptation module fine-tunes the student policy on challenging sequences, supporting both per-instance and batch adaptation.
We validate UniTracker in simulation and on a Unitree G1 humanoid robot, demonstrating superior tracking accuracy, motion diversity, and deployment robustness compared to current baselines.