Achieving human-level competitive intelligence and physical agility in
humanoid robots remains a major challenge, particularly in contact-rich and
highly dynamic tasks such as boxing. While Multi-Agent Reinforcement Learning
(MARL) offers a principled framework for strategic interaction, its direct
application to humanoid control is hindered by high-dimensional contact dynamics and the absence of strong physical motion priors. We propose RoboStriker,
a hierarchical three-stage framework that enables fully autonomous humanoid
boxing by decoupling high-level strategic reasoning from low-level physical
execution. The framework first learns a comprehensive reper- toire of boxing
skills by training a single-agent motion tracker on human motion capture data.
These skills are subsequently distilled into a structured latent manifold,
regularized by projecting the Gaussian-parameterized distribution onto a unit
hypersphere. This topological constraint effectively confines exploration to
the subspace of physically plausible motions. In the final stage, we introduce
Latent-Space Neural Fictitious Self- Play (LS-NFSP), where competing agents
learn competitive tactics by interacting within the latent action space
rather than the raw motor space, significantly stabilizing multi-agent
training. Experimental results demonstrate that RoboStriker achieves superior
competitive performance in simulation and exhibits sim-to-real transfer.