Abstract
Stabilizing unsecured payloads against the inherent oscillations of dynamic bipedal locomotion
remains a critical engineering bottleneck for humanoids in unstructured environments. To solve
this, we introduce ReST-RL, a hierarchical reinforcement learning architecture that explicitly
decouples locomotion from payload stabilization, evaluated via the SteadyTray benchmark. Rather than relying on monolithic end-to-end learning, our framework integrates a robust
base locomotion policy with a dynamic residual module engineered to actively cancel gait-induced
perturbations at the end-effector. This architectural separation ensures steady tray transport without degrading
the underlying bipedal stability. In simulation, the residual design significantly outperforms end-to-end baselines
in gait smoothness and orientation accuracy, achieving a 96.9% success rate in variable velocity
tracking and 74.5% robustness against external force disturbances. Successfully deployed on the
Unitree G1 humanoid hardware, this modular approach demonstrates highly reliable zero-shot
sim-to-real generalization across various objects and external force disturbances.