SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning

UC San Diego ArcLab
*†Equal Contribution

ReST-RL enables a Unitree G1 humanoid to perform the SteadyTray (transporting unsecured objects on a tray with a bipedal humanoid robot) task in a real-world setting.

Abstract

Stabilizing unsecured payloads against the inherent oscillations of dynamic bipedal locomotion remains a critical engineering bottleneck for humanoids in unstructured environments. To solve this, we introduce ReST-RL, a hierarchical reinforcement learning architecture that explicitly decouples locomotion from payload stabilization, evaluated via the SteadyTray benchmark. Rather than relying on monolithic end-to-end learning, our framework integrates a robust base locomotion policy with a dynamic residual module engineered to actively cancel gait-induced perturbations at the end-effector. This architectural separation ensures steady tray transport without degrading the underlying bipedal stability. In simulation, the residual design significantly outperforms end-to-end baselines in gait smoothness and orientation accuracy, achieving a 96.9% success rate in variable velocity tracking and 74.5% robustness against external force disturbances. Successfully deployed on the Unitree G1 humanoid hardware, this modular approach demonstrates highly reliable zero-shot sim-to-real generalization across various objects and external force disturbances.

Method Overview

Method Overview
Overview of the ReST-RL which augments a pre-trained locomotion base policy with a residual module comprising (i) an encoder over privileged robot and payload related observations and (ii) an adapter that outputs corrective residual actions. Two residual designs are considered: Residual Action Adapter and Residual FiLM Adapter.
Distillation
Distillation pipeline in which only the encoder is distilled while the adapter remains frozen.

Stable Payload Transport

ReST-RL maintains a level tray during transport, preventing fluid sloshing, glass tipping, and payload drop.

Baseline

Ours

Balancing Various Objects

ReST-RL generalizes to various real-world objects with different mass distributions, geometries, and physical properties, without requiring additional retraining or fine-tuning.

Coffee Cup

Wine Glass

Food Container

Medical and Surgical Tools

Balancing Object under Disturbance

ReST-RL shows timely whole-body recovery behaviors to re-stabilize the tray and prevent payload tipping under external disturbances.

Kick Robot

Push Object

Simulation (Restablizing Object)

BibTeX