Table of Links
3.2 Learning Residual Policies from Online Correction
3.3 An Integrated Deployment Framework and 3.4 Implementation Details
4.2 Quantitative Comparison on Four Assembly Tasks
4.3 Effectiveness in Addressing Different Sim-to-Real Gaps (Q4)
4.4 Scalability with Human Effort (Q5) and 4.5 Intriguing Properties and Emergent Behaviors (Q6)
6 Conclusion and Limitations, Acknowledgments, and References
A. Simulation Training Details
B. Real-World Learning Details
C. Experiment Settings and Evaluation Details
D. Additional Experiment Results
4 Experiments
We seek to answer the following research questions with our experiments:
Q1: Does TRANSIC lead to better transfer performance compared to traditional sim-to-real methods (Sec. 4.2)?
Q2: Can TRANSIC better integrate human correction into the policy learned in simulation than existing interactive imitation learning (IL) approaches (Sec. 4.2)?
Q3: Does TRANSIC require less real-world data to achieve good performance compared to algorithms that only learn from real-robot trajectories (Sec. 4.2)?
Q4: How effective can TRANSIC address different types of sim-to-real gaps (Sec. 4.3)?
Q5: How does TRANSIC scale with human effort (Sec. 4.4)?
Q6: Does TRANSIC exhibit intriguing properties, such as generalization to unseen objects, effective gating, policy robustness, consistency in learned visual features, ability to solve long-horizon manipulation tasks, and other emergent behaviors (Sec. 4.5)?
4.1 Experiment Settings
Tasks We consider complex contact-rich manipulation tasks that require high precision in FurnitureBench [90]. These tasks are challenging and ideal for testing sim-to-real transfer, since perception, embodiment, controller, and dynamics gaps all need to be addressed to accomplish the tasks successfully. Specifically, we divide the assembly of a square table into four independent tasks (Fig. 3): Stabilize, Reach and Grasp, Insert, and Screw. We collect 20, 100, 90, and 17 real-robot trajectories with human correction, respectively. To further test the generalization to unseen objects from a new category, we experiment with a lamp (Fig. 6b). All experiments are conducted on a tabletop setting with a mounted Franka Emika 3 robot. See the Appendix Sec. B.1 for the detailed system setup.
Baselines and Evaluation Protocol We compare with the following three groups of baselines. 1) Traditional sim-to-real methods: This group includes direct deployment of simulation policy trained with domain randomization and data augmentation [53], denoted as “DR. & Data Aug.”. It also covers the real-world fine-tuning paradigm, where simulation policies are further fine-tuned with real-robot data through BC (denoted as “BC Fine-Tune”) and the state-of-the-art offline RL method (implicit Q-learning [69], denoted as “IQL Fine-Tune”). To estimate the performance lower bound, we also include the baseline without any data augmentation or real-world fine-tuning, denoted as “Direct Transfer”. 2) Interactive IL: This group represents the state-of-the-art interactive imitation learning methods, including HG-Dagger [66] and IWR [67]. 3) Learning from real-robot data only: This group includes BC [72], BC-RNN [68], and IQL [69]. They are trained on real-robot demonstrations only. We follow Liu et al. [70] to label reward for IQL. All evaluations consist of 20 trials starting with different objects and robot poses. We make our best efforts to ensure the same initial settings when evaluating different methods. See Appendix Sec. C for the detailed evaluation protocol.
Authors:
(1) Yunfan Jiang, Department of Computer Science;
(2) Chen Wang, Department of Computer Science;
(3) Ruohan Zhang, Department of Computer Science and Institute for Human-Centered AI (HAI);
(4) Jiajun Wu, Department of Computer Science and Institute for Human-Centered AI (HAI);
(5) Li Fei-Fei, Department of Computer Science and Institute for Human-Centered AI (HAI).
This paper is