by Mohi Khansari

Mar 05, 2019

Learning Latent Plans from Play

Acquiring a diverse repertoire of skills on a single robot remains an open challenge in robotic learning. In this work, we proposed learning many skills at once from unstructured human "play" data as a way to scale up multitask learning. Here, a human tele-operates the robot and executes as many behaviors in the environment that they can think of, without any upfront task definition. Unlike conventional demonstration data, play requires no scene resets, segmentation, or task labeling which means large quantities of it can be collected rapidly. 

After data is collected, we proposed a scalable self-supervised relabeled imitation learning algorithm for learning goal-directed control on top: given any arbitrary start image and future "goal" image of the scene sampled from play, train a policy to output the actions that got it from start to goal as witnessed during play. After self-supervising in this manner for 7 hours of play, a single robot was able to execute 18 different goal-directed manipulation behaviors in a simulated tabletop setting, like opening and closing doors and drawers, pressing various buttons, and picking and placing blocks. We additionally found that play-supervised models, unlike their conventional demonstration-trained counterparts, were more robust to perturbations at test time and exhibited retrying-till-success behaviors.

Read the full paper and visit the website.