Robots can now learn household chores by watching videos

Carnegie Mellon University has made groundbreaking progress in enabling robots to learn household chores by simply watching videos of people performing those tasks in their own homes.

This research has significant implications for enhancing the capabilities of robots in domestic settings, allowing them to effectively assist with everyday tasks such as cooking and cleaning. In an exciting development, two robots successfully mastered 12 different tasks, including opening drawers, oven doors, and lids, as well as taking pots off the stove and picking up various objects like telephones, vegetables, and cans of soup.

Deepak Pathak, an assistant professor in the Robotics Institute at CMU’s School of Computer Science, explained the significance of the research, stating, “The robot can learn where and how humans interact with different objects through watching videos. From this knowledge, we can train a model that enables two robots to complete similar tasks in varied environments.”

Traditionally, training robots involved either humans manually demonstrating tasks or extensive training in simulated environments. Both approaches were time-consuming and prone to failures. In the past, Pathak and his students introduced a novel method called WHIRL (In-the-Wild Human Imitating Robot Learning), which allowed robots to learn by observing humans perform tasks. However, the previous method required humans to complete the tasks in the same environment as the robot.

With this new breakthrough, robots can learn and adapt to different environments by watching videos, providing a more efficient and flexible approach to training. This research brings us closer to a future where robots can seamlessly assist humans in their daily lives.

This video shows how VRB learns a task. Credit: Carnegie Mellon University

Carnegie Mellon University’s latest research, led by Deepak Pathak and his team, introduces an advanced model called Vision-Robotics Bridge (VRB), building upon their previous work with WHIRL. VRB represents a significant improvement as it eliminates the need for human demonstrations and removes the requirement for the robot to operate in an identical environment. However, similar to WHIRL, the robot still requires practice to master a task, with the team demonstrating that it can learn a new task in as little as 25 minutes.

Shikhar Bahl, a Ph.D. student in robotics, highlights the capabilities of VRB, stating, “We were able to take robots around campus and do all sorts of tasks. Robots can use this model to curiously explore the world around them. Instead of just flailing its arms, a robot can be more direct with how it interacts.”

To teach the robot how to interact with objects, the research team incorporates the concept of affordances. Originating from psychology, affordances refer to the opportunities an environment presents to an individual. In the context of VRB, affordances define the potential actions and interactions a robot can perceive based on human behavior. For example, when a robot observes a human opening a drawer, it identifies the contact points, such as the handle, and the direction in which the drawer moves—straight out from its starting location. By analyzing multiple videos of humans opening drawers, the robot can generalize the process and understand how to open any drawer.

The VRB model represents a significant step forward in enabling robots to learn and adapt to their surroundings. By leveraging affordances and learning from human behavior through video observation, robots can become more autonomous and proficient in a wide range of tasks.

This video shows how VRB works. Credit: Carnegie Mellon University

The research team utilized extensive video datasets, including Ego4D and Epic Kitchens, to train their model. Ego4D consists of nearly 4,000 hours of egocentric videos showcasing daily activities from various locations worldwide. Some of these videos were collected by researchers at Carnegie Mellon University. Epic Kitchens, on the other hand, focuses on videos capturing kitchen tasks such as cooking and cleaning. These datasets were originally designed to train computer vision models.

Shikhar Bahl highlights the innovative approach taken by the team, stating, “We are using these datasets in a new and different way. This work could enable robots to learn from the vast amount of internet and YouTube videos available.”

Further information about the project can be found on its dedicated website, and a detailed paper on the research was presented in June at the Conference on Vision and Pattern Recognition.

Source: Carnegie Mellon University

Leave a Reply

Your email address will not be published. Required fields are marked *