Researchers from MIT and Technion, the Israel Institute of Technology, have developed an algorithm that addresses the challenge of knowing when a machine learning model should mimic a teacher and when it should explore on its own. The algorithm combines imitation learning, where the model imitates the teacher’s actions, with reinforcement learning, where the model learns through trial and error.
Similar to a tennis student learning from a skilled teacher, the algorithm allows the machine learning model to diverge from mimicking the teacher when the teacher’s performance is either too advanced or not effective enough. The model can then return to imitating the teacher at a later stage of the training process if it is likely to yield better and faster results.
In simulations, the researchers found that their dynamic approach, combining trial-and-error learning with imitation learning, resulted in more effective learning compared to methods that used only one type of learning. This method has the potential to enhance the training process for machines deployed in uncertain real-world scenarios, such as training a robot to navigate unfamiliar environments.
Idan Shenfeld, an electrical engineering and computer science graduate student and the lead author of the paper, highlights the power of combining trial-and-error learning with following a teacher. This approach enables the algorithm to solve challenging tasks that cannot be tackled by either technique alone.
The research, conducted by Shenfeld and co-authors Zhang-Wei Hong, Aviv Tamar, and Pulkit Agrawal, will be presented at the International Conference on Machine Learning. Pulkit Agrawal is the director of Improbable AI Lab and an assistant professor in the Computer Science and Artificial Intelligence Laboratory at MIT.
Striking a balance
Existing methods for balancing imitation learning and reinforcement learning often rely on brute force trial-and-error approaches, which are computationally expensive and inefficient. In contrast, the researchers took a different approach to address this challenge.
They trained two students: one using a combination of reinforcement learning and imitation learning, and another using only reinforcement learning. The key idea was to dynamically adjust the weighting of the two learning objectives for the first student. To achieve this, the algorithm continuously compared the performance of the two students.
If the student using imitation learning was performing better, the algorithm increased the emphasis on imitation learning for training. However, if the student relying solely on trial and error showed improvement, the algorithm focused more on reinforcement learning. This dynamic adjustment allowed the algorithm to adapt and choose the most effective technique during the training process.
The innovation of connecting the two students and enabling information sharing was a significant challenge in developing the algorithm. The researchers realized that training the students independently was not effective, and they needed to find a technical grounding to implement the intuition of information sharing.
According to Idan Shenfeld, this adaptive algorithm outperforms non-adaptive methods and offers more effective teaching capabilities. The research team aimed to create algorithms that are principled, require minimal parameter tuning, and achieve high performance, driving their innovative approach.
Solving tough problems
To evaluate their approach, the researchers conducted multiple teacher-student training experiments in simulated environments. One such experiment involved navigating a maze with lava obstacles, where the teacher had access to the entire map while the student could only see a limited view. The algorithm achieved an almost perfect success rate across various testing environments and demonstrated significantly faster learning compared to other methods.
To further challenge their algorithm, the researchers designed a simulation involving a robotic hand with touch sensors but no vision. The task was to reorient a pen to the correct pose, with the teacher having access to the pen’s actual orientation while the student relied solely on touch sensors. In this scenario, their method outperformed approaches using only imitation learning or reinforcement learning.
The Improbable AI Lab, where the research was conducted, envisions future home robots capable of complex object manipulation and locomotion. The teacher-student learning approach has shown promise in training robots in simulation and transferring the learned skills to the real world. However, current methods often overlook the student’s inability to precisely mimic the teacher, limiting performance. The new method developed by the researchers opens up possibilities for building more advanced robots.
In addition to robotics, the algorithm has potential applications in various domains where imitation or reinforcement learning is employed. For instance, it could be used to train smaller models to excel in specific tasks by leveraging a larger model as the teacher. Exploring the similarities and differences between machines and humans learning from their respective teachers is another interesting avenue for further research.
Experts not involved in the study have praised the robustness of the method across different parameter choices and its promising results in various domains. The potential applications include memory and reasoning problems involving different sensory modalities like tactile sensing. The researchers’ work is seen as a step toward leveraging prior computational work in reinforcement learning and simplifying the process of incorporating learned policies into reinforcement learning frameworks.