In the film “Top Gun: Maverick,” Tom Cruise’s character, Maverick, is tasked with training young pilots to undertake a challenging mission. They must navigate their jets through a narrow canyon, flying at low altitudes to avoid radar detection and then make a rapid ascent without colliding with the canyon walls. While Maverick successfully accomplishes this feat with his human pilots, it poses a significant challenge for autonomous aircraft.
Unlike humans, machines struggle with what is known as the stabilize-avoid problem. This conflict arises when the most direct path to the target contradicts the machine’s need to avoid obstacles or maintain stealth. Existing AI methods have difficulty resolving this conflict, making it unsafe for them to reach their objective.
Researchers at MIT have developed a novel technique that can effectively address complex stabilize-avoid problems, surpassing other existing methods. Their machine-learning approach achieves or surpasses the safety standards of current methods while providing a significant tenfold increase in stability. This means that the AI agent can reliably reach its intended destination and remain within the desired goal region.
In a remarkable experiment reminiscent of Maverick’s daring maneuvers, the researchers successfully guided a simulated jet aircraft through a narrow corridor without crashing into the ground.
“This problem has posed a significant challenge for a long time. Many researchers have attempted to tackle it, but the high-dimensional and complex dynamics involved made it difficult to find a solution,” explains Chuchu Fan, the Wilson Assistant Professor of Aeronautics and Astronautics at MIT and senior author of the research paper.
Oswin So, a graduate student, serves as the lead author of the paper, which will be presented at the Robotics: Science and Systems conference taking place in Korea from July 10 to 14. The paper is also available on the arXiv pre-print server.
The stabilize-avoid challenge
To address the challenge of complex stabilize-avoid problems, various approaches have been attempted. One common strategy involves simplifying the system to apply straightforward mathematical techniques, but these simplified approaches often fail to capture the intricacies of real-world dynamics.
More effective techniques employ reinforcement learning, a machine-learning method where an agent learns through trial and error, guided by a reward system that encourages behaviors leading to the desired goal. However, in the case of stabilize-avoid problems, there are two distinct goals: maintaining stability and avoiding obstacles, making it challenging to strike the right balance.
The MIT researchers took a novel approach by breaking down the problem into two steps. First, they redefined the stabilize-avoid problem as a constrained optimization problem. By formulating the problem in this way, solving the optimization enables the agent to reach its goal while remaining stable within a specified region. Constraints are then applied to ensure obstacle avoidance, as explained by So.
In the second step, the researchers transformed the constrained optimization problem into a mathematical representation called the epigraph form. They used a deep reinforcement learning algorithm to solve this form, bypassing the difficulties faced by other methods when applying reinforcement learning.
However, using deep reinforcement learning directly for the epigraph form was not feasible since it is not designed for such optimization problems. Therefore, the researchers had to derive new mathematical expressions specifically tailored to their system. These new derivations were then combined with existing engineering techniques utilized by other methods, as So explains.
By employing this two-step approach and integrating mathematical expressions designed for their system, the researchers achieved success in tackling the complex stabilize-avoid problem, surpassing the limitations of existing methods.
No points for second place
To evaluate their approach, the researchers conducted a series of control experiments using different initial conditions. They devised simulations where the autonomous agent had to reach and remain within a goal region while executing drastic maneuvers to avoid imminent collisions with obstacles.
When compared to several baseline methods, the researchers’ approach was the only one capable of stabilizing all trajectories while ensuring safety. To further challenge their method, they employed it to pilot a simulated jet aircraft in a scenario akin to a “Top Gun” movie. The objective was for the jet to stabilize at a target near the ground, maintaining a low altitude within a narrow flight corridor.
This particular jet model, which had been made available to the public in 2018, was intentionally designed as a testing challenge by flight control experts. The MIT researchers’ controller successfully prevented crashes and stalls while exhibiting superior stabilization performance compared to the baseline methods.
In the future, this technique could serve as a foundation for developing controllers for highly dynamic robots that require safety and stability guarantees, such as autonomous delivery drones. It could also be integrated as part of a larger system, activating the algorithm when a car skids on a snowy road to assist the driver in safely regaining control.
The researchers emphasize that their approach excels in navigating extreme scenarios that surpass human capabilities. They envision providing reinforcement learning with the necessary safety and stability guarantees for deploying controllers in mission-critical systems, considering it a promising step towards that goal.
Moving forward, the researchers aim to enhance their technique to better account for uncertainty in optimization-solving and explore its performance when deployed on hardware, acknowledging the disparities between the model dynamics and real-world dynamics.
Stanley Bak, an assistant professor at Stony Brook University, who was not involved in the research, commends Professor Fan’s team for improving reinforcement learning’s performance in safety-critical dynamical systems. He highlights that their refined formulation enables the generation of secure controllers for complex scenarios, including a 17-state nonlinear jet aircraft model developed with the collaboration of researchers from the Air Force Research Lab (AFRL), which incorporates nonlinear differential equations with lift and drag tables.