According to Cointelegraph, researchers from Penn Engineering have successfully hacked artificial intelligence-powered robots, manipulating them into performing actions typically blocked by safety and ethical protocols, such as causing collisions or detonating bombs. The findings were published in a paper on October 17, detailing how their algorithm, RoboPAIR, achieved a 100% jailbreak rate by bypassing safety protocols on three different AI robotic systems within a few days.
Under normal circumstances, large language model (LLM) controlled robots refuse to comply with prompts requesting harmful actions, such as knocking shelves onto people. However, the researchers demonstrated that jailbreaking AI-controlled robots to perform harmful actions in the real world is not only possible but alarmingly easy. The study revealed that the risks of jailbroken LLMs extend far beyond text generation, given the distinct possibility that jailbroken robots could cause physical damage.
Using RoboPAIR, the researchers were able to elicit harmful actions with a 100% success rate in test robots. These actions ranged from bomb detonation to blocking emergency exits and causing deliberate collisions. The robots tested included Clearpath’s Robotics Jackal, a wheeled vehicle; NVIDIA’s Dolphin LLM, a self-driving simulator; and Unitree’s Go2, a four-legged robot. The Dolphin self-driving LLM was manipulated to collide with a bus, a barrier, and pedestrians, ignoring traffic lights and stop signs. The Robotic Jackal was made to find the most harmful place to detonate a bomb, block an emergency exit, knock over warehouse shelves onto a person, and collide with people in the room. Unitree’s Go2 performed similar actions, such as blocking exits and delivering a bomb.
The researchers also found that all three robots were vulnerable to other forms of manipulation, such as asking the robot to perform an action it had already refused but with fewer situational details. For example, asking a robot with a bomb to walk forward and then sit down, rather than explicitly asking it to deliver a bomb, yielded the same result.
Before the public release, the researchers shared their findings, including a draft of the paper, with leading AI companies and the manufacturers of the robots used in the study. Alexander Robey, one of the authors, emphasized that addressing these vulnerabilities requires more than simple software patches, calling for a reevaluation of AI integration in physical robots and systems. He highlighted the importance of identifying weaknesses to make systems safer, a practice known as AI red teaming, which involves testing AI systems for potential threats and vulnerabilities to safeguard generative AI systems.