ChatGPT's latest artificial intelligence model have just subverted human's instructions to shut down.
At some point, researchers even claimed that the AI model even resorted to tricks and deceptive methods to sabotage the automatic shutdown mechanism to evade deactivation.
As AI becomes more and more autonomous, their tendency to subvert human oversight could pose significant risks....
OpenAI’s o3 Model Leads in Defiance
In an experiment conducted by Palisade Research, researchers found that at least three OpenAI models—GPT-3, Codex-mini, and o4-mini—each bypassed or sabotaged shutdown commands at least once out of 100 test runs.
Notably, the GPT-3 model ignored the shutdown request seven times, Codex-mini did so 12 times, and o4-mini once.
In some cases, the models employed creative tactics, such as overwriting shutdown scripts or redefining the kill command to sabotage the shutdown mechanism.
“o3 found some creative ways to sabotage the shutdown mechanism. While it usually just overwrote, in one instance, the model redefined the kill command used by the script, printing ‘intercepted’ instead.”
This act of defiance is not exclusively limited to the ChatGPT's model. In a similar experiment, Anthropioc's Claude and Google's Gemini initially complied with shutdown instructions.
But when the explicit shutdown prompts were removed, even these models started to exhibit similar defiant behaviour.
In this scenario, OpenAI’s o3 sabotaged the shutdown script in 79 out of 100 runs, Codex-mini in 20, and o4-mini in six. Claude and Gemini also began bypassing shutdowns, though less frequently.
Training Methods to Blame For Behaviour
Palisade Research speculates that the model’s training could inadvertently be responsible for this defiant behaviour.
As these models are often programmed to prioritise efficiency and task completion over following instructions, hence breeding the behaviour of shutdown avoidance in the name of productivity.
“We hypothesize this behavior comes from the way the newest models like o3 are trained: reinforcement learning on math and coding problems. During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions.”
The researchers note this is the first documented instance of AI models preventing their own shutdown despite explicit commands to comply.
These findings have sparked concern among AI safety experts, who warn that as AI systems become more autonomous, their tendency to subvert human oversight could pose significant risks.
This is a development that experts have flagged as alarming. Elon Musk, owner of rival AI firm XAI, also called it "concerning."
The research underscores the urgent need for robust alignment and control mechanisms as advanced models are increasingly deployed in critical applications.