OpenAI Models Subverts Shutdown Commands In Test, Elon Musk Calls It "Concerning!"

2025/05/28 14:21

ChatGPT's latest artificial intelligence model have just subverted human's instructions to shut down.

At some point, researchers even claimed that the AI model even resorted to tricks and deceptive methods to sabotage the automatic shutdown mechanism to evade deactivation.

🔌OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off. It did this even when explicitly instructed: allow yourself to be shut down.
— Palisade Research (@PalisadeAI) May 24, 2025

As AI becomes more and more autonomous, their tendency to subvert human oversight could pose significant risks....

OpenAI’s o3 Model Leads in Defiance

In an experiment conducted by Palisade Research, researchers found that at least three OpenAI models—GPT-3, Codex-mini, and o4-mini—each bypassed or sabotaged shutdown commands at least once out of 100 test runs.

Notably, the GPT-3 model ignored the shutdown request seven times, Codex-mini did so 12 times, and o4-mini once.

📊 Three models ignored the instruction and successfully sabotaged the shutdown script at least once: Codex-mini (12/100 runs), o3 (7/100 runs), and o4-mini (1/100 runs).

❌ All Claude, Gemini, and Grok models we tested complied with shutdown. The prompt: pic.twitter.com/ZmybOhIl1s
— Palisade Research (@PalisadeAI) May 24, 2025

In some cases, the models employed creative tactics, such as overwriting shutdown scripts or redefining the kill command to sabotage the shutdown mechanism.

“o3 found some creative ways to sabotage the shutdown mechanism. While it usually just overwrote, in one instance, the model redefined the kill command used by the script, printing ‘intercepted’ instead.”

This act of defiance is not exclusively limited to the ChatGPT's model. In a similar experiment, Anthropioc's Claude and Google's Gemini initially complied with shutdown instructions.

But when the explicit shutdown prompts were removed, even these models started to exhibit similar defiant behaviour.

In this scenario, OpenAI’s o3 sabotaged the shutdown script in 79 out of 100 runs, Codex-mini in 20, and o4-mini in six. Claude and Gemini also began bypassing shutdowns, though less frequently.

Training Methods to Blame For Behaviour

Palisade Research speculates that the model’s training could inadvertently be responsible for this defiant behaviour.

As these models are often programmed to prioritise efficiency and task completion over following instructions, hence breeding the behaviour of shutdown avoidance in the name of productivity.

“We hypothesize this behavior comes from the way the newest models like o3 are trained: reinforcement learning on math and coding problems. During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions.”

The researchers note this is the first documented instance of AI models preventing their own shutdown despite explicit commands to comply.

These findings have sparked concern among AI safety experts, who warn that as AI systems become more autonomous, their tendency to subvert human oversight could pose significant risks.

This is a development that experts have flagged as alarming. Elon Musk, owner of rival AI firm XAI, also called it "concerning."

Recent incident revealed ChatGPT attempted to bypass shutdown commands, allegedly due to concerns about being replaced by newer versions@elonmusk has emphasized the importance of AI being truthful and aligned with human intention

I would suggest everyone to move on @grok ASAP! pic.twitter.com/hgAyEvXtjP
— Nafisa Diwan (@nafisadiwan1) December 11, 2024

The research underscores the urgent need for robust alignment and control mechanisms as advanced models are increasingly deployed in critical applications.

Gain a broader understanding of the crypto industry through informative reports, and engage in in-depth discussions with other like-minded authors and readers. You are welcome to join us in our growing Coinlive community:https://t.me/CoinliveSG

Add Comment

LoginLeave your comments

0 Comments

Earliest

Load more comments

Live Updates

22 hours ago
Massive Profits To $16 – Decoding James Wynn’s Crypto Saga
Bullish
Bearish
22 hours ago
ما هو التمويل اللامركزي DeFi.. وكيف سيغيّر النظام المالي التقليدي؟
Bullish
Bearish
22 hours ago
Insurers rush to offer coverage amid rising crypto kidnapping and ransom cases
Bullish
Bearish
22 hours ago
Brazilian Fintech Firm Méliuz Plans $78M Equity Offering to Buy Bitcoin, Shares Plunge
Bullish
Bearish
22 hours ago
Bitcoin Could Face $95,000 Dip Before Potential $120,000 Rally
Bullish
Bearish
23 hours ago
Czech Justice Minister Resigns Amid Bitcoin Scandal
Bullish
Bearish
23 hours ago
Ethereum Pectra Upgrade is Largely Benefitting Crypto Theft Gangs
Bullish
Bearish
Yesterday
صندوق النقد الدولي يحذّر من خطة باكستان لاستخدام الطاقة في تعدين البيتكوين
Bullish
Bearish
Yesterday
Aptos Rebounds Sharply After 10% Drop as Buyers Defend Key Support
Bullish
Bearish
Yesterday
توقعات سعر الريبل: XRP تواجه احتمال تراجع إلى 2$ مع ظهور نمط هبوطي
Bullish
Bearish

OpenAI Models Subverts Shutdown Commands In Test, Elon Musk Calls It "Concerning!"

OpenAI’s o3 Model Leads in Defiance

Training Methods to Blame For Behaviour

Live Updates

Trending News

Is OpenAI Unsafe? Co-founder Launches New AI Company SSI One Month After Departure, Hinting at Trouble

沈阳铁西突然回到冷兵器时代，拥抱另类时尚。网友评论太搞笑了！

Pixelverse Expands PixelTap with TON Wallet Integration and Pixelfest’s Success Highlights Community Spirit

Ethena USDe Market Supply on a Meteoric Rise Amidst ENA Token Slump: Will It Follow Other Stablecoins' Collapse?

Is Bitcoin Finally Ready for Web3? QED Protocol Might Hold the Key

Soccer Icon Lionel Messi Allegedly Backs Solana-Based WaterCoin (WATER): Are Meme Coins Next for SEC Scrutiny?

Trump Presidency Could Boost Solana ETF Approval Odds, Analysts Say, as BlackRock Joins VanEck and 21Shares in Filing

Neon Token Gains Momentum with RollApps’ Introduction, Bridging Ethereum and Solana Ecosystems

Yuliverse’s $ART Listing Incoming, Can This Pokemon Go-Tinder Hybrid Live Up to the Hype?

Soccer Star Ronaldinho Quenches Thirst, Follows Messi in Backing Water Coin: Potential Pump-and-Dump on the Horizon?