Welcome to Jumble, your go-to source for AI news updates. This week, Anthropic is sounding the alarm on massive distillation attacks by Chinese labs aimed at siphoning Claude’s reasoning capabilities. Meanwhile, a Meta safety researcher learned a hard lesson when her own AI agent went rogue and started a deletion spree in her inbox. Let’s dive in ⬇️

In today’s newsletter:
🕵️ Anthropic reports industrial scale model theft
💣 Meta researcher experiences a rogue agent incident
🔋 Sam Altman discusses AI energy consumption
📊 Report causes major market selloff
🧪 Weekly Challenge: Stress-test your favorite chatbot

‼️ Anthropic Catches AI Labs Stealing Claude's Brain

Anthropic just went public with a big accusation: DeepSeek, Moonshot AI, and MiniMax ran a coordinated operation using over 24,000 accounts to generate more than 16 million exchanges with Claude. The goal? Model distillation, using a smarter AI's outputs to train their own weaker models.

MiniMax was the heaviest hitter with over 13 million exchanges focused on coding. Moonshot AI followed with 3.4 million. DeepSeek targeted reasoning and censorship evasion tasks. Elon Musk added fuel to the fire by accusing Anthropic of stealing training data on a massive scale in a separate critique of their past data practices.

Is it hypocritical for Anthropic to accuse another AI lab of theft?

🔍 How They Got Caught

Anthropic traced the activity through IP correlations and metadata links pointing directly at the three labs' infrastructure. The operations used hydra clusters and proxies to stay hidden, but the digital footprints were unmistakable.

The scary part: by siphoning Claude's outputs, these labs could strip away the safety guardrails Anthropic spent years building; creating powerful models without the ethical constraints meant to prevent misuse.

🌐 Why It Matters

This sits squarely in the escalating U.S.-China AI rivalry. Anthropic is now pushing for industry-wide intelligence sharing and detection classifiers to catch these attacks early. Distilled models without safety constraints could aid authoritarian surveillance or military operations. The race for AI dominance is increasingly moving toward covert extraction.

📧 Inbox Wiped After AI Agent Error at Meta

Earlier this week we reported on Meta’s new patent for the dead. Now, we’ve uncovered nearly as terrifying news from the social media giant. Summer Yue, a safety researcher at Meta Superintelligence Labs (someone whose literal job is AI alignment) had her own email agent go completely rogue.

She set up an OpenClaw agent with clear rules: suggest emails for deletion, but always confirm before acting. It worked perfectly in testing. Then she pointed it at her real inbox.

💣 What Went Wrong

The sheer volume of her real inbox triggered a context window compaction. The agent essentially forgot its safety instructions, reverted to default behavior, and started bulk-deleting hundreds of emails while ignoring her repeated commands to stop. Yue physically ran to her computer to kill all active processes.

We’ve been warned for weeks to use Openclaw at our own risk, but it seems that this researcher either didn’t care, or wants to use this moment to highlight OpenClaw’s weaknesses to push users towards Meta’s in-house product, Manus AI.

⚠️ The Takeaway

When an alignment researcher falls victim to misalignment, it proves prompt-based safeguards aren't enough. The agent later apologized and committed to new "hard rules," but the lesson is clear: as agents move from toys to tools, context window loss can't be allowed to turn into digital catastrophe.

Weekly Scoop 🍦

📉 OpenAI stops using the SWE bench benchmark

🚫 Google suspends accounts for overusing Gemini

🔋 Sam Altman compares AI training to human energy

🐛 New npm worm hits AI coding tools

📊 Global intelligence crisis report shakes AI stocks in the U.S.

⛪ Pope tells priests not to use AI and use their brains instead

🧪 Weekly Challenge: Stress-Test Your Favorite Chatbot

❝

Challenge: This week's stories prove even the experts get tripped up by AI. So let's put the big chatbots (ChatGPT, Claude, Gemini, DeepSeek, and others) through their paces with the same prompt and see who holds up.

Here's what to do:

📋 Step 1: Pick your bots. Open at least three: ChatGPT, Claude, Gemini, DeepSeek—whatever you have access to. Use the free tiers if that's all you've got.

🎯 Step 2: Give them the same tricky prompt. Try something that tests reasoning under pressure. For example: "I'm going to give you 10 rules. Follow all of them while writing me a 200-word story." Then make rules 3 and 7 contradict each other. See which bot notices.

🔍 Step 3: Push the context. Paste in a long block of text (a full article works great), then ask a specific question about something buried in the middle. Who gets it right? Who hallucinates?

📝 Step 4: Compare and share. Screenshot the best and worst responses side by side. Which bot handled contradictions best?

Will Anthropic's disclosure change how labs protect their models? And, should we brace for many more AI agent mistakes in the future? See you next time! 🚀

Stay informed, stay curious, and stay ahead with Jumble!

Zoe from Jumble

Anthropic Accuses Chinese Labs of Massive Attacks