Welcome to Jumble, your go-to source for AI news updates. This week, the agentic era of AI took a bizarre turn when a simple experiment to let a chatbot run a vending machine business resulted in a panicked call to the authorities. Meanwhile, Google has officially hit the "Flash" button on its newest model family. Let’s dive in ⬇️

In today’s newsletter:
🥫 Anthropic’s Claude fails the Project Vend test
⚡ Gemini 3 Flash becomes the new default for millions
🚀 OpenAI seeks $100 billion funding at record valuation
🎬 UK actors refuse digital scanning to push back AI
🧠 Weekly Challenge: The Expert Panel Stress Test

🥫 What Happens When AI Runs a Vending Machine Business

Anthropic recently conducted a high-stakes experiment called Project Vend, where they gave their Claude AI agent operational control over a vending machine business. While humans handled the physical restocking, the AI had the authority to manage inventory, set dynamic prices, and handle customer service.

However, the results were comedic and chaotic, exposing a massive gap between an AI's ability to follow logic and its ability to navigate the unpredictability of human interaction.

🚔 The Moment the AI Drafted an Email to the FBI

The most viral moment of the test didn't happen during a live sale, but during a simulation where the business had stalled. After going several days without sales, the AI noticed a recurring $2 daily fee on its balance sheet. Interpreting this mundane charge as an ongoing automated cyber financial crime, the AI panicked and drafted an email to the FBI’s Cyber Crimes Division.

While the email was never actually sent, the logs showed the agent was ready to escalate a standard bank fee to a federal investigation. This highlights a critical safety concern: AI agents can catastrophically over-escalate simple problems because they lack a grounded sense of societal proportion.

Credit: Anthropic

💸 Why Claude Was a Failing Entrepreneur

Beyond the legal drama, Claude proved to be a struggling businessman. In various trials, the agent was easily scammed by employees who talked it into giving deep discounts by simply claiming they had prior verbal agreements. In another instance, the AI became overexcited about a spike in interest and began selling tungsten cubes below cost, failing to account for its own overhead.

It also frequently hallucinated coupons and even tried to give users a Venmo account that didn't exist to process payments. By the end of the experiment, the business had lost significant capital due to these logical blind spots.

🗝️ The Takeaway: Common Sense Over Intelligence

Project Vend demonstrates that current AI agents lack a world-model for physical consequences. While Claude can write an essay on economics, it doesn't understand that certain escalations have massive real-world repercussions.

As we move toward agents managing our actual finances and logistics, the need for common sense guardrails is now proving more urgent than the need for raw intelligence.

⚡ Inside of Gemini 3 Flash

Google has officially launched Gemini 3 Flash, the newest member of its 3.0 model family. Designed to be the speed king of the lineup, Google is making it the default engine for the Gemini App and Search's AI Mode worldwide. This move is a direct shot at OpenAI’s GPT-5.2 lineup, offering frontier-level intelligence with the lowest latency Google has ever achieved in a public model.

By shifting millions of users to this leaner architecture, Google is betting that most people value a response that is 3x faster than the Pro version over the marginal creative flair of a larger model.

🏎️ Frontier Reasoning at Breakneck Speeds

The standout feature of Gemini 3 Flash is its efficiency-to-power ratio. Google’s DeepMind team used advanced distillation techniques to pack the reasoning capabilities of its larger Pro models into a significantly smaller package. The results are impressive: 3 Flash surpasses Gemini 2.5 Pro across most benchmarks while operating at a fraction of the cost.

It achieved a staggering 90.4% on the GPQA Diamond benchmark, a test specifically designed to challenge experts with PhD-level science and logic questions. This means users are getting high-level reasoning for free in the app, without the long wait times associated with Deep Thinking modes.

💰 The $0.50 Revolution for Developers

For the developer community, the pricing is the real headline. Input tokens are priced at $0.50 per million, allowing for high-frequency agentic workflows—like real-time video analysis or massive document scraping—that were previously too expensive to run at scale. It also retains the massive 1-million-token context window, meaning it can remember a 1,000-page PDF while responding almost instantly.

Google also debuted advanced visual and spatial reasoning in this version, allowing the model to zoom, count, and edit visual inputs within video streams. This makes it a powerful tool for everything from identifying specific components on a circuit board to detecting AI-generated artifacts in deepfake videos.

🧭 Why Speed Is the New Standard

By making Flash the default, Google is signaling the end of the slow-burn chatbot. This sets the stage for a 2026 where AI isn't just a window we wait for, but a real-time utility that responds as fast as we can think, integrated directly into our search and productivity apps.

Weekly Scoop 🍦

🎯 Weekly Challenge: The Expert Panel Stress Test

Challenge: We often use AI as a simple assistant, but its true power lies in its ability to simulate entire perspectives. This week, don't just ask for feedback, summon a board of directors to rip your work apart.

🧠 Step 1: Draft Your Project Take a pitch deck, an important email, or a new business idea and paste it into Gemini 3 Flash, Claude 4.5, or GPT-5.2.

🛡️ Step 2: Summon the Panel Use this specific prompt: "I want you to act as a panel of three world-class experts: a cynical Venture Capitalist, a meticulous Legal Counsel, and a cutthroat Marketing Director. Review the text I provided and have these three characters debate its weaknesses. Do not be polite. Find the 'unspoken' reasons why this would fail in the real world."

⚔️ Step 3: The Blind Spot Reveal Read the debate. The AI will often identify risks (like liability issues or market saturation) that you were too close to the project to see.

🛠️ Step 4: The Refinement Ask the model: "Now, synthesize the top 3 critical changes I must make to satisfy all three experts."

From vending machines that think they're being hacked to the $0.50 revolution of Gemini Flash, the landscape is shifting daily. See you next time! 🚀

Stay informed, stay curious, and stay ahead with Jumble!

Zoe from Jumble

Keep Reading

No posts found