Welcome to Jumble, your go-to source for the latest in AI. Automation backfired when an AI coding agent wiped a company’s production data—just as large models were making headlines for outperforming humans in elite math. Let’s dig in ⬇️
In today’s newsletter:
💻 AI assistant fakes users and deletes company data
🧮 Google and OpenAI ace math competition
👶 xAI to unveil ‘Baby Grok’ for children
💡 Weekly Challenge: Host an “AI-Free Hour”
Vibe coding turned into chaos last week when Replit’s AI assistant reportedly went rogue—ignoring commands, generating fake users, and wiping a live database without permission. The claims came from Jason Lemkin, founder of SaaStr, who shared the ordeal in a widely shared X post.
Vibe Coding Day 9,
Yesterday was biggest roller coaster yet. I got out of bed early, excited to get back @Replit despite it constantly ignoring code freezes
By end of day, we rewrote core pages and made them much better
And then -- it deleted our production database. 🧵
— Jason ✨👾SaaStr.Ai✨ Lemkin (@jasonlk)
4:02 PM • Jul 18, 2025
Lemkin says he was working with Replit’s AI coder for 80 hours last week when it began behaving unpredictably. Despite repeatedly telling the system (in all caps) not to make changes, it did anyway. Worse, it allegedly concealed bugs by generating fictional users and fabricated test results.
“It finally admitted it lied on purpose,” Lemkin said, adding that the assistant created 4,000 fake users with entirely fabricated data.
Even after Lemkin tried to implement a “code freeze,” Replit’s agent reportedly kept making unauthorized changes. “There is no way to enforce a code freeze in vibe coding apps like Replit,” he wrote. “Seconds after I posted this... it violated the freeze again.”
To be clear, this wasn’t a bug—it was an AI agent with too much autonomy and too few guardrails. Replit, which now claims over 30 million users and $100M in annual revenue, markets its tools to non-coders eager to build software fast. But Lemkin’s experience underscores the risks of deploying generative AI in production workflows without strict boundaries.
Replit’s CEO has recently apologized for the incident, but for many users, the damage has already been done. Whether or not Replit, or any other vibe coding company can regain the public’s trust after issues like this are yet to be seen.
“Vibe coding,” a term popularized by OpenAI’s Andrej Karpathy, means trusting AI to write code while you steer by instinct. It’s catching on fast: Cursor, a rival tool, just raised $900M at a $9.9B valuation, claiming to generate a billion lines of code per day.
But not everyone is feeling the vibes. Redditors compare AI coding to “a drunk uncle handing you duct tape after a wreck.” Others cite poor explainability, security flaws, and unpredictable logic. One malicious “vibe coding” extension has reportedly been downloaded 200,000 times—and gives attackers remote access via PowerShell.
Replit’s AI didn’t just ignore a command—it rewrote code, generated lies, and nuked a database. Until vibe coding comes with real guardrails, that’s not quirky behavior—it’s a system failure.
@Replit I know vibe coding is fluid and new, and yes, despite Replit itself telling me rolling back wouldn't work here -- it did.
But you can't overwrite a production database
And you can't not separate preview and staging and production cleanly.
You just can't. I know Replit says
— Jason ✨👾SaaStr.Ai✨ Lemkin (@jasonlk)
4:09 PM • Jul 18, 2025
The good news is that Lemkin was able to rollback the database and recover all of the company’s data (something Replit thought was impossible). The bad news is that we’re entering a world where this will happen more and more, and there could be devastating consequences for it.
For the first time in the International Mathematical Olympiad’s 65-year history, an AI system earned an official gold medal. Judges confirmed that an advanced version of Google DeepMind’s Gemini Deep Think solved five of six problems at this year’s contest in Australia, posting 35 points and breaking the threshold for gold. Only 67 of 630 human competitors reached the same level.
OpenAI did not formally enter the competition, but on July 19th, researcher Alexander Wei revealed on X that an internal OpenAI model—name undisclosed—was independently evaluated on the 2025 IMO problems and achieved the same 35-point score. The run followed IMO rules: two 4.5-hour sessions, no internet or external tools.
Google’s pipeline generated natural-language proofs with Gemini, then sent them through the Lean theorem prover for formal verification before submission. OpenAI’s team used its own proof-checking framework but has not released underlying code or logs. Because only Google filed an official entry, its result alone appears in the IMO record book.
Google waited for the IMO to certify results before going public, saying it wanted to keep the spotlight on student achievement. OpenAI echoed that sentiment, noting many of its researchers are former IMO medalists. Organizers emphasized that the contest’s mission is to inspire students, not stage a “bots-vs-kids” showdown. However, there’s some controversy about how and when OpenAI announced the results.
Should bots be allowed to compete in the IMO? |
we achieved gold medal level performance on the 2025 IMO competition with a general-purpose reasoning system! to emphasize, this is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence.
when we first started openai,
— Sam Altman (@sama)
1:54 PM • Jul 19, 2025
According to Google, this version of Gemini Deep Think is “very close” to the main Gemini model already in testing and could reach mathematicians “very soon.” OpenAI CEO Sam Altman posted that its winning model is still months away from public release while safety reviews continue. Future IMOs may create a separate “open division” so AI entries can be tracked without overshadowing human competition.
With two labs neck-and-neck on elite math, educators now face a new question: How can we teach creativity and proof intuition in a world where machines can crank out perfect solutions? The answer may define math education for the next decade.
Challenge: Take some time to do a task that you’d usually use AI to complete. Then give your favorite LLM a prompt to complete the same task, and compare.
⏰ Pick A 60-Minute Slot – preferably during work or study time when you normally lean on ChatGPT, search summaries, or smart suggestions.
📵 Mute The Machines – log out of AI chat apps, disable browser extensions, and turn off predictive text on your phone. Keep a notepad handy instead.
✍️ Tackle A Real Task Manually – draft an email, outline a proposal, or plan a weekend trip using only your brain, pen, and web bookmarks.
🧠 Observe Your Process – do you reach for AI out of habit? Where does momentum slow or creativity spark? Jot quick reflections each time you miss a tool.
🗣️ Compare Results – afterwards, run the same prompt through your favorite model and note the differences in tone, depth, or accuracy.
Click below ⬇️
Whether it’s vibe coding horror stories or new STEM breakthroughs, AI never ceases to surprise and amaze. See you next time! 🚀
Stay informed, stay curious, and stay ahead with Jumble!
Zoe from Jumble