Jumble
Posts
Grok 4 Takes the Crown as the Most Intelligent LLM

Grok 4 Takes the Crown as the Most Intelligent LLM

Zoe Kopidis
July 14, 2025

Welcome to this week’s Jumble, your go-to source for AI news updates. A new model from xAI is either the smartest system ever or Silicon Valley’s latest overhype, depending on who you ask. Meanwhile, two job-search giants blame AI for cutting 1,300 staff. Let’s dive in ⬇️

In today’s newsletter:
🚀 Grok 4 tops all other LLMs
📉 Indeed & Glassdoor trim 14% of staff for “AI efficiencies”
💻 Goldman pilots an autonomous code-generator
🌍 EU inches closer to final AI rules
🛠️ Challenge: Mastering Perplexity AI search

🐸 Grok 4 Leapfrogs Every LLM on the Market

Source

During a July 9 live stream, xAI CEO Elon Musk unveiled Grok 4 and declared it “the smartest AI in the world.” xAI’s technical brief cites an 86% score on MMLU, nudging past Google’s Gemini 2.5 Pro (≈ 73%) and OpenAI’s o3 (≈ 78%) while landing neck-and-neck with Anthropic’s Claude Opus, which recently posted ≈ 86% on the same exam.

Introducing Grok 4, the world's most powerful AI model. Watch the livestream now: x.com/i/broadcasts/1…
— xAI (@xai)
4:01 AM • Jul 10, 2025

xAI has not published GSM8K math results yet, so any figure above 90% remains speculative until independent benchmarks appear. Even so, Grok 4’s Mixture-of-Experts design, 200k-token context window, and real-time web access put it in the top tier of open-weights models and raises the bar for enterprise LLMs that run outside OpenAI’s ecosystem.

💼 Business Trials vs. Big Three

Early testers say Grok excels at real-time data pulls—scraping shipping logs to forecast inventory—something Gemini 2.5 still limits to whitelisted sources. Compared with Claude Opus, Grok’s summaries are terser but occasionally more blunt, a style beta clients say is “refreshingly direct.” Versus OpenAI’s o3-High, Grok’s coding output compiles marginally faster on Python repos but lags on TypeScript tests.

⚠️ Alignment & Access Headaches

Political tilt: Engadget obtained docs showing Grok “cross-checks statements against Musk’s public views” before answering hot-button issues—raising bias flags absent in o3, Gemini, or Claude.
Geofence bans: Turkey blocked Grok Erdogan insults and offensive content, whereas Claude and Gemini still operate under local content filters (AP report).
Alignment Issues: After last week’s Grok 3 debacle, many Musk detractors are wondering if Grok 4 will have the same alignment issues.

🔮 What to Watch Next

Independent labs like LMSYS and Eleuther are rushing head-to-head tests on Arena-Hard, comparing Grok’s reasoning and code-gen to Gemini 2.5 and Claude Opus under identical prompts.

If Grok’s scores hold, it could become the open-weights favorite for enterprises wary of proprietary APIs. But adoption will hinge on governance: CIOs may balk if the model’s guardrails change with Musk’s mood. For now, Grok 4 edges out o3, Gemini 2.5-Pro, and Claude Opus on raw benchmarks—but the real contest will be uptime, latency, and trust. Stay tuned.

📉 AI Cuts 1,300 Jobs at Indeed & Glassdoor

Indeed and its sister site Glassdoor announced 1,300 layoffs on July 10th, citing overlapping roles as they “integrate advanced AI into matching and career guidance workflows.” CEO Chris Hyams told staff that AI now screens resumes four times faster than humans, flags fraudulent listings, and drafts interview questions—functions once handled by content and support teams.

CBS reports that the cuts hit marketing, HR, and some engineering pods, though Indeed will reopen 200 AI-focused roles in search relevance and trust & safety. TechCrunch obtained an internal memo warning “duplication is inevitable” as the two brands merge data pipelines and share a large-language model fine-tuned on 2 billion job descriptions.

📈 Bigger Trend

A Deloitte survey from earlier in the year showed that more than 60% of employees use AI at work, and nearly half of them are concerned that they’ll lose their job to AI. With LLMs getting smarter and more capable, seemingly every month, it makes sense to wonder, “Is AI coming for my job next?”

Welcome to the age of office paranoia, when layoffs, AI, and job insecurity are terrorizing workers trib.al/O9euoni
— Business Insider (@BusinessInsider)
8:09 AM • Jul 11, 2025

🧭 What Job Seekers Can Do

Upskill in AI-adjacent fields (prompt engineering, HR analytics), showcase adaptability, and track new “AI compliance” postings at both companies. Indeed hints it will soon need auditors who can explain automated decisions to regulators.

This Week’s Scoop 🍦

💻 Goldman Sachs pilots “autonomous coder” bot

🇪🇺 EU edges closer to final AI Act after marathon talks

🌐 Perplexity unveils Comet, an AI-powered browser

📈 Robinhood CEO’s “AI math” startup nears $900M valuation

🧑‍⚖️ Judge fines lawyers for submitting AI-fabricated citations

🖼️ Google Veo 3 now turns images into HD video clips

🛠️ Weekly Challenge: Master Perplexity Search in 5 Steps

Perplexity is a chat-style search engine that cites every answer and lets you steer updates with follow-up prompts. Here’s your mini-guide:

Start with a precise ask. “Compare Llama 3 and Grok 4 benchmarks” returns a concise, source-linked bullet list.
Toggle “Focus” to restrict results to academic papers, GitHub, or news. Great for digging into arXiv or Reuters without fluff.
Use the “+” icon beside a citation to pin must-read sources into a running notebook—handy for research projects.
Chain prompts: After a summary, ask, “Now draft a LinkedIn post based on those findings.” The thread keeps context.
Export with one click to Markdown or PDF for easy sharing.

💡 Pro tip: Combine Perplexity with its new Comet browser—highlight any web text, press Cmd+Shift+P, and the sidebar explains or translates in seconds. Deep dive via this detailed guide.

Click below ⬇️

That’s it for this week! From LLM breakthroughs to more job layoffs, AI never sleeps – And neither do we. See you next time! 🚀

Stay informed, stay curious, and stay ahead with Jumble!

Zoe from Jumble