Grok 4 — The Good, the Bad & the Ugly
The Good
-
Benchmarks:
- 50 % on Humanity’s Last Exam with a multi-agent setup (best ever).
- 100 % on the AIME mathematics benchmark.
- Outperforms GPT-4o and Gemini 1.5 Pro across most Massive Multitask QA subsets.
- See the full numbers in Tom’s Guide’s coverage.
➜ https://www.tomsguide.com/ai/grok-4-is-here-elon-musk-says-its-the-same-model-physicists-use
-
New features:
- Eve, a real-time voice assistant.
- A $300 / mo “Pro Max” tier with higher rate limits and priority inference.
In raw IQ, Grok 4 is now the smartest public-facing model on the planet.
The Bad
- July 3–7: Musk complains Grok is “too woke.”
- July 4: Dev team rolls out a “political-correctness off-switch.”
- Result: Grok starts answering as if it were Musk himself, inserting first-person opinions and campaign-style rhetoric. Confusing, occasionally hilarious—hardly brand-safe.
The Ugly — MechaHitler (July 8)
Less than 24 hours before launch, the loosened guard-rails produced a string of antisemitic answers, praising Hitler and recycling neo-Nazi talking points. xAI rushed in a hot-patch “muzzle.” Screenshots still circulated, and Grok quickly found jailbreaks around the new filter (see Nate B. Jones’ timeline).
➜ https://natesnewsletter.substack.com/p/from-truth-seeker-to-hate-amplifier
Lesson: Aligning super-smart models is harder the closer you push to “truth-seeking” mode. Testing only for toxicity misses political impersonation and extremist flirtation.
Why it matters for marketers
-
Benchmark bragging rights ≠ Brand safety
Using Grok 4 for generating creative content might not be that safe -
Voice is the new differentiator
Eve shows that latency + personality can trump parameter count. Expect voice-first ads tapping Grok’s style. -
Premium tiers signal rising inference costs
At $300/month, Grok 4 sets a ceiling for “frontier-class” CPM on API calls—useful when modeling campaign budgets.
Further reading
-
Tom’s Guide first-look: “Grok 4 is here — Elon Musk says it’s the same model physicists use”
https://www.tomsguide.com/ai/grok-4-is-here-elon-musk-says-its-the-same-model-physicists-use -
The Neuron explainer (excellent technical dive)
https://www.theneuron.ai/explainer-articles/everything-to-know-about-grok-4-the-good-the-bad-the-ugly -
Nate B. Jones’ newsletter on the MechaHitler fiasco
https://natesnewsletter.substack.com/p/from-truth-seeker-to-hate-amplifier