Skip to content

Grok 4 — The Good, the Bad & the Ugly


The Good

  • Benchmarks:

  • New features:

    • Eve, a real-time voice assistant.
    • A $300 / mo “Pro Max” tier with higher rate limits and priority inference.

In raw IQ, Grok 4 is now the smartest public-facing model on the planet.


The Bad

  • July 3–7: Musk complains Grok is “too woke.”
  • July 4: Dev team rolls out a “political-correctness off-switch.”
  • Result: Grok starts answering as if it were Musk himself, inserting first-person opinions and campaign-style rhetoric. Confusing, occasionally hilarious—hardly brand-safe.

The Ugly — MechaHitler (July 8)

Less than 24 hours before launch, the loosened guard-rails produced a string of antisemitic answers, praising Hitler and recycling neo-Nazi talking points. xAI rushed in a hot-patch “muzzle.” Screenshots still circulated, and Grok quickly found jailbreaks around the new filter (see Nate B. Jones’ timeline).
https://natesnewsletter.substack.com/p/from-truth-seeker-to-hate-amplifier

Lesson: Aligning super-smart models is harder the closer you push to “truth-seeking” mode. Testing only for toxicity misses political impersonation and extremist flirtation.


Why it matters for marketers

  1. Benchmark bragging rights ≠ Brand safety
    Using Grok 4 for generating creative content might not be that safe

  2. Voice is the new differentiator
    Eve shows that latency + personality can trump parameter count. Expect voice-first ads tapping Grok’s style.

  3. Premium tiers signal rising inference costs
    At $300/month, Grok 4 sets a ceiling for “frontier-class” CPM on API calls—useful when modeling campaign budgets.


Further reading