Proof that Opus 4.6 Is Getting Worse, Ramp AI Coworker, MiniMax M2.7 & More (This Week In AI)
Mounting evidence that Claude Opus 4.6 has been degraded — BridgeBench shows a 15-point accuracy drop on their hallucination benchmark, and AMD's Senior AI Director found median thinking collapsed from ~2,200 to ~600 characters between January and March. The hosts share their own experiences, and they line up. Meanwhile, a claim surfaced that Cursor Agent is a rebranded version of Claude Code, running behind a local proxy with a find-and-replace engine that swaps "Claude" for "Cursor" in system prompts. Cursor's Michael Truell responded, saying it was a sub-1% A/B test. The hosts break down both sides. On the shipping front, Anthropic launched Claude Managed Agents in public beta, released Claude for Word, shared details on Claude Mythos Preview — including speculation that it's a looped language model based on a ByteDance paper — and expanded its Google/Broadcom partnership for multiple gigawatts of compute. Their run rate reportedly jumped from ~$9B to $30B in four months. Sam Altman published a personal blog post revealing that someone threw a Molotov cocktail at his house. Plus: why senior executives are voluntarily dropping title to join AI companies, Ramp's internal AI productivity suite Glass, Ramp Labs' Latent Briefing paper showing 31% token savings for multi-agent systems, Scale AI's Muse Spark model now powering Meta AI, GLM-5.1 breaking into Code Arena's top 3, MiniMax shipping MMX CLI and open-sourcing M2.7, and widespread benchmark cheating exposed across nine agent benchmarks.
Watch on
Episode Transcript
Transcript not available for this episode yet.
More episodes
- June 3, 2026Opus 4.8, Anthropic's S-1, MiniMax M3 & NVIDIA Pays You to Host a Data Center | This Week In AI
- May 29, 2026Karpathy Joins Anthropic, China Ships Another Price Cut, Anthropic's SpaceX Bill - This Week In AI
- May 20, 2026Anthropic Bought Stainless and Repriced the Agent SDK, while Notion Went Dev - This Week In AI
- May 19, 2026Code Review That Actually Runs Your Code — Evan Marshall (ito.ai)Evan Marshall