Proof that Opus 4.6 Is Getting Worse, Ramp AI Coworker, MiniMax M2.7 & More (This Week In AI)

Mounting evidence that Claude Opus 4.6 has been degraded — BridgeBench shows a 15-point accuracy drop on their hallucination benchmark, and AMD's Senior AI Director found median thinking collapsed from ~2,200 to ~600 characters between January and March. The hosts share their own experiences, and they line up. Meanwhile, a claim surfaced that Cursor Agent is a rebranded version of Claude Code, running behind a local proxy with a find-and-replace engine that swaps "Claude" for "Cursor" in system prompts. Cursor's Michael Truell responded, saying it was a sub-1% A/B test. The hosts break down both sides. On the shipping front, Anthropic launched Claude Managed Agents in public beta, released Claude for Word, shared details on Claude Mythos Preview — including speculation that it's a looped language model based on a ByteDance paper — and expanded its Google/Broadcom partnership for multiple gigawatts of compute. Their run rate reportedly jumped from ~$9B to $30B in four months. Sam Altman published a personal blog post revealing that someone threw a Molotov cocktail at his house. Plus: why senior executives are voluntarily dropping title to join AI companies, Ramp's internal AI productivity suite Glass, Ramp Labs' Latent Briefing paper showing 31% token savings for multi-agent systems, Scale AI's Muse Spark model now powering Meta AI, GLM-5.1 breaking into Code Arena's top 3, MiniMax shipping MMX CLI and open-sourcing M2.7, and widespread benchmark cheating exposed across nine agent benchmarks.

Proof that Opus 4.6 Is Getting Worse, Ramp AI Coworker, MiniMax M2.7 & More (This Week In AI)

Watch on

Listen on

Episode Transcript

More episodes