Mischa Sigtermans

Thoughts
· AI Technology

Opus 4.6 is the best model I've ever used

In February I said Opus would come back. It did. The squeeze you felt in March wasn't Opus 4.6. It was Claude Code bugs, a popularity spike, and Mythos pulling compute.

Two months ago I wrote a post arguing Opus didn't get worse. The thesis was that we were watching infrastructure get tuned in public, and that the model was fine. I want to update the specifics, because a few things happened in March that looked like the model getting worse again and weren't. And underneath all of them is a story about compute that I think matters more than any individual model release.

Here's the headline version. Opus 4.6 is the best model I've ever used. I do all my thinking and planning with it. It fixes complex tasks I wouldn't have handed to a model a year ago. It's my default. Sonnet 4.6 runs the execution layer, including my Ralph loops, and that split has made the last two months the most productive stretch I've had at Ryde. The February post ended with me predicting Opus would come back once the pressure lifted. It did. It came back better than I expected.

The squeeze that everyone felt in March, the one that made it look like Opus had fallen off a cliff again, wasn't Opus. It was at least four different things stacked on top of each other, and pulling them apart is the interesting exercise.

Squeeze one: Claude Code actually had bugs

This is the part I didn't see clearly in February because it hadn't happened yet. In March, The Register ran a piece documenting that Anthropic had publicly admitted people were hitting their Claude Code usage limits 'way faster than expected' and that fixing it was 'the top priority for the team'. That's a very different sentence than 'we're tuning capacity'. That's 'our tool is broken, we know, we're on it'.

One user reverse-engineered the Claude Code binary and posted that they'd found two independent bugs that caused prompt caching to break, silently inflating token costs by ten to twenty times. Think about what that means for a heavy Claude Code user. You're writing the same prompts you wrote in February. The cache should be doing its job. Instead, every request is secretly re-billing you for context you already paid to cache. Your Max plan evaporates by midday and you have no idea why.

That was me. I was hitting limits multiple times a day through most of March, and I couldn't figure out why my usage looked nothing like my February baseline. The honest answer turned out to be that the tool in my hands was leaking tokens. It wasn't Opus. It wasn't throttling. It was a cache bug that nobody outside the debugging community could see.

In early March, Claude hit number one on the App Store and dethroned ChatGPT for the first time. That's a demand event, not a supply event. It's the kind of thing that happens over a weekend and breaks every capacity projection the infrastructure team had on file.

When you layer a 10x cache bug on top of an unprecedented consumer demand spike, you don't get a clean signal about which thing is hurting you. You get 'my model feels slow, my quota feels tighter, my day is shorter, this must be the model'. It wasn't the model. It was millions of new users arriving at the door with their own workloads, and every consumer tier feeling it at once.

The thing about a demand spike that big is that it also justifies whatever comes next. And what came next was a very clever piece of capacity theater.

Squeeze three: the 2x promotion was a softening trick

On March 13, Anthropic announced a 2x usage promotion that ran through March 27. Double your five-hour limits during off-peak hours on weekdays, all day on weekends, across Free, Pro, Max, and Team. Free additional capacity, not taken from the normal allowance.

On the surface this is generous. In context, the timing is the story. It landed exactly when the Claude Code cache bugs were chewing through quotas and the popularity spike was saturating the infrastructure. For two weeks, you could breathe. The bugs were still there, but the 2x buffer hid most of the damage. You got your work done. Things felt manageable. Then March 27 arrived and the normal came back.

And the normal felt worse than it should have, because for two weeks your baseline had been double. That's the trick. Not 'we are giving you more', but 'we are temporarily hiding how bad it is, and when we uncover it again you'll blame yourself for not having planned better'. I don't think this was sinister exactly. I think it was a capacity team making a reasonable call under real pressure. But the effect, whether intended or not, was to make the post-promotion week feel like a nerf even though nothing had nominally changed.

If you want to know when your infrastructure is stretched, watch for the word 'promotion'. It's usually the softer version of 'we need to buy time'.

Squeeze four: Mythos is quietly pulling compute

Now I'm at today, April 11. The Claude Code bugs are largely fixed. The demand spike has normalized into a new higher baseline. The 2x promotion has been over for two weeks. My daily rhythm feels roughly like it did in late January, which is to say, it feels fine. Opus 4.6 is still my default and it is, to repeat myself, the best model I've ever used.

But there's a new variable, and it's the one I missed in February.

On March 27, Anthropic accidentally exposed draft blog posts through a misconfigured CMS. The posts described a model called Claude Mythos. On April 7, Anthropic officially announced Mythos Preview and a defensive coalition called Project Glasswing, bringing eleven enterprises (AWS, Apple, Google, Microsoft, NVIDIA, Broadcom, Cisco, CrowdStrike, JPMorgan, the Linux Foundation, and Palo Alto Networks) into gated access to what Anthropic is calling 'the most capable model we've built to date'.

Mythos isn't a version bump of Opus. It's a new tier above Opus. And buried in Anthropic's own announcement is the sentence that matters most: Mythos is 'very expensive for us to serve, and will be very expensive for our customers to use'. Fortune reported that part of the reason Mythos isn't going broadly available is that Anthropic may not have the GPU and compute resources to serve it at scale. A cheaper internal model, 'Spud', is reportedly going to reach users first because Mythos is too expensive to commercialize in its current state.

Think about what that means for Opus. Mythos isn't cannibalizing Opus users. Eleven companies aren't switching from Opus. But Mythos is consuming Opus-class compute, and it's doing it in a capacity environment where Anthropic is already stretched. You can feel it as a background hum on Opus today. Nothing dramatic. Nothing like the March chaos. Just a sense that Opus 4.6 is being asked to share GPU time with something bigger and hungrier sitting above it.

The bigger picture Anthropic is showing us

The Mythos story isn't standalone. It's the visible tip of a deeper resource situation that Anthropic has been telegraphing for months.

They just hired Eric Boyd from Microsoft to lead AI infrastructure. They've locked in 3.5 gigawatts of Google TPU capacity through a Broadcom deal. They're reportedly considering designing their own chips. They announced a $50 billion data center plan. And Nvidia's own research teams can't get enough GPUs, with data-center lead times running 36 to 52 weeks industry-wide.

None of those are the moves of a company sitting comfortably on abundant compute. Those are the moves of a company running a very expensive model portfolio at the edge of what physics allows, and building runway for a bigger portfolio they haven't shipped yet. The squeeze you felt in March wasn't random. It's what it looks like when capacity is tight, demand is cresting, a popular tool has bugs, and a new tier is being quietly fed from the same GPU pool as the one you use every day.

What I'm doing about it

Same tactical shape as last time, with the updates you'd expect:

  • Opus 4.6 is my default for thinking and planning. One-shot hard calls, architectural decisions, critical review passes, complex specs. This is what Opus was built for and it's genuinely excellent at it. I don't want to bury that inside a caveat. It's the best model I've ever used.
  • Sonnet 4.6 runs the execution layer. Drafting, refactoring, test generation, long-context scans, anything that's more about volume than depth. The 1M context window makes it a different tool than Sonnet 4.5 was.
  • Ralph runs on Sonnet. Opus as the planner, Sonnet as the hands. Making that split explicit in the loop configuration is the single biggest productivity change I've made since February.
  • I watch for cache behavior, not just quota. A 10x cache bug looks identical to a 10x quota reduction from the outside. The only way to tell is to measure. I now log cache hit rates on my heavy workflows because the March experience taught me that the tool can lie about what's happening.
  • I read promotion announcements as capacity signals. 'Double usage for two weeks' is not a gift. It's a statement about the state of the infrastructure. Plan around the recovery, not the peak.

The principle, revised

In February I said to watch the knobs, not the weights. What I'd add today is that there are more knobs than I thought, and they turn at the same time. The Opus 4.6 complaints you heard in March weren't one story. They were four stories stacked on top of each other, with Mythos quietly adjusting the floor underneath all of it.

That's the pattern I'd bet on for the next year. Every time a lab ships a new tier above an existing one, the existing tier gets worse for a while, the tools around it get buggy under the new load, a consumer demand spike makes everything feel like the user's fault, and a generous promotion covers the seam. Then things normalize, at a slightly tighter baseline than before, until the next tier lands.

Opus 4.6 didn't get worse. It's genuinely great. The environment around it went through the worst month it's had in a year, for four separate reasons, and we spent most of that month blaming the one thing that wasn't broken.

thanks for reading

Hi, I'm Mischa. I've been Shipping products and building ventures for over a decade. First exit at 25, second at 30. Now Partner & CPO at Ryde Ventures, an AI venture studio in Amsterdam. Currently shipping Stagent and Onoma. Based in Hong Kong. I write about what I learn along the way.

Keep reading: The most skeptical about AI haven't shipped with it.

Thoughts