We need to talk about the Claude Code rate limits

Theo - t3․gg| 00:32:56|Apr 3, 2026
Chapters8
Overview of Anthropic's new 5 hour session limits for Claude Code during peak hours and the surrounding user frustration.

Anthropic’s Claude Code rate limits just shifted, cutting peak-hour access to fix compute strain, while off-peak usage doubles—sparking widespread frustration and questions about communication and governance inside the company.

Summary

Theo breaks down Anthropic’s surprise rate-limit changes for Claude Code, explaining that the company is trying to rebalance compute across research, product, and enterprise users. The new policy targets weekdays 5:00 a.m.–11:00 a.m. Pacific time, where 5-hour session limits will tighten, while off-peak hours see roughly double the allowed usage. Theo reminds viewers that Claude Code has historically been incredibly subsidized, with some analyses suggesting up to $5,000 of compute per month on the $200/month plan, creating a tension between customer value and cost. He notes that the change was announced late and primarily via a single Twitter post by an Anthropic employee, which amplified optics problems. Theo also compares this to OpenAI’s recent resets and pricing tactics, highlighting the broader industry compute crunch and the strategic difficulties of distributing GPUs among researchers, product teams, and paying users. He argues the root cause is compute scarcity and internal friction as Anthropic shifts from a research-centric culture to a product-driven, revenue-focused regime. The video blends critique with empathy for Thoric, the employee who publicly communicates these changes, while calling out the lack of formal, in-product communication. Theo closes with reflections on broader industry trends, such as batch pricing, priority access, and the ongoing race to build more efficient models. Overall, the takeaway is that the compute bottleneck is real, but the way changes are communicated and implemented matters just as much as the changes themselves.

Key Takeaways

  • Weekday peak-hour (5:00 a.m.–11:00 a.m. PT) session limits for Claude Code are being tightened, accelerating the 5-hour limit during that window.
  • Outside the peak window, users still see doubled usage allowances, but the exact speed-up within the 5-hour block was not quantified by Anthropic in the official messaging.
  • Analyses suggest Claude Code sub plans have been highly subsidized (potentially up to $5,000/month in compute for $200/month), prompting a cost-containment pivot from Anthropic.
  • The change was announced late and primarily via a Twitter post by Thoric (an Anthropic employee), leading to optics issues and uneven awareness among paying users who don’t follow social channels or dashboards for updates.

Who Is This For?

Essential viewing for developers and product teams who rely on Claude Code for heavy compute workloads, and for investors or analysts tracking Anthropic’s pricing and capacity strategy. It’s also a wake-up call for users who want clearer, in-product communications around rate-limit changes.

Notable Quotes

""Anthropic just made a huge change to how rate limits work for Claude Code subscribers and people are pissed.""
Opening line sets the video’s critical tone about the policy change.
""During weekdays between 5:00 a.m. and 11:00 a.m. Pacific time, you'll move through your 5-hour session limits faster than before.""
The core policy change description from Anthropic’s post.
""This 5:00 a.m. to 11:00 a.m. window in particular seems to be the centerpiece of their struggles...""
Theo’s observation on why the window was targeted.
""7% of users will hit session limits that they wouldn't have hit before, particularly for pro tiers.""
Noting the scale of impact claimed by the company or implied by the transcript.
""The optics here were awful... you should be able to see that in the product as you use it.""
Critique of how the change was communicated.

Questions This Video Answers

  • How does Anthropic’s Claude Code rate limiting work during peak hours?
  • Why did Claude Code subsidize compute so heavily, and what changed?
  • What are the best practices for predicting compute usage with GPU-constrained labs?
Claude CodeAnthropicrate limitsGPU compute shortagecompute economicsOpenAI comparisonpeak vs off-peak pricingbatch pricingpriority accessThoric (Anthropic)
Full Transcript
Anthropic just made a huge change to how rate limits work for Claude Code subscribers and people are pissed. It's spring break for Cloud Code. Enjoy 2x usage during off- peak app. Wait, what? They're pissed about this. Oh, that was March 14th. It's not March 14th anymore. It's the week after. And now there's a new post. To manage growing demand for Claude, we're adjusting our 5-hour session limits for free, pro, and max subs during peak hours. Your weekly limits remain unchanged, but during weekdays between 5:00 a.m. and 11 a.m. Pacific time, you'll move through your 5hour session limits faster than before. I have a lot of thoughts here. For those who don't know, the Claude Code subscriptions have historically been very, very generous. So much so that according to some of the competitors doing analysis, you're able to consume up to $5,000 of compute per month when paying for the $200 a month plan. That is a 25x subsidization. That is crazy. And that's fine if they want to do it until they can't manage to do it anymore. And it seems like they are struggling to manage. This 5:00 a.m. to 11:00 a.m. window in particular seems to be the centerpiece of their struggles as it's the exact same window that they cited over here with the Clawude March discount thing they did. Yeah, it's the 5:00 a.m. to 11:00 a.m. Pacific time window. Outside of that, they doubled your usage. But if you were working within that window, your usage stayed the same. But it seems like they've gone even further and now limited it heavily. Citing a number that is seemingly small, but is actually very, very scary, saying that around 7% of users will hit session limits that they wouldn't have hit before, particularly for pro tiers. This is not great. As you can see from all of their replies going from, I think you should reconsider to this is [ __ ] [ __ ] it. People aren't happy. And my take here might surprise you guys. I think they're getting too much crap for this. I'm going to do my best to explain why, how we got here, and why we'll be seeing more and more of these things in the future. But if I have to have two subs right now, someone's got to cover the bill. So, we're going to take a quick break for today's sponsor. Nowadays, it's hard to imagine life without Docker. It has fundamentally changed how we build and manage our pipelines. But, it's also a thing that we see a lot as we sit there waiting for our Docker images to build. Unless you're using today's sponsor, Depot, because they'll make your Docker images build hilariously fast. Postto made the move for their Docker builds, and they're now 14 times faster. Mastedon made the same change, and it was 18 times faster. I haven't seen any example of Depot being slower for any real project. It's actually hilarious how fast it can get. So, it's got to be more expensive then, right? Nope. I find it's often cheaper than GitHub's actions. So much so that GitHub tried to charge people for using Depot, which blew up in their face and they ended up canceling that change. I understand why they'd be a mad though, cuz Docker images aren't the only thing they do. They also will cut your GitHub action workflows in like half or less, which is crazy. So, what's the catch? It's got to be really hard to set up, right? Clearly, you haven't been around very long. I wouldn't be plugging these guys if it was super complex. You install their CLI and you just use the depot command instead of the Docker command and everything will be way faster. And once you've authenticated with Depot, it'll actually cache the layers so that not just you and your CI benefit from the speeds, your whole team does, too. So if one person's already built this image, you shouldn't have to wait the entirety of that build time. The result here is magical, especially if you're spinning up lots of agents to run things in parallel. Having those agents spin up Docker in seconds instead of minutes is a huge shift in how much stuff you actually reach for them to do. And it's worth noting they'll integrate with pretty much any CI you are running. And if you're using CI, that's not in this list. Let me know in the comments. I'm actually curious. I can't imagine anyone not using at least one of these platforms as the majority of how they run their stuff. This ad was longer than how much time it takes to set up Depot, and your build times are longer than they should be. Fix that now at soyv.link/depo. Hi, Theo from the future here. Quick things. One, this video has nothing to do with the Cloud Code source leak. We're just covering the rate limit changes. Two, separate from the rate limit changes that were discussed in mostly the topic of this video, there were also some bugs with how the rate limits were being applied. The details have not been discussed in a meaningful public way, just that they are aware that people were hitting usage limits and they shipped some changes in order to make this less likely. There were a lot of theories as to what was causing this. Everything from broken code preventing cash from being hit to intentionally bloating the context to cause usage to go up. None of that appears to be true, and even if it was, it appears to be a bug. That all said, they still haven't done a rate limit reset, so it is what it is. But that is not what we're talking about. This is all still being investigated and is borderline conspiracy, some of the things I've been seeing. So don't expect me to talk about the reverse engineering that was done in order to try and figure out what was broken in the caching because we also learned from the source code leak that that probably wasn't true. So let's go back to the actual topic, which is why the rate limits were changed and the economic reality that Anthropic has put themselves in, which is quite rough. The main thing we need to cover here is that compute is expensive. More specifically, we need to talk about how Anthropic made terrible choices around compute and they're now eating the cost of that long term. There are other layers I want to talk about here from how this was communicated to how it was tested with this previous change to how Pthoric has to deal with all of this himself. And it doesn't seem like he's getting any help internally at all. And we'll talk about all of that, don't worry. But I really want to focus in on the compute side here because the story there is interesting. Now, firstly to start with something that y'all might not actually know. It takes GPUs to run models like Claude. I know crazy revolutionary thinking here but when you combine this with another fact I think things will start to make a little more sense. Anthropic researchers also need GPUs. This might seem obvious as well like obviously running your models needs a GPU and training your model needs a GPU but there is a big issue here. If you have a fixed number of GPUs, how do you split them? How do you make sure there are enough GPUs for both your users that are paying to use your stuff and for the product teams to make the stuff for those users as well as having enough GPUs for the researchers and the people building the models in the first place? It is my understanding that there is a very adversarial relationship internally between the three groups here that matter. You have research, you have product, and you have users. Hopefully you guys know what research does. They're the ones who actually make the models and create them using all of this compute power. Then there's product. This is stuff like cloud.ai I the website cloud code the subscription as well as the CLI and all the things we integrate with that cloud co-work all the products that they offer even to an extent the APIs and SDKs that they sell to enterprises as well those are the people at the company who are building the product for it and then you have the users the people at the end who are paying for these things and then consuming them as users of said product but something's changed over time got a lot more customers in 2024 they made 100 million of revenue in 2025 5, they made a bill and right now it's looking like their revenue for 2026, assuming it stays where it is, not even growing more. It's going to hit 14 bill in revenue. The number of customers spending over $100,000 annually on Claude has grown 7x in the past year. And also, there's the enterprise side. Two years ago, a dozen customers spent over a mill with us on an annual basis. Now, that number exceeds 500. Eight of the Fortune 10 companies are now Claude customers. Do you understand? The growth has been absurd. And that means that the product and user side of this breakdown has been encroaching more and more on the GPU allocation they have. This is obviously going to be very inaccurate, but let's say they have 100 GPUs. They have to decide how these get split across the different groups internally. Maybe it's split 50/50. Maybe half goes to research and half goes to the users. Maybe they split it more aggressively. Maybe they split it 7525. Maybe they go the other way. Maybe they give research way more. The issue here is that one of these sides is elastic in the sense that they can change how much internally people are using the GPUs. Like if research is using too many and product needs more so they can land some deal, they can take GPUs away from research and reallocate them to users. And they allocate on an even more fine grained method than this. Like when the I believe it was when Sonnet 4 dropped, we couldn't get a high enough rate limit in T3 chat for even basic usage. And I had a long back and forth with our salesperson there and they said they could bump our rate limit but they would have to knock down our rate limit for other models because what they were effectively doing was changing how many GPUs were reserved for given models. And they're not the only ones who have to think about things in this way. Even OpenAI just had to shut down a product. It was their Sora app as well as the API for it which is their video model that I roasted hard not long ago. The Sora app used an ungodly amount of compute because videogen is expensive and they also weren't making real money on it. they didn't see real growth or a direction with it. So they decided the best solution would be to sunset it so they could have those GPUs available for other things. Other labs do have these problems too, but I want to focus on the problems that are specific to anthropic. The first one is how horribly I would say they've adjusted going from a research company to a product company. It was not in their DNA. It was not in their blood. It was not what they were meant to do. Even the original cloudi site was contracted out. They went other places to other devs externally to build it because they just wanted to focus on making the models good. As such, research has tended to lead anthropic as a company. What the researchers want tends to be what ends up happening. Which means as product and the users of the product started having more and more demand, research was upset. And when they realized they can't really reduce how much compute users are using without losing money and losing customers, they decided that research would have to give up a little bit and take a slight hit on the amount of GPUs that are being shared internally. This very much upset research and it seems like it's still upsetting research and it seems like right now research really needs the GPUs because they're using the GPUs for training what is allegedly some crazy new god model that keeps leaking all over the internet. I don't know anything about it. I don't have early access. I just know this is a thing that's happening. People keep asking about it. I don't know about mythos beyond the one blog post that kind of leaked. It is clear anthropic is using the [ __ ] out of their GPUs right now. At the same time, they are facing unprecedented growth. So research needs more GPUs because they're making a bigger smarter model. Product needs more GPUs because they're trying to build crazier products. Things like background agents code review that uses a ton of tokens. All those types of things where given product uses more tokens per request than before. And they have user growth. So researchers need more GPUs for more research. Product needs more GPU time per request for the products. And they're making products that require more compute on average because they want to go further and further with the product. And the users are demanding more usage because they are paying customers. Some are on subscriptions, some are paying API prices. They all need more compute. And the result is that these groups are just constantly fighting at each other for a fixed set of GPUs. There's a pretty obvious solution here, right? Buy more GPUs. Like just just get more. If the problem is that, get more GPUs. They just raised $30 billion. That's a lot of GPU money. You can get a lot of graphics cards for that if they existed. And here's the problem. Buying GPUs is hard. Buying a lot of them requires foresight. It requires choosing to buy them really, really early because they can take 18 months to three years to actually end up in your server farms. And Anthropic did not choose to invest heavily in buying more compute. We've been trying to get good data here and since everybody keeps this tight to their chest, it's hard to know for sure, but it does actually seem like as of recently, Anthropic has significantly increased their amount of compute and their number of H100 to slightly overtake where OpenAI is with their data centers. That said, we're still in this window of like these things being made. Allegedly, the new XAI server farm is going to be massive, massive, like absurdly so. like $44 billion was the cost for it, which is more than Anthropic just raised. But the Anthropic Amazon collab for more compute seems like it's going quite well too and will push them above and beyond again as well. That all said, this is not owned by Anthropic. This is owned by Amazon. This is a collaboration between them all. It is currently largely understood by the industry that both Anthropic and Google were relatively slow to jump on the buying compute and buying GPUs thing. And as a result, they are very behind overall. And a lot of this capacity is being shared with Amazon. And since both of these parties are behind, it looks like they're partnering together. Google is now financing a data center project that is leased to Anthropic so that Anthropic can once again have more GPUs because they are extraordinarily behind on GPUs right now. Meanwhile, from what I have heard, it seemed like OpenAI just chose to buy literally all of the compute available at any given time. not like doing the math for how much do they need, how much might they need longterm, just like literally buy everything in front of them whenever they can at all times. And as a result, they are slightly better positioned for these types of surges. But they also have to make sure that there is enough for research and that often involves things like the shuttering of Sora in order to get more compute. But in the end, almost everything happening here is due to the compute crisis. When there's only a certain amount of GPUs available and everyone is fighting for them, the result is that to an extent, nobody is happy and nobody is getting enough for what they're trying to do. This is also why so many people are investing heavily in finding ways to make things more efficient. Like Google's new turboquant thing, their new rethinking of compression for making more efficient AI that uses less memory. This is really cool and it's clearly an effort they're investing in because they want to find ways to reduce how much compute is being used by their models. This is also why our friends over at Anthropic had the problems with the model degradation way back late last year because they were trying to make changes to the way that their GPUs were managed, the way that their models were deployed on them in order to make them more efficient per request and per token. And some of the things they changed just didn't work the way they were supposed to and that resulted in degraded performance. It seems like they're experimenting a lot because they're down to 98% uptime according to them. And according to people who monitor it more actively, it's probably closer to 95. Half of days almost have some type of outage. And some of them are crazy. Like on 26th of February, they had a 6-h hour outage on all of their usage reporting. They had a 4-hour outage where you couldn't log into most stuff on the 27th. Like it's it's kind of crazy how often anthropics infrastructure has problems. And yes, they are actually down to 1 nine of uptime and reliability. Cloud Code's still claiming to and it's probably close to it, but like yeah, not great. Still better than GitHub, sure, but I don't think anybody's defending GitHub's current state. Yeah, GitHub's at 90. So they no longer are at 09, but yeah, not great. So everyone's fighting over GPUs. There aren't enough GPUs. They're trying to allocate things to the best of their ability. And they're also very clearly trying to do experiments. They're trying to figure out what it would take to shift when the compute happens because compute load isn't even throughout the day. If you have 20 customers that often work in parallel in those hours between like 7 and 11 a.m. or so, you'll need 20 GPUs for those customers. But at 4 p.m. when half of them are signed off, you now have twice as many GPUs as you need. So if they're buying GPUs to handle the capacity needs at peak, they end up sitting on a bunch of GPUs that aren't being used when they're off those peak hours. This is a thing a lot of the labs are trying to price into their whole model because the off peak hours are effectively cheaper for them because there isn't contention. When you have one GPU and three people who want it, two of them are users, one's a enterprise user, one's a subscription user, and one's one of your researchers. Picking who gets it is not trivial. Another way of thinking about this that is entirely different, we got to think about how much value the different groups here bring. So instead of research, product, and users, I'm going to change this to research, subscription, and enterprise/appi usage. You break things up this way. Things are very different. Their researchers might not make any money immediately. Like any time that researchers are spending using the GPUs is at cost because that's GPU time that could go to a customer. But when researchers use it, they get zero, not a zilch, until the model's good enough to put out and then they can make money on it. But researchers are at the time of usage making it zero dollars in hopes that in the future they can potentially make billions of dollars. Subscriptions are interesting because it's a basically guaranteed rate. If you're paying $200 a month, whether or not you use the sub, you're still paying 200 bucks. If you use it to do $50 of inference, $0 of inference, or $5,000 of inference, it doesn't really matter. Enthropic makes the same amount of money. And then there are the enterprise cases which effectively scale based on the amount of usage available. There are certain companies that will gladly use literally all of the compute that is available to them through anthropic, but there just isn't enough. So they end up bottlenecked on how much they can do and they're paying the full maxed out rates. So a simpler way of putting it is if you give researchers more GPU, you might get a better model faster. If you give subscription users more GPUs, you might retain them a bit better, but you're not going to make more money directly because they're paying the same amount regardless. If you give enterprise and API users more access, they will spend more money and you'll make more money. So enterprise is immediate financial gain. Subscriptions are a financial gain, but you're just trying to keep them from churning. And researchers are a long-term financial investment. How do you split across all of this is non-trivial. And I think what's happening here to an extent is they're just not. They are leaving some system in place that was already in place and now it is breaking at the seams. Remember earlier when I said that they can just kind of like take GPUs from research, move this line down and let product have more. They can cuz researchers are their employees. They can do as they please. But it kind of goes the other way too. As I mentioned before, this is a business that is led and run and thought about as a research thing. the exec board and the people who started Anthropic are all researchers. The point of the company is research and research first. So if the researchers can have GPUs taken from them, why can't the users? And I think that's what's happening internally is they see this as the exact same as happens to them. Every employee at Enthropic has had a moment at some point where they had their GPU allocation reduced at some meaningful level where they had an amount of compute and then suddenly they have less compute. That is a real thing that's happened to most of these employees. So when the thought of passing that off to the users comes up, I don't think they thought about it too deeply. They thought of it as like, well, product loses GPUs all the time. Research loses GPUs all the time. I guess it's the user's turn. And thankfully, we're only going to take it from that teeny tiny bit. It's just during this particular window where we have a lot of those enterprise customers that we're scared are going to churn if they can't get the usage they expect. So, we need to make sure they have enough GPUs. So, during these hours, we will affect users a little bit. and they probably didn't think too much about it, which is why, and here's where we get into the things they [ __ ] up, they did it before they announced it. And that's where things get really sketchy. A lot of users were reporting issues with the Quad Code subs the day before and the day of this change before it was announced. As Thoric said here, this affects users during weekdays between 5:00 a.m. and 11:00 a.m. Pacific time, which he tweeted at 12:45 p.m. Pacific time. That's not great. There is one other admittedly small but very very annoying thing here. Not everybody's on Twitter. If I could not be on Twitter, I would have made that choice long ago. Sadly, I and many others are trapped there. This post, not only was it two-ish hours late from when the window ended. It's also only on Twitter and it's not even from an official Anthropic account. It's from Thoric, who's an individual employee at Anthropic. He doesn't even run a team. He's just a guy that does dero stuff for them. This is the only public confirmation of this. So, if you're not a Twitter user and you're not following him, but you're paying for Cloud Code and you're paying for the subscription, there is no place that you get this information. It's not in the dashboard. It's not in the CLI. It's not in cloud.ai. This is information that you only get if you're on Twitter. That's a huge optics L for them to take. Thankfully, a bunch of y'all are getting it here. Actually, leave a comment if you didn't know about this change before this video, despite being a heavy Cloud Code user. And definitely let me know if you hit one of these limits and were confused about it because they didn't announce this as a company. Which is funny because on the other hand, when they did the spring break thing where they doubled the limits outside of those hours, there was absolutely a link on the official site as well as a tweet from the official account. And that sucks. That is absurd. And it's not Thoric's fault. If anything, he is the solution here, not the problem. He's the one who made this post. But anthropic not jumping on this, not officially communicating this is relatively shameful. You shouldn't have to be on Twitter and Reddit and YouTube to hear about your subscription suddenly being less valuable. You should be able to see that in the product as you use it. Anyways, the other thing that's not great is that they're not specifying how much faster you'll go through that 5 hour window. They are just saying you will. So, while this messaging is reasonable, and I appreciate Thoric deeply for the transparency in his efforts to deal with this, it is important to recognize the flubs here. The lack of real numbers combined with the very untimely announcement here had horrible optics. These things together just looked and felt awful to most onlookers, especially the people who hit one of these limits, which apparently is 7% of users are going to hit them. That's a significant number of users hitting this before the announcement even went live. Here's a user who mentioned that they had been on the max plan for over six months and they've never had a problem. But on the day that this change went live, Claude Code used 100% of their 5-hour limit in a workflow that usually only would take 10%. It's hard to know for sure how big the hit is here. I have not been able to test it because it hasn't been a weekday since they announced this, so I haven't had the ability to actually thoroughly test it. But if what this user says is true and it is a 10x decrease during that window, that's really really bad. That's like really, really bad. It's hard to not contrast that with what's going on over at OpenAI. Tibo just shared on the same day, funny enough, that they reset codeex usage limits across all plans so that everybody can experiment with the new plug-in system that they had just launched. And because it's been a while since they did a reset, Tibo's notorious for resetting the codec usage limits all of the time. If there's a bug, it gets reset. There's a new feature, it gets reset. There's a new model. It often gets reset and they're currently in a double usage window which I believe ends on, funny enough, April 1st. Nobody's gonna believe it. That's I don't think that was strategic, but came out in a pretty funny way. This got so common people started making memes about it like the codeex rate limit reset. Where they tell you if it's been reset today or not. There are so many of these memes. St. TBO, giver of tokens, resetter of limits. Oh man, it's become a meme. They are notorious for this and they take advantage of that. What's really funny here, and I hate to read into these numbers, they don't mean too too much. This post has 9K likes, 888K views. So, under a million views, 9K likes. Thorics has 7 million views. So, 7x the viewership, 7x the eyes on it, and less likes at 7K. That is the sentiment shift happening in real time. And this is OpenAI getting a lot of support for the way they're doing things, the transparency, and the user-friendly behavior they are doing. Meanwhile, Anthropic is burning their trust with the community at a record pace despite this change, in my opinion, largely making sense. This is a real issue they have. They're not changing this because they are greedy or they want to maximize their money or they want to screw over people who have these subscriptions. They are clearly in a rock in a hard place situation where they subsidized meaningfully too hard. Getting 5K in compute for 200 bucks in sub is just insane. And they could have just stopped doing that. They could have just went back to giving you $200 when you paid $200. But the problem wasn't that at all hours usage was too high. The problem was that in this window enterprise customers weren't getting access because there weren't enough GPUs available. They could have just changed the limits entirely. But I like the fact that they went this direction. It's also pretty clear that their previous spring break thing was their attempt to experiment with the windows and see what usage looked like when they had more usage outside of the 5:00 a.m. into 11:00 a.m. window instead of less usage inside of it. So, they tried this strategy. They got their data. They learned a lot, I'm sure. And the conclusion was it wasn't enough to pull people out of this window. So, they have to force reduce the amount of usage in this window. And to be clear, they are not the only lab doing things like this. Maybe they're doing it way less transparently and in very cringe aggressive ways. But there are other labs like, you know, OpenAI that do this the opposite direction. The default price for 5.4 during normal hours is $2.50 per million tokens in and $15 per million tokens out. But if you go to the batch pricing, it gets chopped massively where it gets cut literally in half. When you do batch processing, it's a $125 per mill in and 750 per mill out. What is batch pricing? for large numbers of API requests that are not time-sensitive. This means that instead of reserving compute and letting you execute ASAP, you're just putting it on a queue and waiting for them to notify you when it's done. That's part of how they are handling the increase in demand, but it's also how they're letting people who are very cost-sensitive maximize the value they're getting per dollar. That said, I would never use the batch processing in a product like Codeex or T3 code. I would use this when I have a bunch of data I want to analyze in the background and I don't care about when it's done. And flex is largely the same, but it's just higher latency. Batch is more when you're grouping a bunch of stuff and want to get pinged when it's done later. Flex is similar, but then there's the opposite, which is priority. It's the making sure you have enough GPUs allocated for the entirety of your request so that you don't get limited between tokens ever. And as a result, it's 2x more expensive. It's five bucks per mill in and 30 per mill out. Now, we're getting close to sonnet prices, but sonnet fast is even like hilariously more expensive. Regardless, you get the idea here. Hopefully, a lot of the difference here is in the availability. And a GPU during a peak hour that is being used by enterprise customers paying a lot of money is much more valuable than a GPU at 1:00 a.m. on a weekend when nobody is using it but degenerates like ourselves. I know you and I are alike here. We love coding at off hours and watching YouTube videos at even more off hours. Leave a comment if you're watching this past midnight. You get the idea. One last thing worth noting is this last message from Thoric. I am taking him at his word here. I cannot fathom him having any reason to not be truthful in this statement. Overall, weekly limits are staying the same. It's just how they're distributed across the week that's changing. Again, that was probably why they did the 2x thing. They wanted to see if they could give users 2x the compute and off hours, would that destroy them or not? And I'm guessing that longterm that's kind of their plan. They're going to give you way higher limits during off hours and slightly to meaningfully lower limits during these peak hours. That said, people kind of sensed this when it was first announced they were doing the discount pricing. Pranet said that Anthropic pulled the oldest trick in SAS pricing. 200 bucks for cloud max limits have been notably worse for the past week, but suddenly they announced that 2x off peak usage for 2 weeks. Sounds generous, but he suspects that they quietly dropped the limits, added the temporary 2x to make the reduced limit feel normal. Then the promo ends and you updated a baseline that was lower than it was before. There are a lot of people who do a lot of analysis of how much value you could get out of a cloud code sub. Like I'm pretty sure cursor just has something running in the background at all hours pushing the limit so they can document it. But yeah, it's I see why people are being so conspiratorial here, especially because previously Thor replied, "No, it's just a bonus 2x. It's not that deep." I'm pretty sure he believed this at the time and that the team was given the goal of looking into what different ways of spreading the compute usage across the week would look like and then faster than expected because again they're growing faster than expected. This had to be changed quickly because they were hoping that they probably thought they would have a month or two to figure this out. It ended up being a week but then they had to make the change. They quietly made the change. Everything [ __ ] exploded and now for reasons that I think are reasonable people are very upset. So to summarize, the compute problem is real. Anthropic isn't doing this for the sake of it. They're not doing this because they hate their users. They're doing this because they're out of compute and they thought the users could spare some in order to keep research and other products and enterprise customers above the line during specific hours. I honestly like the way they approached this, the idea that this window is where there's a problem. So instead of reducing how much you get overall, we're going to just limit it during this window. I am also biased here because most of my coding time is well outside of that window. So, this doesn't really affect me and the work I do directly, but that doesn't mean it doesn't suck for a lot of users. Apparently, 7% of all users will be hitting limits they didn't before, which is still just insane when you remember how many users they have. So, I do like the idea of how they wanted to do this change and how they wanted to roll this out. But, as per usual, Anthropic has some anthropic special problems. The first one is that they didn't buy enough compute and now we as users are paying the cost of that. They also historically haven't made models as efficiently as some of the other labs do, just the amount of tokens being used and the amount of compute it takes. While we can't know what the difference is for how many watts of power are used by the different models, it is generally agreed upon that Opus is a larger model than any of the recent OpenAI releases. Although 5.4 is clearly bigger than 5.2 too was hence the slow increase in the price on the OpenAI side. It is still at least as far as we know not as big as the Opus models and certainly not as big as this Mythos model that they are allegedly dropping in the near future. And then the biggest problem by far anthropic doesn't understand humans. They do not know how to communicate with users because there is no leadership or culture around it. There are so little comms about these types of things that a random devril who's trying his goddamn hardest, Thoric, who's only been there for about a year, maybe not even if I recall, who isn't even big enough to have a goddamn badge on Twitter. He is now the person in charge of doing all of this communication largely because nobody else stepped up to do it. There are like 20 different people at OpenAI that come out and participate whenever anything changes, good, bad, or other. Whenever anything's going on at Anthropic, it's poor Thoric's problem. And I have so much empathy for him for being the one person who got hired into the role of dealing with this and still being in the trenches doing his best to deal with it. Especially because I know the lawyers are policing half the things he wants to say. As Joel just said, this type of [ __ ] is well above Thor's pay grade. And I couldn't agree more. His job isn't to lead all of communications and product and issues at Anthropic. He has no say over this decision. If you think harassing him about this rate limit is going to in any way, shape, or form change the likelihood that Anthropic does these things, you couldn't be more wrong. To an extent, it's Thor's job to advocate for us internally. Going after him is biting the hand that's raised for us. He doesn't have employees. He doesn't run a team. He's not an exec. He's not some manager. He's a Devril guy who's stuck trying to defend them on Twitter and try to make things clearer. And I am positive for every one of his posts that goes out, there are at least five that couldn't because some lawyer internally wouldn't let him do it. So do not give this man any [ __ ] He is not the problem. If anything, he is the closest thing we have to a solution. But the problem is much bigger and much better paid than him. It exists at a cultural level. It exists deep in this split between research, product, and users. And going after the user advocate, the person who communicates to us on behalf of anthropic because they don't have enough GPUs for their researchers. a fundamental misunderstanding of the hierarchy internally. And I get that people are mad. You should be. Things just changed in a way that was poorly communicated, blatantly communicated, and hard to understand with not enough transparency. All of that is true. None of that is his fault. And if there's any other labs who want a really good communication person that they want to, you know, unshackle a little bit and let just be real about the thing, you might want to poach him from there. I don't know how willing he is to leave, but it sucks seeing somebody trying as hard as I know he is at a company that it's so actively fighting against his ability to do the thing. Yeah, just don't go after the individuals who don't have control of the situation. I got nothing else on this. I can't believe I just spent this much time defending Anthropic and came to the same conclusion I always do. They don't understand people. They don't understand developers. They don't know how to communicate and they are not transparent enough. This has always been the case with this company. Those are the problems I perceive here. Not the change itself because they are low on compute. It sucks that they put themselves there, but they are there. I get why they did this. I do actually think they're trying to be prouser with this change. But as always, Anthropic's inability to understand the world outside of their office is burning them actively. And maybe, just maybe, they'll start listening to people who aren't on the payroll of Anthropic. And maybe as a result, they'll stop doing this type of cringe [ __ ] Fingers crossed that is the case. But I I have no confidence. This company doesn't seem to care about how people feel about them. They just do the wrong thing confidently and get away with it. So, we'll see where things go. Until next time, I'm sticking with Codeex.

Get daily recaps from
Theo - t3․gg

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.