Prime is (mostly) right about AI
Chapters8
Host sets up a nuanced take on Primagen's AI economy argument and promises to unpack the real economics beyond slogans.
Prime’s take on AI economics is nuanced: compute limits, not just prices, are reshaping how big players subsidize and deploy AI today.
Summary
Theo from t3.gg dives into Prime’s video on the AI economy, agreeing that the economics are shifting but arguing the lens matters. He emphasizes that subsidized compute and capacity constraints drive behavior more than sticker prices alone. Theo walks through Anthropic’s pricing experiments, Cursor’s subsidy challenges, and Microsoft and Google’s strategies, anchoring the discussion in real-world numbers and timelines. He clarifies why Claude Code pricing changes, how peak/off-peak dynamics affect compute, and why enterprise demand drives policy more than consumer plans. The breakdown covers pre-training vs post-training costs, RLHF and RLVR fine-tuning, and why Opus 45/46/55 family dynamics matter for pricing and performance. Theo highlights that compute scarcity, not just revenue, is the real bottleneck and that Google’s huge compute subsidies created a different trajectory than Microsoft’s and Anthropic. He concludes that while subsidies aren’t dead, the era of unlimited free compute is ending and pragmatic costs will shape adoption, with Uber and other firms illustrating the scale of enterprise compute use. As always, he encourages watching Prime’s original breakdown to see where nuance was missed and emphasizes the importance of understanding where compute is actually limited.
Key Takeaways
- subsidized compute is the core driver of AI price dynamics; when labs subsidize heavily, end users receive more “free” compute, but capacity constraints soon curb this upside.
- model training economics separate pre-training vs post-training costs; Opus 45/46 vs 55 illustrate how new pre-training can be expensive but post-training iterations become cheaper.
- enterprise demand and compute capacity, not consumer pricing, mostly determine who gets access to GPUs; GitHub Copilot pauses and CoPilot pricing changes reflect capacity triage, not pure profit seeking.
- Google’s strategy relies on massive compute subsidies and free AI features, but rapid restrictions followed over-subscription show compute limits more than revenue motives.
- the real story is RAM/compute cost growth, not subscription changes; the economics are shifting toward compute availability as the bottleneck.
Who Is This For?
Essential viewing for developers and tech executives trying to understand why AI tools become more restricted or expensive over time, and how compute capacity drives pricing and availability in big AI platforms.
Notable Quotes
"The cracks are starting to show. The foundations looking a little bit shaky. I'm talking about the AI economy, the token economy."
—Prime frames the broader concern about AI pricing and subsidies as the starting point for Theo’s breakdown.
"Pricing the number of messages makes no sense at all."
—Theo highlights Prime’s criticism of how Copilot and similar services price usage, focusing on compute cost rather than just message counts.
"The real story isn't that consumers are getting priced out; it's that capacity is the bottleneck for enterprise compute."
—Core takeaway about compute limits driving access and pricing decisions across big labs.
"Google is subsidizing harder than anybody else and then had to pull back faster because they ran into capacity."
—Explains Google’s aggressive subsidization and subsequent tightening of access.
"RAM prices went up. The cost problem here isn’t about subscriptions—it’s about how expensive the compute has gotten."
—Theo stresses that hardware and RAM/compute costs shape pricing more than consumer plan SKUs.
Questions This Video Answers
- Why are AI compute capacities becoming the bottleneck for enterprise AI usage?
- How do pre-training and post-training costs affect the pricing of new AI models like Opus 45/46/55?
- What caused Microsoft Copilot to pause signups and adjust pricing recently?
- How does Google’s subsidization compare with Microsoft and Anthropic in the AI compute race?
- What is RLHF/RLVR and why do they matter for AI model performance and cost?
AI economics compute subsidies Anthropic Claude Code Microsoft Copilot Google AI compute pre-training vs post-training RLHF/RLVR Opus/Codeex model lines enterprise AI pricing GitHub Copilot pricing
Full Transcript
So, I'm sure many of you have noticed I kind of like AI for development work. I think it's pretty solid overall. This is a big change from how I used to feel about it. And over time, I find myself reaching for AI in my development work more and more. But not every developer feels the same way. One in particular is an old friend of mine, Primagen, and he just put up this video about how the AI economy is about to change. This isn't going to be a usual reaction video cuz I actually already watched it.
But I want to break down some of his takes cuz there's so much truth to them that I feel like that the nuance of it being missed. But on the other hand, there are some things that I I feel like aren't well understood around the economics of all of this. In particular, once Microsoft and Google get mentioned, I feel like the the details get a bit lost. I want to try to find a middle ground that makes sense here, cuz Prime already is kind of in the middle between the haters and the full AI psychosis folks.
I also like to think I'm somewhere in the middle there. I know people think I'm much more on the psychosis side, and I can see why. It's a great video, and I want to do my best to to find the nuance within it. So, we're going to dive right into this video and talk all about Prime's takes right after a quick word from today's sponsor. You've probably already heard me talk about today's sponsor. It's Blacksmith. Wait, they ship MacRunners. Okay, I am acting here, but I was that excited, if not more so, when I heard about this because one of the slowest parts of the T3 code build process right now is the Macr runners.
In order to build a Mac app, you need to use a Mac. And since we were building our app for Mac, we kind of got stuck on GitHub's Macr runners, which are really, really bad. Our whole build process used to take 16 minutes. And a lot of this was time spent on the x64 Intel builds for the Mac version, which would take eight plus minutes. They now take 3 minutes and 34 seconds. And the ARM 64 builds, you know, the Apple silicon ones, would take 6 minutes and 11 seconds on GitHub. And now they take under 4 minutes, 332.
This made us go from annoyed about releases to kind of excited to do them. And if you're curious why we now ship a nightly with T3 code, it's because of how much cheaper and faster our build process has become. I'm going to be so real with you guys. If you're building a Mac app on GitHub and you're not using Blacksmith yet, you're wasting so much time. And if you're not building Mac apps, but you're still using GitHub actions, you're still wasting time because Blacksmith is 2x faster in less than half the price. Stop wasting time and money and build faster at soyv.link/blacksmith.
Let's see what Prime has to say. The cracks are starting to show. The foundations looking a little bit shaky. I'm talking talking about the AI economy, the token economy. Uh, the first one comes courtesy of course of Anthropic, everybody's favorite, the good guys. Am I right? I love their lawyers. What ended up happening just a couple days ago is Anthropic did something called a painted door test. At least that's the term I've always heard. I think I've seen other people use fake door test where you actually show alternative pricing on your page to see, hey, how much more money could we make if we charged people more?
How many more people would leave the pricing page? So, let me give you a hypothetical. So, he's talking about here is a change that happened a few weeks ago where Enthropic wanted to figure out if they could move Claude code out of the $20 tier so that you had to be on the hundred or $200 tier to get Cloud Code usage. The reason they wanted to make this change is because Claude Code is expensive to run and they wanted to try and save compute. There are pieces he's missing here, though. The first thing he misses is that this is not the first time the economy started to break down here.
I would argue the first time actually goes all the way back to July of last year. Cursor got hit hard in this case. I'm bringing up cursor because previously they build by a number of messages. So you would pay an amount of money and then get a certain number of messages a month. That sucks because some messages cost a few cents to run and other ones cost many dollars to run. Fun fact, GitHub at this point in time still prices based on number of messages. They are also coming around and changing this at the start of next month.
I've actually had a terminal running trying to solve a cryptography challenge and it has been going for almost an hour and a half on one message. You get 1,500 messages a month on the $40 plan. Entirely unrealistic. And we'll come back to this, don't worry. But this was, I would argue, where the start of the end was. When Cursor, who doesn't have the ability to subsidize because they have to pay the labs for the inference, like they have to pay OpenAI and Anthropic cash for each request, they couldn't eat the subsidization because some users would do $10 of inference with their 200 messages.
Others would do thousands of dollars of inference with their 200 messages. And Cursor couldn't eat that. So they made changes where your pricing is now more reflective of what it cost to run your requests, not how many requests you did. So this was the first start of the end here. The next really big one was when Enthropic started changing when you could use the inference. This might not seem like it fits cuz it seems like more inference, but hear me out because this is a really big change when Enthropic made it in March of this year.
A small thank you to everyone using Claude. We're doubling usage outside of peak hours for the next 2 weeks. This was the first experiment that Enthropic ran in order to figure out how to price things based on the time of day. They have a fixed number of GPUs available, a fixed amount of compute that can run requests. So when there's a bunch of users during the workday that are trying to get a ton of usage out, that means that people who are running things 24/7, it's more expensive for them during that window effectively. Because if you have 100 GPUs and during the workday 95 are taken up, the last five are going to be seen very differently than off peak hours when you have 50 available.
So, Enthropic was trying to steer the power users towards off hours because they want that compute available during the workday. This did not go as well as they hoped. Just a week and a half later, Thoric announced publicly that in order to manage the growing demand for Claude, they would adjust their 5-hour session limits for free pro and max plans during peak hours. Your weekly limits remain unchanged. So, between 5:00 a.m. and 11:00 a.m. PT or 1 to 7 GMT, you'll move through your 5-hour session limits faster than before. This is the key piece I wanted to highlight.
The previous thing was them seeing if they could push us to using off peak hours more. It wasn't enough. So they had to do the opposite and reduce the amount of usage we get during specific hours. To go back to cursor for a second because again cursor can't subsidize the same way the other labs can. They don't own this nearly infinite amount of compute and GPUs. Cursor is the customer of anthropic and OpenAI the same way we are. So they're much more sensitive to these things. That's why they had to stop subsidizing early. And this is also why they do a lot of internal auditing figuring out how much compute you get per dollar on these other platforms.
So internally they measure how much you can get out of a $200 cloud code sub. Previously you could do up to $2,000 in compute. Now they're seeing even crazier numbers where you can get up to $5,000 of inference in that $200 a month plan because Anthropic is subsidizing so aggressively. So I just wanted to give a bit more history because Prime started with a recent drama which was them trying to move Claude Code out of the $20 plan. I see that less as trying to push people up to the $100 plan and more as empropic trying to get all of these like low tier users to stop using claude code.
Their goal is to claw back compute, not just to increase profit here. And I really want to highlight that because Anthropic's growth is crazy in enterprise, which is where they make most of their revenue. They don't care that much about people going from the $20 plan to the $100 plan. what they care about is much more specifically that they are able to get their compute available so that they can use it for all the other enterprise customers. So just want to make sure I jump on that because the enterprise revenue is what Anthropic cares about.
The subscription revenue is mostly just a marketing thing and the reason they do this crazy subsidization isn't because they think they can make the money back or they're just trying to crush and win. It's mostly a marketing play. The issue is that they're trading compute for marketing effectively where they own the GPUs. giving it away for these prices helps them be seen better and be seen more often. But now that they're low on compute, it's harder for them to get away with that. But this is much more about the compute than just spiking revenue. Skipping a bit because I want to get to the next core point, which is how the costs actually work out as a user and as a business for a company that is doing inference.
Well, they're making money off of inference. Like every time they do like a request call, of course they make money. And it's like, yeah, I guess if you measured how much a shoe cost by only paying the employee to sell the shoe, then you're like, yeah, you're making money off of every shoe. They're making money off of every request. Yeah, that makes sense, but you're not actually considering the real cost. Remember, people that used Opus 45 are now using Opus 46. Nobody's using Opus 47. A kind of tarted model. No, no one really likes that one.
But that means all the cost that went into 45 if they didn't recoup that in inference cost 45 just cost them money. And this is obviously what's happening. Open AI got a 100 talk about the open AI stuff in a sec. I want to talk about the examples here with anthropic because for the most part he is correct. If you're looking at the cost in simple numbers like you spent $15 in API cost that cost Anthropic $1.50 in electricity. They did make money on that, but there's a couple layers where that breaks down. The first is the subscription model where some users are using $5,000 of inference for 200 bucks.
That's a hard enough subsidization that they're probably losing money on it even just from the electricity costs. Not counting the cost for the GPUs, not counting the cost to train or the employees or any of the other things just counting electricity. There are users who are costing anthropic money purely on electricity due to the sheer level of subsidization they are doing. Estimates for how much the compute costs relative to the API requests are as low as 15 to 20%. So the raw cost that Anthropic or OpenAI eats is likely 15 to 20% of what you spend on the API requests.
But again, when you're subsidizing 20x plus, that washes out and these companies are spending money effectively to give you that level of subsidization. And if those numbers for the cost and electricity sound crazy, I have a 5090 doing some fun training tasks on the side. And it alone raised my electric bill by about $1,000. Like my electric bill is around $1,800 a month across my AC, my computers, and everything here in San Francisco. It's insane. Electricity is expensive, and it's cheaper other places, but it's like half the price maybe. Running a high-end GPU at a full,000 watts 24/7 in even the like cheapest places in the US, you're talking$500 plus dollars of electricity a month.
Like it's crazy how expensive that gets. So there's other parts here I want to break down. Specifically the amortization of the price over time where you have to make the amount you spent on the model. There are two pieces here I want to talk about. I'm going to break this down to pre-training and post-training because what Prime is referring to here is largely the case in the pre-training part. There's a really good Daario interview where he talks about this. There's kind of like two different ways you could describe what's happening in the model business right now.
So let's say in 2023 you train a model that costs $und00 million um and then you deploy it in 2024 um and it makes $200 million of revenue. Meanwhile, because of the scaling laws in 2024, you also train a model that costs a billion um and then in 2025 you get $2 billion of revenue from that um $1 billion and mean and you spend $10 billion to uh train the model. So if you if you look in a conventional way at the profit and loss of the company, um you know, you've you've lost $100 million the first year, you've lost $800 million the second year, and you've lost $8 billion in the in the third year.
So it looks like it's getting worse and worse. Um if you consider each model to be a company, um you know, the the model that was trained in in in in 2023, um was was profitable. You you paid a hundred million and then it made 200 million of revenue. there's some um you know cost to inference with the model. Um but you know let's just assume in this cartoonish cartoon example that even if you add those two up you're you know you're kind of in a good state. So if every model was a company the the model is actually you know in in this in this example is actually profitable.
So this is roughly what prime was talking about there. If you look at each model and how much it costs and then look at how much money it makes, it comes out as profitable. Usually, if hypothetically speaking, Sonnet 35 cost 100 mil and they made back 200 mil in revenue, that model was profitable. Even if they spent 400 mil to train the next one, the company might have lost money, but the model itself, the money in to the money out worked out profitably, but they have to keep scaling up the cost of making the new model, which only works if their revenue scales alongside it.
The thing prime said that I don't agree with because it doesn't fit this model correctly is that Opus 47 and Opus 46 even to an extent lost money because they weren't as big of a jump as Opus 45 was and 45 and to an extent 46 are the ones people are using. It's possible that they're losing money on these model drops that get replaced quicker and they move on from faster or that don't get as much adoption. That is not the case. The reason why that's not the case is the different types of training that are done.
I'm sure you guys have noticed by now that sometimes a new model drop just feels like the last one but slightly smarter, slightly faster, slightly more capable in specific domains. And then other times it feels entirely different. Like going from GPD 52 to 53 codeex was like an oh, it talks to me more but it's roughly as smart. And then 53 codeex to 54 was like oh yeah, it overreaches a bit but they felt similar. 55 feels entirely different to the point where it probably should have been given a different name. Either something separate from the GPT line or GPT6 honestly because the model is so fundamentally different.
The reason it's so different is because of the pre-training. There's two main types of training with models. You have the pre-training and the post-training. The pre-training is how you take all of the information, all the data you've collected and bake it into this pile of parameters that is a model. then you want it to behave certain ways. So the pre-training is how you get the knowledge in effectively and the post-training is how you get the behavior out of it. This is a gross gross simplification of all of the stuff here. If you're into it, I highly recommend looking into it more.
It's really cool stuff. I'm just trying to get the point across here that pre-training is the big expensive thing that costs hundreds of billions if not billions of dollars of compute to try and take these terabytes of data and compress them into this particular shape that allows you to generate text programmatically off it as an LLM. The post-training is how you refine the way it behaves. And postraining has been way more rewarding than people expected, especially RLVR and RHF, which are reinforcement learning strategies. Turns out those are really good at making the model good at things like long tool call runs, long agent runs, doing code, and all those types of tasks.
You can RL models into getting way better at that. We know this because models like Composer 2 by Cursor aren't actually a new model. They are fine-tuned versions of older models like Kimmy K25. Kimmy K25 from the moonshot guys over in China's at open weight model. And since the weights are out there, you can RL it and you can fine-tune it externally yourself. This isn't the thing that like anyone can do if the model weights are open. It is also generally speaking often much less expensive than the pre-training is. So Opus 45 was almost certainly new pre-training.
It was a fundamental change in how Opus works and behaves. That's why it got three times cheaper than previous Opus models because they made it smaller than it was before probably and ended up making it way smarter than they expected to. And it is likely from that point on that focus 4 6 and 7 were post training which was less expensive. You can usually see this just by the cadence of release cycle. But there are exceptions to this too. Like some of my research friends in chat are pointing out that gro 4 spent more time in compute on RL they did on pre-training.
Even something like composer 2 from cursor allegedly spent 4x more on the compute for post-raining than Kimmy did on the pre-training in the first place. So, there are exceptions here, but generally speaking, especially with the big labs, once they do a new pre-training, the future iterations aren't as expensive. They'll often have five or six of them internally, and they take the one they like the most and put that one out. The individual updates to the model once they have that pre-training are not these billion dollar investments like the new models were before. So, 55 probably cost hundreds of millions, if not billions of dollars.
56 probably won't be that big of a cost in comparison. Opus 45 probably cost near a billion dollars. Opus 46 probably a lot less. Just wanted to make that point because it's important to understand. Let's see what Prime has to say about OpenAI here. Open AI got 122 or $120 billion investment and that's enough money to run for 18 to 24 months. That is like 5 to7 billion every single month in the hole for the next 18 to 24 months. They had to do this test because they have to know how much money they can make because if they don't make some sort of change, they're going to continue to lose billions of dollars.
Now, if this just happened by itself, I'd say, "Hey, Claude just needs to make more money. They need to start becoming more competitive cuz they're competing against OpenAI." And then I'd be like, "Okay, that's that's just that." But this isn't the only case of this happening because just a couple days ago, guess who else decided to make a bit of a price change? Microsoft. Oh, beautiful Microsoft. I love Satia. I love co-pilot. Isn't co-pilot is this? Is this the greatest? You're probably confused when I say co-pilot because you're probably thinking of like one of their 50 services.
I I'm talking about the GitHub, the GitHub co-pilot in this case. See, the GitHub co-pilot, you used to pay some sort of amount of money and then you could perform actions on behalf. You had some sort of amount of actions you could execute. Well, what's the problem with that? Not every model cost the same. That's not the problem with this. Yes, different models cost different amounts, but you can make certain models two or three times as many messages per message. The problem, and this is actually the point that triggered me enough to make my response video, is not that some models are expensive and some are cheap.
It's that pricing the number of messages makes no sense at all. Imagine if Walmart priced by the amount of items you had in your cart. Or if they priced by how many people walked in, they charge you when you walk in, you go take whatever you want and leave. Some people go in and take two pieces of candy. Some people go in and grab five TVs. If you're pricing by the number of items, that doesn't work. And the worst part is that Copilot already is doing this. If you go into Copilot and you pick a different model, it'll tell you what the multiplier rate is for different models.
For example, here 54 is 1x, but 55 is 7.5x. That's the number of messages it uses. So on the $40 plan, which is what I'm on with Copilot, I believe I'm being comped because of open source. So, thank you to GitHub for that. They give you 1,500 messages a month. The problem is that some messages cost pennies to respond to. Whether it is Opus or 54 or whatever, doesn't matter. What matters is what is it doing and how long does it take? And remember earlier I said I was running this request. This request is nearing 2 hours.
I sent one message and it has been going for 2 hours. This one message is going to use upwards of $100 of compute by itself. This makes no sense. You cannot charge based on the number of items in somebody's cart. You charge based on how expensive the things are. The problem with Copilot and the reason they're making this change now isn't because the subsidization is over. It's because Microsoft is just a slower moving company and it took them longer to fix this. That is all that matters here. And the 7.5x part actually emphasizes the point I want to make, which is that these numbers and these costs are not based on what the API prices are or what the labs are charging.
These prices are based on how much compute they have available to run this model and how expensive it is for them to give that compute up to the subsidized co-pilot users. Historically, C-pilot users were light enough users that they could just eat the cost. If some of those $40 a month users cost them 800 bucks, whatever, it's Microsoft. They make money. They can afford that. The problem is there's enough of them now. And it's not just money that they are spending. It's compute they are losing. And they are compute constraint. They are fighting to reserve as much compute as they can on Azure for all of the other deals for all the other companies they're doing.
They need that compute. It's not just like you're costing them too much money when you use co-pilot. is that they need the GPUs that Copilot is using available. The bigger deal about their announcement isn't that they change the pricing. The bigger deal is that they paused signups entirely. You don't pause signups because you want to make more money. You pause signups because you don't have capacity. And this is the really big point I want to drive home. The problem isn't that there isn't enough money to spend. The problem isn't that they are spending too much on these users of these $20 plans and they want to get you to spend more.
This is not the traditional up to the right problem that it is being made out to be. This isn't these companies trying to squeeze more pennies out of you as the user. They don't care about you. All they care about is their enterprise customers that they make actual money off of. And they don't have enough Nvidia graphics cards in their [ __ ] server farms to afford those customers and to sell them what makes them actual money. Do you understand how crazy it is that GitHub and Microsoft had to pause signups? That is not revenue clawing. That is not a desperate attempt to make more money.
That is trying to reserve compute so you stop losing enterprise customers. That is the real story here and I want to make sure that is emphasized. You're thinking about composer too fast from cursor. That thing costs like nothing. When you're thinking about Opus 47, that costs a lot more. We're talking like 20 times more the cost. So obviously co-pilot they had to make some sort of reduction and that's what they did. you no longer just get some amount of executes. I hate to skip over this part, but it's just entirely wrong. Like I again, the core points Prime's trying to make here are right.
He's just not in the weeds enough to know these details. And the usage base change isn't they're trying to claw money. It's they're trying to make sure that users of a $200 a month thing or a $40 a month thing or $20 a month thing aren't using so much compute that they don't have any available for their customers doing enterprise [ __ ] As my research friends in chat are pointing out, capacity is going to be a huge problem. I am very scared. Chips can't be made fast enough. That is the problem here. The next section is where Prime starts comparing Microsoft to the other companies in the space and he brings up one that's very interesting that is also very different.
See the difference between Microsoft and Anthropic is Microsoft makes money. I know crazy concept for anthropic, but Microsoft makes a lot of money. whether you like it or not, they're one of the biggest companies in the world. And therefore, they can actually kind of take like a bit of a nose dive for a while accumulating users. But even at this point, they're going, "Hey, we can't we can't do this. We got to make more. This this is this is silly." This is the same mistake as earlier. I also would argue this is a misunderstanding of the anthropic change before, too.
They didn't change the potential availability of claude code on the $20 a month plan to push people up. What they would do is heavily restrict how much cloud code you get on the $20 a month plan so you're incentivized to upgrade. They don't care about those subscriptions. All they want is more enterprise customers. That is what the chain should be seen as. The removal of cloud code on the $20 a month cloud plans isn't them pricing you up. It is them trying to slow down the fire hose of compute being wasted as more and more users are signing up every day.
It's the exact same thing as the C-pilot pause. Think of those the same way. The $20 month change for Claude Code and the pausing of signups for Copilot are the same. This isn't because Microsoft was burning too much money and they're clawing it back and trying to make more now. This is because they both have the same problem. They're both out of compute because neither Microsoft nor Anthropic are Nvidia. They don't have access to unlimited compute. Even Nvidia doesn't really. They can just manufacture it and they're limited by the companies they rely on to actually fab the silicon and [ __ ] There is not enough compute in the world right now.
That is the problem. We are working with a limited resource and the limited resource isn't money. And if we keep trying to make it about money and not the amount of compute available, you're going to start going down conspiracy theory paths that just aren't true. These behaviors do represent the end of something, but this isn't the end of the subsidy economy. This is the start of compute restrictions really affecting the way companies think. That is the key difference here. Kic just linked me this tweet from two years ago from Sam. It's actually almost three years now.
We are pausing new chat GBT plus signups for a bit. Brown frowny face. The surge in usage postdev day has exceeded our capacity and we want to make sure everyone has a great experience. You still sign up to be notified within the app when subs reopen. Again, OpenAI was compute limited so they bought literally all of the compute they possibly could. It started coming in last year and this year and now they don't have these problems. Not because they have more money than the other companies or they're more willing to subsidize than the other companies.
They just have more compute. This all comes down to compute in the end. All of this, even Opus 47 being dumber is arguably because of compute. What's going on here? This doesn't even make sense according to Microsoft. Now, the real winner honestly from all of this is Google. Classic Google. They are pouring like a hundred plus billion dollars a year into AI and they can just do that. And guess what? After they pour a hundred billion dollars, $200 billion into AI, they still make money. That's wild. Like they can do so much money for the next year after year and they don't have to worry about will investors still find me attractive.
That's probably why you're not getting the same level of hype coming from Google that you get from these companies. I just think this whole part is just entirely wrong. That's not how it's working for Google. Google is providing more free compute than anyone right now by far. By [ __ ] far. I'll give an example. I'm going to Google search for the primogen and it just shows me his links. But watch what happens if I ask who is the primogen. Now we get an AI overview. This is free compute. Watch what happens when I do it on a signed out account.
Oh, it still did it. It still did it. These AI overviews are free compute. How did he get popular? Now I am doing literal free compute on google.com. Signed out incog browser doing this for free. The people saying, "Oh, isn't it cashed? Why am I able to do follow-ups? This is a full built-in chat interface as part of Google. They do a shitload of free compute." And I have been told by so many people that they're using anti-gravity because of how aggressive the subsidizations were. In fact, the Google models are so bad that they included Opus 45 in your Google subscription that you could use in anti-gravity.
And I knew a number of people that were using anti-gravity just for the subsidized opus usage within it. And that ended up exploding so hard that Google was the first company to start doing the restrictions for where you can use the inference where like people who were even just building plugins to track their anti-gravity usage started getting banned. People who were doing plugins to link up anti-gravity with open code and [ __ ] all started getting banned as well. They've been aggressive with that because they were too generous initially and have had to walk it back since.
The reason you don't think of Google as subsidizing is because they did it so aggressively initially they had to clot back first. And Google models are so [ __ ] trash that nobody talks about it. So this is just an entire like blind spot for many people in the developer world. Google is doing more free compute than anybody else. They were subsidizing harder than anybody else. They had to clot back faster than anybody else and more aggressively than anybody else. Google's arguably the most extreme example of all of this and they make their own compute because they make their TPUs and they are really behind on that to the point where there are rumors that they use CPUs both for training and for inference because that's what they have around.
Google is as computed if not more so than the other companies and has acted accordingly. I don't blame anyone for not being in the weeds on this [ __ ] because you got to pay a lot of attention to notice these things happening but this is just a big miss. Google is as good of an example as you can get of the things I'm talking about here. The subsidized compute era isn't over, but it is closing because the amount of compute available is less and less. The real story isn't that your subscriptions are being limited or the $20 a month tiers are getting more restricted.
The real story is that RAM prices went up. That is what this really should be focused on. The cost problem here isn't one related to what you're getting for 20 bucks. It's one related to how expensive the compute has gotten. Think about that for a second. Google's also competing on the frontier. Google's also attempting to win the market. Google's also trying to convince everybody that their AI is the best AI, that they're going to be able to shepherd it and take it into the future despite the fact of inventing the T and GPT and somehow fumbling the bag and not being the first one to market.
Nobody knows the answer to that one. And AI is really expensive. Uber just got done claiming that within 4 months they spent their entire year's budget on AI. Gee, I can't believe this is happening. How could you tell every employee to maximally use AI? By the way, we're judging you on AI usage. Oh my gosh, you're using too much AI. How'd that happen? What the hell? You're not supposed to be using a year budget in 4 months. What are you even doing? What are people doing with all those tokens? And here is where the point I've been making the whole time comes in.
Uber is not using the $200 a month subscriptions on Cloud Code. They're certainly not using the $20 a month ones. Uber is using inference through the API. So they are paying directly to Anthropic and OpenAI. The API costs are what are being talked about here. They are paying full rate. My usage of these tools is using the personal subscriptions which are way more generous. I am probably doing similar amounts of inference to the average employee at Uber. The difference is I pay 200 bucks a month for it. They pay 2,000 bucks for it. That's the key part here.
I have heard crazy numbers from other companies that I'm hopeful I can share in the near future from inside sources in many different places. I know companies where there are engineers spending more on compute and spending more on inference than they are paid in salaries like pretty often because the companies have to be under different terms because a lot of these subscriptions explicitly say you can't use these when we should be negotiating a enterprise contract. By the way, that one message I sent earlier is still going. We're now over 2 hours of this message going.
And if you're wondering why you can't use the sub for your enterprise job, I just asked outright. Here you go. The consumer terms exclude commercial offerings by design. They open with a notice clarifying the commercial terms govern API keys. The console and other offerings that reference them explicitly cover that does not include Claudia or Cloud Pro for individuals. Cloud Code's own legal compliance page confirms the split. Commercial terms apply to teams. Consumer terms apply to free. The ordinary individual usage assumption. The Cloud Code legal page states that advertised usage limits for Pro and Max plans assume ordinary individual usage of the cloud code in the agency.
This is a soft cap on what counts as legitimate use. There's also a whole section about business domains and non-commercial evaluation. Sign up with an email owned by your employer. Your account may be linked to your org's enterprise yada yada yada. Where Enthropic permits limited evaluation. The evaluation use is restricted to personal non-commercial use only. They also prevent you from using it to make competing products which basically everyone is at this point like everyone is. And the biggest deal is that when you're on the subscription tiers, your data can be used for training even after opt out.
Flagged content and feedback can still be used. There is no ZDR. There is no proper isolation of your data. And most businesses will never let you get away with that. BAAS only extend to cloud code if the customer has executed a BAA and has zero data retention enabled. Both of which require an enterprise contract. Max plans cannot be HIPPA covered. They want you on enterprise plans. If you're doing enterprise work, just be realistic here. So again, if you're at an enterprise, you're not paying the 200 bucks a month. You're paying for your tokens and your tokens are much more expensive.
Just want to make sure you understand that cuz a lot of people don't. I don't even know. Like I don't think we're going to be like, "Well, back it up everybody. We're not going to use AI anymore. You're going to have to go back to hand coding everything because AI is just not economically viable." No, no, no, no, no, no, no, no, no, no. They will find a way. Sure, it may be in a couple years, but you know what? They're going to find a way to make this thing viable, but for now, we're starting to see the cracks.
Things just can't be as free as they once were, and the amount of usage you're going to be getting is clearly and obviously going down. I don't want to be the person that's like, "Oh, Mr. anti AI for all reasons. Like that's why I make this video to show you how ridiculous their marketing is. This is why they do such hyped out marketing. This is why Daario is constantly telling you, "Hey, you're out of a job here very very soon. You you you Yeah, we're going to take your job." I feel super bad about it.
Oh, I I I want to talk a bit about the point he just made before about the like amount of compute we get going down. I really don't see that. If you measure it purely by dollars, sure. If you measure it by tokens to the best frontier models maybe, but the cost of intelligence is consistently going down. A good example of this is the artificial analysis intelligence index. GBT55 is a more expensive model than prior. They 2x the token costs, but they also made it more efficient. And if you look at the highest end X high versions of the highest end models, 55 is smarter than 54.
55 got 60 points and 54 got 57. It is smarter. Where this gets much more interesting is when you turn on 5.5 medium. When you put 5.5 medium in the graphs, you'll notice 5.5 medium performed roughly as well as 5.4x high did. Same score there. But what's much more interesting is the actual cost to run. This isn't the cost per token. People get too crazy about those numbers. To do the grocery store example again, if somebody was buying grapes and you wanted to charge them, you could charge them by the number of grapes or by the weight of the grapes.
And we are looking at the cost per grape effectively when we look at the token counts, but the actual cost is not just how expensive it is per token. It's how many tokens were needed to solve your problem. And that shows very interesting information here where GPT 54 and 55x high. Despite, remember 55 is 2x more expensive per token, it used fewer enough tokens that it only cost like 20% or so more from 2,850 bucks to 3350 bucks for the exact same data drop run everything. And Sonnet, which is cheaper than GPT55 per token, ended up being even more expensive because it used way more tokens.
And obviously, Opus is basically double the price at $5,300. But here's where things get cool, where we don't have to worry about cost going up a ton. If you were happy with how smart 54 was, you're in for a treat because 55 medium is just as smart and costs less than half as much. 55 medium cost $1,200 where 54x high costs $2,800. That is what's exciting here. While the frontier continues to get more expensive as we're pouring more compute into it, at any given level of intelligence, prices are dropping fast. 54 happened a few months ago, like two months ago, not even.
And 55, a month and a half or so later, took that level intelligence, cut the price in half, and then offered more intelligence for 20% more money. And if you're willing to go a little bit lower and you want to try out 55 low, which is honestly quickly becoming my default, it only cost $500 to run this. That is a sixth the price of 54. And the intelligence level isn't that far off. Like, yeah, it's dumber, but it's still 51 points. That still puts it up there with like Claude Sonnet 46 which was way more expensive.
And again, if we compare the number of tokens, things are very entertaining because Sonnet 46 max did 200 million tokens to run this bench. Opus was more efficient at 110 million, but those tokens are very expensive. 54x high did 120 mil tokens, so slightly more than Opus. When we look down here, 55 was down to 75 million. Massive token reduction. 55 medium was only 22 million and 55 low was the lowest amount of tokens used of any of the things we have selected at 7 million tokens to score higher than deepseeek 4. The models on the furthest left and right in this chart got the same score.
The one that used the most tokens, Deepseek V4, and the one that used the least tokens 55 low got the same score. That's what's so cool. The labs that are putting their effort in the right places are making things more efficient. There is effort to make these things cheaper. You might not see this because you look at how expensive the subscriptions are, how many messages you get, or how expensive the tokens are, but when you look at how expensive real work is to complete, the cost is going down rapidly as long as you're not using anthropic models.
So again, just to emphasize how silly and [ __ ] all the numbers here are and that like the way things are priced to you is not based on the amount of electricity that it costs, but it's based on the availability of compute. We've now established that 55, despite being 2x more expensive by tokens, ends up being a lot cheaper per run, especially if you use it on medium or low. It ends up being way cheaper than 54. To go back to our favorite example from earlier, Microsoft, why is it that 55, which worst case is 2x more expensive, but generally speaking is actually quite a bit cheaper than 54, why is it a 7.5x messages per request when 54 is 1x?
If this was about the cost to run, these numbers make literally no sense at all. The reason these numbers are here in this way and that Opus is 15x is because this actually comes down to how much provisioning is available. How many GPUs are running 54? How many are running 55? How much demand is there for those different clusters? How much availability is there in those clusters? 55 costs more opportunity for Microsoft to serve because they're trying to sell the 55 inference to companies and enterprises right now. These numbers are not meant to actually make the more or less money.
They are meant to get you as an individual paying 40 bucks a month out of the way of random Fortune 500 companies that are paying way more per token. That's all this is. And if you think of this any differently than that, you are falling for conspiracy theory [ __ ] That isn't actually how any of it works. And by the way, we're now way over 2 hours and this one request is still running and my co-pilot usage is still at 0.4%. So if you think that this change that GitHub Copilot is making is them being evil and greedy, you just don't understand how these costs work out.
You just don't. And that's fine, but don't complain about things you don't understand. Go understand it. So I wanted to jump on that because it's just a a very very common misconception that I've seen around a lot. I just feel so bad about me taking all of your money. I'm so sorry. It's so dangerous. But this is the reason why they're doing it. They need to raise money. This is why Google doesn't do it. They don't need to raise the same kind of capital that uh Daario and Sam need to raise. Yeah. Again, I just this is wrong.
Google was subsidizing really hard. Google and Microsoft's problem isn't they have more money so they could subsidize longer and Google chose not to. Microsoft chose to. It's just that enterprise these things take a longer time to do and do right. Microsoft doesn't want to take hits to their reputation. So they extended co-pilot's generosity for longer than they should have. Google doesn't really know what reputation means, so they made changes way quicker that made their already bad product basically unusable. That's the other piece here is that co-pilot being part of VS Code makes it solid. Generally speaking, the C-pilot developer ecosystem is okay.
It's not my favorite thing. There are other harnesses I like much more, but the Copilot CLI is totally fine. This is not the case for Google solutions. Google's CLI is not great, and Google's IDE is absolute garbage. So, Google didn't have much to lose there. But I feel like all of these nuances were missed because there's so much nuance here. There is so many things you have to be tapped in on to understand what's going on. And again, I do not fault Prime for not understanding these things and not communicating these things because his core point still stands.
Things are changing and we're getting used to things working one way. It is getting different. But I also think they're getting better in a lot of ways, too. The fact that you can pay a hundred bucks a month and have enough compute to build like three full startups as an enduser is unbelievable. The fact that these models are getting so much more efficient and smarter is incredible. My previous video where I said that we were hitting the ceiling for what models were capable of is so stupid in hindsight and retrospect that it's actually kind of funny.
So, I have made much dumber flubs and much less informed videos and opinions than anything I'm talking about here, but I ended up going way deeper in the weeds and learning all of these things. So, that's why I wanted to jump on this. I still think you should go watch the original video. It's linked in the description. While I overall agree with Prime's analysis here that the economics of all of this are shifting and changing, I think it's a lot less about how much money they can squeeze out of you as an end user and it's much more about the amount of compute they have available.
That's all I have to say on this one. I hope this breakdown was helpful and until next time, save your GPUs.
More from Theo - t3․gg
Get daily recaps from
Theo - t3․gg
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.








