The AI bubble is bursting

Syntax| 00:22:47|Jun 12, 2026
Chapters7
Explains Copilot moving from usage based to token based pricing and why it matters for costs.

Copilot’s switch to token-based pricing stings for heavy users, pushing many toward cheaper local models and DIY workflows.

Summary

Syntax host CJ and Wes dig into the growing pain of AI pricing and policy shifts that followed Copilot’s move to token-based pricing on June 1st. They recount real-world bills, such as March’s $440 usage-based tab versus a projected $1,800 under token pricing, and Uber’s reported AI budget caps. The discussion expands beyond Copilot to Microsoft Build’s seven new self-hosted models, Google Gemini 3 Pro chatter, and the wider market push toward cheaper, more controllable options like Opus, Haiku, and local alternatives. Wes argues this marks a healing phase where cheaper, local or self-hosted models become more viable, while CJ notes the industry’s race to cheaper tokens, better hardware, and more diverse providers. The duo also considers the impact on workflows, agentic automation, and the balance between convenience and cost, sharing tips like caching costs and the importance of model selection (Opus 46/47, Opus 47’s multiplier, and Code One Flash). In the end, they’re cautiously optimistic that competition and cheaper models will reshape how developers actually use AI day-to-day.

Key Takeaways

  • Copilot’s pricing rimed from usage-based to token-based, with a notable bite on larger chats and long-running agentic workflows.
  • March usage on Pro Copilot: $440 bill; token-based projection would have pushed this toward ~$1,800.
  • Opus model shifts (e.g., Opus 46 to Opus 47) can dramatically alter cost per request due to multipliers, impacting what users actually pay.
  • Microsoft Build introduced seven new own-models, including MAI Code Flash, aiming to provide cheaper, self-hosted options to developers.
  • Local/open models (Quinn 3.6, Gemma 4, MiniaX, Kimmy) emerge as cheaper, viable alternatives for those willing to run workloads locally or on cheaper cloud inference.
  • Cursor Composer is highlighted as a low-cost, high-performance tool, often beating newer, pricier cloud options for many tasks.
  • The market is tilting toward cheaper tokens and cheaper, smaller models, with a longer-term shift toward local hardware and open-source options.

Who Is This For?

This is essential viewing for developers and managers who rely on AI copilots and token-based pricing. It explains how to navigate rising costs, choose between big-model bets and local alternatives, and plan workflows that don’t blow budgets.

Notable Quotes

"The way usage based worked was every request you made counted as one usage and that was regardless of number of input tokens or output tokens."
Explains the core difference between usage-based pricing and token-based pricing.
"With token pricing, every new request would actually get slightly more expensive… but basically a message gets longer the more tokens you accumulate."
Describes why token pricing can explode costs on long threads.
"My bill for March would have been almost $1,800 under token pricing."
Concrete example of how token pricing could have drastically increased costs.
"Microsoft Build announced seven of their own models… including MAI Code Flash and MAI Thinking One."
Notes major move toward in-house models as a cost-control strategy.
"Cursor Composer is better and cheaper than the Microsoft model for a lot of tasks."
Highlights a practical alternative that saves money.

Questions This Video Answers

  • How does token-based pricing differ from usage-based pricing in AI copilots like Copilot?
  • Why did Copilot switch to token-based pricing and what does that mean for long-running AI workflows?
  • What are the cheapest viable AI models for developers right now and how do you run them locally?
  • Will Microsoft, Google, and other giants eventually replace external models with their own, cheaper options?
  • What is Cursor Composer and why might it outperform newer cloud models on cost and speed?
AI pricingCopilottoken-based pricingusage-based pricingOpus modelsMicrosoft BuildGemini 3 ProClaude/Haikucursor composerlocal AI models
Full Transcript
Welcome to syntax. It's happening. The AI bubble is bursting. People are writing code by hand again. Um there's a lot of recent news about uh the price of AI increasing. Company are companies are limiting the amount of AI that their employees can use. And we're going to talk all about it. Today I am joined by Wes. How's it going, Wes? I I don't know how to feel about this. I part of me is just like I told you so. Uh like this stuff is expensive and they weren't making any money on it and now they need to make money. Um there's a whole bunch of new models, but like that's how I'm kind of feeling right now. You know, I'm excited to talk about this. Nice. The biggest news is um Copilot announced that they're switching from usage based pricing to token based pricing. And um I have been a big user of copilot cuz I was combining it with open code and it gives you access to the cloud models and the GBT models and Gemini models. Basically with one subscription you get access to a bunch of different models. You can use it in VS Code. You can use it with open code or whatever else. I I loved it. like I I um I was on the premium co-pilot tier and also paying a little bit more for usage and I had a bunch of agentic flows that were like running overnight and uh it was only costing me a few hundred bucks a month. But they announced that instead of usage based they're going to token based and I'll show my my bills later in the episode. You'll see how much they would have increased by. Um but it's it's not great. It's not great and I'm going to have to figure out a new way to run AI. So explain the difference between usage and token. Like before was it simply just the amount of times that you used it? Well, honestly it I also saw it being bound to happen because the it didn't make sense economically. So the way usage based worked was every request you made counted as one one request one usage and that was regardless of number of input tokens or output tokens. And so uh if you are in let's say a long chat with like lots of messages, lots of back and forth codes, the number of tokens increases with each new message was with with each new response. And if you were using a service that was token based, every new request would actually get slightly more expensive um not taking into account or taking into account tokens that were cached before. But basically a message gets longer and longer and longer a message thread and that means more and more tokens. And so if you're getting build based on token usage, you're going to get pay a little bit more for each new request. But the way that Copilot was working, it was just an individual request. So regardless of how many tokens, you just got charged one premium request. And this this is the issue. This is honestly why like copilot was kind of goat for several months is because like I had massive chats and massive histories and massive agentic workflows where I would only have to pay like six or seven usage requests even though it's like several hundred,000 tokens that are kind of like building up over time. So it was it was honestly the best kept secret. like I knew that it was going to happen at some point, but I only got maybe three months of usage out of it before they now announced that they're going to token based. And and with that, they also increased the like pricing of a lot of these models. So, if you were using some of the newer ones like like Opus 47, Opus 48, then it was, I don't know, something like 10 times more expensive um than it than it previously was, which is absolutely nuts. And you're seeing these these tweets from people being like like it's it's done, you know, like one message, five messages, I've used 27% of my monthly thing, people are are finding out that like you just send a couple messages or if you if you aren't smart about it, like with the caching, if you come back like a day later and just send like one little message like thanks, um that entire message is that is now uncashed. They only cach them for like a short period of time. Each model's slightly different, but you could accidentally like I've had messages that were like $90 chats back and forth and then you you come back like a day later like, "Oh, I better not even send hello because that could be a $90 hello." Yeah. Now, this went into account on June 1st, and immediately you saw people just kind of freaking out, you know, being like, I'm blowing through my budget really quickly. I was on the $10 $20 month plan and like it's it's seemingly useless to me right now. And then along the same times, we saw that like Uber is reportedly capping employees from using $1,500 a month after blowing through its AI budget. So again, you're seeing people just kind of going nuts with it. Um, and this stuff is is really expensive. $1,500 a month on this is is not nothing, but like very very doable. I there's many days where if I look at my usage, I could spend what 200 bucks, 300 bucks in in like a a relatively productive day. And then you you hear other people that are saying like I'm using $8,000 a month or $8,000 a day on these tokens, which is absolutely nuts. Yeah. I I guess I I'll show my my usage bill now. You all have waited long enough. Um yes, you can you can you can uh you can see what I got. Um and if you were using Copilot, they they launched this blog um in the middle of May that mentioned that the usage reports are now available. And so basically you go into your GitHub settings, go into AI usage, and then you can download a report for any given month. And so, uh, for me, for the months of March, April, and May, I was all on usage based pricing. And then June 1st swapped over to token based pricing. But they'll get, they'll send you a CSV, and then they have this uh website where you can upload the CSV to preview it. So, this was my usage for the month of March this year. Um, my bill that I paid was $440. And so that's like the I I'm on the $39 a month uh pro co-pilot pro plan plus the extra usage. So I actually paid for overage, but my bill for the month of March was $440 and that's everything for for GitHub. If they would have been using token based pricing, my bill would have been almost $1,800. You would not get a job at Uber with this bill, CJ. I I was using too much AI. And again, so March is when I really got heavy into agentic workflows. And I think honestly a lot of people did. And that's why companies are are trying to re are like having a hard time reacting to all of this. Uh because like I would I would have workflows that would run overnight, like six, seven hours overnight. And uh it would use a ton of uh a ton of requests. Um, and actually not a ton. Like I would consistently be watching the number of requests because with the pro plan you get I think it's like 1500 included requests and for a run overnight it would use maybe a hundred of them. So which is not bad. Um, so yeah. So this was the month of March. This was my heaviest AI usage where I like paid out of pocket um 440 bucks. And then for the month of April um I I spent a lot less a lot less out of pocket. So $120 out of pocket. But this is Yeah, this is what I wanted to mention earlier. This was when they released Opus 47. So what happened was I was using Opus 46 for everything. But then when they released Opus 47, they disabled access to Opus 46 um in Copilot. So you couldn't you couldn't use Opus 46 at all on the on the pro plan. You had to use Opus 47. And Opus 47 had a multiplier of 7.5 instead of three. So all of a sudden I have uh the the model that I want to use um basically almost double the amount of requests every time I'm trying to use it. So e even though I would have only paid I only paid $120 out of pocket that would have been a $1,400 bill. So and this is not just like a like a GitHub co-pilot program problem. Um, we're seeing this with absolutely everybody who who offers tokens up, right? Like Claude has been like clamping down on what you're allowed to do with it. Um, most of the like heavy unmetered usage has to happen on their $200 a month plan inside of Cloud Co. You can't use it with um like other applications. There's a whole bunch of like gotchas as to like how you can actually use that. Um, and now like like Codeex or Open AAI right now is seems to be in a pretty good spot because they're in this spot being like, "Sure, use it with anything you want and we love open source and we love X Y and Z and everybody is just like well they're doing it and it's like yeah, they're they're say we will bleed a little bit more to get you onto our side, but if you think for a second that they're not going to like well things are getting a little bit more like if you think for a second that they're not going to do the same thing that Anthropic and and Copilot are doing, doing right now is you're you're dreaming because this stuff is costing them a crazy amount of money to to to to get people on, but they just they want you addicted. They're they're trying to figure it all out. They want that data so they can train their their next model. Definitely. Now, Microsoft Build um as of time we're recording, this was yesterday, they announced a whole seven of their own models. Um and this is what I thought was happening. They were clamping it down and they weren't simply just going to ruin their product. They were going to sort of come out with their own model which they control. They can host it themselves. They have the infrastructure and it'll be somewhat cheaper to do this. So they rolled out seven different models. Um the kind of two interesting one for like developers are the MAI code flash or code one flash and MAI thinking one. Um the other ones are like image text to speech speech to text those types of things. Um, and already in Copilot, they've only rolled out, as the time of recording, the flash one. And the flash one is just equivalent to like anthropic haik coup. So, it's just a it's a cheap and fast model. And often these models are used for sort of the dirty work, you know, like parse an output, um, list a whole bunch of files and finds the ones that you're thinking about, you know, sort of like the grunt work of just like this is some relatively low table stakes type stuff. You don't need a very expensive model. And people are always like, "Oh, I'm not using Haiku or whatever." But like if you're using Cloud Code, they're using Haiku to do a lot of their sub agent type stuff and they're just using the expensive one as like sort of the main orchestrator. So that's what they're like code one flashes for. And then they also have this like thinking one which is not at the time of recording in anything and they've not revealed any pricing for it as well but it's the equivalent of like a sonnet 46 which is still not it's sonnet is a fantastic model. I think you can I think a lot of people don't realize you can get most of your coding done with like a sonnet level model. Um, but they didn't. To my surprise, I thought for sure they're going to be replacing like a big boy, you know, like you're going to get like a big GPT or like a Opus level model for everyone to work with, but it's not something that they they had announced. Yeah. And I think you you made a video about this, but uh there was an announcement earlier in the month that Microsoft stopped their devs from using Claude internally. Yes. And that that kind of led to this like, oh, well, maybe they're making their own or oh, Claude is I mean, it's it's twofold, right? Cloud's getting more expensive, so they maybe they don't want to pay their devs for it, but maybe they also want their devs to maybe dog food these models that they're working on internally. And so maybe we'll see an Opus model, a Opus level model come out. Um, but this is all they announced. And um, I I've been doing a lot of local AI stuff and this isn't any better than the models that I can run on my my local AI machine. So hopefully surprising. The same thing happened with Google. We had Google IO a couple weeks ago. They announced Gemini 35 Flash, which again Flash, their cheap, fast model, and they said in June 2026, so maybe by the time you're watching this, Gemini 3.5 Pro is going to come out. And we're all kind of sitting here hoping that that is going to be like an Opus level model for us to use. And you you got to think like Microsoft has been been training their own version of this. Maybe it's just it's maybe it's not ready yet, you know? Yeah, I hope so. One more thing on your point of like Microsoft not being able to use cloud. I I think what they did was they said no more cloud code. Um we have to use our own like harness. We want to be able to dog food it. But I think there's still by looking at the dashboards of everything they announced at uh at Microsoft Build. They have cloudisms all over the place. You know, there's the text spacing, there's the the little dots, all of the things that we're used to seeing in the design. So yeah, they're for sure. I I don't think that they're going to like take that away from some of their top engineers who are like the people that are working on this stuff are cranking to catch up right now. You know, I don't think they're going to be like, well, use our like kind of crappy model to do it. Now, can I also say, so we're on this Microsoft.ai website. This looks like an anthropic website. Like the the background color, the font choice, like they are very much this is like anthropic vibes on the Microsoft website. So it it's I like it. It's beautiful. It's good. Yeah. But it is very much an anthropic look. And so like you're saying, I do wonder if Claude had had any involvement in building up the styles for this. So, but that's actually kind of interesting is that like I've never had Claude kick out a Claude looking website. Um, and I I I'm always curious about what that sauce is internally. It's probably talented people. Um, but we'll see. I do like this accessibility mode off. What does that even do? I I noticed at one point it just stopped the animation, but why did they have to put a button for the That's a very clawed thing. Is Oh, I need to account for X, Y, or Z. Button. Put a button on it. If statement or a button. Um, so like what's the solution to all of this stuff? I If we eventually do like Anthropics going public, they're going to have to start making money. Um, we're going to start getting squeezed out. It's great that we're we're now we will like soon be in a spot where we possibly have five very good providers, maybe even more, right? We're obviously have Anthropic and OpenAI, but then Microsoft's going to have their own hopefully their the big boy. Um XAI is probably going to have their own coding. We're starting to see it launch out um with their um like Grock build tool. And then Google um obviously putting out their Gemini 3 Pro. Those those are like then there's more, right? like meta you you just mentioned um deepseek like maybe let's talk about some of those like like what's the solution here is the more competition but probably the other solution is is cheaper models. Yeah. And I think uh I almost feel like there might be just a shift in how we're using AI because a lot of people were vibe coding or were running those agentic flows where AI is writing everything. But if the models get cheaper that means maybe they'll be a little bit dumber and that means maybe we have to write a little bit more code oursel. So it kind of feels like the world is healing a bit. But in terms of like the cheaper models um like there's Kimmy K2, uh there's Miniax, uh Quinn 3.6 has come out. I've been using Quin 36 on my local AI and also Gemma 4. So Gemma 4 is an openweight model that was released by Google, but it's comparable to Quinn 35. Um, and I guess those are like Sonnet or or Haiku Sonnet level um models that you can run locally. They're I think between 27 and 36 billion parameters. So they can run on local systems or you could actually run them in the cloud but just pay for the cost. So you could use something like open router and uh basically use Kimmy or Miniax from running on someone else's GPU in the cloud and then you're just paying pennies. And these models are also only getting better like Quinn 3.6 is like really good and you can run it locally or you could run it in the cloud. I know you have experience using open router. So I I see a world where again there's there's more competition like you mentioned, but also like using local models, paying pennies for like cloud inference. I think that's the move is we're eventually going to go to these cheaper models. Um, for now I'm still running on the the plans because if you can get a plan with a certain amount included, even if you have to buy three or four plans and sort of mush them all together to get it, they're usually for people like us, they're losing money on it and eventually that will will sort of stop. But, uh, at that point, you can move over to paying based on token. But I I'm not paying based on token unless it's just something very specific or I just need this model to do uh some stuff like I I need to to use nano banana for a whole bunch of like icon generation and just once and then I'm then I'm moving along otherwise I'm I'm not paying for um usage based. One thing I will say is is cursor composer. I think people are sleeping on it. Um because a lot of people moved off of cursor uh they moved into cloud code and I don't think they're realizing how good cursor composer is and how cheap it is because cursor composer is better than the Microsoft model that was just released. But it is significantly cheaper and it's it's fast as hell as well. So, it's I still will flip back to Opus when I'm having a a difficult time with Composer, but I would say probably 80% of my work I could get done with Composer. Yeah, I would just say like personally I almost I like I'm using AI a lot less because Opus is so expensive. Like I was using Opus cuz it was really good. Like I like I didn't have to babysit it as much. I didn't have to do a lot of rework with it. But now that it's so expensive, like I have to use the dumber models and then I'm spending it's basically going back to like four or five months ago. When it was very cumbersome to use AI. So I'm just back in the boat of like writing stuff by hand and yeah, less less agentic flows for me at least. Are we are we just going to do we just rewind our lives 6 months ago to to the point where we were all happy with our just sitting there typing in the box? Um I I think the the other thing that we didn't really talk about here is that like hardware is is it has has to get better, right? Like you know these hardware companies right now are just racing to figure out how to build better, more efficient chips. So that is the other option is that these companies are just they're losing money right now knowing that there's something on the horizon that will allow them uh allow them to run them much cheaper. Um, so like that has always been true with technology, but here I am on a like a $5,000 three-year-old laptop and it's struggling and I'm mad that I'm always out of space, you know? Like if you told Wes 20 years ago that I who was slapping 1 TBTE drives into his like tower computer. If you told me that in 20 years from now you're still going to be stressed out about not enough space on your computer, I probably would have left the profession. Um so like what the hell? It obviously has gotten better and storage is faster and all blah blah blah blah all of these things. But just give me more. I just want more of it. Yeah, I'm dealing with the same. I was literally backing up drives last night. I have like a a NAS with uh 40 TB in it, but I don't keep it plugged in all the time cuz they're spinning drives. So, I kind of use it as cold storage. I'll plug it in, back up all my little drives every now and then, but I was just backing up stuff last night. But I think like so we get more storage, but then video becomes 4K, video becomes 8K, so our files get bigger, and that's also what we're dealing with. Um, but I I've I've been thinking of the the personal computer revolution. So like that happened in the 80s but before that big companies still had access to big computers and and researchers had access to compute but there back then we didn't have the VC funding to put that compute in the hands of people but now we're in a world where like this stuff came out from research and and was very expensive but they used VC funding to be able to give it away to people to free to like get them hooked on it or at least give it access to the people. But I feel like we're going to have almost like a just like there was the personal computer revolution and then like the uh smartphone revolution. We're going to have something similar with AI where the hardware gets good enough that we won't have to depend as much on companies to sell us the compute and we can run them locally or we can basically Yeah. hardware gets better, models get smaller. I think we're we're headed in that direction and and it it feels good. I I sure hope so because like you think about video editors, you know, with the rigs they have, especially like 10 years ago, people were juicing up these Mac Pros. Um, will that be developers of of the future? You know, you spend 15 grand on like a your developer's computers so that you don't spend $1,500. Like even if you were to spend $20,000 on a developer's local computer, that's not a lot. looking at the cost of of how much people we're spending on all these people right now. So yeah, if you use that with uh I guess Uber limit even limiting to 1,500 a month, but that's $18,000 a year per developer. How many developers do they have? Like they're spending a ton on on on imprints here. So yeah, you're right. Do one-time costs versus like having to do that uh for multiple years. Yeah. Yeah. I'm just I'm curious if that like like would it be cheaper just to build put it like why is it cheaper just to put it in my office versus like renting what I use from like a rack somewhere in a very highly optimized server farm. Like they're not making money on it yet. So maybe maybe that's that's the move is that it will be eventually like that. Yeah. All right. That's all we got for you today. But um I'm feeling good about all this. I mean, obviously like I'm the resident AI hater, but I told you so. Yeah, this kind of news feels good. Um, and and the industry is healing. So, let us know what you think down in the comments. Are are you do you have high AI bills? Is your company spending a ton? Are you seeing people roll back? But yeah, let us know down in the comments and we'll catch you in the next one. Peace. Peace.

Get daily recaps from
Syntax

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.