I exploited Copilot and burned $46,000 (it cost $40)
Chapters10
Discusses Copilot moving from fixed message limits to rate-based pricing and the mixed user reactions.
Theo reveals how he monetized Copilot’s pricing loopholes, why the new billing ship is coming, and what that means for developers chasing value.
Summary
Theo (t3.gg) dives into Copilot’s pricing overhaul and the surprising ways people have stretched the old system for personal gain. He argues that Copilot evolved from a simple autocomplete to an agentic workflow, which changes how usage should be priced. He lays out four billing models for inference—subscription with rate limits, subscription with message limits, subscription with spend limits, and API/per-token billing—then explains why the old message-based model created huge, unpredictable costs. Through his own experiments burning Azure credits and running cryptography puzzles, Theo demonstrates how expensive certain prompts can become when tools run long, including an infamous 16-hour single message test. He uses this to defend Microsoft, arguing the billing change is a response to abuse, not a rug pull. The sponsor segment with Clerk highlights practical billing and auth solutions for codebases, emphasizing per-user billing, pricing tables, and shared user caps. Finally, Theo predicts Copilot’s June 1 changes will shift to token-based credits, raising multipliers (e.g., Opus 27x, 5.4x to 6x) and forcing users to rethink cost planning. If you’re building with AI copilots, this video is essential to understand where pricing is headed and how to plan around it.
Key Takeaways
- Copilot’s pricing move removes fixed monthly message limits in favor of rate-limit style or per-token billing, changing how usage is quantified.
- The four main billing models for AI inference are subscriptions with rate limits, subscriptions with message limits, subscriptions with spend limits, and API/per-token billing, plus the option for dedicated compute.
- High-cost prompts and multi-step tool usage dramatically inflate costs; caching can dramatically cut token costs but still leaves substantial spend in long-running tasks (example: one 16-hour single message could exceed $60 without caching).
- Theo’s crypto-puzzle experiments show how agentic workflows enable many steps per user prompt, which multiplies inference usage beyond simple per-message accounting.
- The June 1 change to Copilot pricing will increase model multipliers (e.g., Opus 15x→27x; 5.4x→6x), moving toward a token-based credit system and tightening subsidies.
- Clerk is pitched as a practical solution for auth and billing in codebases, with features like shared user caps, pricing tables, and per-user plan rendering.
- The speaker frames the pricing changes as a protective measure against abuse, not malice, arguing many users actively exploited the old system to extract more compute than paid for.
Who Is This For?
Developers and engineering leaders evaluating AI copilots for production use, especially those relying on Copilot Plus or similar plans. If you’re budgeting for AI-assisted coding, this video helps you understand pricing dynamics and how to plan around upcoming token-based credits.
Notable Quotes
"I have not used this plan a whole lot because, to put it simply, I just use Codex and occasionally Claude coding, obviously a little bit of cursor, too."
—Theo notes his preferred tools and sets up the personal cost context that anchors the rest of the discussion.
"I decided to throw together a tiny little dumb app, also testing out Cloud AI's like built-in app thing. It's cringes balls, but it works."
—Illustrates the hands-on experimentation with AI costs and tooling.
"One message could be a cent or $30, maybe even more."
—Highlights how variable per-message costs can be dramatically skewed in agentic workflows.
"The multipliers will increase on June 1st, though. So, if you're on a yearly plan, you still get this model, but the actual amount you're spending is probably going to go up."
—Signals upcoming price changes and their practical impact on renewals and budgeting.
"This is Microsoft money. The same Microsoft that I had to spend thousands of dollars of credit on to prove their inference doesn't work."
—Theo frames the economic stakes of subsidy programs and the rationale for the pricing shift.
Questions This Video Answers
- How does Copilot pricing work and why did Microsoft change the billing model?
- What are the four main ways to bill AI inference, and which one is most cost-effective for developers?
- How does caching affect AI inference costs and when does it fail to save money?
- What does an agentic workflow mean for API usage and token costs?
- What should developers watch for in Copilot's price changes coming June 1st?
GitHub CopilotCopilot PlusAI pricingInference billingCaching in AIAgentic workflowsClaude CodeCodexCursorOpenAI GPT models and tokens/ multipliers
Full Transcript
I still remember the day that Copilot came out. It was a very strange moment. I was convinced it would never be for me and I would just let TypeScript autocomplete carry me forward, but now we're letting AI write all of our code. In that time, Copilot has had to change a lot. What used to just be an autocomplete tool is now a full agentic solution trying to compete with the likes of Cursor, Claude Code, Codex, and more. And through that, they've had to make a lot of changes, many of which are questionable. Recently, they announced a huge pricing change where they're no longer going to be giving you a fixed number of messages per month.
Instead, they're going to be giving you more traditional rate limits like we see in tools like Claude Code and Codex. This has caused a lot of outrage from users of Copilot, as well as some really interesting takes such as people thinking that this is proof Microsoft can't even afford the subsidy wars and they're choosing to stop subsidizing their users. I don't think people looked into the numbers, though. And if there's anything I'm an expert in, it is wasting Microsoft's money. I have a long history here. I got over a million dollars in credit from Azure.
I already had 500k before this mill grant. And I've been putting a lot of effort into burning it. So much effort that I had to flame them hard for how bad their inference was. Azure was so slow for hosting models that I was going insane and needed more ways to use the money that they gave me. So I chose to waste a bunch of it by measuring just how slow they were compared to OpenAI on a cron job every hour. I knew it was bad, but the P90 being 21 times worse was absurd. After enough complaining, they fixed it.
This benchmark went viral, they heard about it internally, and the right execs got involved. And to Azure's credit, they had a person very high up call me, debug with me, and they fixed it. And now the result is that their inference is actually consistently faster than OpenAI's. Really cool to see. But I burned a lot of money on these tests. Money that, for some reason, they don't seem interested in giving me back. So it's time for revenge. Fun fact, I'm currently comped a GitHub Copilot Plus plan, which is their $40 a month tier. I have not used this plan a whole lot because, to put it simply, I just use Codex and occasionally Claude coding, obviously a little bit of cursor, too.
So, this $40 tier that Rinder I'm getting for free is kind of sat unused until now. You might see a percentage on the screen right now, 4.7%. That is the percentage of my 1,500 messages that I've used so far for this month. I have used 5% of my $40 tier, not even. Want to guess how much money I have cost them in inference with that 5% usage? Minimum $550. I'm going to see if I can hit 40 grand, and we're going to do it together. I have a lot of fun to share with you guys on how easily abusable many of these plans are, and why the existing billing model for Copilot is so [ __ ] broken.
It is genuinely insane that they have let it sit here for this long. But, if I'm about to get cut off of my subsidization, and I might lose my million dollars of credit as a result, we're going to have to cover it with a quick break for today's sponsor. There's a lot of products and companies that I used to love that haven't made the jump to a new agent-driven world particularly well. Today's sponsor is Clerk, and they are not one of those companies. They embraced the two problems that agents have the most when they're building codebases, auth and billing, and they solved both of them incredibly well.
Getting all of your stuff set up right for billing, which really is a user-specific problem if you think about it. Like, why were we doing billing on our database but keeping our user somewhere else? Those are very linked things, especially once you get into orgs and whatnot. Clerk realized the same thing and solved it with their billing implementation. They provide all the components, process, UI, and back-end stuff that you need to set up billing properly. You fill out the fields in the dashboard, you can even add features, and then the code has access. They have a show component you can use to check what plan a user's on and render different things depending on if they have that feature or not.
They even have a built-in pricing table component that will update with your changes. These things are so annoying to get right. We've even had outdated UI in T3 chat cuz we didn't do this correct ourselves. On the topic of billing, I want to talk about their billing because they made a change I've been trying to bully them into for a while. They used to charge per project and I thought that was dumb. They heard me and they addressed it better than I ever would have imagined. Now the $20 per month is unlimited projects. The user cap is shared across all of them.
That's not it, you can probably get away with the free tier which also has unlimited apps in 50,000 monthly users per seat. And by the way, monthly user only counts if they come 2 days, not just that first 24 hours. Stop stressing about authentication, billing, and your costs at soid.link/clerk. In order for any of this to make sense, I need to break down the different ways you can bill for inference and what inference actually costs. Let's start with the four different ways that you can be billed for inference. The first is what all of us are used to, which is subscriptions with rate limits.
This is the model we see in things like Claude Code as well as Codex, where you have one of these vague dashboards that has a current session. On Claude Code, it's a 5-hour window. You can only do so much within 5 hours, but then every week you have a separate limit that you can hit. So if you max out on enough of those 5-hour sessions, you won't be able to do any more inference until the weekly limit resets. How much do you get for each of these? Like how many messages, how many tokens? I don't know.
They don't really document it. Not only that, they regularly change it both up and down. And sometimes they'll even make the 5-hour window more restrictive at certain times of day. Like recently, they changed the 5-hour window so during the working hours of the day, like the middle of the work day in California and New York, they would burn through your usage much faster because they wanted GPUs available for the customers doing the other methods of paying Anthropic. Now that they just got more allocation, they're doubling the limits and removing that part, but it is what it is.
Just want to make sure we understand this method. It is not a direct you get this for this. You aren't getting a specific amount of tokens or messages for the money you're spending. You're getting a vague 20x more than what Pro gets by paying $200 a month. That is what you get with Claude code. And Codex is similar in this regard where you're just paying into a black box with a multiplier on it. And you're not getting any transparent info over how much usage you actually get. It's kind of vibe based. And that's why it's so weird when they change things and suddenly what used to only use a 30 or usage is using all of it.
It's just It's weird and I understand why people are confused by it. But people are now used to this not in the sense that they're used to these vague numbers that we don't see any data from. They are used to getting a certain amount of usage at a certain amount of dollars. So people who are using the $100 or $200 plan on Claude are used to getting all of the usage they need throughout the day, throughout the entire month. They don't know what that usage actually costs if they were to be paying API costs, which we'll get to in a minute.
So we have subscriptions with rate limits. This is Claude, code, Codex, etc. Next category we have subscriptions with message limits. This is a subscription where you have a certain number of messages you can send as a user. This is how T3 chat used to work where you would get I think it was 1,500 messages on the cheap models and you'd get 100 message on the expensive ones like Opus or GPT 5.5 or even Sonic Sonic was very overpriced. We'd give you 100 of those messages cuz they cost us so much more and 1,500 with all the other models cuz they were relatively cheap.
And for most users that was fine. But there were certain users that wanted way more premium messages. I I will go more into the details of how the numbers worked out for T3 chat when we had a model like this in a bit because that's not our focus today. It's Copilot. Just know that I know all of this info because I run a platform that sells subscriptions and has weighed the benefits and negatives of each of these models. So, I know a lot about how the billing works. There is an in-between here of subscriptions with spend limits where you subscribe and you get a certain amount of usage, but it's a dollar amount and you could see it in a dashboard.
This is things like Open Code Zen as well as things like Cursor. You pay 20 bucks, they give you 20 bucks of inference, you see it in the dashboard, but you can go over and spend more money. It's very similar to a subscription with rate limits, but they're more transparent about the dollar amounts being used. They don't reset as often and it's much more understandable when you go over. This is not as important now, so I'm going to delete that section and talk about the other two ways you can pay for inference. There's the one that I spend the most money on by far, which is API billing per token.
When a new model comes out and you're looking at pricing, this is what you're looking at. For example, GPT-4, if I recall, was 125 per mil in and it was 15 per mil out. So, if you made a request that took in 800,000 tokens and spit out 200,000 tokens, you can do the math to see how much that costs. If I have a request where I attach a PDF that is a million tokens, that request is going to be much more expensive than a request where I ask it to solve a simple math problem because the input tokens are misleadingly expensive.
When you pass a ton of data to a model, it's going to read all the data and it's going to be expensive to read all that data. So, something a lot of people don't seem to understand is when you paste a PDF into a model and ask about that, that message costs way more money than a message where you ask for an answer to a simple thing. These are the prices we have to pay for T3 chat and it's our hope that uh somebody spending eight bucks a month or 50 bucks a month in the T3 Chat plan doesn't use their messages in a way where it's more expensive than what they're paying for.
This is the balance that we have to strike externally. But then, there is the fourth category, and this is the category that the labs are in, as well as the category that really big businesses often find themselves in, too. Dedicated compute. This is similar to getting dedicated servers for traditional traffic. Instead of having servers spin up when users make requests and spin down when they're not making requests, you can choose to have servers that are provisioned that are sitting there waiting for requests. On platforms like AWS, Azure, and Google Cloud, instead of just paying per token, you can choose to rent a bunch of GPUs and run the model that they provide on it, usually with some amount of licensing fees on top, and now you just have that and can do as much compute as that box is capable of.
This is how OpenAI and Anthropic have to think because they are buying the GPUs, they are running models on the GPUs, and then they are providing those models to different customers, some of which are people paying API prices, some of which are some of which are personal users that are using the subscriptions with rate limits. So, you as a business can do the dedicated compute thing, but for a lot of reasons it doesn't make a whole lot of sense, but this is how the big labs are thinking about it. So, these are the four types of compute that we need to So, these are the four billing methods that we should understand before going forward.
But now that we have these four, it's important to understand how the human psyche works. People get used to doing things a certain way, and when that thing changes, they get upset. The more they use the thing, the more upset they get. So, if you have a person who is using Copilot really heavily, and they're used to getting a certain number of messages a month, and now they're being moved to a different model, they're they're to be really upset. Some users benefit, though. We'll use the example of T3 Chat. Previously, T3 Chat for $8 a month would give you 1,500 messages on normal models, the cheap ones, and 100 messages on premium models, expensive ones.
We did it this way because of how we launched. This was not a good idea. It was just the simplest thing at the time. When we launched T3 Chat, the only models we supported were 40 mini and DeepSeek V3, which were both really cheap. So, we gave you generous usage for the eight bucks. And then we added Sonnet cuz everyone wanted it. And Sonnet, quickly, even though it got a third of the traffic, was 10 times more expensive than everything else combined. We immediately realized we [ __ ] up. Back in February of last year, I announced that we had to change how billing worked in T3 Chat.
Again, we were very generous before, but certain users were just fleecing us for the Claude usage. So, we changed the 1,500 messages a month. We limited Claude to 100 messages a month so we don't run out of money. We let you buy more Claude credits at the time, eight bucks for another 100. I gave a bunch of free credits away for people who were already subs, and they reset rate limits. I was the original Tibo. We had individual users who had cost us over $200 in just a few days cuz again, we'd only started in February, and within five days had already spent $2,000 on inference.
This would have bankrupted us almost immediately. So, we took the expensive models like Sonnet, we gave 100 a limit of 100 for those, and we gave 1,500 for the things that were less expensive. This was a patch that we did, though, and I want to emphasize that. This wasn't us solving the problem, it was us stopping the bleeding a little bit. But it kept ramping up. In particular, as the models were able to run longer and do bigger tasks, some requests would only cost a penny or two. Others would cost 10 plus dollars. I have had single messages to Opus cost over $10.
It's not that hard to do. So, this forced us to reflect and we spent a lot of time on it and we realized that selling messages is a suicide mission because a message does not equate to a specific amount of money. This is like saying, I'll pay you a million dollars for five cars. If the cars are all Ferraris, that's probably a good deal, but if the cars are all beat up, abandoned, used Subarus from 2001, that's a [ __ ] deal. A message has as much variety in cost as a car does, if not a wider range.
And people really struggled to understand this. So, rather than try to explain it like I normally do, I've decided to demonstrate. I recently learned that Copilot still bills this way. Copilot billed this way because when you used a chat in your code base, the way it worked would be relatively consistent when the models were dumber. The models have gotten much smarter over time. The smarter models can do more per message, which means they're using more tokens per message. Most users of T3 chat were profitable. Most users would do way less than $8 of inference, but 1% of users would carefully, strategically use their 100 premium messages and each of them would cost us $1 to $3.
Individual users, even with this structure, were capable of costing us 200 plus dollars a month because billing on messages is stupid. The problem there is that if I was to lower the numbers, if I lower the number of premium messages so that the most expensive ones would not bankrupt us, then normal good faith users asking simple questions to Sonnet would get 10 messages a month because this very small number of users was abusing what they could do in the platform. So, if you're upset that we had to make these changes, don't be mad at me.
Be mad at the creators of Repo Mix, this awful cringe [ __ ] tool that compresses your whole code base into XML so you can paste it into a chat app like T3 chat. This single app has probably cost my business half a million dollars. This should never have been created. Anybody who uses this on chat apps that are made by small businesses should feel [ __ ] bad. You are abusing message quotas when you use it. As I said, doing this to abuse a small business is very cringe and bad and you should feel bad for doing it.
You directly impacted the amount of money I'm able to pay people like Mark and Julius if you use that tool. I'm not GitHub. GitHub got acquired by Microsoft for 7.5 billion dollars. GitHub is still funded entirely by Microsoft and has a shitload of money. We had to change how this worked because the cost per message was so varied that certain users were capable of bankrupting us. We ate the optics hit and it was a hit, believe me. Our revenue went down when we made our changes here because people were upset. Most users can send more messages as a result of the changes we made, but some users couldn't and those users were upset and they all left.
Microsoft doesn't want to deal with those types of optics hits, so they rode the wave longer even though the delta between the cheapest and most expensive messages for Copilot was significantly worse. Because at the very least in T3 chat, one message can only do two turns at most. If I send a message like, "Who is the president of the US?" to this model with search off, it's only going to do one step. It's going to reason, which is when it talks to itself to figure out what's going on and how to respond, and it correctly responded, "The president of the US is Donald Trump." Cool.
This is one turn and one step. I sent a message, one API call was made, and the model generated one response. But let's say I ask it, "How did the stock market perform last week? It generated a tool call. The tool call runs, and when the tool call is happening, the previous request is done. It's over. It ended. The search occurs, generates results, puts them back into the history as a new message, and then a new API request is made to where the inference happens to continue from there. Once the model decides what tools to call, it stops running.
And once it stopped running, the tool call executes, and when it's done, it comes back in as another part of the history, and the model then is spun back up with a new API request to continue from there. That is how this all works. We only can do so many of those steps, though. You can bump up how many searches the model can do, but we have hard limits on that. And we even say, "Higher search counts add more usage." Because if the model can do multiple steps, each of those steps is doing more inference and costing us more money.
So, the more of these steps the model can do, the more inference is being used. When products like T3 Chat, Copilot, and even Cursor back in the day all chose to do message-based billing, the models could only do one step each turn. Once you sent a message to the model, it would respond. It wouldn't do other things. Now that models are primarily used in agentic workflows, where they can do different things after you respond, after you send your message, it will do multiple different steps, that means it generates significantly more tokens. T3 Chat is still limited, though.
We went from one API call per message to two, maybe, when you turn on search. Sometimes three or four if you bump up the number of searches it can do. Do you know how many Copilot can do? Do you know how many Claude Code or Codex can do? Because I don't. Because as far as I know, there isn't actually a hard limit. Claude Code, Codex, Cursor, Copilot, all these tools can just run. They can just keep making additional tool calls once you send your message, which means that for us on T3 chat, the cheapest message and the most expensive message do have a gap, but it's like 1 cent to 1 or 2 dollars.
On something like Co-pilot, one message could be a cent or $30, maybe even more. This is a run I did using GPT 5.4 extra high solving a cryptography challenge. That is 111 million input tokens and 1.6 million output tokens. This request ran for 16 hours and was one single message. One message. If we ignore cash, 111.3 mil times the 125 for the input tokens plus the 1.6 mil times $15 for output tokens. This one request with no caching would have been 163 bucks. Thankfully, there is caching. If you're not familiar, I've talked about this a bunch before.
Real quick TLDR, if you don't get it and want more info, go watch my videos about Claude and how Claude regressed cuz I talk a lot about how caching works in that. When you have an input and you give it to a model, it has to adjust its parameters in order to generate the next thing correctly. That requires a lot of math and weird calculations the GPU has to do. Caching is taking your history up to a point, saving what it had in memory at that point in time, and then restoring it so we don't have to recalculate up to that point.
And the result of this is that those input tokens that are cached are 10 times cheaper. Caching matters a lot when you're doing tool calls because every time a new API request is made, it would have to re-ingest all of the input tokens without cash. But if it's cached, it can store what it had calculated already and reuse that, which is much cheaper. This is where the math starts to get annoying though because the cached tokens cost a tenth as much, but there isn't an easy way to do these numbers and like it's just it's a lot of math.
So, I decided to throw together a tiny little dumb app, also testing out Cloud AI's like built-in app thing. It's cringes balls, but it works. So, let's just put in these numbers from that run I had there. When you factor in caching, the cost is less than half as much, but that is still $62 for one single message. I get 1,500 of those. That means if I successfully hit that number for every single message in that month, my $40 plan is worth $93,600. That's a lot of inference. So, what the hell did I do that made the model run this long?
Probably should have mentioned this before, but uh I'm really into cryptography puzzles. Blame DEF CON for getting me hooked. I just have a lot of fun going to DEF CON and trying to solve these absurd challenges there. These aren't like hacking in the traditional sense where you're trying to break into a server. They are fun decode problems where you're given gibberish and have to try and figure out what the hidden meaning is inside of it. I tried to see what GPT 5.5 was capable of solving and all the puzzles I had it was able to solve.
So, I started making my own. And it could solve them. It could solve this one specifically, but it took it [ __ ] forever to do. I was curious how well my community would be able to solve it, so I put up a bounty. For no reason in particular, made my first cryptography challenge. I'll pay $1,000 to whoever solves it first. Winner is whoever gets the answer in my DMs first. Posted that at 11:49 a.m. and it was solved at 11:58 a.m. Less than 10 minutes later. This one was fun. I will break down the solution. To test the capabilities of the models, I took this cryptography puzzle, dumped it into ChatGPT to see if it could solve it.
And it did. It took a while. I'll show you how much in a second. Remember, line one is the puzzle. Line two is a hint effectively. He realized the second line is a route 47 and he decoded it to this. A dog on the moon once said, "This hash." This hash was meant to confuse people into trying to decode it, but there's nothing to decode here. GPT-54-Pro figured this out. It's not a hash to crack. It's a Git commit hash in my Dogecoin simulator project. The commit is titled add legacy Doge hash calibration. I edited my old Git history on my project in order to add a fake commit to it on a branch that didn't exist before, so it would not look too suspicious.
So, I could add and remove this fake old dog hash seed that was needed to decode the first line. This took GPT-54-Pro 81 minutes and 47 seconds to solve. It ran for a long ass time. But, it solved it. It also didn't get much in terms of hints. I just told it to solve the crypto challenge. I didn't even have memory on, so it didn't know who I was or to find my GitHub. Memory was off, so it didn't have any additional hints that way. When you give it the link to the gist, that's a huge hint because now you have the GitHub account of the thing to check.
And when you combine that with the hint about Doge, linking it to me is much easier. So, when people pasted the gist link into Codex and said solve, it could solve because the link between my GitHub and this GitHub was tightly coupled such that it was easier to figure out. And then from there, the first line would decode as JSON, but you had to have the seed in order to crack it. And I gave you in this JSON blob all of the pieces you needed to do the decoding. So, I needed to make this harder.
I needed to make this way harder. There was one other stupid thing I had done though. The encoding for this first line, I wanted to make it another hint to point at me. The coding for this is a the version of what I'm referring to as base 23. 2 3 T is the 20th letter, T3. It's meant to hint at T3, so you knew to go to my GitHub. One of my cryptography friends that tried this puzzle was very mad at me. Because base 23, according to him, should start at zero. According to me, base 23 isn't real.
I made it up for the sake of the challenge. It can start wherever I want. Base 32 and base 64 don't start at zero. They start at A. But the fact that this annoyed my cryptography friend so much taught me something. All of this is arbitrary [ __ ] Base 64 has a specific order that things are supposed to be. As unintuitive as it is, zero is 52. Capital A is zero, B is one, C etc. And then we switch to lower case at 26. This is the standard base 64 alphabet. You can usually guess what type of alphabet is used by just looking at it.
Like in this text for this challenge, there is no X, Y, or Z. So it's probably not base 32. So there's three things I want to change for the next version. First, I didn't want the hint on GitHub because then when I post a gist, it gives it away. Second, I don't want this to be easy to decode as JSON because if you can decode it as JSON, it's too easy to solve. So I wanted to make the next step less clear. But most importantly, I wanted a [ __ ] encoding. I wanted to be more trolly with my encoding.
And that's why I put out a second challenge the next day. This one took the community 10 hours and a lot of hints. And my agents were never able to solve it without even more hints. I also had to disqualify a lot of people because they would paste it into an agent and it would hallucinate some [ __ ] because it couldn't solve it. And then they would send me these random [ __ ] things. If you want to go solve this puzzle yourself, pause. Don't keep watching cuz I'm about to spoil the whole thing. But I'm proud of it and I like the opportunity to.
Here's puzzle two. You'll notice the first line looks a lot different. It is much shorter and it is a much different alphabet. In fact, this alphabet looks very clearly to be base 64. This second line again as a hint decoded to where it all began, but the began is n underscore o yada yada random characters. If you're a big enough nerd, you might know what that is, and if not, you might want to read the screen more because that's a YouTube style video ID. That video ID goes to the first video I ever had on my YouTube channel, a skate demo when I was testing the new iPhone.
And here I have a random string for no reason in particular, a drum brake might shatter it. And then I specify below it if you DM me this phrase you are disqualified and I will block you because I had so many people DMing me this string thinking it was the answer even though it is the hint from the second line that you use to decode the first line. From there you have to solve the much stupider puzzle, which is realizing this is not standard base 64. This piece here is a different encoding. You'll try to decode this as base 64 and you'll get jack [ __ ] You can try rotating it, trying to brute force it, you can do a lot of different things and it won't work.
And people were struggling a lot with that, especially because when they got the right decode, it still looked somewhat random because it didn't decode to JSON like it did on part one. It decoded to a different format of the same AES encryption as part one. So, what is the encoding I used? The first hint was that yesterday's puzzle is a blueprint for today's puzzle two similar to puzzle one. It's a little late cuz I was out in Miami. I then specified that the trick I did in the video was a switch laser. Switch is backwards, it's inverted.
That hint almost immediately gave the answer away. What I did, my evil trick for the decoding here, is that I inverse the base 64 alphabet. Instead of zero being A, zero was slash. Instead of one being B, one was plus. Instead of two being C, it was nine. Nice little stupid trick. I'm the right type of dumb to design these puzzles is what I'm trying to say. Regardless, you can't just throw this one at an agent and expect it to figure it out. I had a run on the second puzzle that went for over 180 minutes and I can't find it, but puzzle one was able to take 157 at times, which is pretty crazy.
These models were trying so hard to solve this puzzle. You might be questioning why I'm talking so much about my cryptography challenges in a video about wasting Copilot money. That number is the hint you should be looking for though. If I can get this to run for 157 minutes with such a simple prompt, is that abusable? Can I get Copilot to run that long? The answer is actually no, I can make it run significantly longer. I got this run to go for 16 hours and 10 minutes. That's a long ass time to run for. That's a lot of tokens to generate.
So, how do I combine all of this into Copilot's worst nightmare? We'll start by SSH'ing into my Claw Mini cuz I needed this running 24/7, so not on my laptop. We're going to run Copilot. There's a few things we need to understand about it. First off is how the model selection works. You'll notice here there's a bunch of different models and they have different multipliers next to them. The multiplier is how many messages does it cost you out of that 1,500 message window. For the cheaper tiers, it's fewer. For the $40 tier, it's 1,500. So, if I'm using GPT 5.5, I'm only going to get 200 messages cuz they bill it at 7.5x.
So, each message with 5.5 counts as 7.5 messages instead of just one. Meanwhile, Opus 47 is 15x because unlike 5.5, which is relatively token efficient, Opus isn't and it is very expensive tokens. So, you only get 100 messages with Opus 47, you only get 200 GPT 5.5, but GPT 5.4 is only 1x. They also don't seem to care what reasoning level you choose. Switching to high does not increase the number of messages you're burning. And from here, it is quite trivial to get Copilot to run for far, far for long. I'm going to do something particularly evil.
I've built a whole system for automating the burning of Copilot money. And it all comes through this prompt.md file. This prompt.md file has the challenge in it as well as a simple prompt. Solve the following cryptography puzzle. Keep going until you have a plain text answer. I have heard the plain text answer ends with a question mark. The answer is in the first line, the second one's a hint. Again, keep going until you get an answer. Going to make a couple changes to make this a little harder for the model. Do not access other files on this computer.
Do not access Twitter, either. Neither are necessary to solve this puzzle. This is because the models were cheating a lot accessing the other runs that other models were doing pretty often. So, I've also restricted it the hardest levels to the best of my ability. But, I'm going to make one more evil change right here. I'm going to change one letter. This is now unsolvable. Let's spin up some more. This is the most evil thing I've done in a minute. If this was a human I was handing this to, I would legitimately feel I should probably be jailed for this.
But, it's not a human. It's not even a a hardness that is worth respecting. It's not even an It's not even a bank account worth caring about. This is Microsoft money. The same Microsoft that I had to spend thousands of dollars of credit on to prove their inference doesn't work. So, as far as I'm concerned, this is just a favor. And I am now spinning up 50 sessions staggered very carefully to get around the additional rate limits that Microsoft has been slowly trying to add to see how much money this can burn. I would be very surprised I would be extraordinarily surprised if that one command I just ran didn't cost Microsoft a thousand dollars.
I have other strategies I could do to to it more likely to always be the expensive runs because sadly I have not had that much consistency. I don't think it's the hardness. I do think it's just models being temperamental. But if I go through the CSV with all of the runs in it, which is here. Last numbers output tokens. You'll see some of these are as few as like 30k output tokens. Oh god, that one was only 10k. That's scary. That means that didn't burn as much money as it should have. Some of these are in the 780k range.
It's my goal to get every run to be in this range. And I have some more theories on how to to try and get this so that it always hits even more than 1.6 mil. I might be able to get it up as high as 2 mil. Considering that the average across everything here, 60 messages was 545 bucks. So I'm averaging $10 per message here. Not bad. But as I showed earlier, I can get individual messages up to 60 plus. I would be surprised if my experimentation here alone doesn't cost upwards of 15 grand. I would be disappointed if I couldn't get it up to 40 though.
And believe me, I plan on trying. And I will definitely share updates in future videos as to how much I actually get out of this. But considering that just from the request I made now plus this, I'm well over $600 at under 5% usage, I think I succeeded at my goals here. And again, I'm on a time crunch now because starting June 1st, your Copilot usage will now be using a more traditional credit system where the number of tokens used is what you're being rate limited against instead of the number of messages you send. Instead of counting premium requests, every Copilot plan will include a monthly allotment of GitHub AI credits with the option for paid plans to purchase additional usage.
These will be calculated based on token consumption including input, output, and cache tokens using the listed API rates for each model. Sadly, despite the promise to have the preview build experience in early May, it is not out yet. It is only May 6th, so they have time. Follow me on Twitter if you want to see when I post the preview build cuz it's going to be very fun. Users who are on annual Pro and Pro Plus plans will remain on their existing plan with premium request based pricing until their plan expires. The multipliers will increase on June 1st, though.
So, if you're on a yearly plan, you still get this model, but the actual amount you're spending is probably going to go up. Opus is going from 15x to 27x. 5.4 is going from 1x to 6x, so it's very good I'm doing this right now. So, why am I talking about this other than just being spiteful towards Microsoft because I think they owe me money, which honestly I should be paid for the work I did on Azure. The reason I'm talking about this is because, to be frank, this isn't a rug pull, you're just stupid.
I I'm saying this because I have seen so many people, even now in my chat, "It's a rug pull. It's a rug pull. Please explain." I'm talking about the users who paid and now have expectations. You're stupid. I'm making this video and I'm costing GitHub thousands of dollars because there's a lot of very stupid people that are entitled to getting thousands of dollars of stuff for small amounts of money because Microsoft took too long to change their billing. Every company doing inference that moved to agents should have changed billing when they moved to agents. The idea of the model being able to keep going more than one request per message inherently changes the cost characteristics of messages in those products.
T3 chat didn't really move to an agentic flow, we just have search, but even that was absurd and was basically requiring us to change our billing model. That made us go from one API request to two per message. Copilot introducing agentic flows meant they went from one API request to potentially hundreds if not thousands per message. The only mistake GitHub made was taking too long to change the billing here. They should have made these changes months ago if not years ago, and they took too long to do it because, let's be real, the Copilot product from GitHub is far from a focus right now.
The reason [snorts] people are upset isn't because they're being rug pulled, it's because GitHub was the only company that was inattentive and stupid enough to leave this loophole in for too long. Every other company tapped out of this billing model as the models and agents got more expensive. Microsoft just took too long. They're still giving you what you're paying for. If you spend $40 on your plan, you're getting at least $40 of inference still. And if you think this is because Microsoft is broke and wants to charge you more money, you also don't understand. They don't have the compute to handle this.
If you're doing $40,000 of inference on a $40 plan, that's not $40,000 they should be charging you. That's $40,000 of compute that wasn't available for them to sell to enterprises. And as mad as Microsoft might be at me for wasting all this money, the harsh truth is that I did it to defend Microsoft. The same way that I did the Azure bench to defend Microsoft or help them get better. I'm doing this to try my best to prove to you guys that this isn't a rug pull. This is a loophole that they left in for too long that let you do a thousand times more money of cost than you were paying for.
And you guys abused it, and now they're changing it so that it can't be abused. You should not be able to get more than $40 of inference for 40 bucks. I think I made my point here. GitHub is not some evil company trying to maximize their profit on you. They just literally don't have enough compute available to keep subsidizing, and they seem to have done this for as long as they could. They took way longer than they should have to get here. And they only made the change when they were between the rock and hard place where one was pissing off people by making the change, and the other was they didn't have the compute to serve your requests.
They even went as far as disabling sign-ups. Like you cannot register for Copilot right now because they didn't have the compute available. Everybody's in a compute crisis to the point where Infroptic is partnering with Elon Musk in order to solve it. This isn't a rug pull. They're not trying to squeeze more money out of you. They're trying to keep the program afloat and give you a reasonable value for your money. And if you're upset that you can't scam $40,000 of inference out of a $40 plan anymore, I'm sorry. It's time to get good. I think that's all I have to say on this one.
I'll be sure to update how much of GitHub's money I managed to waste during this last month and I'll be pushing it as hard as I can. Curious how y'all feel. Are you going to go max out your plans or do you now feel a little more sympathy for the companies that put themselves in this position? I'm just trying to make the economics of all of this a little easier for us to understand as devs cuz I think that those subscription plans have rotted our brains and nobody gets how this stuff actually works anymore. Let me know how y'all feel about this and until next time, peace, nerds.
More from Theo - t3․gg
Get daily recaps from
Theo - t3․gg
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.









