Everyone is Wrong about Tokens
Chapters7
Discusses the startling OpenAI token spend figures and the speaker's initial skepticism, setting up a discussion about the trajectory of token usage.
Bold forecast: token efficiency will reshape AI work, as giant token spend trials reveal that “infinity” usage isn’t sustainable or desirable for most teams.
Summary
The PrimeTime’s Pete riffs on OpenAI’s token spend and what it means for the future of AI tooling. He starts from a jaw-dropping stat—$1.3 million spent on tokens in 30 days and 603 billion tokens in total—to challenge the crowd’s obsession with endlessly scalable token usage. He argues that the real lesson isn’t who spends the most, but what token efficiency looks like in practice. Pete notes that some tech influencers promote massive cloud-agent setups, but he doubts that approach scales or stays affordable for ordinary teams. He draws a parallel to crypto-era hype, suggesting that “fluencer” culture is migrating to AI and that many people react with hype rather than practicality. The core prediction: we’ll move from token maxing to token efficiency, with consultants, not just engineers, helping organizations optimize usage. He also imagines a future where AI agents run in the cloud, CI/CD pipelines show live status, and “vibing” becomes a trade-off between time and money. Throughout, he weaves in anecdotes about corporate spending controls from the past and contrasts them with today’s eagerness to deploy AI widely. The piece culminates in a warning against token overuse and a call to rethink how we measure value: features delivered and efficiency over sheer volume of tokens. Pete ends with a playful critique of the coming wave of token-efficiency coaching, promising a future where prompt trainers resemble Pokémon trainers rather than traditional consultants.
Key Takeaways
- Infinity-token usage is unsustainable for most organizations; real-world value comes from token efficiency, not maxing spend.
- OpenAI-like token costs could exceed budgets quickly, prompting a pivot from “spend more tokens” to “spend smarter tokens.”
- Tech teams will increasingly seek consultants who optimize token usage and feature delivery rather than simply expanding cloud agent fleets.
- The industry trend mirrors earlier tech cycles (microservices, Kubernetes) where complexity outpaced customer value until efficiency-focused practices prevailed.
- A future market may emerge for token-efficiency coaches, including prompt trainers, as companies balance speed, cost, and practical outcomes.
Who Is This For?
This is essential viewing for AI engineers, product leads, and tech execs who are wrestling with whether to scale token usage or focus on sustainable efficiency. It’s also a timely read for consultants and managers who will shape token strategy in large organizations.
Notable Quotes
"“You see this right here? Yeah. That's $1.3 million spent in OpenAI tokens in the last 30 days. 603 billion tokens spent.”"
—Pete opens with exact numbers to set the stage for his critique of token-maxing hype.
"“Just because this works for Pete... does not mean it works for you.”"
—A recurring caution against blindly following high-spend setups.
"“I think a lot of people look at this and they're like, ‘Oh, well, you know, OpenAI is being evil.’ No, I think they actually just believe this.”"
—He reframes the OpenAI token experiment as a belief in a token-heavy future.
"“Token efficiency is going to be an entire argument, not just token maxing.”"
—Core prediction: shift from maximizing tokens to optimizing how they’re used.
"“The consulting class... is going to be the most annoying people in the universe.”"
—Humorous cliff note on a coming wave of token-efficiency experts.
Questions This Video Answers
- How will token efficiency change AI project budgeting in 2024 and beyond?
- What is token maxing and why might it be unsustainable for most companies?
- What are prompt trainers and how could they impact AI workflows at scale?
- Can companies realistically replace large numbers of engineers with AI agents while staying cost-efficient?
- What should leaders consider when measuring value: token volume or feature delivery?
The PrimeTimeToken efficiencyOpenAI tokensAI agents in the cloudToken maxing vs. token efficiencyFluencer culture in AIPrompt trainersAgile coaches for AI
Full Transcript
You see this right here? Yeah. That's $1.3 million spent in OpenAI tokens in the last 30 days. 603 billion tokens spent. Now, even if I were to try my hardest, I am not actually sure it's possible for me to spend this amount of money or that amount of tokens. I have no idea how we accomplished such things. And when I saw this, I thought this is just the most ridiculous thing and I This is so stupid. But then, I started thinking about it more and more and I realized that there's a future that is developing in which I think a lot of people are wrong and I think this post right here really helps it kind of crystallize in my mind where things are going.
So, I got a lot of yapping to do, so I hope you're going to you know, strap down cuz I I I think that yeah yeah you you you you you probably aren't going to see this one coming. I don't think you you understand what's going to happen here in the next year. And I think I might be right on this one. I'm going to In fact, I'm going to do something I normally don't do. I'm going to make a tech prediction. I know. Kind of dangerous. I I do just got to yap about this for a second, okay?
The reason why I have to yap about this is that whenever a post like this happens, there's always the exact same thing that happens. There's this entire fluencer market when it comes to AI and I largely think they're just simply pull-overs from the crypto days. The crypto NFT bros moved over to AI. When they see someone make a post like this, Papa Pete, of course, they go, "Oh, hey bros. Hey bros. Everybody. Uh I don't know if you know this. If you aren't spending like $100,000 if you A- if you're not even hitting 10 billion, if you're not even in the B's when it comes to token usage a month, you're not going to make it.
You know how I know that? Look at Papa Pete, okay? Open cloud guy, he knows what he's doing. Do you know what you're doing? Not going to make it. Permanent underclass. Hey, buy my course and I'm going to teach you how to do AI properly." And it's such a bad takeaway. And let me explain it in more simple terms. Like, you know, the funny thing about history and about tech, they don't repeat but they do rhyme. I've heard that once and it makes me feel like I'm really smart saying that. You know what I mean?
So, what I mean by they they rhyme is that in 2016, 2018, 2020, if you would see any startup, you if you went and talked to any of your friends in the Silicon Valley, there was an entire culture that had more microservices and Kubernetes usage than they did literal customers. I actually had a friend lament to me that he was managing 10 different microservices and he had three customers. Unironic, that's not me making up or exaggerating things. He had triple the services for three customers and he was just like, "What the hell have I done with my life?" And it's just like, "Brother, you have to quit listening to Google for how to run a company.
Just because it works for them does not mean it works for you." And this is kind of that same vibe. Just because this works for Pete, which by the way, guess how many dollars he paid for those tokens? Yeah, zero. You know how much money you're going to pay for those tokens? Yeah, full price, okay, buddy. It's not going to be cheap, okay? You're not spending six six hundred three billion dollars in tokens per month. And if you if you are, I mean, well, hey, nice to meet you, sir. I did not realize you I was not aware of your game.
And so, I just wanted to kind of get that out of the way, okay? Now to the future, the thing that I think all of you have wrong, okay? But first, the bag. You see these people walking around with their laptops cracked just so their agents don't stop running? Mine never stop running. When making changes with Cloud Agents, you can see the diffs inline just like with any other agent. It will create a PR and you can actually see your CI running live within the Cloud Agent. You can see the status of the CI when it completes and you can even go back and fix the failing CI.
Not only that, but you can also just run live commands in the terminal. That is my project right there. This is not on my computer. This is in the cloud running where I can ask it to do things. I can ask for changes. I can ask for changes on my phone and see the game played via MP4. What's even crazier is I can just take over the desktop and I can place towers and I can just play the game. Start round. And I can watch the bats happen. This is my game. Try Cloud Agents today.
cursor.com/agents and never have to worry about your laptop being open again. Okay, welcome back. Let's talk about the future here for a second. So, something that you need to kind of keep in mind when you see these things is that what Pete's entire goal is, it's a research project. How far can OpenAI take token usage? Cuz remember, they believe the future is going to be this token Utopia where everybody just sits back and relaxes like we're in Wall-E and we just are able to out anything and you have billions upon billions of tokens for free cuz everything gets 10x cheaper every single year, which by the way, that promise is 2 years old and I feel like things have never been more expensive.
I don't know. It feels that way to me. Maybe I'm wrong, but things kind of feel a little costly. Nonetheless, 10x cheaper. Remember that. 10x cheaper every single year. And so, at some distant point in the future, you spending 603 billion tokens and every last person on Earth doing that, which by the way, we don't even have enough. Like I don't even think there's enough power on Earth to do that currently. We might have to 10x all power on Earth and only use it to power GP used to make this happen. But again, I digress.
If this were to come across, this is how projects could look. So, I think a lot of people look at this and they're like, "Oh, well, you know, OpenAI is being evil." No, I think they actually just believe this, right? Like I think they actually believe that every last person will be using Infinity tokens at all times. And yeah, sure. They are the benefactors of it. And I mean, it's a good future for them, but I actually also think they they think this is just like how the world should work. This is how projects should be ran.
And so, this is a research project which got me to think about something for a second. And it's kind of this funny conundrum that you see. Uh right now, if you go to any of the big companies, what are they all about? Hey, what's your token spend? I mean, there is literal people getting fired because they're not using AI enough. You've seen this, you've seen the articles, you've seen potentially these rage posts on Reddit. I can never tell if what I'm reading on Reddit is real or not, if it's just there to rage bait me into a frothy mouth just to go off and tweet a story that doesn't even exist.
But let's pretend they do exist. People are getting fired for not using enough AI. The I've read stories about people who are interviewing, if they use too much AI, people don't like it. If they don't use enough AI, people are not liking it. Like interviewing sounds like hell. Working at companies right now sounds pretty awful cuz you're constantly being shoved down the throat, you must use this. People at Amazon, you better use Kiro. Hey, if you're over at Google, better use that Gemini, buddy. And just keeps on going and going and going, right? Well, there's kind of a problem there.
I don't think people realize what the problem is. Because right now it's like spend all the money you want, right? Okay. Well, let's just rewind like 18 months, okay? Not even that long ago. Let's just go back a little bit. You wanted a new computer. Oh, you want 32 GB of RAM? Well, we're going to have to get a vice president to sign off on those $400. Oh, and a chair? Yeah, that chair, it's going to be a used Herman Miller. Okay. You're I'm sorry, but those buns of yours do not get the luxury of sitting on brand new Herman Miller, okay?
You know what? We're getting you a lifetime chair. That's what you get. Yeah, you. You get a lifetime chair and I'm going to go grab some patio furniture padding and duct tape that right onto your chair. That's what you get. That's what you deserve, okay? Because let's just face it, we can't be bothering our VPs for these $50 upcharges. We can't do that, okay? Us as a multi-billion dollar company, we are very concerned if you spend $25. And now, all of a sudden, you can spend infinity on tokens. In fact, you're even encouraged to do so.
Going back to this for a second, if you really think about that, that means it takes $1.3 million a month to run OpenClaw. So, how many engineers is that? Well, like if you think about that, let's just pretend we're a big tech Google company. It costs $50,000 a month, and you're spending $1.3 million a month on just AI agents. To replace those with just engineers with Well, that kind of math you I mean, it's a number. That kind of math you can't just do off the top of your head. So, let's just say 30 engineers.
That's like 30 engineers worth of people working on something. You can't just do this for every single project. Your company at some point's going to go, "Okay, timeout. We've made a mistake. We have decided that we let you use all the tokens you want. That's bad. We're going to go back to the old days. Who's the most token efficient? Oh, you're not token efficient. You're spending 603 billion tokens on maintaining a simple project? No, we're not going to do that." You're gone. There's going to come a world where there's an entire consultant class going through these companies teaching people how to be efficient with tokens.
No longer will we see this world of infinity token usage. Instead, it's going to be, "Okay, who's the top performers by features and things delivered, not just by how much you spend." Because in the old world, we used to do buy versus build. Do you build the thing or do you buy the thing? Depending on the cost and the trade-off, sometimes it's better to, you know, trade the time for the money or the money for the time. But now, we kind of have a new world. It's like buy versus build versus vibe. Do you vibe it?
Well, vibing takes both time and money. So, which is the proper trade-off? And I think companies are going to quickly snap back to the old way in which they've always done things. It's going to be, "Okay, who's the most efficient? Who knows how to use these things the best? It's not going to be the people spending Infinity. It's not going to be the fluencers that's telling you you need to run 500 agents in the cloud at all times or you're not going to make it. It's going to be the people that are just being engineers.
They're the people like learning. People that actually want to just do good work and use things to help speed them up in certain areas. And that's my prediction. Yes, I I'm doing a prediction. I'm doing an actual prediction. I know you're not supposed to do predictions. Tech predictions almost are largely you're always wrong, but I do think in the near future we are going to see token efficiency as an entire argument as opposed to simply token maxing. Token maxing is because we're just trying to figure out is this even viable? And by we, I don't mean me.
I'm out here still hand coding stuff for my video game, okay? This is it's a different world. But nonetheless, this was very interesting to see. I was very happy I got to read about this and kind of see the live reaction from everybody because people were just, you know, instantaneously suspicious. Like, "Oh, this is just open code trying to make money." Yeah. They are they are trying to make money. I'll tell you that much. But they also this is just like what they think the future looks like. You and 100 agents non-stop doing stuff. And maybe at some point in the future, maybe hey, you know what?
Maybe in 10 years, some large amount of time when we have, you know, 100x more energy and 1,000x more GPUs. Yeah, maybe that future does exist in in some far away place. But right now, to me at least, the big takeaway here is I think you got to start thinking about token efficiency. You got to start thinking about how you're actually using it. Maybe having a kajillion agents does work for one person. But I'm not sure if this is really a sustainable approach for anybody, even if the promised 10x is going to happen. Okay, sorry.
I made a future prediction and I'm probably going to be wrong, but I you know, honestly, I think I'm right. Also, the consulting class, can we all just agree that's going to be the most annoying people in the universe? Honestly, I'd almost rather take the crypto bros who are going to be like, "Oh, you got a token max." than the new class of agile coaches that are going to be coming out. These agile coaches for token efficiency is just going to be the worst. Oh my gosh. There is actually going to be prompt trainers. Like it's going to be like Pokémon trainers, but they're going to be prompt trainers and you're going to have to go in there and they're going to like one-v-one you on prompts.
It's going to be so ridiculous. It's all horoscopes, baby. The name is the Brian Magen.
More from The PrimeTime
Get daily recaps from
The PrimeTime
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.








