No way this actually works

The PrimeTime| 00:06:58|Apr 10, 2026

Chapters7

Introduces the idea that Caveman can trim unnecessary output to save tokens when using Claude code.

Caveman trims AI output fluff to slash Claude code tokens, delivering real savings and faster responses.

Summary

The PrimeTime dives into a surprising hack for Claude code: Caveman. Despite initial skepticism, the host shows that trimming output tokens can save real money, sometimes dramatically. He cites Caveman’s philosophy as a blunt alternative to long-winded prompts and explains how it reduces token usage without sacrificing code accuracy. References to GrugBrain Dev and Grugra Dev give a playful backdrop for counterintuitive ideas that bite back against token bloating. The video highlights concrete implementations, such as dropping pleasantries and using concise phrases, plus adjustable levels like light, full, and ultra to control how aggressively you trim text. A practical takeaway is the 69-token example dropping to 19 tokens, and a basic table contrasting usage across Caveman modes. The host also notes why the approach feels like magic in a space where output tokens drive cost, and humorously rails against overlong agent directories while praising the potential cost savings. All of this sits beside a candid jab at studies’ reliability, peppered with a March 2026 brevity-boost claim and a cheeky coffee-subscription plug to close out the segment.

Key Takeaways

Dropping filler tokens can dramatically cut costs: a sample shows 69 tokens down to 19 tokens using Caveman’s technique.
Caveman offers modes (light, full, ultra) to control the degree of token trimming beyond basic removal of pleasantries.
A concrete example notes a table where token usage drops from 1,180 to 159 tokens, yielding ~87% savings.
The approach emphasizes concise, non-hedged responses while keeping technical terms intact (e.g., polymorphism, DB, O config).
The video cites a March 2026 study claiming brevity constraints improve accuracy by 26 percentage points, framing brevity as performance gain.
There’s a critical take on long-winded agent ecosystems and a humorous nod to overabundant “skill” directories common in AI projects.

Who Is This For?

Essential viewing for developers using Claude code or Claude-based tooling who want to cut costs and speed up responses without losing essential accuracy. Great for those curious about practical token-optimization tricks.

Notable Quotes

"Now, I know a lot of you have recently been hitting some limits when it comes to using Claude code."

—Sets up the problem of token costs and limits with Claude code.

"Drop any articles. So just don't use a and and the drop all filters. Just really basically actually simply drop all the pleasantries."

—Illustrates the core Caveman rule: cut filler phrases to save tokens.

"This is just the free hack. They even have like a basic table breaking down the various usages, explaining a reason a reactender bug."

—Highlights the practical, shareable output of Caveman and its impact.

"Brevity constraints reverse performance hierarchies and language models. All of that, all those words just simply mean that making the response brief improves accuracy by 26 percentage points."

—Cites a study cited by The PrimeTime to bolster the method’s value.

Questions This Video Answers

How much can Caveman actually reduce Claude code token usage in practice?
What are the exact Caveman modes and when should you use light vs ultra?
Does trimming language affect accuracy for code-generation tasks?
What are common pitfalls when applying token trimming to API prompts?
Is Caveman compatible with CodeX or other Claude variants?

Caveman Claude code token optimization AI toolingGrugBrain DevGrugra DevGitHub markdownbrevity studyo middleware bugCodeX

Full Transcript

Now, I know a lot of you have recently been hitting some limits when it comes to using Claude code. The conventional wisdom, of course, is that you're holding it wrong, but actually, it turns out there's a better way to save on tokens. The solution, I honestly I I didn't believe it, but it actually works. It actually works quite well. And here's the thing, you will save actual real money using this method. And no, I'm not exaggerating. I'm talking about Caveman. Now, you may not know what Caveman is. And hey, if you don't know what it is, there's there's a couple of kind of pop references you might be familiar with. First off, GrugBrain Dev. If you haven't heard of Grugra Dev, I highly recommend the essays. They go about as counterintuitive counter. I think I just made up a word. They're like countercultural, but I also use the word intuitive. I kind of just made a baby with them. Countercultural to what is going on in today's space age AI. 37,000 lines of code. Just only let AI review AI's kind of nature. This is for the simpler man. Okay. So me think why waste time say lot word when few word do trick. That's actually what caveman is. Instead of allowing claude code or codeex or whatever to go off and say a bunch of expressive statements. Hey man, you're absolutely right. You could spend money getting glazed like wild. Instead it goes straight to the heart of the issue which is to actually just stop saying so many things. And with the cost of output tokens, that's actually this actually can save some some serious money. Okay, so what does actually the caveman scale look like? Well, I can't show it to you because apparently on GitHub, it can't show 200 lines of markdown. And when I try to go look at it raw right now, they broke raw. It just downloads it. Like it doesn't even take me to the web page. Anyways, I downloaded it and this is all it says to do right here. Watch this. Drop any articles. So just don't use a and and the drop all filters. Just really basically actually simply drop all the pleasantries. Sure. Certainly. Of course. Happy to. A short cinnamons big not extensive fix not implement a solution for. All of these are actually real token dropping phrases that you can actually save actual money with which is kind of insane. No hedging. Skip the it might be worth considering. Fragments fine. No need full sentence. Technical terms remain the same. So you can use polymorphism is still polymorphism. We don't we don't shorten up those terms. Code blocks unchanged. Caveman speak around code, not in code. Air messages quoted exact. Caveman only for explanation. You can get the same results. The only difference is Claude just doesn't sit there and glaze you and say a bunch of stupid words at you with well actually the fix was quite simple and your insight into the sol into the problem space was actually the right direction. All I had to do and it's just like no no no shut up. Stop saying that. Here's a good example. Sure, I'd be happy to help you with that. The issue you are experiencing is likely caused by no, don't do that. Yes. Bug in O middleware token expiry check. Use this, not that fix. So often you can actually drop a lot of tokens. Even just this alone, you can see right here it goes from 69 tokens to 19 tokens. It even allows you to do various levels of caveman. You can do light where you're just trimming the fat. You can do like kind of the full one. You can also do the ultra maximum one. All full rules plus abbreviate common terms DB, O config, request, res, u, fn, imple, strip conjunctions where possible, one-word answers when one word enough, arrow notation for causality, and this just actually works. Like, this is just the free hack. They even have like a basic table breaking down the various usages, explaining a re a reactender bug. It goes from 1,180 to 159 tokens. 87% saved, which also just shows how fluffy the language is. Like, think about it's like bloiating. It's just saying a bunch of nonsense with these big extravagant words without actually saying anything at all. I don't want to be much of a conspiracy theorist, but you know, I'm just saying Claude, they do make their money by output tokens. So instead of just being like off is broken, it needs to just go on just a rampage soliloquy to let you know every last possible thing that could possibly be said about a topic that could be said in three words. It's truly an impressive piece of technology. I can't like honestly the trade for affordable computers in Rainforest for one of these little black magic, you know, sandboxes is pretty it's pretty fantastic. I would make that trade any day of the week. Also, if you don't know anything about me, I typically don't cite studies because largely I think studies have been gamed this the facts the facts you're getting hit with I'm not too sure you can really trust those cuz you could you know there's lies there's damned lies and then there's statistics. But hey, since this one's going up in my favor in March 2026, so just a couple days ago, brevity constraints reverse performance hierarchies and language models. All of that, all those words just simply mean that uh making the response brief improves accuracy by 26 percentage points. Now, what is 26% more accurate? Some would some would say that sounds like a lot. What does 26% even mean? It doesn't really matter. You know why? Cuz it's more accurate. Okay. Hey, green mean good. Okay. We got that graph that's going up and to the right. And that's all you need in life. Okay. When things get better, it's good. Things bad, not good. So, go ahead, give it a try. Go check out this Julius Brussy's caveman, which also Can we just take a quick step to the side? We got we got we got to chat about this for a second. Why? Why oh why why oh why does every single agent program you can possibly download have its own skill directory that you put skills into? This has to be the greatest XKCD outcome that could ever be. you have like any project I seem to walk into has like 20 separate folders for the same text and they're all committed. [laughter] It's just like why why did we get here? [gasps] I THOUGHT WE GOT PhD level intelligence. Instead, we just have absolutely junior level execution. It hurts me. It hurts me deep down. Anyways, so if you're struggling out there using Claude code and the you're holding it wrong message, in fact, it did not help you. Why don't you give this a try right here? Okay, go check it out. Don't say I never told you anything. Okay, cuz this is good. This is good information right here. Okay, this good thing you download, use now name Prime Gen. Hey, is that HTTP? Get that out of here. That's not how we order coffee. We order coffee via SSH terminal.shop. Yeah, you want a real experience. You want real coffee. You want awesome subscriptions. So you never have to remember again. Oh, you want exclusive blends with exclusive coffee and exclusive content? Then check out CRON. You don't know what SSH is? Well, maybe the coffee is not for you. [singing] [music] Live the dream.