The Claude Code Limits Problem Is Finally Solved

AI LABS| 00:13:24|Apr 7, 2026

Chapters11

How Claude plans and the 5 hour window work, including message caps per plan and how usage is counted across devices and tasks.

AI Labs breaks down Claude code limits, reveals how to stretch your 5-hour window with planning, structure, and token-smart commands.

Summary

AI Labs’ video unpacks why Claude code’s 1M token context window hasn’t lived up to expectations and how to make it last longer in practical workflows. The host explains Claude’s two paid plans—Pro and Max—and how their message limits (roughly 45 and 225 messages per 5-hour window, respectively) can still run out quickly depending on the model (Opus vs Sonnet) and task type. The discussion then shifts to concrete optimization techniques: using clear and compact commands to control context, employing the by-the-way trick for side questions, and prioritizing upfront planning to avoid costly mid-project course corrections. They also cover project structuring for token efficiency, such as minimizing an overlong claude.md, splitting rules into area-specific documents, and leveraging skills, scripts, and an appendable system prompt. The host notes that official limits can be exacerbated by recent session-limit reductions during peak hours and by leaked Claude code issues like partial responses that bloats context. Practical tips extend to disabling automemory, background tasks, and unnecessary thinking to save tokens, plus setting a hard cap on max output tokens. The video also lightly promotes Twin, a no-code AI agent, as a safer, token-conscious automation option. AI Labs closes with guidance on implementing these strategies in real-world products and hints at downloadable claude.md templates for subscribers.

Key Takeaways

Pro plan provides around 45 messages per 5-hour window; Max plan provides around 225 messages, with higher costs if used by multiple team members.
Opus models burn ~3x more tokens per request than Sonnet, so frequent Opus use dramatically shortens the 5-hour window.
Plan ahead: spend tokens on planning to reduce costly corrections later in development.
Use clear, compact, and by-the-way commands to keep context lean and separate side questions into separate sessions.
Limit and optimize the claude.md by keeping it under 300 lines and moving project-specific rules to separate documents linked by claude.md.
Disable automemory, background tasks, and thinking where appropriate to save tokens; cap max output tokens to control generation size.
Structure projects with hooks and area-specific rule files to load only what Claude needs in a session.

Who Is This For?

Software teams and AI product developers using Claude code who want to maximize their token budget, especially those running multi-user organizations or heavy compute tasks. Ideal for engineers adopting project-structure strategies to keep Claude efficient all day.

Notable Quotes

"There are two paid plans, Pro and Max, with different message allowances, and they all follow the same rule: a limited number of messages per 5-hour window."

—Explains the basic constraint shared across Claude plans.

"Opus models consume around three times more tokens for the same request than Sonnet because they are far more powerful and compute intensive."

—Highlights model choice impact on token usage.

"If you want to save tokens, use the clear command when you’re done with a task and don’t need previous context anymore."

—Introduces practical commands for token efficiency.

"The shorter your claude.md, the better it will perform; keep it under 300 lines and make it a guiding file, not a full manual."

—Gives a concrete guideline for documentation to optimize context usage.

"You can disable automemory and background tasks to prevent token waste from processes running in the background."

—Offers configuration tips to cut unnecessary token usage.

Questions This Video Answers

How many Claude code messages do I get per 5-hour window on Pro vs Max plans?
Why does Opus use more tokens than Sonnet and how does that affect my workflow?
What are the best practices for structuring claude.md to minimize token usage?
How can I use planning to reduce token waste when building AI-powered apps?
What are effective hooks and per-path rules to keep Claude focused on relevant context?

Claude codeClaude limitstoken optimization context windowplanningclaude.mdOpus Sonnet Twin (sponsor)AI tooling

Full Transcript

Claude code has not been great recently. Our team uses it every day and over the past few weeks, we've been running out of limits way faster than we should be. The 1 million token context window was supposed to make things better, but it's actually made it worse. This is why we went and researched optimizations we could find to make Claude code last longer. Before we move forward to how we can actually make the most out of the limits, let us first discuss how the plans and limits system of Claude actually works. This section is just for explaining for those who are not familiar with how the limits actually work. Claude has two paid plans which include the pro and max plan. Max is the most expensive one and pro is a cheaper plan with just $20 monthly. Both plans have access to different features that were not available in the free plan, including claude code, co-work, and others. But they all follow the same rule. No matter which plan it is, each gives you a limited number of messages you can send within a 5-hour window. And once that window ends, your message count resets. The number of messages you get differs by plan. The 5-hour window starts when you send your first message. Whether it's on clawed desktop, web, or any clawed interface. After the window starts, each message you send is counted against the set limit of your plan. Now, you might expect that the window only counts when you're actively using it. But even if you go idle in between and then use it heavily in the fifth hour, the window is still running and you would have to wait until the full 5 hours pass before your limit resets. The 5-hour window is also not dependent on your device. So, if you're using more than one device with the same account, all usage will be counted within the same limit. Now, for the pro plan, you get around 45 messages per 5 hour window. The max plan gives you 225 and the max 20 times plan, which is more expensive than the $100 plan gives you 900 messages in the same window. These numbers can vary depending on the model you use, as you get more messages with Sonnet and fewer with Opus. Now you might think that this number of messages sounds more than enough for your use case, but this is just a rough count and there are other factors that affect it. The first one is the model you are using. Opus models consume around three times more tokens for the same request than sonnet because they are far more powerful and compute intensive. So if you are using opus all the time, you won't get 45 messages in your 5-hour window and your limit will run out much faster. The pro plan has a lower limit overall. As for the Max plan, while a single person might manage on it, Max is usually purchased by organizations and distributed across team members, so it won't hold up with multiple people on board. We do the same at AI labs. We've purchased a max plan and distributed it across our team. Even with that, we still run out of the limit frequently, which led us to research ways to make it last longer. The second factor is the type of task you are performing. Compute intensive tasks or tasks that require multiple tools consume a lot of tokens. So the window will run out much faster than usual and you might not even make it to 45 messages on the pro plan. And on top of all that, Anthropic has recently reduced the session limit faster during peak working hours when many people are using the service heavily at once. So your Claude plan will run out even faster before you can get any actual work done. This is why now is the right time to learn how to make the most out of your window and use Claude effectively all day. But before we move forwards, let's have a word by our sponsor Twin. If you've tried automating with tools like Zapier or N8N, you know the deal. Rigid workflows, constant breakdowns, and hours wasted connecting apps. And local agents like Claudebot are security nightmares and way too expensive. Twin changes that. It's a no code AI agent that actually does the work for you while you sleep. It connects to tools via APIs when they exist and when they don't. It builds integrations on the fly, giving you an infinite integration library. And if there's no API, Twin can just browse and interact like a human. On top of that, you get built-in access to tools like Perplexity, Gamma, VO3, and Nanobanana. They've just launched the Twin API, so you can trigger agents from anywhere and plug them into your existing workflows. And the best part, these agents learn. They fix themselves when something breaks, improve over time, and run 24/7. Stop babysitting broken automations. Click the link in the pinned comment and check out Twin. Now you might already know that the claw code source code was leaked and a lot of people identified that there are many issues inside it that can make limits run out faster than intended. One of these is truncated responses staying in the context. So if you get an error message like a rate limit being reached, it can create a partial response but instead of discarding that, it retries while keeping the previous context along with the partial error-filled message. This bloat the context with unnecessary information and wastess tokens. The skill listings are also injected mainly for faster access even though they don't provide much value because faster handling through the skill tool already exists. Similar to that, there are some other issues as well. Because of all this, a lot of people are complaining about claude limits being hit faster than expected. So to counteract both the official limits and these hidden token drains, you have to take certain measures to make Claude code last longer when you're building your products. We share everything we find on building products with AI on this channel. So, if you want more videos on that, subscribe and keep an eye out for future videos. We'll start with the tips you might have already heard from us if you've watched our previous videos. The first one is the clear command. Use this whenever you've completed a task and don't need the previous context anymore. For example, when you are done implementing the app and want to move to the testing phase, you don't need the earlier context. So, it's better to reset it and start the next task with a fresh context window. But sometimes you do want to retain some of that context. In that case, you can run the compact command instead. It summarizes the whole interaction and frees up space with a summary in the context. The reason we want you to use these is because every time Claude sends a message, it includes the entire conversation so far along with system prompts, your tools, and all previous conversation history. With each new message, this keeps growing, resulting in a bloated context window and higher token usage per message. Now, even with compacting, if you ask side questions in the main window, you're still bloating it with unrelated content. So, you can use the by the way command to ask a quick side question. It responds in a separate session context window. This side question wouldn't go with the next message you send, leading to fewer tokens per request. Now, even though planning might sound like a token-intensive task, you need to start your projects with it. This is because if you don't spend time planning, you will have to course correct Claude later when its implementation is not aligned with what you need. Spending tokens up front on planning saves you from wasting far more tokens on corrections down the line. Sometimes Claude doesn't follow your instructions as you want to. In those times, we often prompt it again with the correct way of implementation. But instead of reprompting, you can run the rewind command to restore the conversation and code to a previous point before the message where Claude didn't align and make the changes directly in the prompt. You can also double press the escape key to do the same thing. This removes the incorrect implementation from the context window and the wrong outputs don't get sent to the model. Now, all of these commands help you save tokens during a session. But the bigger impact comes from how your project is structured in the first place. You might have already structured your projects using different frameworks like BMAD, SpecKit or more. But the majority of these frameworks are actually token intensive. So if you use them in your own app, expect your token limit to be reached faster. While these frameworks might sustain on max plans, they definitely won't on pro. Now even if you're not using frameworks, you might have set up your own. For creating claw.md file, you must have used the init command which goes through your codebase and creates a claw.md file for you. It does create one, but it contains a lot of issues. This file is supposed to provide guidance to the AI agent, but it lists certain things that the AI already knows on its own. For example, the commands it shows are ones used to run every dev server, and Claude already knows how to do that. Unless you have a different running flag for running the server, there's no need to add those in. Same with the architecture. Claude can read file names and deduce what each file is about based on the name because it understands file systems and uses it for navigating around. So there's no real need for these kinds of instructions unless there are specific cases where additional guidance is required. If you're going to write your own claude.md, it should ideally be less than 300 lines. The shorter the file, the better it will perform and the more focused Claude will be on what actually matters. It should act as a guiding file, not a detailed manual explaining how to do everything. Whatever you include should be generically applicable across the project. Not specific details of each part all packed into one file. Include what Claude shouldn't do, any of your development practices and other similar instructions which Claude doesn't know by default only in the claude.md. You need to configure this file properly because this file gets loaded into the context once every session and stays there. So unnecessary information in the context window means you're wasting tokens with each turn which aren't even needed up front. For specific aspects of the project like database, schema or other areas where different rules are required, split them into separate documents and link them in the claude.mmd file. This allows Claude to progressively pull in only the docs it actually needs. We also mentioned this in our previous video. Creating project rules that are specific to certain paths helps Claude stay focused. This way, Claude only has relevant information in context and avoids unnecessary token usage. So you should also separate rules files for area specific logic so that Claude can load only what's required. You also need to make use of skills for repetitive workflows and add scripts and references so it can perform tasks more accurately. Skills help by progressive loading only the required part and this makes Claude stay focused on the relevant aspect of the task. Bundling with scripts help by not wasting tokens on the deterministic tasks which can be handled programmatically. The reason for separating files is simple. If Claude is working on one part, it doesn't need information about unrelated areas. But if everything is placed in the same claude.md file, all of it will be loaded every time, leading to unnecessary token usage. You can also use the append system prompt flag to add specific instructions directly to the system prompt. The session starts with those instructions instead of putting everything into the claw.md file. These instructions are temporary and will be removed once the session ends. Now, this might sound like it's adding to the context, but it's actually more efficient than putting a one-time instruction in claude.md. If you add it there, Claude keeps it in the context permanently, wasting tokens unnecessarily. With appending, you provide the instructions exactly when you need them. Also, if you are enjoying our content, consider pressing the hype button because it helps us create more content like this and reach out to more people. You also need to set the effort level of the model you're using. If you're not working on a task that requires much thinking, set it to low since the low setting saves tokens. By default, it's set to effort auto, which means the model decides how much effort to use, but you can manually change it. If your task isn't very complex, there's no need to use a high effort setting. Now, as we mentioned earlier, Opus is the most token consuming model. So, if you're working on straightforward tasks, switch to Haiku. If your task requires a reasonable level of thinking, use Sonnet. It might not be as powerful as Opus, but it is still efficient and saves more tokens. If you've configured multiple MCPS for a project and don't need a particular one, just disable it so it doesn't waste tokens by injecting unnecessary information into the context window. Another important step is creating hooks that filter out content that shouldn't belong in Claude's context window. For example, I've configured test cases for my project. When we run them, they report both past and failed tests and all of that gets loaded into the context. But Claude's main concern is the failed tests since those are what need fixing. So you can create a hook that uses a script to prevent the past test cases from entering the context window and only the failed ones get included. This saves a significant amount of tokens compared to injecting all test reports. You can configure hooks for many other tasks the same way to optimize token usage. Now aside from all of that there are certain configurations you need to make in yourcloud folder to improve performance. The first one is setting disable prompt caching to false. This makes claude cache your most commonly used prefixes which reduces token usage. Anthropic doesn't charge you for parts that are sent repeatedly. You only pay for the new content. You can also disable automemory to prevent it from adding content to your context and increasing token usage. Automemory is a background process that analyzes your conversations and consolidates useful information into memory files for your specific project. Disabling it means it won't track your habits, but it will save tokens by not running in the background. There's another flag called disable background task which stops background processes from consuming tokens continuously. These include dream, memory refactoring and cleaning, and background indexing. Turning this off helps save tokens because even if you're not actively chatting, these processes would still be working on your conversation. You should also disable thinking when it's not needed because thinking consumes a lot of context and wastess tokens extensively on tasks that don't even need it. Now, this is different from the effort setting we discussed earlier. The effort setting controls how much reasoning Claude does within a response. So, lower effort means less thinking, but it still thinks. Disabling thinking completely turns off the internal reasoning step and Claude just generates the response directly. So if your task doesn't need deep reasoning, disable thinking entirely. If it needs some reasoning, but not a lot, lower the effort level instead. Finally, configure max output tokens to a set number. There's no default, but limiting this controls how much the model generates. Set it lower if you want to save tokens aggressively, or increase it if your task requires longer outputs. Now, the claw.md template and other resources are available in AIAS Pro for this video and for all our previous videos from where you can download and use it for your own projects. If you found value in what we do and want to support the channel, this is the best way to do it. The links in the description. That brings us to the end of this video. If you'd like to support the channel and help us keep making videos like this, you can do so by using the super thanks button below. As always, thank you for watching and I'll see you in the next one.