Big Projects Always Fail... Anthropic Is Fixing That

AI LABS| 00:14:07|May 21, 2026

Chapters12

This chapter discusses the challenges of scaling codebases with agents, especially when dependencies grow, and introduces the practical steps for handling large projects as highlighted by Anthropic.

Anthropic’s approach shows how to scale codebases with robust harnesses, proper navigation, and modular agent components to avoid large-project failures.

Summary

AI LABS explores Anthropic’s roadmap for scaling coding projects with agents. The video emphasizes that small, easy projects collapse once dependencies grow and languages become less conventional, and it highlights practical steps to prevent that failure. Rodrigo and team note that agent navigation has shifted from rag-based semantic loading to file-system-based methods, mirroring how developers actually explore code. Claude Code, Codeex, and Gemini CLI are discussed as examples of strong inherent harnesses, but the main message is that a tailored harness—plus sub-agents, plugins, and LSP integration—drives real success at scale. Claude’s agent harness components are broken into five parts, starting with claude.md, which should stay concise (around 300 lines) and evolve with the project. Hooks, skills, and plugins are presented as essential tools to stabilize large-scale workflows and ensure consistent context without bloating memory. The video also stresses the importance of LSP for unconventional languages and the role of MCPs to connect internal tools and data sources. Finally, the host plugs a sponsor and points viewers to AI LABS Pro for deeper resources and templates to implement these practices.

Key Takeaways

Rag-based navigation is falling out of favor for large projects because semantic matching across a growing codebase can hallucinate or fail when the central database becomes unwieldy.
File-system-based navigation (bash tooling, ls) is the preferred mode now, loading exact code snippets into context without bloating the window.
A strong harness (Claude Code, Codeex, Gemini CLI) is crucial; model power alone isn’t enough—design a project-tailored harness to unlock real coding performance.
Claude.md should be concise (about 300 lines) and optionally split by subdirectories to keep context focused and scalable as the codebase grows; maintain it as the model evolves.5D1️

Who Is This For?

Essential viewing for software teams building large-scale AI-assisted coding workflows, especially those using Claude Code, Codeex, or Gemini CLI who want robust harnesses and stable context handling.

Notable Quotes

"Nowadays, shipping small projects has become really easy, but agents start failing the moment the codebase grows large and gets multiple dependencies."

—Sets up the central problem: scaling issues for agents amid larger codebases.

"The thing here is that no matter how models are improving on their own, the model alone does not determine how good the code you are able to produce will be."

—Emphasizes harness and integration over model power alone.

"There are also open- source harnesses like superpowers and you can use any of those when you are building something."

—Highlights the practical option of using external harnesses.

"Hooks force Claude to act. They help when working with large-scale code bases."

—Underline the practical role of hooks in managing behavior.

"Sub aents contain isolated context windows of their own and do whichever task is delegated to them by the main orchestrator agent."

—Explains the purpose and benefit of sub-agents in the architecture.

Questions This Video Answers

How do you implement a scalable agent harness for a large codebase with Claude Code or Codeex?
What is the difference between rag-based and file-system-based navigation in AI agents, and why is the latter preferred for big projects?
What role does LSP play in guiding agents through unconventional programming languages?
How can MCPs and plugins streamline collaboration on large AI-assisted development projects?
Why should claude.md be limited to ~300 lines and how should you structure it across subdirectories?

Anthropic Claude Code Codeex Gemini CLIRAG (retrieval-augmented generation)LSP (Language Server Protocol)MCPs (model communication plugins)hooks skills plugins

Full Transcript

Nowadays, shipping small projects has become really easy, but agents start failing the moment the codebase grows large and gets multiple dependencies. The issue gets even worse if you are working with unconventional languages where errors and issues become even harder to trace. What people miss is that you need to take proper steps before making the agents work on large code bases. And this is exactly what Anthropic talks about here. They cover how to actually handle projects when they scale. It was really insightful because these are things we ourselves have been using in our own projects and have found pretty helpful. Before we go into detail on how to set up a project at a large scale, let us first understand how the agents navigate around the code. In general, there are two ways they do this. The first is rag based. This works by embedding the entire codebase and retrieving the relevant chunks at query time. Based on your query, it runs a semantic search which matches your query with the code in its database. from the similarity matches it loads that specific context for the model to analyze and work ahead from. This might work for small-cale apps but it does not sustain on large scale ones. This is because there is a central database that maintains the data and if there are a lot of files in the database the semantic matching might be problematic. This is the reason coding agents hallucinate modules that no longer exist exactly because of its issues. The rag based approach has been completely replaced. The other type is file systembased navigation which is what claude code and most other agents now use. This is similar to how software developers actually navigate. The agent uses bash tools, finds files with the ls command, then gps and narrows down to the exact code snippet it needs and loads that into context. Bash tools work because they do not pollute the context window with unnecessary snippets. So this mode handles all the ways ragbased systems were failing and almost all coding agents now navigate this way. The thing here is that no matter how models are improving on their own, the model alone does not determine how good the code you are able to produce will be. An even more important thing that matters when it comes to working systems is what harness you use for coding. So whichever tool you use, whether it is Claude Code, Codeex or Gemini CLI, the output you get is not solely defined by their powerful models. It also depends on the harness you combine with the model's capabilities. If the harness is weak and the model is strong, there is no point in the model being strong on its own. Now, we know agents like Claude Code and Codeex have strong inherent harnesses, but this does not mean you have to rely on those entirely. You need to set up a harness tailored to your project directly so it fits your project better. There are also open- source harnesses like superpowers and you can use any of those when you are building something. But when you are developing a large scale project, these harnesses might not sustain and you would need to set up your own anyway. Every agent harness you build on your own or pull from shared chats contains five pieces centered on how Claude's jobs and agentic loops are configured environmentally. We will go through each. The first piece in the agent harness is the claude.md file which is loaded at the start of the session and remains in memory for the entire session. This file is really important because it gives claude the knowledge base for the codebase. We have already done a separate video on how to write and structure a proper claude.md which you can check out on the channel. When your codebase grows large, claude.md becomes critical. If you do not spend time on it, your project is bound to fail at scale. This file is for project conventions, codebase knowledge, and the dos and don'ts that apply across the entire codebase, not just a single aspect. This might be fine if your codebase is small, but it becomes a problem the moment you scale into multiple architectures. So, stuffing every aspect of the code into one file is highly inefficient. It distracts the agent with information it does not need at the moment. That's why the claw.m MD should stay short, ideally around 300 lines. And if you are running a monor repo with multiple areas, each subdirectory should have its own claw.md following the same rules. The agent progressively loads it when working in that directory. So instead of pulling everything from the root file, it gets more focused instructions from the subreo files. This file is not something you write once and rely on forever. We need to maintain it actively not only as the project evolves, but also as model intelligence evolves. The principles applicable for sonnet 4.5 will definitely not apply for opus. Newer models are trained to overcome patterns that were failing in earlier instructions. So giving the same instructions to every model just wastes tokens. But before we move forwards, let's have a word by our sponsor, Clean My Mac. If you work with AI tools like we do, your Mac quietly piles up junk, old builds, cash, broken downloads, and you don't notice until it starts lagging. I run Clean My Mac every week, and it frees up over 15 gigs in a single scan. That's it. One click and my Mac was brand new again. Clean My Mac is built by Macpaw, Apple notorized, and trusted by over 29 million people for 17 years. The cleanup feature removes over 20 types of junk so your system stays fast without babysitting it. Space Lens maps your drive visually so you know what's eating up space. It even scans your iCloud, Google Drive, and Dropbox locally for unsynced files wasting cloud storage. And it catches 99% of known malware through Moonlock so your Mac stays clean and secure. Your Mac should keep up with you, not the other way around. Use code AI Labs for 20% off and try Clean My Mac free for 7 days. Now, hooks are another important thing that helps when working with these large code bases. They are basically scripts that let the agent take specific actions based on certain conditions. There are many types of hooks you can configure, usually written as shell scripts that control the agents behavior. For example, you can configure a session start hook, which loads the information you want at the start of each session, like which files Claude should load for context. You can also use a hook with exit code too and feed the error message back to Claude so it can iterate on that. Pre-tool use hooks are another type. Whenever the agent uses whichever tool you have configured the hook for, it runs your commands. You can use it to prevent Claude from editing files you do not want it to touch. But one of the most important hooks is the stop hook which runs after a session ends. This pushes Claude to reflect on what has been done so far. From that it can update the claw.m MD with the learnings from the session so the same issues do not happen again. You can also configure hooks for linting, running tests and many other purposes. All of these strung together help a lot with large-scale code bases. Hooks force the agent to do things it should be careful about where instructions in claude.md alone may not suffice. Instructions in claude.md can get blurred in the agents attention span due to too many things to focus on, but hooks actually force Claude to act. The third piece in the workflow is skills. It is a set of skills.mmd files and other grouped files that load on demand instead of being present in every session and bloating it unnecessarily. Skills are important because they use progressive disclosure and are tailored to perform a specific specialized task needed for the workflow. They expand the agents knowledge of something it is already capable of doing. If you put these instructions in claw.md, they just consume unnecessary tokens. Project specific instructions should go into skills because they load only when the agent actually needs them. You can also scope skills to specific paths so they only activate in the relevant part of the code and do not bloat the context outside of that. For example, if you are working in the deployment area, you can specify the path of that directory in the skill description. So the skill is never loaded when you are working elsewhere. To configure skills, you just invoke the skill creator that now comes built into claude code. Previously, you had to get it open source from GitHub. Then you answer the questions it asks during the discussion session. You will have a skill tailored to your exact needs which you can access once you restart the session. Aside from skills, you can also use plugins. Plugins are a bundle of skills, hooks, and MCPs available as a single downloadable and distributable package. So, whoever installs this plug-in will have the exact same context and configurations made available for their use right away. So, if you are working in a team, creating your own plugins to distribute to teammates becomes really important. If you set up all your configs in one place, that information can be distributed across the organization so your team members have the same context as you. You can do this by creating your own plugins and managing them by either manually uploading them or syncing with a GitHub repository. You can install any plug-in using the plug-in command and you can browse the marketplace and install whichever one you want. You can also add other marketplaces using the add plug-in marketplace command. Claude Code also comes bundled with multiple plugins like front-end design, code review, code simplifier, playwright, and others all from the Claude official marketplace. You can use them directly in your workflow and you can create your own as well. Plugins matter especially for large scale projects because a lot of people work on the same project and distributing context among them is important. So instead of making each person download skills and other components separately, they can install the plug-in directly. Also, if you are enjoying our content, consider pressing the hype button because it helps us create more content like this and reach out to more people. Another thing that matters in agent harnesses but is not talked about enough is LSP. Language server protocol or LSP is basically an integration that gives the agent the same kind of navigation a developer has in an IDE. There is an LSP for almost any programming language and it might be unnecessary with popular ones, but it becomes critical with unconventional ones. It gives the agent intelligence about the programming language so it can navigate the code base the way a human does. For example, when a human wants to find a function, they check where that function is imported from, go to that file, and check that file for the functions definition. That is how they actually find the exact source they need. Without LSP, the agent pattern matches based on text and is likely to land on the wrong symbol. As we mentioned, Claude Code uses the file systembased approach with bash commands. So without LSP it is just pattern matching on file names and text not navigating with deeper intelligence. Now do not assume LSP is not needed just because your agent has not run into errors yet. Set up LSP even before you start working on the project. Configure it for all the languages you will use even before writing any code. So the agent already has information on how to work with them. Instead of letting the agent guess patterns, installing LSP lets it read and edit code the way a developer thinks about it, not just as text. Now, as you already know, MCP is used to connect the agent to external tools. But you can also connect your MCPs to your project's internal tools, data sources, APIs, or other systems the agent otherwise cannot reach. For that, you need to create your own MCPs and make them available so people on your team can use them easily. MCPS are basically an extension to the existing setup loaded whenever they are needed. And the tools they provide are then available for the agent to use. If you are working on a large codebase, you can build MCPs that serve many purposes like acting as a documentation guide, retrieving analytics, or even letting you make changes through them. These are helpful because if you have your own codebase, you can let the agent naturally interact with internal information, call tools, and make changes there instead of fumbling through huge documentation. This gives the agent more direct access to the information and systems it needs. But to configure an MCP, the basic setup of the app needs to already be working. If you configure your MCP before that things can go wrong and the MCP implementation may fail. So first make sure your app is working properly. Then create the MCP and let the agent interact with your project with more intelligence and better information. Another thing you need to create is sub aents. Sub aents contain isolated context windows of their own and do whichever task is delegated to them by the main orchestrator agent then return only the final output to the parent. This is a key part of an agent harness because using sub aents properly does not bloat the context window and makes context utilization much better since they do not fill the main agents context with information it does not need. Sub aents only run when invoked and then return their findings. Claude spins off sub aents on its own but you can configure sub aents yourself as well. You can configure whichever tools and models you want for them and provide instructions on how they should operate creating specific agents for your own workflows. You can also override Claude's existing agents. For example, you can create your own agent whose instructions override existing ones like explore and provide a description on how it should navigate around your directory. Claude's own explore agent is generalized for all kinds of code bases. But if you configure your own, the custom one overrides the default. This gives the agent more context on how the files in your project are structured. So it does not waste tokens navigating files relying only on the information in claw.md. So you can make the main agent control the whole project execution and rely on sub aents for the actual work. Sub aents also help because you can parallelize their work through agent delegation which makes the workflow much smoother and faster than doing everything sequentially. There are a few more practices you need to follow when navigating around a large codebase. This is important because Claude's ability to navigate a large codebase is determined by whether it is able to find the right context. So ensuring Claude gets the right context is important so the agent does not get too little or too much and stays focused. Aside from separating the claude.md file, you need to separate tests for each subdirectory instead of having them all in one place. This way they stay segmented, avoid timeout issues when a lot of tests run at once and can be scoped more effectively. You can also create a separate codebase map file that maps your project structure. If you are working with conventional apps like React or Nex.js, JS, you can skip this because the agents have been trained extensively on those. But with unconventional languages like C++, you need a codebase map. It acts as a table of contents for the agent, letting it know where each file lives instead of running a lot of bash commands to narrow down to the right one. Lastly, but most importantly, review your setup every few months as the model evolves. Remove the instructions, hooks, or anything else that the newer model no longer needs. Use ignore files like ignore and aagent ignore. So the files you do not want the agent or version control to touch are left alone. This way your setup will be able to sustain on large scale apps. Now the resources for this video can be found in AIABS Pro for this video and for all our previous videos from where you can download and use it for your own projects. If you found value in what we do and want to support the channel, this is the best way to do it. The links in the description. That brings us to the end of this video. If you'd like to support the channel and help us keep making videos like this, you can do so by using the super thanks button below. As always, thank you for watching and I'll see you in the next one.