Vercel's New Claude Code Setup Fixes AI Coding's Only Problem

AI LABS| 00:10:30|May 8, 2026
Chapters7
Discusses how rapid AI assisted coding has led to more security breaches and the need for better tooling.

Vercel’s DeepSee harness uses Claude-based agents to securely review AI-generated apps at scale, balancing speed with structured, batched analysis despite high token costs.

Summary

AI Labs walks through Vercel’s new DeepSee security harness, designed to catch breaches in AI-generated applications. It uses Claude Code and CodeEx in parallel, targeting large codebases with a scalable batch workflow. The system starts with a regex-based scan to flag security-sensitive areas before handing batches to agents, which lowers token waste. After processing, a revalidation step (optional) cross-checks false positives, and findings are exported as markdown or JSON for ticketing. The workflow also offloads metadata to Git to attribute issues to responsible developers. The videos’ hands-on test with a web app containing 10 built-in vulnerabilities reveals DeepSee’s strengths and gaps—highlighting how information in an info.md can steer results. Claude’s reports sometimes surface more issues than DeepSee, prompting an iterative approach to focus scope. Finally, AI Labs shows off a DeepSee-based skill in AIAS Pro that automates the entire process, including multi-model runs and comprehensive output ready for real projects.

Key Takeaways

  • DeepSee combines Claude Code and CodeEx in parallel, enabling faster reviews on large repositories at the cost of higher token consumption.
  • The workflow begins with a regex scan to filter files before expensive agent reasoning, which is crucial for thousands of files.
  • Batches of roughly five files are analyzed per batch, with a fresh prompt per batch tailored to project context.
  • An optional revalidation step helps weed out false positives, and results are exported in JSON and Markdown for ticketing.
  • The info.md file’s content can steer what DeepSee looks for, sometimes limiting findings to pre-known vulnerabilities.
  • Claude’s internal reports can identify more issues than DeepSee in some runs, highlighting complementary strengths and gaps between models.

Who Is This For?

Essential viewing for security-focused AI developers using Vercel’s DeepSee or Claude-based tooling. It’s especially relevant for teams evaluating how to scale AI-assisted code reviews without blowing through tokens.

Notable Quotes

"Deepsee is a structured tool that handles reviews far more systematically."
Describes the core design philosophy of the DeepSee harness.
"The tool is designed for scanning large repositories because it supports a parallel design that speeds up the workflow and batches code into multiple groups."
Explains why batching and parallelism matter for big codebases.
"It splits the project into batches and calls multiple tools on each one."
Outlines the batching and multi-tool orchestration.
"Claude's report was much more detailed and highlighted 39 issues."
Shows how different models can yield different depths of findings.
"The info.md file contains a general overview of what the codebase does and what the authentication flow looks like as well as the threat models, project specific patterns and all the known false positives inside the code."
Illustrates how prep work inside the project can steer the review results.

Questions This Video Answers

  • How does Vercel DeepSee use regex scanning to handle security reviews in huge codebases?
  • What are the token implications of running Claude Code and CodeEx in parallel for security reviews?
  • Can DeepSee's optional revalidation step significantly reduce false positives in real-world projects?
  • How does Claude's reporting compare to DeepSee when focusing on scope versus broader findings?
  • What steps are needed to run DeepSee end-to-end on a new codebase using the AIAS Pro resources?
Vercel DeepSeeClaude CodeCodeExAI security harnessregex scanningbatched processingtoken economyAIAS Prosecurity reviews
Full Transcript
AI made coding accessible to everyone and people have started shipping code at a much faster pace. But at an even faster pace, security issues inside those apps started piling up. And in the past few months, things have actually gotten worse. There have been many instances when an agent deleted someone's entire project. Another agent deleted an entire production database while the developer was working on something completely unrelated. And there have been many similar issues like Apple's internal claude.md being leaked. So tooling that can actually catch these issues matters more now than it did. Seeing this rise of issues, Versel just released a security harness to detect breaches in AI generated applications called deepsee. Now you might think claude code can already do security reviews on its own with its agents. So why would you need deepsee in the first place? It's because deepse is a structured tool that handles reviews far more systematically. Under the hood, it's using coding agents like claude code and codecs. The tool is designed for scanning large repositories because it supports a parallel design that speeds up the workflow and batches code into multiple groups which makes it perfect for reviewing large code bases. Now this is not built with cost effectiveness in mind. They are using the most powerful models of claw code and codecs which are opus 4.7 at max effort and GPT 5.5 at x highing both of which consume a lot of tokens and with them running in parallel the token usage piles up quickly increasing cost. Multiple known apps have already run this harness on their code bases and reported good results. In the test they ran, the false positive rate of this tool is roughly 10 to 20%. This number is significant given how LLM accuracies usually are. Conversely, this means the agent is correct most of the time and its true positives are high. The architecture behind this is what makes it different. If you ask Claude Code or any agent for a security review, it will start by directly scanning the codebase and then produce a full review report. that not only takes a lot of time but it also consumes a lot of tokens and the review might still miss things. So the first part of this workflow is scanning. Performing a regx only scan of all files for security sensitive areas that the subsequent steps will focus on. Regex detection matters here because the tool is designed for large code bases where there can easily be thousands of files. Regex matching is a series of code patterns that match known areas likely to have security vulnerabilities and then filter those files out from the large pool. Once the large pool of files has been filtered, the next step is investigation using the agent. The agent is the expensive part consuming a lot of tokens and typically taking a long time depending on how big your codebase actually is. So this tool splits all the files into batches and parallelizes them so they can be processed at the same time. Once that process is done, there's another step of revalidation where the investigation is checked again so that false positives are cross-cheed. In case something was missed, it catches that and ensures the classification has been done correctly. This validation is actually optional. After that, the agent uses git metadata and other sources to identify which people are responsible for which issues. Once all of that is done, the findings are stored as markdown or JSON so that they can be turned into tickets for humans as well as coding agents. Now, as mentioned earlier, the files are grouped into batches with around five files processed together per batch. For each batch, a fresh prompt is assembled based on the identified framework along with other project information. These are then analyzed by the claude agent SDK or codeex agent SDK whichever you have configured and they're given tools with readonly access to understand what the code base contains. Once they have the findings, everything is merged into a single file that is dduplicated and normalized. At the end, there's a follow-up step to make sure the analysis has actually covered everything. This architecture makes it effective because of its systematic process and structured analysis method, and it helps identify issues far better than it could without the harness. So to test this out, we used an open-source project that is a web application containing built-in security risks just for practice. We wanted to see if this tool was able to detect all of the issues in this repo on its own. This project contains 10 security issues with all the details available directly in the code itself, including how to remove them. So to run deepseec you first run the deepseec init command which installs the dependencies and creates a deepseec folder and then you install the dependencies inside that folder. It also gives you a prompt that you need to paste into whichever coding agent you use. Since we were using Claude code, we ran that prompt in Claude, which contains the instructions for creating a small info.md file that includes all the project information and is built around a specific template. You do not have to run this command in the project folder itself. You run it in the deepseec folder because it instructs the agent to look in the previous directory and read all the information from it. The info.md file contains a general overview of what the codebase does and what the authentication flow looks like as well as the threat models, project specific patterns and all the known false positives inside the code. So once this file has been created, the next task is to run the deepseex scan command. This command is the regx matcher we previously talked about and it finds all the matching endpoints and lists all the filtered files containing potential security issues. This part happens fast because it's just code working in action. The next step is to run the deepseec process command. You can specify any API key of the model you want to use whether it is the versal API gateway codeex or claude inside the env.local file. But if you do not do so like we didn't, it automatically defaults to the claude code subscription and uses your authentication instead of requiring any API key. It splits the project into batches and calls multiple tools on each one. After each batch, it gives a summary of how many tokens were used and what the estimated cost was. Now, if you are using a subscription, it will not charge anything beyond your subscription, but it still provides an estimate for API costs. Since this is designed for large codebase reviews, it keeps reliability in mind. So, in case there are any errors during the review, it does not restart everything from scratch and instead continues from the point where the error occurred. Once the scan has been completed, you run the deepseec report command and it generates a report in both JSON and markdown format containing a general overview of all the findings categorized by severity level. Now once this report has been generated, you can run the revalidation step. This step is entirely optional. You can run it if you want or skip it completely. Once you run it, it validates the findings to check whether the reports are false positives or not. After that has been done, you can export everything using the export command and it will write the findings into the findings folder. This findings folder contains the issues ordered by priority as folder names and creates one file per identified issue. It first lists the source of the issue, meaning the exact file and the lines causing the issue, how severe the issue is, and how confident the model was in identifying it. It also mentions which commit introduced the issue and assigns the user who committed it. It then explains the recommended fix, lists the revalidation results, and mentions all the issues that were explicitly addressed. It also includes the steps to reproduce the bugs inside the findings. But this report still did not identify all of the issues even though the tutorial was actually inside the code itself and it should have been able to identify them. So we iterated with Claude on why the original vulnerability lessons that were bundled into the app by design were not identified. Upon iteration with Claude, we found that the reason this tool only reported three findings was because of an explicit mention in the info.md file. Deepsee expected an app where the 10 vulnerabilities are already known and it only focused on issues besides them because they were already known. Meaning it was actually trying to go beyond what was already known and only focus on other patterns so that the scan becomes much more effective and does not waste time and tokens on issues that are already documented. We then tested another app to see if it did better this time. We ran the same steps starting from the scan to the processing stage. We did not run the revalidation part. We just created the report and exported it directly. And this time, Claude's info.md file only contained details about the app and did not include statements like the previous one. Side by side, we also asked Claude to review the code and write a report.md file with a complete security review so we could compare which one actually performed better. So the report created by DeepSec found multiple bugs with different severity levels. It found nine issues and created a detailed report along with recommended steps on how to fix them. And these recommended steps are what most other reports miss because this is what helps the agent understand how to fix the issue which makes debugging much easier. But we noticed that Claude's report was much more detailed and highlighted 39 issues. So we asked it to create a diff first. The diff showed that Claude's number was larger. But we had already seen this during our testing with Codeex. Claude tends to identify other issues in addition to the scope along the way. It does not solely focus on the scoped issues that DeepSc was specifically designed for. So once we asked it to focus only on scope, it narrowed the findings down to 13 issues. But there were still a few issues that DeepSEC missed which were identified in Claude's report. The reason Deepsec missed a few findings is because it focuses only on issues that the code directly contains and that can be resolved directly from the functions themselves. It does not identify issues that might arise when the app actually runs like cores related problems. It does not really focus on logical patterns and architectural decisions either. As we mentioned previously, it uses reg x to filter out files first. So, it mainly focuses on what is explicitly present in the code and not on issues that may occur dynamically when the application is running. Also, if you are enjoying our content, consider pressing the hype button because it helps us create more content like this and reach out to more people. Now instead of running these steps one by one on our own, we've created this deepseex skill which contains all the instructions on how to use Versel's security scanner end to end and how it should identify from the user's prompt what is being asked. It then follows the entire step-by-step process and manages the whole harness on its own. It is also bundled with multiple assets, evals and references for all the issues along with multiple scripts that might actually help with the working solution and the overall functioning of this repository. So with this in place, you can just run this security scan and specify which model you want to use and it will directly handle everything for you. It will run through all the steps we saw earlier along with addressing the issues that it missed previously and will be able to perform a much better security review by combining deep sex abilities while also covering the gaps in its findings. Now this skill along with all resources can be found in AIAS Pro for this video and for all our previous videos from where you can download and use it for your own projects. If you found value in what we do and want to support the channel, this is the best way to do it. The links in the description. That brings us to the end of this video. If you'd like to support the channel and help us keep making videos like this, you can do so by using the super thanks button below. As always, thank you for watching and I'll see you in the next one.

Get daily recaps from
AI LABS

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.