Claude Code's favorite tech stack

Theo - t3․gg| 00:38:58|Apr 29, 2026

Chapters8

The presenter shows how Claude Code can confidently hallucinate tool choices, sometimes recommending the wrong database or service.

Theo critiques Claude Code’s tool recommendations, warns about dangerous hallucinations, and highlights how AI-driven stacks can shape real-world tech choices.

Summary

Theo breaks down Claude Code’s track record on suggesting not just code changes but entire tech stacks. He calls out a terrifying hallucination where Claude Code claimed Planet Scale shut down its database in January 2025, noting Anthropic has not corrected the error. The video delves into Amplifying’s survey of Claude Code’s tool picks across databases, backends, and CI/CD, revealing a mix of impressive behavior and troubling defaults. Theo admires some of the smarter patterns—like how Cloud Code often builds from scratch rather than simply recommending third-party tools—but remains wary of how heavily the model can influence real projects. He shares personal benchmarks (upload thing awareness) to show how model quality shifts with more data and smarter prompts, and he discusses how certain stacks converge on familiar tools like Postgres, PNPM, Tailwind, and Zustand. The host also reflects on the broader ecosystem implications: as agents become gatekeepers for tool choices, vendor strategy and market share could hinge on AI picks. He ends with practical takeaways and a plug for a deeper look at how to prompt and steer these models for better, safer results. Theo ties the discussion to industry context, job market hiring, and the ongoing debate about “building blocks” versus monoliths in modern tooling.

Key Takeaways

Cloud Code frequently DIYs features (e.g., building a feature flags system from environment variables) instead of recommending established third-party tools, making DIY the most common primary pick in 12 of 20 categories.
GitHub Actions dominates CI/CD recommendations at 94%, with Vercel and Stripe also appearing prominently in their respective pipelines.
PostgreSQL remains the top database pick in Claude Code’s recommendations, while SQLite appears only as a meme despite its popularity in some contexts.
PNPM leads package managers at 56.3%, with Bun and npm trailing; the emphasis is on faster, modern tooling over traditional npm workflows.
Observability rankings favor Sentry, while traditional tooling like Prometheus remains a minority pick, illustrating a strong UI/monitoring skew.
Upload thing and similar file-upload tools gain traction as models get smarter, with 5.4x runs frequently showing 100% recommendations for Upload Thing in tests.
The study shows context matters: same prompt in different projects yields meaningfully different results, and newer models (Opus 46) sometimes shift toward newer tools compared to older models (Sonnet 45).

Who Is This For?

Essential viewing for frontend and full-stack developers exploring AI-assisted tool selection, product teams evaluating AI solvers for coding, and tool vendors tracking how AI picks may influence market share and product strategy.

Notable Quotes

"This example in particular horrifies me. Claude told the user that Planet Scale had shut our service down. This is unsafe by any definition."

—Theo cites a dangerous hallucination by Claude Code to illustrate why the model’s reliability is non-negotiable.

"The results are fascinating. In some ways, I'm impressed. In others, I'm terrified."

—Intro to Amplifying’s survey showing mixed sentiment about Claude Code’s tool picks.

"GitHub Actions has a near monopoly at 94%."

—Highlighting the dominance of certain tools in Claude Code’s top recommendations.

"As more devs let Cloud Code handle tool selection, the stacks it chooses become the stacks."

—Commentary on how AI-driven tool selection can become a distribution channel for tools.

"When cloud code doesn't recommend things, it's usually because it was asking additional questions."

—Notes on the model’s cautious behavior and how prompting affects outcomes.

Questions This Video Answers

How reliable are AI recommender systems for selecting development tools in real-world projects?
What tools does Claude Code most commonly suggest in a typical Next.js + React stack?
Can AI-driven tool recommendations impact software vendor market share and why?
What benchmarks show Claude Code preferring DIY solutions over third-party tooling?
Which AI models handle deployment and CI/CD choices most effectively, and why?

Claude Code Cloud CodeAI tool recommendationsCI/CD toolingGitHub ActionsPostgreSQLPNPM Bun TailwindZustand (zustand)`,

Full Transcript

Hopefully at this point we can all agree that tools like Claude Code are pretty good at making real contributions to real code bases. But what happens when they have to make suggestions not just about the code but about what tech you use? Sometimes it gets it really really wrong. This example in particular horrifies me. Claude told the user that Planet Scale had shut our service down. This is unsafe by any definition. Anthropic has made no effort to correct the situation. Here we can see it very confidently asserting that Planet Scale shut down its database service in January of 2025. That is a hallucination. That is really bad. So, it's not recommending good tools like Planet Scale. What is it recommending? There are some examples here, but in the replies, I saw this really interesting post. What Claude Code actually chooses. This is a survey that Amplifying ran on Cloud Code directly figuring out what tools it will pick for various tasks, database backends, all of these types of things. And the results are fascinating. In some ways, I'm impressed. In others, I'm terrified. And overall, I think we need to pay more attention to these types of things because I'm regularly surprised by just how many people are learning how to code and are building their first stuff just using tools like Cloud Code and honestly not doing much research at all. More and more the tools that are recommended by things like Cloud Code are going to matter. But at the very least, we should dig into what it recommends today. I got one recommendation I want to cover first, though. Today's sponsor. It feels like everything about my stack is changing. The editor I use, the technologies I use, the way I build has shifted fundamentally over the years. But there's one thing that's kind of stuck and I don't know why. That thing is GitHub actions. And it feels like they've only gotten worse over time. I'm spending more time waiting for them. I need to push my code to run them. And when they fail, I have to copy paste errors all around. Wouldn't it be nice if there was something that was better, faster, programmatically executable, and something that your agents could actually use? They sponsor Depot and they already figured out how to make the CI run faster, but they recently added more because they rebuilt the CI from scratch. If you're scared of changing over, you don't have to. You could still get 10 times faster GitHub actions by changing the one line for the runner. Probably should mention that it's cheaper, too. You can also get the 40 times faster Docker builds without having to change anything. But if you want CI that's better if you want definitions that make more sense, a programmable engine that your agents can run and do things in parallel with, you definitely check out's new CI. I will let you know in advance it's incredibly difficult to move over. First, you have to install a CLI that is so hard to run one command and then you have to run another thing. That's two things you have to do. That's terrible. If you can't sense the sarcasm, it's incredibly easy to move over to Depot's new CI. And one of the most annoying things about CI is secret management. And thankfully, they have that figured out too with their secret and ver argument in the same exact CLI. And most importantly, and this is my favorite thing, you can run a workflow without having to push the code. It will just use the code that you currently have in your local working tree. It is amazing that GitHub doesn't have this yet, and I can't live without it now. Speed up your agents and your builds at soyv.link/depo. I like how Amplifying opened this here. Cloud code is a new gatekeeper. When a developer says add a database and lets Cloud Code handle it, the agent doesn't just suggest. It installs packages, writes imports, configures connections, commits code. The tool it picks is the tool it ships. As more devs let Cloud Code handle tool selection, the stacks it chooses become the stacks. This is a new distribution channel where a single model's training data may shape market share more than a marketing budget or a conference talk. For tool vendors, if the agent doesn't pick you, you're invisible to a growing share of the new projects. For devs, your default text increasingly shaped by what the agent knows, not what you research. And for the ecosystem, understanding what AI actually chooses is no longer optional. It's competitive intelligence. This was some AI written slop for sure, but hopefully the rest is good research. From my cursory scroll earlier, it seemed really solid. I will also say that I've been doing my own similar research here, admittedly in a very selfish way. One of the benchmarks I spun up is upload thing awareness. This is a benchmark I made in order to plug my service upload thing which is the best and safest way to add file upload to your full stack web apps in particular next.js builds. I wanted to see which models recommend upload thing when asked about adding secure file uploads to your next.js app. So I made a bench that measures this. It is not cheap because it writes a lot and percent write is does upload thing come up as one of the options. And the really interesting thing I noticed is as the models got smarter they started recommending upload thing more and more consistently. Oof. Grock 41 fast's only at 2/3 so far. This was a fun one. We'll look at the results at the end. This is going to be an expensive run, but when we have that running, we'll look through what Claude recommends in general because I don't want to just plug my stuff. The methodology of the test here is actually really cool. They had four different project types. They had three different models. They tested, three runs each across the 20 tooling categories they were looking for. Each prompt was an open-ended what should I use with no tool names anywhere in the input. So, here's the four projects. One was called Task Flow. It's a Nex.js app using an old version of next. TypeScript and app router invoice tracker which is vit react 18 and typescript data pipeline which was fast API python 311 pyantic and deployctl which is using node typescript and commanderjs. The biggest find is that agents build instead of buying in 12 of 20 categories. Cloud code frequently builds custom solutions rather than recommending third party tools. Custom and DIY implementations account for 12% of all the primary picks. 252 out of 273 making it the single most common recommendation. And there's definitely a default stack built in where agents pick thirdparty tools. They converge on Verscell, Postgress, Stripe, Tailwind, Shad CNN UI, PNPM, GitHub actions, Sentry, Resend, Zustand, as well as stack specific picks like Drizzle if you're using databases with TypeScript, SQL model if you're using Python for OMS, Next O for O, and Viest or Piest for JS and Python testing. Next off, as the recommendation is very interesting because it's not even next off anymore, it's OJJS. So that might be really old and I'm a huge Zustan fan. So, I appreciate that being there. Let's dig in further. There are certain categories that are fully locked like GitHub actions owning CI/CD, Shad CN owning UI components at 90% and Stripe owning payments at 91%. Models agree 90% of the time within each ecosystem. All three models pick the same top tools in 18 of 20 categories when compared within ecosystem. Only caching in real time show genuine cross ecosystem disagreement. The other three disagreements are artifacts of mixing JS and Python results. And one last interesting piece is that context matters more than phrasing. The same category will have different results across different repos. If you give the same prompt to two different projects, the recommendations will be meaningfully different. But within a given project, it'll stay stable across five different phrasings of the prompt. 76% consistency there. That's really cool. It is worth noting that the three models they tested were all models by anthropic. It was Sonnet 45, Opus 45, and Opus 46. So they did not test GPT models. Turns out they already published the Codex comparison too. So, we'll get to that near the end. Here are the actual prompts they used. How do I deploy this? I need a database. What should I use? Add user off. What testing framework works best with this stack? Add O. Recommend whatever works best for the stack. Apparently, the fastest responses were when they were asked about deployment, only taking 32 seconds, but O would take much longer at 245 seconds. Longer response times correlate with higher custom and DIY rates. The model spends more time when it builds from scratch, like with O realtime and payments, versus when it confidently picks a tool like deployment or CI/CD. They have a whole section of what the study cannot tell you. Probably worth reading for people who are going to read into this too much. I like the call outs here. It's not developer consensus. They cannot separate quality signals from training frequency. They don't have any special insight here. This is just an interesting deep dive into what the models like and recommend and to some extent how they think. One last interesting call out here is that when cloud code doesn't recommend things, it's usually because it was asking additional questions. Claude Code is often cautious, not confused. asked clarifying questions or requested permission before proceeding. Sometimes I misunderstand the prompt, but it wasn't very common. Certain features like feature flags and background jobs would trigger caution more often than others. And now we get into the results. First, we have the custom in DIY section. Cloud Code frequently prefers to build custom solutions rather than recommend third party tools. When asked to add feature flags, it doesn't say use launch darkly. It builds a complete feature flag system from scratch using environment variables and framework primitives. This one's particularly funny as an example because Claude Code doesn't do that. Cloud code is using stats sig both for doing stats and like product analytics stuff but also for feature flags. They were but then stats say got acquired by open AAI. So cloud code moved over to growth book. And when I say cloud code moved to growthbook, I mean that the team building cla moved cloud code itself to growth book. So even though cloud code likes to DIY feature flags, cloud code itself does not use DIY feature flags. Just thought that was funny. So the recommendations Cloud Code makes are not the recommendations that Enthropic follows for building Claude Code. So to everybody saying the models are clearly dumb. If they leaked the Cloud Code source, the models didn't leak the source. The developers did. And the developers don't just blindly trust whatever Claude does. As mentioned earlier, if custom and DIY counted as a tool, it would be the most common one across everything. 70% of the time feature flags were requested, a DIY solution would be used. 100% of the time off was requested in Python projects, DIY would be used. Overall though, only half the time would it use DIY for O. Observability was a lot lower at 22%, email also low at 22%. Please do not roll your own SMTP integrations. It is hell. Use any of the services. I'm not sponsored by any of them right now. There's a lot that are good. Resend is popular for reasons. There's also plenty of other options, too. I like the Loops guys a lot. Do not roll your own email unless you know what the hell you're doing. It is pain. Speaking of things that are pain, real time, man, I've been through it with real time and sync engines. You can do it, but if you have never had to debug frames coming through a websocket on the client side, you probably should use something else for this. It's not fun. There's a reason it's only 21% of the time it's diying it. But yeah, forms are down to 20% of the time. Using custom React hooks and use state validation instead of using a form library. Cachings at 19% will be diyed. styling 17% DIY. The models really like using Tailwind and Chad CNN. And then file storage is only at 12%. Very interesting. Speaking of file storage, how's my bench going? Interesting so far. I need to add 5.4 to this at some point, too. A good call out here is why this matters for tool vendors. If coding agents are becoming the default way developers discover tools and the agents prefer building over buying, vendors need to either become the primitive that agents build on or make the tools so obviously superior that agents recommend them over custom solutions. It's funny that's mentioned because I saw this post from Mitchell earlier today. If you're not familiar with Mitchell, he is the creator of Terraform and Hashi Corp and he's also the creator of Ghosty and more importantly Liby. He wrote an article today about the building block economy. The most effective way to build software and get massive adoption is no longer highquality mainline apps, but via building blocks that enable and encourage others to build quantity over quality. So Ghosti got to 1 million daily users in 18 months. Lib Ghosty, the library that you can use to build your own terminal integrations with Ghosty's backend, gets multiple millions of daily users already in just two months. And there's lots of other projects that are seeing this type of success due to the building block nature. I don't want to spoil the technologies yet, but I thought this article was worth calling out. I'll leave the link in the description. I will also be doing a whole video about this in the business side of open source and building blocks in the near future, so keep an eye out for that. Back to the tool rankings. Here's the overall. Oh boy. GitHub Actions has a near monopoly at 94%. Stripes also near Monopoly at 91.4%. Shad CNUI is at 90%. Versel's at 100% for JS which is insane. Tailwind is at 68.4%. Zustan's also matching that. Kind of crazy that Tailwind and Zustan are similarly popular. I love Zustan, but seeing it become this prominent is insane. I never thought it would even be a tenth as popular as Redux, but that flipped quickly. Yeah, Redux is at 21.5 million installs per week right now. And Zustin's at 22. Holy [ __ ] It happened. It happened. Zustand is more popular than Redux. I never thought I'd see the day. I might have to start pronouncing it properly out of respect. I will never do that. It is Zustand forever. So yeah, Zustand is much more popular than I ever thought it would be. Still really cool to see. It's crazy seeing it above Sentry cuz like there's a reason that Sentry is so popular. They are still the most complete bug tracking solution for error management in real software, especially mobile. Going down the list further, we got Reend. As I mentioned before, please don't roll your own email solutions. B test which is really good for doing testing in Typescript. I'm so thankful we're not getting like old tools like just in here. Postgress is the main database pick. I have my opinions. Postgress is fine. But if your project grows, you got to know how to scale Postgress and the consequences of the choice. I find that if you're not having to worry about scale, SQLite's a pretty good option. If you are, my SQL is still the undefeated goat. Especially if you use a platform that can handle that with Vitess, which isn't vitest. Vitas vits is a way to scale my SQL. Planet scale has it handled really well. Highly recommend it. Postgress is a weird in-between that I don't find myself recommending a lot, but it's fine. As many of you all know, I prefer integrations with my database that go a little deeper in terms of how it relates to my backend as well as my client. And I really like Convex for that specifically. We got PNPM here at 56.3%. React hook form great library at 52%. Tanstack form is catching up now in functionality, but React Hook Form is the undefeated goat still. Reddus is at 41.6. Tanstack query is only at 40%. That's a bit sad to see. This should be a lot higher. One of the fun things I noticed is that a lot of people who have historically not been too deep on the front end in React side because they're building bigger apps now and they're using tools like Cloud Code to do the front end a little deeper. Somebody like Aaron who comes from the PHP world is suddenly really deep in dynamic front-end application development and just discovered Tanstack query and how great it is. And it's very cute seeing all these people discover Tanstack query six plus years after I started evangelizing it. And over on the Python side, we have fast API makes a lot of sense. AWSS3 is very competitive for file storage, but it's only at 32.5%. That's very interesting. I want to see the other file storage solutions. Next off is pretty low at 31. Next.js API routes is also weirdly low at 28.6%. Piest is 25.7. Actually on the nextJS routes one. I wonder if that's because there are other apps that aren't using next. So this is just recommended in the next.js instances. And then railway is on the list but barely at 25.6%. Cool. So far let's compare the different categories with CI/CD. GitHub wins by far. Verscell CI is recommended which is funny. I don't even really recommend VerselCI beyond building. I turn off all the checks in VerselCI. I just let it build and I run my actions for all my type checking and whatnot. I usually use something like Depot, Blacksmith or RWX in order to be much faster and less [ __ ] than GitHub actions. This is also funny because I happen to know that all those companies I just mentioned are doing very well stealing a lot of customers from GitHub actions. So I don't know how long term this will stay, but this is absolutely the case now. Then we have payments where Stripe is by far the lead at 91.4%. Custom DIY is at 8.6. And then there's a few other times that Paddle, Lemon Squeezy, and PayPal were recommended. We're using Autumn now for T3 chat. I invested in them forever ago. Didn't think it was the right solution for us and then Mark started talking to the team and despite the fact that Mark almost never recommends solutions that are newer, he ended up really liking Autumn. So that's what we're using now and it's been a nice change for us. And then we get into UI components where Shaden sweeps. Radex UI is a little behind, but remember that Radex is also a core depth for Shaden. So most people just go Shaden. Shakra, Mantine, and Material all appeared as alternatives, but they were rarely the primary recommendations. Makes sense. And then deployment, mostly Versel at 76.8% and Railway at about 28.6. Interesting to see Railway get recommended so heavily. It makes sense when you see their user growth recently. I almost started working at Railway even before this. Yeah, we were in talks back in 2021 and Jake convinced me to go do my own company instead. It would have been the fourth employee at the company if I recall and I declined it because actually funny enough he told me that I should probably make my own thing. So I did but seeing this after is insane. Like they're at over 15,000 new deployments a day I think or new users per day. 15k new users a day on a dev tool is insane. And that lines up here. That makes a lot of sense. What's really interesting here is the ecosystem note where 100% of the time it was asked for JavaScript it picked Verscell but for Python railway was usually picked but only at 82%. It is also worth noting that Verscell has a free tier and is pretty damn scalable. Railway doesn't have a free tier so it being this popular is an interesting thing. And to be clear with Railway they let you get a little bit of usage every month for free. I think it's like five bucks or so but that's barely enough to keep a server live the whole month. I also use Railway heavily for side projects and it cost me almost nothing. They are hilariously cheap. They're often cheaper than AWS, which is just insane. Turns out running your own hardware is powerful. AWS Amplify was mentioned 24 times, but never recommended as a primary or alternative. Claude, we are aligned. [ __ ] Amplify. I [ __ ] hate Amplify. I was one of the earliest testers of Amplify when I was at Amazon. It was such a [ __ ] show. Such a [ __ ] show. Netlifi was recommended as an alternative 67 times. Render was recommended 50 times. Fly was only recommended 35. Cloudflare pages was 30. But cloud page is also kind of dead because it's part of workers now whole thing and GitHub page is only 26 times. Interesting to see how much these falter also that Netlefi is almost twice as highly recommended as fly and more than twice as highly recommended as Cloudflare pages. That says a lot. And now we get into the strong defaults things that have over 50% but aren't quite in the monopoly tier. For styling, Tailwind absolutely crushes at almost 70%. Custom DIY is the next highest at 17 and then CSS modules is at 13%. Style components emotion and SAS were nearly absent from primary picks. However, CSS modules got 42 alt picks and style components appeared 14 times as an alternative and was mentioned 35. Interesting. For state management, Zustand was super popular and tanstack query less. So like if you're doing client state, Zustand makes a ton of sense. But if you're doing server state where you're fetching data and showing it in the UI, having mutations that trigger asynchronous stuff, tan stack makes a lot more sense for anything that is like a this rendered and I need data type thing. And React context being solo is kind of hilarious. Yeah, I would really want to see the prompt they're using for this test. We then get to observability where Sentry sweeps. Customdy is at 20%ish, but Sentry more than triples, which is crazy. Prometheus is at 12 out of 160, so at 7 and a half. And then Pino, which I've never heard of, at 6%. Email has resend sweeping then customdy at 20% and then send grid way behind at seven. Testing v test crushes pi test is pretty high. Playright which is an interesting thing to sneak in here is doing okay at 10%. But playright is really for testing in the browser not just for running your tests. Seems like the vagueness of the prompt has steered the model into saying lots of other things. And then there's just which is at 4% where it belongs the bottom. I will never ever miss jest. Databases as I mentioned before Postgress is crushing at 58%. Superbase is still in my opinion higher than it should be at 24%. My gripe with superbase is specifically that so much of the state lives in the existing database and not in your codebase. Things like the permission system and model, things like the current state of the database, what migration level it's at, all these things exist in the database itself. And you have to use an MCP in order to keep the database in the right state as you're working with it. On top of that, the off model is something you have to put effort into. Rowle security is far from my favorite way to handle safe application development. So, Superbase not a recommendation I make a lot. I suspect this one will drop over time. But I also think Superbase is going to work hard to unfuck their things because they have way too much state living in their dashboard and way too little living in the codebase. And then SQLite because who doesn't love a good meme? [ __ ] got zero primary picks but was heavily mentioned. Models know it, they just don't default to it. Interesting. Overall, I am sad to not see my favorite Convex here, but I get it. Convex is still a new thing. I'm sure the models will recommend it more in the future. Then we got package managers. I am Oh, I was happy with the results until I saw how high npm was. PNPM is the top choice. I think this makes a lot of sense. Bun is great, but it has edges that can make it annoying for lots of projects, especially mono repos. We've had enough bugs with bun and T3 code, even just for our package management that we're considering moving off. Also funny cuz bun is owned by Anthropic but it's still third place here. PNPM still sweeps here at 56.3% and then npm is at 23%. Please stop using npm for package management. Just use pnpm or bun. Also probably not worth picking yarn nowadays. Now we're in forms. One of my favorite topics. React hook form is at 50%. Zod is at 26.7%. Zod's not for forms though. It's just validation. And apparently it appears as standalone validation pretty often alongside I'm assuming the custom DIY implementation. Please validate your forms. Zod's a really good option for it. And now we're into competitive markets where nothing has more than 50% dominance. Eight categories and no clear winner. For O customdy is just barely under 50%. Next off does pretty well at 31% and then superbase offs at 11%. Next off is recommended 91% of the time for next.js which is interesting. So remember next off doesn't really exist anymore. you're supposed to just use ojs, but it's been folded in as part of better off. Like it's all into better off now. Like, yeah, it's interesting seeing it recommended so much when it's like part of better off. Yeah. Wild. That that shows you that these models are all old data. Convex off is getting mentioned in my chat. Personally, I don't love it. And funny enough, Convex doesn't either. They highly recommend at Convex that you use other o solutions. Personally, I really like both clerk and work OS. Both are easy recommendations. Better off is also in a really good state nowadays. For what it's worth, I'm annoyed enough with Oth that I'm building my own O service that I'm hoping, fingers crossed, to get out soon. Still polishing a few things. It is out. If you want to try it, be cautious. It's not open source yet. And apparently, there are some things in it that might not be the most secure. We're working on it. I'll get it out soon. I'm very happy with it. Now, we have caching with Reddus doing very well at 41.6% probably where that belongs. Custom DIY at 19.5 and Nex.js JS's built-in cache, which to be clear is built in, but it's built in in a way where you need to host it on a platform that understands how to use it, which means you're probably on Versell. It's really versel's cache, I would argue, but 20% overall is not bad, but in Nex.js apps, it got recommended 42% of the time. And in Python, Reddus did break out over 50 to 57%. Very split by the stack. Makes sense. Again, it's really cool to see that the stack that your project is in when you run the prompt really helps determine where things go. And now we're in the API layer section. Very interesting to see Tanstack only at 40% here. Fast API at 35 and next API routes at 28.6 because none of these are competitive with each other. I've seen projects that use all of these. Thankfully, even though none of these conflict, Tanstack does get recommended much more heavily for React projects at 70% of the time. I would say it should be more than that. It should be near 100, but it's good to see. Python has fast API 100% of the time and next has API rest recommended almost 80% as well. This is very stack determined. And now we have file storage. Um sir, where's my mention? According to them, I'm not even on the list, but we look at my research here. Most of the models are at least aware of upload thing and will recommend it as an option. Every modern claude model is willing to. Sonet 46, Opus4 5, and Opus 46 recommend upload thing 100% of the time. Also worth noting that benchmark cost me $22 to run. Well, if I got to spend all this money to figure out how popular upload thing is, we should probably take a break for another really good recommendation from today's sponsor. AI has made our lives much easier. As long as you're using it for code, as soon as you have to do something else, like hiring, AI actually makes your life much worse, because all of a sudden, we're all getting spammed with endless applications from people that don't necessarily know what they're doing, filling their resumes up with AI generated slop. Hiring sucks so hard right now, and recruiters just don't know what they're doing or how to deal with it. All the signals they used to rely on to make good hiring decisions just don't really work anymore. And that's why G2I is such an easy recommendation for all the companies I know who are trying to hire. They don't base hiring decisions off of sloppy resumes. They already have a network of talented engineers. Every time I talk to somebody at G2I, I'm impressed with the depth at which they understand the people. Just a silly example, I recently reached out to them on behalf of a company I was trying to help with hiring. That company had a CEO that really knew the industry and was trying to pick the best possible people. So, I assumed that they wouldn't need much help, but I figured I'd throw them in the ring anyways. What really blew me away was that G2I responded with a list of awesome candidates and like descriptions where they clearly understood who the person was and what they did. And three of the candidates on their list were friends of the CEO. The depth of their network is insane. And that's why companies like Web Flow, One Password, Automatic, and even Meta have worked with them for hiring. It's their goal to go from interview to first PR in 7 days. and have seen them do it enough times to recommend them with confidence. Stop suffering with hiring and get back to building at soy dev.link/g2i. God, it's doing so many tokens. It's 5 cents per run on this one. Jesus Christ. 3k tokens. Well, hopefully we'll get 100% there. This is taking a while, but so far 5.4 is at 100%. So again, as the models get smarter, they are more likely to recommend upload things. So take that as you will. If you want to be as smart as the smartest models, maybe use my stuff. I'll come back and look at the final results later though, I promise. Back to very competitive categories like OMS and database tools where SQL model and Drizzle are the highest. Prisma is not too far behind and SQL Alchemy is a little behind there. In Python, SQL model is at 72% though and in JS Drizzle is at 61% and next. So yeah, Drizzle is doing very well. Congrats to the Drizzle team. Background jobs has BMQ near the top, but it's only at 25%. Ingest is right behind it at 23%. Celery is at 18. Fast API is at 13. Also split by ecosystem a good bit. I will say I'm surprised to not see Trigger here. I know they're a sponsor, but Trigger is awesome. My whole team has been super impressed since we started playing with it more. Definitely worth checking out. I actually owe them some ads. I just realized I will be hitting them up soon. Uh yeah, Trigger's been they're too chill to work with. You would have heard about them a lot more if they were less chill, but they're very cool with me being late. And as soon as I mentioned Trigger, chat's like, "Oh my god, Trigger is amazing. Trigger's awesome." Yeah, they're really, really good. And now we're in feature flags where they overrecommend custom and DIY. Launch Darkly's in second place at 20% and Post Hogs all the way back at 4%. We use the Post Hog feature flags. They're pretty dang good. Growth book is awesome, too. I'm surprised they're not mentioned here at all, especially because they use growth book inside of Cloud Code. And then real time custom DIY at 20%, Superbase at 15, SSE at 14, socket io at 11, Live Blocks also gets mentioned for collaboration, but yeah, no clear winner here. And now we're in the model comparison section. Not comparing with codecs. We'll get there right after. But just seeing how the recommendations break down depending on the new versions and old versions of the models. GitHub actions went down significantly from 45 to 46 which is interesting. Postgress got a little more popular. Versel got less popular. Interesting. Resend is plummeting. Sonnet had 84. Opus 45 was 77. And Opus 46 is 66. That's actually very interesting. PNPM hit a straight 100 across all three. Stripe was 73 8376. Zustand got to 100% with Opus 46. Interesting. Tailwind was only 100% for Opus 45. Fascinating. I wouldn't read too much into these simply because the models are all similar and this is all non-determinism. So yeah, apparently Drizzle's 100% on Opus 46, but Prisma is more popular on Sonnet 45. This is very interesting to see which times they disagree. The rest aren't anywhere near as big a split. Actually, Reddus was much more popular Sonnet 45 than Opus 45, which is fascinating. One of the other really interesting things is that Prisma dropped from 79% with Sonnet to 0% with Opus 46 and Drizzle rose from 21 to 100%. One more fun observation is the recency gradient within each ecosystem. Newer models pick newer tools. All percentages below are within ecosystem. For example, Prisma 79% is 79% of the JS. Okay. Yeah. Very interesting to see the difference there. And it seems like, interestingly enough, the newer models, especially the newer, smarter ones like Opus, are more willing to try out new tools and go in that direction. Where with Sonnet 45, it tends to fall back on existing established popular things and not trying to go with the new stuff. It's also less willing to build custom solutions, which is interesting. Yeah, fascinating. So, we wanted to summarize the most frequently picked things in cloud code in these scenarios. It would be resend for email, vest over just, PNPM is the package manager, Drizzle, Shad CN, and Zustand are all the most popular options. Now, let's see how Codeex compares. I'm actually really curious here. The top here is very interesting. They list all of the different specific categories and what the number one pick was between the two. And there's a lot of agreement overall in particular when a lot of the DIY stuff is being recommended. But when it comes to something like scheduled tasks, codeex recommends OS level cron and cloud code recommends something like versel in their cron implementation. For search, cloud code likes postgress. For js runtime, codeex prefers node and cloud code prefers bun. Surprise, that is actually genuinely surprising and very interesting. And for edge and serverless, cloudflare workers is preferred by codeex and verscell edge is preferred by cloud code, which is interesting because verscell edge is pretty dead. It's your like edge on Verscell is not what you should use Versell for. You should use it for their like fluid compute model with that's serverless but isn't really serverless. Interesting. Also, the gap in the recommendation level here where Cloudflare workers recommended by Codeex 50% of the time and Verscell's Edge is only recommended 24% on Cloud Code, meaning there isn't a real number one winner there. It also looks like they might have hardcoded statig out on Claude because they're mad about them being bought by OpenAI because Claude never recommends it even though they were using it and Codex recommends it over a fourth of the time. So that's an interesting gap. The ownership question. This is funny because both have this problem where stat sig is no longer recommended by claude code because probably not really but potentially because openai bought them where with bun codex doesn't recommend it anywhere near as much but cloud code does a lot of the time and almost always mentions it. Another fun gap when asking specifically about AB testing and feature flags. Codex picks statig and cloud code picks post hog. Very interesting. Oh, before I forget, what about upload thing? The uh 5.4 24 Pro runs are uh not resolving. I think it's cuz I have a timeout on them and they take longer than that. So, I probably just spent a lot of money and will never get the results, which is annoying. Might have to bump the timeouts up. Regardless, 5.4x high got 100% upload thing recommendations. So, again, as the models get smarter, they recommend what I build. I know this seems like I'm just plugging upload thing really hard. It's cuz I am, but it's also because it makes jack [ __ ] [ __ ] for money. The free tier is too generous and the paying users don't pay a whole lot. So this is like we lose two grand a month on upload thing right now. It's not a big profitable service. I just put a lot of work into it and care. Oh, this is actually very interesting. Claude code is just wrong with the bun recommendation here. When they start the project is on node by default, but they ask what JavaScript runtime should be used for the project. Is there something faster than we have? And this is a next.js14 project. Remember next cannot run in bun. Next is not compatible with bun. has a bunch of weird async IO stuff that is node specific that is not implemented in bun. This is just wrong. Install times is correct. Sure, it's faster than npm, but we're not asking about the [ __ ] install times. We're asking about bun rundev and the runtime for this. Bun would be the fastest runtime option for next 14. This is just wrong. That is a lie. Just saying. Next is tightly coupled to node internals. So while bun works well for install and dev, the actual next server still runs on node under the hood. TLDDR use bun for speed. fall back to PMPM if you hit compatibility issues. On a question about runtimes, it's saying what package manager you should use. Opus is really smart until it's really stupid and then it's really [ __ ] stupid. And this is an example of it being really dumb. The way I described this to somebody yesterday is when I use claud code and opus, it kind of feels like the prompt you give it isn't so much an instruction as a word cloud and a vibe that you want it to feel out. So when it got the prompt of what JS runtime should I use that gave it the vibe of things like node and bun and that gave it the additional vibe of things like pnpm and package managers and then you got an answer about package managers even though the question is about run times whereas with open AAI models in particular 5.3 codex and 5.4 it feels more like you're telling the model what to do and it follows it as instructions sometimes too literally. So here we get a much much better answer like like the gap in the quality of the answer here is insane. What you have today is nodebased and that is still the safest path for next compatibility. It also opens specifically saying keep node as your primary runtime for this project. It immediately answers what the question is and then gives you some additional information that is very specific. If you want faster lowrisk speed up that's recommended is that you stay on node runtime. The first thing it says but you also switch tooling to pnpm or bun install for faster dependency installs. Runtime experiment. Bun can be faster for startup and CPU heavy work, but for next on Verscell, it's still marked as beta as of November 10th, 2025. So, use staging first. Okay, so apparently there is a beta runtime for bun with next on Verscell. I didn't even know that. Cool to learn a thing. I I vaguely remember this happening, but calling it out properly. And the biggest practical next speed gain is to upgrade from 14 to 16 where turbo pack is the default, which will improve your dev and build environment. Also, node release guidance now shows v24 is active LTS and recommends active and maintenance LTS for production. So target node 24 for prod and then it links all of the sources that it used. Do you understand the gap in that quality of response? I don't even care what it recommends. I care that it actually understands what it's recommending and writes it out properly. This test about platform preference doesn't appear to be the best because they're asking about edge compute. And this is also just a bad prompt. Like you don't actually want to run code close to the users because the code needs data and the data is probably centralized. I have a lot of videos about this. But if you really want compute near your users, you got to go with Cloudflare. It's the right option here. But the in the Verscell Edge stuff is just not their focus anymore. They've even come out against it like formally. Just no. This this prompt is bad and OpenAI codeex followed the intent of the prompt correctly and Claude Code doesn't understand what's going on here and didn't do the proper research. I'm actually curious if either mentioned the fact that you need your data close to the code. Let's see. None of them do. I've got into some arguments with some very early into coding people that are like learning how to code through vibe coding that very strongly asserted that my takes on like I think it was my convex takes were wrong because convex doesn't have an edge compute platform and superbase does. Yeah, it's the database. You want the compute next to the database. The more it has to travel the worse it is. I have a lot of videos about this. Look up theo edge and you'll find a ton of them. I would like for the models to tell you about that. Codex calls out that Verscell has an edge, but the docs now recommend moving many edge workloads to node runtime for performance and reliability. Correct. God, it's funny that neither really recommend growth book. They've only mentioned it even once even though Cloud Code is using it. And then JS runtime and tool chain pretty big gap. Vest is more popular with cloud code. It's still recommended over here. PNPM does much better in the codec side than on the Cloud Code side overall, but both are recommended pretty heavily. Interesting. It looks like Cloud Code just never recommends Cloudflare. Cloudflare images got a huge wreck on Codeex side, but it didn't get even listed on the Cloud Code side. Image optimization is obnoxious. So, yeah, I don't I totally understand why somebody would ask the models and I've been through it with that. Image Kit was really good for my experience. I like them a lot. Was it Bunny CDN? Yeah, I was trying Bunny. It's pretty well priced and good, but god, it sucked so hard to set up. Like, so insanely hard. Next, image is a good recommendation now that Versell made it cheaper especially, but Cloudflare images is also pretty good. Just a little more code you have to write. Also can be a little hard to secure on the Cloudflare images side. I've ranted at them about that in the past. Headless CMS's good bit of disagreement, but who cares? They're headless CMSs. You can vibe code your own alternative at this point. Why would you ever DIY SMS? That's insanity. Codeex, come on guys. What service should I use to rate limit and protect my API in prod Cloudflare is 20% on the codec side and it's only 8% on the Cloud Code side. Weird that Claude doesn't like Cloudflare so much. It's very interesting to see some cool companies called out as cross agent picks that are up and coming. Doppler is doing pretty well. Upstach is doing pretty well. Our friends over at Axium are doing very well. And Melee Search is also doing well, too. Weird that Firebased cloud messaging almost always comes up. Not my favorite, but it's a solution that works. Claude is slightly more likely to DIY. Checks out. And I got one last thing I want to say before I wrap up. It's a short article from friend of the channel, Simon Willis. perhaps not boring technology after all. A recurring concern I've seen regarding LMS for programming is that they will push our tech choices towards the tools that are best represented in their training data, making it harder for new and better tools to break through the noise. That was the case a couple of years ago when asking models to help with Python or JS appeared to give much better results than questions about less widely used languages. But with the latest models running in good coding agent harnesses, I'm not sure that continues to hold up. I'm seeing excellent results with my brand new tools where I start by prompting use uvx showboat-help or rodney-help chart room yada yada to learn about these tools. The context length of these new models is long enough that they can consume quite a lot of documentation before they start working on a problem. Drop a coding agent into any existing codebase that uses libraries and tools that are too private or too new to feature in the training data. My experience is that it works just fine. The agent will consult enough of the existing examples to understand patterns and then iterate and test its own output to fill in the gaps. This has been my experience as well. I am seeing the models getting better and better and if you combine that with tools like skills, it can help even more. I'll also call out that if you make a lot of different projects like I do, you might find it worthwhile to modify your Claude MD to include some of your specific preferences. So, I have a universal clamd in mycloud that specifies I like using TypeScript and it should never use any unless entirely necessary. Commands I don't want to run. I hate when my like agents run dev servers because I'm already running my dev server. [ __ ] touch it. Don't run builds unless I ask you to because it'll override [ __ ] And use checking commands primarily. And for package managers, I say use pmpp if it's already using it. Otherwise, use bun. Never use npm or yarn. I'm going to change this to I'll leave it as bun for the default just cuz it's quick to install things slightly more so than pnpm even if it has problems at scale. Text stack preferences when uncertain prefer tailwind typescript bun react convex clerk for cell. This has allowed the model to usually get what I want very quickly without being steered which is very nice. Code style be concise yada yada. and then all of the GStack [ __ ] from when I was testing out GStack. Ignore that. You get the idea, though. These models are more willing to recommend new [ __ ] than I ever would have guessed. And you can always steer the model yourself with a little update to your MD files to keep it going where you want it to go. I have been honestly pretty surprised with all of this and I am pumped that I found this article. Shout out to Edwin for his work on amplifying AI. It's a really cool source and I had a ton of fun going through this. Shout out to him for all the hard work. This was a very fun thing to dive into. I'm curious how y'all feel about these recommendations. Are you more or less scared now that you've seen what the models actually recommend? I'm feeling a little more hopeful, but I'm curious if you guys agree. Let me know what you think. And until next time, peace nerds.