AGI is Here. Anthropic Just Proved It.

Nate Herk | AI Automation| 00:12:37|Jun 5, 2026
Chapters8
The speaker highlights Enthropic’s claim that over 80% of their code is now AI-written and argues that AGI is already here in practical terms, setting up the key takeaways from the report.

Anthropic’s internal data suggests AGI is here in practical form, with AI systems already solving open-ended problems and speeding up by leaps, while alignment and governance remain the big unknowns.

Summary

Nate Herk dives into Enthropic’s report, arguing that AGI isn’t a distant fantasy but a present capability by practical measures. He highlights that Claude now writes over 80% of Enthropic’s code, and that the company tracks four task buckets from trivial to open-ended, with Claude hitting 76% success on open-ended problems—up from 26% six months earlier. The video emphasizes exponential progress: tasks that took hours or days can now be tackled in shorter cycles, with engineers shipping eight times as much code per day compared to 2024. Nate also discusses how AI is increasingly proposing its own next steps in research, outperforming humans in selecting smarter moves 64% of the time in a recent internal test. He stresses that this isn’t about sci-fi robotics but practical capability, including a model generating code blocks 52x faster than before and closing a 97% gap on a stubborn problem (humans would take a week; the AI did it faster). The open-ended risk is the alignment problem—if an AI builds its own successor, misalignment could compound and become harder to detect. Nate maps three futures: stall, steady compounding with human-directed results, or self-improving AI that outruns human control. He notes Anthropic’s stance on slowing down for safety, balanced against verification challenges, and argues the real decision is whether we cultivate the big-picture judgment humans excel at to guide the AI revolution. The takeaway: the real value shift is from doing to deciding, and the race to advance AI will continue regardless of pauses, unless governance and alignment mature in tandem.

Key Takeaways

  • Claude now writes over 80% of Enthropic’s code, signaling rapid internal automation of core development tasks.
  • Open-ended task success for Claude is 76% (up from 26% six months earlier), marking a major leap in problem solving without a predefined finish line.
  • A single model improved training code speed by 52x, and on a bottleneck problem it recovered 97% of the gap (humans would take a week, AI was faster).
  • Engineers are shipping eight times as much code per day compared to 2024, indicating exponential progress in productivity and capability.
  • Enthropic’s three scenarios show a real risk spectrum: stall, continued human-guided gains, or AI building its own successor—raising alignment and governance concerns.
  • The core value shift is judgment and problem selection—humans must guide where AI applies its open-ended reasoning to avoid misalignment.
  • Even if progress is rapid, Anthropic argues that slowing down is beneficial only if all major labs pause and verification is possible, which is currently hard to achieve.

Who Is This For?

This video is essential viewing for AI practitioners, product founders, and policy-minded engineers who need to understand how near-term AGI feels in real-world labs and what it implies for governance, competition, and strategic product decisions.

Notable Quotes

""I think that by the definition that actually matters, AGI is already here.""
Nate states his core thesis early, framing practical AGI as present, not distant fantasy.
""On open-ended problems, Claude's success rate just hit 76%.""
Shows the substantive progress on unsolved, real-world tasks.
""It jumped 50 points in half of a year.""
Illustrates rapid improvement in capability over a short period.
""The AI becomes fully capable of building its own successor.""
Describes a pivotal, risk-heavy scenario highlighted by Enthropic.
""Training runs are far easier to conceal than missile silos.""
Captures the governance challenge and verification problem in AI development.

Questions This Video Answers

  • Is AGI already here according to Anthropic's internal data?
  • How fast is Claude improving at open-ended tasks compared to six months ago?
  • What are the three potential futures Enthropic outlines for AI development and what do they mean for safety?
  • Why does Anthropic argue slowing down AI progress could be beneficial, and what are the verification challenges?
  • What shifts in value and skills as AI moves from doing to deciding in real-world work?
AnthropicClaudeAGIOpen-ended tasksAlignment problemAI governanceCode generationAI efficiencyBias and misalignmentAI race dynamics
Full Transcript
So, as of last month, more than 80% of the code that Enthropic ships is now written by their own AI, Claude. That stat comes from a report that they just dropped called when AI builds itself, where they basically pull back the curtain on what's actually going on inside of their own business. So, I read this whole thing multiple times and I walked away pretty convinced that AGI is not this far off future thing that we're all just kind of waiting for. I think that by the definition that actually matters, AGI is already here. So today I want to tell you guys the important stuff that I took away from this article and like what I think it means for society. So real quick, just to start off, let's get on the same page about what AGI actually means. So AGI stands for artificial general intelligence. And I think that everyone gets stuck arguing about that middle word general. Like can it just literally do anything that a human can do? Can it feel things? Is it conscious? All that kind of stuff. And just so we're clear, I'm not talking like sci-fi robots taking over the world sort of AGI. I'm talking more practically here. I understand that some of you guys will disagree with this take, but I think what matters is can I walk up to a model with a problem that has no clear answer and just say, "Hey, go figure this out for me." And then it actually goes off. It runs experiments. It does research. It tries a bunch of different approaches and it comes back with something that actually works. And I think that that distinction is very very important because you've got narrow AI, which is something that's really really good at one specific thing that you actually built it for. So, a model that plays chess or a recommendation algorithm or a model that sorts your inbox, even if it nails one job 99% of the time, that's not AGI because it's still stuck in one lane. In my mind, AGI is when you hand a general model almost anything, a problem that it's never seen before that has no clear answer and it does the work on its own to figure out the the approach. It researches it. It experiments. It finds the best way. So, a model scoring 80% on simple narrow tasks is cool. It's helpful, but that's not AGI to me. The thing that I actually care about is the open-ended stuff, the problem solving. And this whole report is Enthropic showing you with their own internal data that we are already there. All right, let me show you the actual proof. So to be clear, Enthropic in this article never says the words AGI is here. That's me saying that. But once you see their own numbers, I think that they make the strongest case that I've seen that the practical version has already showed up. Because most of the time that AI has existed, it's been great at little things, right? like your quick questions, your Google search replacement, write me an email, summarize this article, but the second that you tried to hand it maybe like a real big open-ended project, something that doesn't have a clear known answer, it kind of falls apart. So, Enthropic splits these kind of like coding sessions into these four buckets based on how hard they are. You've got trivial tasks, which are the easiest, and then you get more difficult going down from routine tasks, substantial tasks, and then of course the hardest ones are the open-ended problems. And open-ended in their exact words means tasks with no clear specification where the engineer isn't sure what the answer should look like. So nobody even knows what the finished thing should look like and you just hand that mess to an AI model and say go figure it out. So imagine you just told someone, hey go make our app faster, but not which part, not how, not even what faster should look like when it's done. They basically have to figure out the entire thing on their own. And that is an open-ended problem. So on these open-ended problems, Claude's success rate just hit 76%. And 6 months ago, that number was 26%. So it jumped 50 points in half of a year. And this is that top rung, the messy no map, nobody knows the answer rung that just got cracked. And that rung is this whole ball game. And it's not just answering harder questions. It's doing longer and longer work all on its own with nobody babysitting it. Enthropic actually tracks how long a task takes for their AI to handle it from start to finish. Two years ago, their best AI model could handle about a 4-minute task. A year ago, an hour and a half long task. This year, 12-hour tasks. And one of their newer internal models worked for 16 hours straight. I'm assuming that was Claude Mythos. That length has been roughly doubling every 4 months. They said if this trend holds, tasks that take a skilled person days could come into range this year. And in 2027, AI systems could be capable of tasks that take a person weeks. their typical engineer is now shipping eight times as much code per day as they were back in 2024. Now, of course, more lines of code doesn't automatically mean that it's better code, and I completely get that, but it still tells you how fast everything is actually accelerating on this exponential curve. And they've clearly been shipping features insanely fast. It kept me really busy this March. But it's not just doing the work you hand it anymore. It's starting to decide what comes next. So, Enthropic ran obviously another test. They'd take a real research project, freeze it, write at a decision point, and basically ask the AI, "Okay, what do you want to do from here?" Then they'd line up the AI's answer against what their own human researchers actually picked. They did this across 129 of these moments. And back in November, the AI made the smarter call 51% of the time, smarter than the human, and by April, it was up to 64%. Now, to be fair, they pick spots where the human's first move had maybe some room to improve. But still, more than half the time, the machine is choosing a better next step than the people who literally do this thing for a living. And the actual work it's doing is getting pretty absurd as well. They handed a chunk of training code to get sped up. A year ago, their model would have made it about three times faster. But this past April, this newer model made that same code block 52 times faster. And on one problem their researchers had been stuck on, they just let the AI agents grind on it around the clock and it clawed back 97% of the gap. The humans, given about a week on the same thing, only hit 23%. And once again, keep in mind a lot of these numbers are probably coming from mythos, which is Enthropic's newest and even smarter model that is not yet publicly released. But just imagine what this will look like when that's in everybody's hands. So the machine isn't catching up to the people who build it. It kind of already did. All right. So in the report, Enthropic lays out three ways this could go from here. And I think this is the clearest way to understand where we actually are right now. So scenario one, the trend just kind of stalls. All these crazy lines on the graph that are going exponentially just kind of start to flatten out and plateau and AI ends up being a super incredibly powerful tool still, but it just kind of plateaus out like that. Scenario two is that these gains that we're seeing keep compounding. the AI does more and more of the work, but their exact words are humans continue to set research directions and judge results. And then scenario three is the big one. The AI becomes fully capable of building its own successor. And at that point, the speed of progress isn't really limited by humans at all anymore. It's only limited by how much computing power you actually give it. Now, think back to that second scenario for a sec. The AI does the work. Humans just pointed at the problem. And that's not really even the future. That's literally where we are right now. That's where we are today. And that to me is already AGI. The only real question left is whether we slide into scenario three. And that is the scary part. Enthropic basically admits nobody can tell which one we're actually on. Because the exact thing that makes all of this so impressive is the exact same thing that should make you honestly a little bit nervous. Cuz if you think about it, if the AI is the one building the next better AI, then any little flaw, any weird behavior in today's model gets baked into the next one that it builds. And then that one builds the next one. And Enthropic says it pretty straight, which is the rare occurrences of misalignment present in today's model could compound as the models grow their successors growing more frequent but less understood until we lose control of them. So what that actually means is the mistakes don't just add up, they actually multiply and they get harder to even see at the same time. So you end up with a problem that's growing and it's going invisible all at the same time. And that's the thing that keeps people up at night because it it's got nothing to do with killer robots. It's that quietly we stop or not we the actual professionals that know how this stuff works like under the hood. They quietly stop understanding what's being built. All right. So let's just zoom back out for a sec. Zoom out from the labs and let's look at the rest of us. You know, right now society's reaction to AI is basically split into two groups. I mean obviously it's a spectrum, but we've kind of got two ends of that spectrum, right? You've got people on one side who open up an AI tool at work. They type in one lazy prompt. They get a result that's very meh. And then they say, "This AI stuff is so overhyped. You know, I don't get it. Co-pilot." And then you've got this whole other group over here quietly running what used to take a 15 person team, but they're doing that by themselves because they have little AI agents that go off and build agents and automate things. Enthropic even says in this report, a 100 person companies could start doing the work of 10,000 or even 100,000 person organizations. And that gap between those two groups of people is just getting wider every single month. And society has not yet come even close to catching up. And the thing that ties this whole video together is that the gap between people is the exact same problem as the race between the companies just at a different scale. No solo person who figured this out is going to sit around and wait for everybody else to catch up. And no AI lab is going to hit pause while their competitors keep sprinting because whoever stops ultimately loses. Enthropic admits that part pretty flat out. The incentive to keep going in their words is enormous. So everybody from one person at their laptop all the way up to the billiondollar labs and entire countries have the same exact incentive which is to not stop to not slow down this AI progress because everyone wants to win the AI race. So that brings up the obvious question why is Anthropic the company building the most aggressive version of this whole thing. Why is why are they the one who's standing up and waving the flag? And when you actually read what they say it kind of makes sense. They say basically the one thing that they admit they're least sure about is alignment, which basically just means keeping these systems pointing in a direction that's actually good for humanity, good for society. And that's their words. The alignment problem is the thing that they are least certain about. So slowing down would buy everybody more time to figure that out before it's too late. And they straight up say that a slowdown would likely be a good thing. But then they hit you with the catch, which is obviously Anthropic's only going to pause if every other major lab also pauses and only if everyone can actually verify that the other labs have paused. And that is like almost impossible because how do you verify that? In their words, training runs are far easier to conceal than missile silos. So what that means is you can see a country building a giant missile, but you cannot see a company quietly training a model in some random data center 100 ft underground. There's nothing to point a satellite at, right? And we've actually solved a version of this before, just not with AI. Back in the Cold War, the US and the Soviet Union signed nuclear treaties where they literally let each other's inspectors walk into bunkers and confirm missiles were gone. That's the famous trust but verify. Enthropic even name checks one of those exact treaties. But their gut punch is the fact that those treaties took decades to build up and the trust and the systems that you know it took to pull off. Like I said, it was just it took a long time. And what they said is we don't have that long. So, you know, getting a little existential here, but if you ask me, their promise to slow down is sincere and convenient. That's the way I feel at least. I mean, obviously they want good PR, but I think they genuinely mean it because if you think about how the super small group of people have so much power over the future right now, it's pretty scary. But they also know the one thing that would actually force them to do it doesn't yet exist and might not for a while. So after all that, I actually want to bring it back down to earth because the report does too, and it's kind of my favorite part. Basically like if the doing part, if the actual building, the typing, the grunt work is basically becoming free, then the thing that's actually valuable just shifts, you know, it becomes your judgment, your taste, you know, your expertise, knowing which problem is even worth pointing the AI at in the first place. So, Enthropic says this themselves once again that the one thing humans are still better at is seeing the bigger picture and thinking beyond the confines of the immediate task at hand. So that's the muscle that you want to be building right now because the real danger here isn't the doomsday stuff. At least I really don't think so. I think that it's using this like a search box while the person next to you figures out how to use it like an entire team. So is AGI here? I think by the definition that actually matters to me. The one where you hand a general AI model a hard problem and it just goes and solves it. Yeah, I think that that's AGI and I think it's here. It showed up pretty quietly. None of us really got to vote on the definition of AGI and the company that built it is the one telling us to be careful with it. So, the worst thing we could do right now is look away and pretend it's still science fiction because it's not. So, anyways, that's what I wanted to talk about today. If you guys are really interested in this kind of stuff and you want to keep up with all of this type of discussion, then definitely check out my free school community. The link for that is down in the description. You can come ask questions, hang out. We've got about 400,000 people who are building with AI every single day, building businesses, doing all kinds of cool stuff. But anyways, that's going to do it for this one. So, if you guys enjoyed the video, you learned something new, please give it a like. It helps me out a ton.

Get daily recaps from
Nate Herk | AI Automation

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.