Claude Fable 5 Made This Entire Video By Itself.
Chapters8
The creator reveals that the video is AI generated, with a cloned voice, AI avatar, and all script and edits produced by Claude.
Claude Fable 5 can generate an entire YouTube video—from script to avatar to editing—without a single human touch.
Summary
Nate Herk showcases Claude Fable 5, a Mythos class model from Anthropic, and demonstrates how it can autonomously produce a YouTube video. The clip explains that the avatar, voice, and script were all created by AI, with Nate merely supplying one goal prompt. Claude Fable 5 represents a leap beyond Opus, unlocking capabilities previously reserved for vetted security partners. The video cites striking benchmarks, like Stripe’s claim that Fable 5 compressed months of engineering into days and completed a Ruby migration on a 50-million-line codebase in a single day. Vision improvements let the model rebuild a web app from screenshots and even beat Pokémon Fire Red using raw images. A core feature highlighted is long horizon focus with file-based memory, demonstrated as Claude plays Slay the Spire and maintains context over millions of tokens. The production pipeline involves Claude drafting the script, 11 Labs cloning Nate’s voice, and Avatar 5 rendering the motion with a workflow that includes ffmpeg, HTML-like motion graphics, and Hyperframes. The result is a fully edited YouTube video produced in one prompt, with built-in verification steps and automated quality checks. Finally, Nate notes costs, caveats, and the potential to replicate the style with other tools, while stressing responsible use and budget considerations.
Key Takeaways
- Claude Fable 5 is a Mythos class model available on paid plans, expanding access beyond vetted security partners.
- Stripe claimed Fable 5 compressed months of engineering into days and handled a 50-million-line Ruby migration in one day.
- Vision improvements allow rebuilding a web app from screenshots and even completing a Pokémon Fire Red run from images alone.
- Claude’s long horizon focus uses file-based memory, enabling consistent performance across millions of tokens.
- The production pipeline can run from a single goal prompt to a finished video using script drafting, voice cloning (11 Labs), avatar rendering (Avatar 5), and automated editing with ffmpeg and Hyperframes.
- This demo costs roughly $0.10 per 1,000 input tokens and $0.50 per 1,000 output tokens, highlighting budget considerations for long runs.
Who Is This For?
Essential viewing for AI enthusiasts and content creators curious about autonomous video production. It shows what next-gen models can do for scriptwriting, voice cloning, and automated editing all in one workflow.
Notable Quotes
"What you're watching right now was not filmed. This avatar is AI. The voice you're hearing is a clone of mine."
—Opening claim that the video is entirely AI-generated, establishing the premise.
"Every single word of this script was written by Claude."
—Emphasizes Claude Fable 5’s role in writing the content.
"One prompt went in and a finished, fully edited YouTube video came out the other side."
—Highlights the end-to-end automation capability showcased.
"This thing stays locked in across millions of tokens."
—Describes the model’s long horizon focus and memory strategy.
"I could definitely replicate this style now that I've already built it out once."
—Hints at replicability and potential for broader usage beyond Fable.
Questions This Video Answers
- How does Claude Fable 5 keep context across long sequences of tasks?
- What is a Mythos class model and how is it different from Opus?
- Can you actually produce a video from start to finish with AI alone, and what are the costs involved?
Claude Fable 5AnthropicMythos class modelVision (AI)Avatar 511 LabsffmpegHyperframesPlaywrightGap scripting
Full Transcript
So, I literally just opened up Claude Fable, gave it this/Gole, went down to the gym, and came back to this. What you're watching right now was not filmed. This avatar is AI. The voice you're hearing is a clone of mine. And every single word of this script was written by Claude. I didn't write this. I didn't film it. I didn't edit it. And while it was being made, I never saw a single frame of it. I just typed one prompt into Claude code and walked away. and everything else, the research, the script, the voice, the avatar, the motion graphics, all of it happened on its own.
So, this week, Anthropic released Claude Fable 5, and that's basically the only reason this video can exist. It's the first time a Mythos class model, that's the tier above Opus, has been available to anyone on a paid plan. Until now, that tier was locked to vetted security partners, and it's state-of-the-art on nearly every benchmark they tested. So, let me show you guys what this thing is actually good at and then exactly how it made this video. So, the coding numbers first because they're kind of nuts. Stripe said Fable 5 compressed months of engineering into days.
And in the announcement, there's a 50 million line Ruby codebase where it ran a full migration in a single day. A job that would have taken a whole team over 2 months by hand. And Vision took a big jump, too. It can rebuild a web app's source code just from screenshots. And it actually beat Pokémon Fire Red start to finish on raw screenshots alone. No maps, no navigation aids, where older Clawude models needed a whole helper harness just to play. But the one that matters most for this video is long horizon focus. This thing stays locked in across millions of tokens.
Anthropic gave it a file-based memory, like literally just files it could write notes to and had it play Slay the Spire. And it reached the final act three times more often than Opus 48. Now, it's not cheap. 10 bucks per million input tokens, 50 on the output. But you guys are about to see what that buys you. Okay, so real quick, how did this video actually get made? First, the script. Claude read Anthropic's full announcement, fact checked every claim you just heard, and wrote this entire thing in my voice using a voice playbook built off my actual transcripts.
Then the voice, it sent that script over to 11 Labs where I've got a voice clone trained on my real videos. And the trick is, you can't just generate like four straight minutes of audio because the longer a generation runs, the more the voice starts to drift. So Claude split the script into chunks, just under a minute each, and generated them separately. Then every chunk went to hijen to render on my avatar on the avatar 5 model, their newest motion engine. And for a while, you couldn't even select Avatar 5 through the API. So the workaround was clawed literally driving a browser with playright and flipping every video by hand.
Their new API finally exposes it. So this one went straight through. But at that point, it's just a pile of raw avatar clips and nothing's been edited yet. And then the editing, which is usually the part that takes a human days. Claude stitched the avatar clips together with ffmpeg, ran a wordle transcription, and built every motion graphic in this video as actual code, HTML animated with Gap inside hyperframes, timed to the exact words I'm saying. Then it checked its own work. It rendered out frames from every scene and visually reviewed them. And anything that looked off got fixed and rerendered until it all passed.
So, one prompt went in and a finished, fully edited YouTube video came out the other side. That's what a MYOS class model does the same week it comes out. But anyways, that's going to do it for this one. So, if you guys enjoyed the video or learned something new, please give it a like. It definitely helps me out a ton. And as always, I appreciate you guys making it to the end of the video. I'll see you on the next one. Thanks, everyone. I mean, isn't that amazing? Even all those sound effects, everything in there, one shot by Cloud Fable 5.
Now, two quick things to keep in mind. First of all, if you copied that exact same prompt, I'm not convinced you would get the exact same results because I've got a few different like hyperframe skills that are already in there. And then number two, I don't think you actually need Fable to do all this. I could definitely replicate this style now that I've already built it out once. I could build a skill around it, but I think that I could replicate that style with probably even Sonnet. This is the actual session that I ran. This only took an hour as you can see.
And I used /goal. So the goal was achieved in an hour. It took about 400,000 tokens, 380,000 tokens. But keep in mind, I did have it spin up a dynamic workflow at the end to verify everything. So, it had a bunch of agents taking screenshots and verifying everything. Even those sound effects, all of the sound effects in that final render was built right in here with Claude Fable and Hyperframes. You'll also notice that I was on max. So, obviously, there was a lot of energy being put into here. But when it spun up the sub agents, all of those sub aents in the workflow were not fable.
But, you do seriously have to be careful. This was obviously me doing an experiment, and I just wanted to see what it could do. This ate up about 40% of my $200 a month plan. So, in 1 hour it ate up almost half of the plan. So, obviously be careful. You can see here it says done. The video is ready to upload. This is where it lives. This is how long it is. Here's what I built. Here's how it was verified. And it has this weird thing where I'm trying to scroll up to show you guys the prompt, but it like cut off.
So, let's see if I can recover that. Okay. So, here is the exact goal prompt that I set. I'm not going to read this entire thing, but you guys can pause the video and and read it if you want. You'll notice here at the end what I did is I I gave it context. I said, "You should only stop when you are 100% confident that this is a high quality video. this will be going out to my YouTube channel. So, if it doesn't look good, you know, it's high risk. It will damage my reputation. Now, obviously, like with slash goal, you want to do things that are pretty objective.
But I have found that when I give it context so it understands why we're doing something, it tends to understand a little bit better. I also said here, now after you build it, verify it. Use a dynamic workflow to visually verify and validate that the entire video is perfect. The motion graphics come in on time. There's nothing out of bounds. Everything is aesthetic, and everything fits within the goal of a completely finished and fully vetted and reviewed YouTube video. So anyways, this was my glido word vomit into claw goal and then that's what we got.
So that is going to do it for today. You'll even notice that my hey Jen ended the video the exact same way I always do, which is if you enjoyed the video or you learned something new, please give it a like. It definitely helps me out a ton. And as always, I appreciate you guys making it to the end of the video. I'll see you on the next one. Thanks guys.
More from Nate Herk | AI Automation
Get daily recaps from
Nate Herk | AI Automation
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.









