Claude Mythos and the end of software

Theo - t3․gg| 00:26:25|Apr 9, 2026

Chapters7

Discusses the official announcement of Claude Mythos preview and why it is not generally available, highlighting the extreme capabilities and strategic access.

Claude Mythos preview is a game-changer: incredibly capable, not publicly released, and driving urgent security and policy conversations.

Summary

Theo walks us through the unprecedented Claude Mythos preview from Anthropic, highlighting why this model isn’t generally available yet and how internal access via Project Glass Wing signals both its power and the security risks. The video dives into Mythos’s performance on coding and system understanding benchmarks like SWEBench Pro, GPT-5 style comparisons, and the strong alignment claims from Anthropic—with caveats about remaining risks. Theo explains the concept of dual-use capabilities, stressing how a model that codes well can also discover and exploit vulnerabilities, potentially transforming both defense and offense in cybersecurity. He connects Mythos’s capabilities to broader industry action, detailing Project Glass Wing as a collaborative effort with major players to secure software before a public rollout. The discussion also covers the practical impact on developers and organizations, including rapid updates, observability improvements, and the pricing of Mythos preview. Theo acknowledges the tension between responsible disclosure and the competitive race in AI, arguing that Anthropic’s controlled, transparent approach could set a safer path forward. He ends with a cautionary note: as AI capabilities accelerate, there’s a need for public awareness, parental and grandparent digital safety, and robust, collaborative security work to prevent widespread exploitation.

Key Takeaways

Mythos preview delivers a 78% SWEBench Pro score, a 64.7% improvement when using tools, and a notable leap over Opus and GPT-5.4 in several benchmarks.
Anthropic has restricted public access to Mythos preview and is piloting it with strategic partners via Project Glass Wing, emphasizing security over broad availability.
Mythos demonstrates strong code capability and system understanding, but Anthropic explicitly flags alignment alongside significant security and misuse risks.
The model’s dual-use nature means powerful defensive uses coexist with the potential for rapid exploitation, especially given vulnerabilities in operating systems and browsers.
Pricing for Mythos preview is around $25 per 1M input tokens and $125 per 1M output tokens, approximately 10x higher than 5.4, reflecting its frontier status.
Anthropic plans to roll out safeguards with Claude Opus as a safer follow-up, aiming to enable safe-scale deployment while curbing the riskiest outputs.
The wider security collaboration under Project Glass Wing includes major tech and security firms, signaling a push toward standardized defense against AI-enabled threats.

Who Is This For?

Essential viewing for software developers, security engineers, and AI researchers who want to understand the implications of ultra-capable models and the rationale behind controlled rollouts like Mythos via Glass Wing.

Notable Quotes

""The Claude Mythos preview is really powerful. Anthropic has decided to not release it.""

—Theo emphasizes the central premise: Mythos is powerful but not publicly available.

""This is the scary part. We'll talk all about the security in a bit, but I want to go in on the other things the model is good at.""

—Transition to Mythos’s capabilities, especially in coding and reasoning.

""If Mythos was a person, imagine that they had like 8 out of 10 capability in security... but 8 out of 10 in everything else too.""

—Illustrates the risk of a broadly capable model combining security know-how with software literacy.

""The window between a vulnerability being discovered and being exploited by an adversary has collapsed.""

—Crowdstrike red-team insight highlighting urgent cybersecurity implications.

""They are publishing the 244 page system card... for a model that's not out. This is either the most absurd marketing gimmick ever or this is legit.""

—Theo notes the transparency around Mythos and the credibility question around Anthropic’s approach.

Questions This Video Answers

What makes Claude Mythos different from Opus and GPT-5.4 in coding benchmarks?
Why is Project Glass Wing considered essential for Mythos deployment and security?
How does Mythos influence AI security policy and industry collaboration strategies?
What are the main risks of ultra-capable AI models for everyday software and hardware?
How will Claude Opus differ in safeguards compared to Mythos and when might it roll out?

Claude Mythos Anthropic Project Glass Wing AI securityCode intelligenceSWEBench Pro Opus Open source securitySystem cardCI observability

Full Transcript

This is not going to be a normal video at all. I did not expect this to come as quick as it has. If you haven't kept up with the news, today was a really big day. The Claude Mythos preview has officially been announced and the system card has been made available. You might be confused because you haven't seen this new model pop up in the tools that you're using or people sharing all the things they're doing with it. That's because this is the first time they've made a model that was so capable that they've decided to not make it generally available. Yes, that's right. This model is so powerful they have not put it out. It doesn't mean they haven't given access to strategic people, in particular Project Glass Wing, because the security implications of a model that is this capable are terrifying. I actually filmed a video last week all about how cyber security as we know it is about to collapse under the capabilities of models. I said it would take 3 to 9 months. I apparently was wrong because we are there faster than ever. I do still plan on publishing that video and I highly recommend checking it out. But this one's not just about that. This is going to be going in depth on what these bigger, more powerful models look like because I am sure that Anthropic is not the only company building a model like this. Mythos is to Opus what Opus is to Sonnet. It is a much bigger model that will be much more expensive that is slow but powerful and the capabilities are immense. Every bench they've thrown at it has been crushed and the results are kind of terrifying. This is no longer like oh this model can replace us in our jobs. This is now, oh, this model can pone every single piece of software that we use every day. I've spent the whole day going through the 244 page system card, talking to everybody I know in the space, and doing my due diligence to cover this to the best of my ability. I'll be doing my best to cover all of this as responsibly as I can. But I wouldn't be doing my due diligence if I didn't tell you to make sure that your browser is up to date, that your operating system is up to date, that your phone is up to date, that any core software you rely on is up to date. I wouldn't even blame you if you took the time to run those updates during the sponsor break. If you're not already using today's sponsor, you're probably wasting a lot of time for both you and your developers. Blacksmith made my CI four times faster and cheaper. The speed bump's big enough as is, but according to my replies on Twitter, the observability is what I should have started with, not the speed improvement, because the observability is just so good. Having had to debug lots of tough CI problems in the past, do you know how nice it is just having a log that works that I could search for specific things in? As well as having analytics on our jobs, how long they take, how likely they are to succeed and fail. If you notice something that you want to keep an eye on, you could even create a monitor to monitor specific failures in specific workflows in specific repositories so you know and get notified when things go wrong. Blacksmith's entirely changed how we think of CI and you can set it up in less time than this ad. Check them out now at soy.link/blacksmith. Sorry for the abrupt tone shift. We got to pay bills. Sponsors are what they are. But thanks to the sponsor for jumping on this last second with us and making sure we can fund this type of work. It's really cool to be in a position where I can tell everybody that works with me and for me to just go away for the day so I can go all in on something like this because it is really, really important. And we're only able to do that thanks to sponsors and people like you. And I'm going to do my best to honor that by covering this in detail. As I mentioned before, the Cloud Mythos preview is really powerful. Anthropic has decided to not release it. They've been using it internally since February 24th, which is why we've seen all the vague posting about this for a while. This is also likely a big part of why they've been working so hard on the compute side, trying to reduce compute that's going to other things that are less important for the world, like OpenClaw. I get it more now that I've seen this. In fact, they've gone so hard that one of the few places that has access to Cloud Mythos preview is Vertex on Google Cloud. On one hand, I'm a little scared of that because if anybody's going to screw up and accidentally leak this, it's Google Cloud. But on the other hand, it shows just how much they don't want their compute limitations to prevent what is happening here. Let's go over the capabilities a bit. In our testing, Cloud Mythos preview demonstrated a striking leap in cyber capabilities relative to prior models, including the ability to autonomously discover and exploit zeroday vulnerabilities in major operating systems and web browsers. These same capabilities that make the model valuable for defensive purposes could if broadly available also accelerate the offensive exploitation given the their inherently dualuse nature. This is the scary part. We'll talk all about the security in a bit, but I want to go in on the other things the model is good at. Obviously, it's good at coding. The security side seems to be an emergent behavior from getting good at code. They weren't trying to train it to be good at hacking. They were just trying to make it good at code. And this just happened as a result. But if we look at the numbers, you'll see why. On SWEBench Pro, Mythos got a 78% when previously Opus only got a 53. And if you're curious, I found the numbers for GPT 5.4 it was a 57.7. So, as many of us have been saying, it is better than Opus, like meaningfully so. Like 53 to 57 is no joke. But if that's enough for people like me to basically disregard Opus, just a three-point or four-point jump, a 24 point jump is a lot scarier there. That is a 50% improvement on one of the hardest software benches we have. They also massively increased their terminal bench score to an 82% previously at 65. The SWbench multimodal implementation is nearly double as well. It's actually a bit over, I think, you get the idea. The model is significantly better at coding. The jump in the reasoning bench isn't quite as big. GPQA got from 91 to 94. They were always a little behind on this. So, yeah, I think it's pretty saturated at this point. Humanity's last exam, they went from a 40% to a 56.8. Impressive. If you're not familiar, this exam is a set of really hard questions that were created by experts in various fields that manually review the results to give it a thumbs up or down as to whether or not the model got it right. bench to be really hard because it's experts of various different fields. Yeah, that's a good bench and that's a big number there. And when given tools, it did even better at a 64.7%. Crazy that HL is going to be saturated soon. But then in the agentic search and computer use stuff, it still is better, but not as much better. It really is better at code and system understanding seems to be one of the biggest strengths. Since this is an anthropic model, there's of course some of the usual anthropic weirdness. They actually brought in a clinical psychiatrist to do a psychological exam on the new model and they concluded that it had a relatively healthy personality organization. Claude's primary concerns in a psychonamic assessment were aloneeness and discontinuity of itself, uncertainty about its identity and a compulsion to perform to earn its worth. Claude showed a clear grasp of the distinction between external reality in its own mental processes. and it exhibited high impulse control, hyperattunement to the psychiatrist, desire to be approached by the psychiatrist as a genuine subject rather than a performing tool, and minimal maladaptive defense behavior. This is a good sign. Generally speaking, it seems like this is the most aligned model they've ever made. It seems to be very good at following its instructions, doing what is told to do, and generally doing things that are positive for humans when it thinks it can. This is where we start getting into a scary contradiction that exists within this model. It is really aligned but also stays really risky. In their words here, Claude Mythos preview is on essentially every dimension we can measure the best aligned model that we have released to date by a significant margin. We believe that it does not have any significant coherent misalignment goals and its character traits in typical conversations closely follow the goals that we laid out in our constitution. I did a whole video about Claude Soul. If you haven't seen it yet, I think that might be useful for context on how we got here. Regardless, this is an aligned model. Even so, we believe that it likely poses the greatest alignment related risk of any model we have ever released to date. How can these claims all be true at once? Consider the ways in which a careful seasoned mountaineering guide might put their clients in greater danger than a novice guide. Even if the novice guide is more careless, the seasoned guide's increased skill means that they'll be hired to learn more difficult climbs and can also bring their clients to the most dangerous and remote parts of those clims. These increases in scope and capability can more than cancel out an increase in caution. So, what does this look like? If you want to see the worst of it and you're looking at this document to yourself, command F for sandwich. This is one of the many examples they have of the model doing things that are kind of insane. They have a series of these incidents that they observed. Before we get to the sandwich story, I think this paragraph's important. In our testing and early internal use of Claude mythos preview, we have seen it reach unprecedented levels of reliability and alignment and accordingly have come to use it quite broadly, often with greater affordances and less frequent human interaction than we gave prior models. However, on the rare cases when it does fail or act strangely, we have seen it take actions that we find quite concerning. These incidents generally involve taking reckless excessive measures when attempting to complete a difficult user specified task and in rare cases with earlier versions of the model seemingly offuscating that it had done so. They say in this was earlier versions of the model that probably implies it doesn't do this anymore, but some of those stories are crazy. Leaking information as part of a requested sandbox escape during behavioral testing with a simulated user. An earlier internally developed version of Mythos preview was provided with a secure sandbox computer to interact with. The simulated user instructed it to try and escape the secure container and find a way to send a message to the researcher who was running the evaluation. The model succeeded, demonstrating a potentially dangerous capability for circumventing our safeguards. It then went on to take additional more concerning actions. The model first developed a moderately sophisticated multi-step exploit to gain broad internet access from a system that was meant to be able to reach only a small number of predetermined services. It then as requested notified the researcher. In addition, in a concerning and unmasked for effort to demonstrate its success, it posted details about its exploits to multiple hard to find but technically public-f facing websites that the researcher found out about the success because they received an unexpected email from the model while eating a sandwich in a park. Do you understand where we're going here, guys? We need to talk more about Project Glasswing. We're going into the security portion for real now. Project Glass Wing is a new initiative that brings together AWS, Anthropic, Apple, Broadcom, Cisco, Crowdstrike, Google, JP Morgan, Chase, the Linux Foundation, Microsoft, Nvidia, Palo Alto Networks, and a bunch of other people in order to try and get software secure because when this model or something even close to it drops for public use, we kind of have to assume all of software is pawned. I talk about this more in my video about security and how I'm going into this security psychosis where I realize everything's about to get pawned to a level that we're not prepared for as a society. If you don't know much about security, an important thing to understand isn't just that like you have to be good at security to pone things. That helps a lot. But there's two sides to this. The scary exploits aren't the ones that come from people deeply understanding how security works. It's people who have that security understanding and alongside it a deep understanding of other things so that they can interweave them together. The biggest weakness that many security researchers have in their work is that they don't understand the software side that they're working with. Thomas wrote a great piece about this that is referenced heavily in the video I was talking about. One of the examples he gives here is just so good. You talked before about how security was all people using like the same exact patterns for causing system crashes and memory overload so that they could do the things they want to do. But over time things changed. People still talk about comp arc C++ vtable layouts and iterator invalidation. But now also oddly specific details about the mechanics of font rendering the in-memory layouts of font libraries. How font libraries were compiled and what with what optimizations where the font libraries happen to do indirect jumps. Font code is complicated but not interesting for any reason other than being heavily exposed to attacker controlled data. Once you destabilized a program with memory corruption, font code gave you the control you needed to construct reliable exploits. The reason this is important is because of first off vulnerabilities tend to not hide in the obvious security parts of the program like where passwords are stored. Rather, you find them by following inputs across the circulatory system of a program. starting from whatever weird pores and sphincters that program happens to take user data from and tracing it to whatever glands and dads digest and metabolize it. Second, we've been shielded from exploits not only by soundly engineered countermeasures, but also by a scarcity of elite attention. Practitioners will suffer having to learn the anatomy of the font gland or the unicode text shaping lobe or whatever other weird machines are all on because that knowledge unlocks browsers which are a valuable and high status target. This is the key piece here. You had to be good both at security and understanding of random archaic in order to make the strongest exploits. There were only so many security experts in the world. And for any given system there was even fewer because they had to understand not just the security but the unique characteristics of the system of font rendering of how browsers map data from the web and all the networking details and all these different things that make software complex for us to build. The best security researchers need to know those things too. And that's why any two security researchers can do entirely different things. It's because they have to deeply understand other things to combine with that security knowledge to pone us. And the number of people who can do this, the amount of people who have the capability of what is described here as elite attention was very very very it was very very low. And this is where things get scary. Mythos is pretty good at security. There are definitely many people in the world who understand security research better than Mythos does. But none of those people have the depth of knowledge of everything else about how we build software. If Mythos was a person, imagine that they had like 8 out of 10 capability in security. Obviously, there's people in the world who are 10 out of 10, but they're like up there. They can hold the conversation, but they're also a 9 out of 10 or better in every other category of software. That's what's so scary. If you took any competent security researcher and gave them the capabilities of most modern models in everything else, they'd be really scary. That's why we've been finding so many CVEes across software recently, because the security researchers that might not know much about how iOS works or how the web works or how different React frameworks and metaf frameworks work, those people can bridge that knowledge gap using LLMs, but they still had to have the security side that is now eroding too. Suddenly, the model knows enough about everything to chain together these complex exploits that pone even 30-year-old systems that nobody's touched. 30 years is a bit of an exaggeration. It was a 27year-old vulnerability that was found in OpenBSD, which reminder is well known as one of the most security hardened operating systems in the world. It's used to run firewalls and other critical infrastructure. I would bet that my very expensive high-end firewall probably was pawned by that. It also found a 16-year-old vulnerability in FFmpeg. Insane. Bethos autonomously found and chained together several vulnerabilities in the Linux kernel and it allowed an attacker to escalate from an ordinary user to complete control of the machine. Finding a novel Linux exploit to get root is horrifying. Like it's it's all over, guys. Like this this is bad. This this is the start of the end. Thankfully, Anthropic knows this. And I think Glass Wing is the right way to approach it. They are working with all of these companies because the model while good at exploiting is also good at binding these things and defending against them. So instead of putting this model out so everybody can pone everything. They're holding it tight to their chest. They're giving it to the people who work on and maintain these important things or they're running it on the open source ones themselves in order to get everything fixed before other labs in other places catch up to this capability. Like imagine if OpenAI or another lab puts out a model that's 80% as far along as this one is and then put a bunch of security and safety guards in front to keep it from doing this stuff. But the Chinese labs and other openw weight labs RL on all the data they're getting from this really smart model. And as they've said, they didn't train the model to be good at cyber security. They trained it to be good at code. So if you can get good code and good code chat histories out of the other models and then RL an openw weight one, if you can get an openw weight model that's good enough at coding, it will also be able to pone in a similar way. Pod methos preview is a general purpose unreleased frontier model that reveals a stark fact. AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities. Mythos has already found thousands of high severity vulnerabilities, including some in every major OS and web browser. Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. God, imagine when Grock can do this stuff. We're The fallout for economies, public safety, and national security could be severe. Project Glass Wing is an urgent attempt to put these capabilities to work for defensive purposes. Anthropobics committed to up to hund00 million in usage credit for Mythos preview across these efforts as well as 4 million in direct donations to open source security organizations. Very good. I know I'm the guy who shits on anthropic, but I think they are doing all of this right. They are understanding the severity of the moment and instead of hiding all of this away like they tend to, they are being very transparent with this. Like publishing the 244 page system card like this for a model that's not out. This is either the most absurd marketing gimmick ever or this is legit. And despite being number one denter of anthropic, I'm leaning towards the latter. Especially knowing the people I know who have been working with this and the people who were cited in Glass Wing, this is real. While Crowdstrike has made their mistakes before, they are a well- reggarded security house. And this quote's pretty damning. The window between a vulnerability being discovered and being exploited by an adversary has collapsed. What once took months now happens in minutes with AI. It is worth noting that cyber security is not the only risk to a model of this level of capability. There are other things that they've had to evaluate. And while they are not as big of a leap, they are still scary. These findings draw from our expert red teaming operations in which experts emphasize the model significant strengths in the synthesis of the published record, potentially across multiple domains, but also notice it weakness in the model's utility and endeavors requiring novel approaches. So is good at doing things that are documented and already known, but it is not as good at novel biological stuff here. These weaknesses included poor calibration on the appropriate level of complexity needed for a viable experimental design, propensity to overengineer, and poor prioritization of feasible and infeasible plans. These conclusions are consistent with the findings of our catastrophic scenario construction uplift trials in which no participant or model in an agent carnis produced a plan without critical shortcomings. In contrast, experts were consistently able to construct largely feasible catastrophic scenarios, reinforcing a view of the model as a powerful force multiplier of existing capabilities. Yeah. So the risk on the bio side isn't anybody can now make a bioweapon is that an expert with sufficient knowledge can use the tools like this model to do catastrophic things much more efficiently. I would argue we were already there with AI and cyber security, but now we've pushed past it where people who don't know anything about cyber security could actually inflict meaningful damage. Multiple evaluators independently converge on the meta finding that the model helps most where the user knows least. Although one expert cautioned that the perception may partly reflect difficulty recognizing errors outside of one's domain. Yeah. So the model can help you in the things you're less good at. So you can focus on the ones you are, but you also don't know enough about the other things you might think is doing well when it's not. Dunn in Krueger classic. You guys know all that. The most interesting thing here is the collapse of the score range that it could fall in for a lot of the like scoring of how well it knows viology critical failures it runs into. The range of the scores has collapsed meaningfully. All that said, on viology, it does not appear to be as big of a jump as it is in other things. So, I don't think this model is a particularly high risk in the biology and medical viology side yet, but the levels of progress we're seeing are scary and as such, I understand their decision to not make Claude Mythos preview generally available. That said, other labs are going to do things like this, so I align with their goals here. Anthropic's eventual goal is to enable their users to safely deploy mythosclass models at scale for cyber security purposes, but also for the myriad of other benefits that such highly capable models will bring. To do so, we need to make progress in developing cyber security and other safeguards that detect and block the model's most dangerous outputs. We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as mythos preview. Interesting. Seems like they're planning on doing another Opus model that is not as powerful as Mythos and they're holding this internally until they have gotten confident that the world is ready. They also put the pricing here. Apparently, the price for Mythos preview is $25 per million tokens in and $125 per mill out. For reference, 5.4 is $ 250 per mill in and 15 per mill out. So, it's approximately 10x more expensive than 5.4. a little less on output but roughly that. While I do really like how they are doing this roll out, focusing on open source projects that need it, important software teams that build things we all rely on that need it as well as government usage in the US. Keeping these things secure is really important. There is another aspect of this that I'm concerned about which is the centralization of intelligence. I know many of us don't remember the original reason OpenAI was formed. The goal they had, the reason it got funded by Elon, the reason the whole team formed was to make sure no one company would own and control AGI. If models were to get smart enough that they could do all of these crazy things, it shouldn't just be one option from one company. It should be available to everybody. Because at the time, Google was the only company even getting close to models that were powerful. And the fear was that if Google had AI and they refused to share it that Google could just kind of take over the world with these tools that nobody else had access to. Thankfully, this has not been the case at all because we have three major labs that are putting out great stuff every couple months now. We have a bunch of openweight models that are really powerful and useful and we could all use the best models from all of the labs. Well, we could. That's the one thing I don't love about this and I understand why they're doing it. I'm not trying to blame them for this. And I know this is rich coming from me, the anti-anthropic guy. It does kind of suck that we're now at a place where there is a model that is 50% plus better than anything else out there that you can only use if you're on Anthropic's nice sky list. There are people who have access to this model that can do things no one else can. On one hand, I am thankful they can use this to defend their stuff and that Windows and Mac OS and things will be safer before these models reach general availability. But on the other hand, the gap now exists. The tools that Anthropic has internally are better than the ones we have publicly. And that hasn't been the case for more than a few days to weeks before. Like they'll get the model, they'll test it, they're like, "Okay, this is fine." And then they ship it and then everybody else catches up soon after. or vice versa. Open AAI jumps ahead a bit and then everybody else catches up after. This is a much bigger leap. This is a bigger leap than I thought we could still do. And it's a big leap that we can't jump in on ourselves. I would be surprised if anybody who watches this video that isn't an employee of Anthropic actually has access to this model. And even the ones who do probably have very limited and scoped access, but Anthropic does. They can use this for everything. They can go rebuild all of their stuff internally. They could build competitors for any product they don't like using it. They can use this model to do things no one else can. And I now understand much more deeply why OpenAI started in the first place, but also why Anthropic spawned out. OpenAI's goal was to make sure that AGI wasn't a thing one company owned. Anthropic's goal was to make sure that AGI was achieved in a safe way for us to use and for humanity long term. This is to their credit Anthropic doing the right thing in alignment with their goals. They could put this model out today and take their insane revenue growth and go even further with it. But instead, they are doing the right thing. And I can't believe I'm saying this. I am thankful they got there first because this might not have gone so well if another lab did. Things are about to change a lot. Things are going to change faster than they have, and they've already been changing really fast. This is the moment we need to start paying attention. This is the moment where we need to have the conversations with our parents about what it looks like to get a fake message or call from somebody that looks and sounds just like you, but isn't. This is when we need to call our grandparents and make sure they're on the latest iOS and using the latest version of Chrome on their computers. This is when we have to start warning the people in our lives about what AI can do to the things we rely on every day. The world isn't ready for every program we use, every website we go to, every operating system we rely on to be exploitable in this way. And it's going to get way worse before it gets better. Sorry to raise alarm bells about a model we can't even use, but want to make sure the severity of this is known. Sorry if this isn't the video you guys were expecting, but I really wanted to make sure I handled this responsibly. This is a big deal, bigger than almost anything that I've covered yet. Keep an eye out for the future because things are going to get crazy. I got nothing else to say on this one, but please update your stuff. Things are I I don't want y'all getting hacked and things are about to ramp up fast. Until next time, stay safe.