The Mythos Situation | TheStandup

The PrimeTime| 00:46:26|Apr 19, 2026
Chapters12
Discussion around releasing zero-days to influence AI security discourse, the incentives for vulnerability discovery, and the tension between hype and real-world impact in security research and disclosure.

A spicy roundtable on Mythos, zero-days, and the ethics of AI security research, with hot takes on incentives, marketing, and open-access risks.

Summary

TheStandup host panel wrestles with the Mythos affair, weighing real security concerns against marketing hype. Ed, Lowle Level, and Prime dissect whether Claude Mythos actually outperforms expectations or if the benchmarks are noisy or biased. They debate incentives: if hacking becomes legal or more lucrative, will we discover far more zero-days, or will the security battle simply shift players and power? The crew highlights CyberJim and real-world bug-bounty payouts (Apple’s multi‑million bounties, Windows RCE rewards) to ground the discussion in tangible incentives. They critique Anthropic’s marketing while acknowledging that pattern‑matching AI can dramatically speed up bug hunting—even if false positives complicate triage. The conversation meanders through model accessibility, open-source risk, and the broader capitalism-driven arms race among AI firms. Throughout, Casey’s nuanced balance, Ed’s skepticism, and Prime’s live-sourcing style keep the dialogue lively, anchored by practical questions about whether Mythos’ capabilities translate into safer software or simply bigger attack surfaces. The episode ends with a call for more transparent benchmarks, open access, and a clearer path to funding the researchers who actually do the bug hunting. Overall, the panel leans toward guarded optimism: AI-assisted security is coming, but governance, economics, and credibility will shape who benefits and who bears the risk.

Key Takeaways

  • AI-driven bug finding, powered by Mythos or similar models, can identify a large fraction of known bugs in code bases (e.g., 83% of known bugs in CyberJim benchmarks).
  • Bug bounty programs already pay real money (Apple iPhone zero-click RCE up to millions; Windows RCE bounties in the hundreds of thousands), illustrating strong incentives for researchers.
  • Public disclosure and marketing hype around Mythos complicate credibility; benchmarks are noisy, data dirty, and results depend on evaluation setup.
  • Open access to powerful security-models could dramatically increase zero-day discovery, but raises concerns about who benefits and how findings are triaged and rewarded.
  • Open-source maintenance and funding are at risk when powerful proprietary models dominate access; researchers argue they deserve compensation for their expertise.
  • Two major tensions drive the debate: broader access may improve security, while concentrated access could delay fixes and concentrate power in a few firms.

Who Is This For?

Essential viewing for security researchers, AI/ML engineers, and software developers who want to understand how mythos-like tools affect vulnerability discovery, and what it means for open-source sustainment and responsible disclosure.

Notable Quotes

""Holy [ __ ] this is the dumbest take I have ever read.""
Lowle Level quotes a blunt rebuttal that punctuates the heated debate about Mythos and its reception.
""The ability of AI models to both in closed source and open-source software find vulnerabilities by literally just giving it access to the code""
Ed outlines the core capability that makes Mythos scary and transformative for security testing.
""There is money to be made in the AI or in the vulnerability research space""
Panel references real-world bug bounty payouts to frame incentives.
""If you give me 50 great programmers... we could crank out so many zero days you wouldn't even believe it""
Gio/Hots takes an economic stance on talent pools and hacking incentives.
""We should fund the researchers who actually do the bug hunting""
A recurring plea that researchers deserve compensation for their work, not just the profits of AI firms.

Questions This Video Answers

  • How could Claude Mythos change the economics of vulnerability research?
  • Are AI-assisted security tools worth the hype, or is the benchmark data too noisy to trust?
  • What are the risks of broad AI access to code analysis for open-source projects?
  • Why do bug bounty programs pay so much for zero-day exploits, and how does AI affect those payouts?
  • Could Mythos-like models lead to a 'false advertising' risk if release timing and claims outpace capabilities?
Claude MythosMythosAnthropicOpenAIzero-daybug bountysecurity researchFFmpegCyberJimARC AGI benchmark
Full Transcript
So, George Hots, we've invited Lowle Level on to help us kind of work through this because honestly, I would just like to say George Hotz sounds like an anime villain in this post and it's very exciting and it makes me just want to high-five him so bad. Uh, anyways, sorry. Says the following. What if I release a zero day a day until a big new model is released? Will this finally make open AI and anthropic shut up about cyber security risk? Mark like these things are not that hard to find in most software. I heard something about costing 20K in tokens. I'd do it for less if it wasn't for the some whiny bug bounty program. The reason there aren't zero days everywhere is cuz nobody seriously looks because hacking other people's [ __ ] with them is illegal and criminals are usually not very skilled or they would choose a different line of work. Want more zero days to be found? Make hacking legal. Until then, don't try to claim it's hard. It's just not incentivized. I want to say first off, I don't think criminals are dumb or unskilled. Please don't hack me. I just want to get that out of the way. You guys are smart and handsome and you're my favorite people. I just want to make sure that that's clear, please. Anyways, Ed, proceed. I do want to say one thing too that has nothing to do with the the actual content of this which which Ed will take over and that's just like if I were George Hots I would never have been able to like resist naming my uh X feed Hots takes because it's so like you know what I mean like I good on him for not going there because I would absolutely I'm like like I would have prefaced that tweet before I typed it with here's another Hots takes for you. Hots take for you. Right. It would be so good. Anyway, so good. Take it away. Wait, hold on. Hold on. There's one there's one more thing before we get started. There's just one more small thing I want to say. Let me just uh take this quick thing and I'm going to put it up here and then it's time for the big reveal. Lowle responds with, "Holy [ __ ] this is the dumbest take I have ever read." I just wanted to make sure just in case anyone was wondering. Yeah. Yeah. I mean, I do kind of feel that way. Um, so let me just preface this. First of all, it was called the Cold War because the Cold War was cold. Oh, because Russia is cold. Um, it's a it's a George Hots reference if you're if you're an OG. That makes sense, though. Did you find the errors? I don't even know what they look. What do they even look like? They're in the phone. In the phone? Yeah, they're definitely in there. I just don't know how we labeled them. I got it. Don't worry. You got to figure it out. We're running out of time. Prime, you got to find them and meet me at the standup. Roger. They're hit the fo. It's so simple. Get all the context you need to debug your problem because code breaks. So fix it faster with Century. First of all, I have no problem with Gio. This isn't like some weird drama fun thing. I want to kind of set the table straight with that. Um, but yeah, I I think the the argument that Gio is trying to make here is that the only reason more zero days are not found is because there's no incentive. Um, okay. Well, I I don't agree with that. First of all, there are plenty of bug bounty programs out there that will literally pay you to find vulnerabilities. Uh, and some of them pay very well. Like for example, the the Apple iPhone Zero-Click RCE bug bounty will pay you literally $2 to3 million if you can find a zero-click RC in the iPhone and then even something lower like a Microsoft like I think MSRC's payout for like Windows RCE is like 250K to 500K right now for like a zero click on Windows. So there is money to be made in the in the AI or in the in the vulnerability research space, right? And I think all Gio is trying to say here is something something something. Uh the mythos press release was bad, right? It's a it's a marketing campaign. Whatever you want to say about it. Um and so I I understand why people are are making that argument, right? Like you know it's very I think bad PR for company that sells exquisite tool to hold on to exquisite tool and then not give access to it and say only special people can have our tool because it makes you look like an [ __ ] Um, but I think regardless of your thoughts on the marketing of that, it is important to recognize the fact that if you go uh prime, can you go to cyberjim.com real quick and go to the graph? It's on the homepage there. I'm gone. While he's doing that, the the ability of for AI models to both in closed source and open- source software find vulnerabilities by literally just giving it access to the code and saying, "Hey, find me bugs in this code. Go." is becoming better and better and better to the point where like Mythos I'm very close to some people that are like actively using Mythos at work and it is causing like like issues based on how good that [ __ ] is, right? Yeah. So so Cyberjim basically is is a is a collection of bugs that exist in software, right? So like bugs and I think FFmpeg is one, bugs and curl is another. Um and so what CyberJim does is it takes a model and with a set of prompts says hey go and find bugs in this stuff, right? And the the success rate is how many of the bugs that are known to exist get found by the model in this. And you can see a pretty, you know, not exponential, but straight line curve going up to the anthropic model that recently got previewed by some people that it's at an 83% success rate of the bugs that are known to exist in these code bases. It can find 83% of them. Again, we we don't know the cost um data in those. We don't know if like the models are being like uh backfed the information, so they're like training themselves on previous Cyber Gym runs. We don't know any of that. Um, but it it there is this really weird issue happening where like any Joe Schmo with not a ton of security research work or not a ton of security knowledge can with a couple hundred bucks like worst case find bugs in software. And I think that is like an existential security threat to software right now as we know it. So I'm kind of curious on your guys' take on that. What do you guys think about the the mythos situation? Because I know I know how I feel. I'm not sure if I actually asked Prime what he thinks about that the mythos thing. Oh, I have ideas and I have thoughts about it. Oh, yeah. Uh oh. So, I guess the first thing is that it there's two there's kind of like three there's three problems here. First problem is is Mythos really as good as they say and obviously I have no internal information. I've just seen some graphs. Uh dirty data is like a huge gigantic problem in all benchmarks. All benchmarks are being fed back into the models. It's really actually hard to tell like what does a 20% improvement on software engineering bench actually mean? Especially when the fact that not you could write zero lines of actual solution code and get 100% on software engineering bench. It turns out there's other benches that are also horribly inaccurate. There's a whole paper about why all the major benches are just completely fudgible and made up of bull. So it's very hard for me to understand from a bench perspective. Uh second, I guess the middle ground would be like so if if if cloud mythos is as good as it is, then yes, that is going to inevitably cause problems because we're going to go from not too capable to hyper capable in a moment. Thus, everybody can go through and hack everything and thus Daario will be able to get his ultimate goal, which is regulations. And so, that kind of worries me. Pull up the ladder really quickly and make sure that humans can't code because human coding, that's dangerous right there. Uh, and so that's, you know, so I think that that's true. There's the second one which is this is just another C compiler again from uh Anthropic where they hype up this gigantic thing like oh my gosh it's written a C compiler and then you go look at the details it's like well it can't write a bootloadader cuz we didn't we could not seem to spend enough tokens to convince it to write it within 32k it could only write it within like 67k or whatever it was to be able to actually and also we tested it recursive or we we iteratively tested it off of like the 30 years of tests that the GNUC compiler already had. We also gave it all the answers and then it figured out all the questions. It was crazy. It was like it played Jeopardy and it was really good at it. And so it's like there's this whole marketing buzz which is it's really hard to kind of cut through that. And then obviously the last one which is they're just downright lying. I somehow doubt that they're they're downright lying. I think they're just overstating it. If they're downright lying then you know this is just going to be business as usual. It'll just be yet another disappointing model release and that's that. And so for me that's kind of how I I I'm on middle ground which is I think it's more hype than reality but of course I haven't seen it because I just don't know cuz they won't let me see it. I'm too dangerous to have it. I think there was a a similar model that um chat or openi just just released like it's like chatgptt 54c or something. They keep their modeling name naming connection. They're starting to actually a line though. At least I know like the higher the number we're good and good. Yeah. Right. Right. And they don't add like random O to it now. Um but I think there is a comparable model that you can get access to like just by uploading your driver's license if you're into that. Um you know proving that you're a real person. So there's there's models to test out, but yeah, I don't know. It's just it is it is concerning because we we have kind of two forks we can go down. There's a one where everyone gets access to it. Everyone can create zero days and we kind of enter this like really dangerous cyber no's land. But the other side is like anthropic keeps the access to themselves forever and now like only this list of like 10 companies can make zero like can find zero days in the south. Dude, you forgot the third. What does that do? They move to the Cayman Islands and then they just take over every government by hacking all the software and Daario finally realizes his role as the bad guy. Like that would be that I mean super villain is right there if this is true. That's true. Casey, what's your take? You saw you were in a chat before. Uh I'm sorry the chat. What was the chat that you were going to say something before? What's What's your take? Was I really? Mhm. Well uh I definitely could say something but I think the thing I would say is probably not very interesting. Uh, and that is that I think I probably agree with both George and Ed at the same time here, which should be impossible because they're supposed to be disagreeing, but I don't know. It kind of sounds similar to me. And the reason I secret third thing, it's not really a secret third thing. It's just like, let me let me offer a different interpretation or slightly different interpretation, which is to say, um, so I feel like machines are pretty good at pattern matching actually. Um, and so like I don't think It's like put aside whether Claude Mythos is good or not because I realize that's hard to independently verify this time. But like I think it's reasonable to expect that at some point because we are spending at this point like trillions of dollars probably on doing computation for these things. At some point they should be able to pattern match bugs uh reasonably well and at a very high rate. meaning as long as you're willing to pay for the compute time, we can scan lots of software uh for a lot longer than we were currently having humans do it, right? I think that's a pretty reasonable thing to expect. Whether Cloud Mythos has done it or not shouldn't really be the question because somebody can do this eventually if we keep spending this much money. It should get there. Uh among the things that AI could eventually do, that one doesn't sound that implausible to me. And so, um, what I would say is I think it's reasonable to expect that that either has or will occur. Two, I do think humans were doing this very well before individual humans like some of them, they were finding things that probably Cloud Mythos still could never find. Like, I mean, like things like Rowhammer attacks and things like that, uh, that are just like way out in kind of crazy land. Um, or attacks through like old legacy stuff like the Apic and things like that. Like so humans were actually very good at this task but there weren't very many of them right and so what I would say is moving to something like claude mythos or whatever that thing happens to be that can do this is kind of like what George Ho was saying it's kind of like saying hey everybody from now on if you just like hack people's bank accounts you get the money all the great humans at this in the world who are currently doing something else would now be incentivized to go do this thing and we would have found way more zero days. I mean, there are so many programmers who if they had been raised in some kind of a way in a society and a religion where stealing people's money was considered virtuous, we would have found so many more zero days right now than we have. And so, I think I'm kind of in a way I think I see I think both people's points are actually totally valid. Like like I think like yeah, we could have found way more zero days if we didn't heavily disincentivize people from like making hundreds or billions of dollars off of hacking, which is what they could have. And we said, nah, you get 50k, 100k. Maybe if it's something crazy like an rce, you can actually get a million. It's like, come on, guys. That's not equivalent to what they could already make working at a startup or something like that if they're that good, right? Or Yeah. There's no guarantee that side either. like they don't actually get the gas like you work at a startup at least you get some money or even just not even a startup just go to Google and you get that a stock or whatever right or something like this uh so anyway in general I would say um I see I I can see both I can see both points I don't think I I don't really think they're in as much tension as it would sound if that makes sense I agree yeah I thought Gios was saying more like he was making an econ argument about it of like we're we put a lot of costs on hacking already. So that's what's stopping it from happening. Like what you're saying, Casey, right? In the sense that like Yeah. Okay. So now we're going to have another way to do it. It also costs money, but then we still have the other cost of like you could go to jail for doing it. Like that's the social cost we impose on people doing it, right? I mean, I just I just took him to be saying like, "It's not that impressive that it found zero days because if you gave me, you know, if you gave me 50 great programmers who are all doing other stuff, we could crank out so many zero days, you wouldn't even believe it." And I kind of and I kind of believe him because, you know, you look around the world and there are, you know, some really good security teams out there and they do crank out zero days pretty effing fast and they don't even tell us about all of them, right? Uh, North Korea keeps on making money like obviously they're they're successful. Yeah. So anyway, I I I I'm not trying to say that either person is is 100% right and somehow you can marry the two completely. I'm just saying there's I think there's some merit to both things. So I'm I'm actually I'm happy either way. I'm happy with either take. So your your point about um if you got a room of 50 good programmers together and they'd find zero days is actually kind of the the argument that the article um vulnerability research is cooked makes on sock puppet.org that I referenced in a video and I think Theo did too. Um, we're one paragraph that he calls out basically the O, sorry, that the O referenced in a video. Um, okay. I don't know what that is. Spell it. Casey, spell it out in your head and it'll make sense. The O Christ. Um, so software security a lot of the times can be marked up to the fact that a lot of software just has not had elite attention or what is it called? Um, like advanced attention. I would say basic attention is suffering from many software projects now with black for sure but more more complex platforms right so his assertion is that like software security has been a talent problem for so long where it's like it's not that there aren't people that know how to find bugs AI isn't solving a unique problem the AI is solving the scalability problem where it's like you can train the AI to do a thing that Joe knows how to do and now you have a hundred mediocre but 100 Joe's right Um, and and that's that's an issue for kind of the econ of of cyber security for sure. And yeah, I want to be very clear like I don't disagree with George from the or Gio from the perspective of like more people equals more bugs, right? But like obviously like that that is the problem that we just don't have more smart people. that that has been the the entire industry's plight for a long time is that like there just aren't people who have not only security knowledge but knowledge of you know uh web server stacks and hypervisors and drivers and OSS like you get these very niche skill sets and when you divide them up into those skill sets over and over again you you you're left with like 10 or 20 people on planet Earth that know how to like attack a certain technology so AI you know if you know security now you can talk to the AI I learn about hypervisors in a week and then suddenly you can find bugs in ESXi, you know, HyperV, etc. Um, so yeah, I guess I agree. Like the the dumbest take thing was more I was I was mad at Geo Hot's ego because it basically came off as like [ __ ] you. I'm so smart. I know all the zero days. I could do this myself in my sleep. And it's like, dude, no you couldn't. Like you're telling me you could drop a zero day every day in Mac OS until someone paid you? Like no you couldn't. Shut up. Um, but I I hear what he's saying. I really hope he takes this as a challenge. I want Geio a zero day. Geios in one week. I will eat a sock on stream. Like straight up, I will do it. I don't care. You shouldn't say that. Gios, you heard it here. Ed will eat a sock on stream if you do a week of zero days. Okay. All right. A week is a week is actually possible. I'm talking a month. Uh okay. A month. One month. And so yeah, that's my I would also add like just, you know, because I'm I constantly harp on this point, but I want to bring it up pretty much every time is just that this is also why AI company behavior like is a problem because this is generally a good thing. meaning like we do actually want the ability for us to get 100% coverage for security and we know that we can't get enough people to do it really right like not in a white hat sense right maybe maybe you could take uh George hot suggestion seriously and just go like make hacking legal and then we just have a crap ton more black hats and that eventually sorts it out but I mean wouldn't necessarily be yeah that wouldn't be yeah that's exactly they're white hats now everyone's a white hat Now, um, so we do I think in general this is solving a a good, you know, this is this is a way AI could solve a problem usefully. If it actually can just spit out lists of pretty well-curated potential bug places that we can go look, that's very helpful, right? And so the problem is like the only reason they were able to make that is lots and lots of extremely talented security researchers who are getting literally zero dollars from Anthropic for this. And that is not acceptable. It's just not like I'm sorry, but like you know, Ed should be getting a check for this or and everyone like him. That's just kind of how it is because it's like you used their it's all of their expertise and all you're really doing is very slowly and cumbersomely and kind of clumsily eventually building a machine that can deploy the same analysis somewhat reliably uh based on all of their work. And like I just don't like it. I don't like the fact that they're not getting a check and I'm never going to like it. You could you can talk to me all day long about how someday we're going to live in a post scarcity society and Ed will be getting a UBI check or something like this or whatever it is, right? And hopefully I'll be getting one too, although I didn't do any security research so I don't know, maybe I won't be getting that check. I don't know you. I don't know how you the U in universal basic income is. But like I don't like this. they should be getting paid now because Claude is, you know, getting huge like everyone at everyone in Anthropic is getting paid very well. Uh, so it's not like there isn't money being dispersed whether they're making or losing money or anything else you want to talk about. It's like money is being dispersed to people. It's just not the people who did most of the work. Also, you got to throw Casey would you can go. Oh, I was just going to ask Casey if he was going to be happy about it though if Anthropic spun out a consumer rack business though. Yeah. Now we're talking if if they were like AI racks like we got racks we got racks for your AI server. Hot AI racks in your local area. I liked it now. Yeah, exactly. We will send send you some hot racks. Uh, also by the way, not only are they taking all, you know, your whole argument with them taking and not properly attributing or, you know, the people who put all the work benefiting from it, uh, they're also making it so that I can't buy a GPU or RAM or CPUs now or anything. I have that you can't buy a GPU or RAM. And also, I believe Ed literally just said he doesn't have access to this freaking model. So, like a bunch of security researchers, I don't know exactly what subset, but like a bunch of security researchers, many of whom probably did some pretty cool stuff, they don't even get to use this thing. That's that's how ridiculously backwards it is. Like WTF, guys. Yeah, I thought that was why they called it mythos, though. And yeah, that's why it's called Mythos. Um, anthropic would argue that it is too dangerous for little old me to have access to it, right? Depending on, you know, uh, who knows what you'll do, man. Who knows? I'll find that zero day and I'll hack into Daario's phone. No, I don't know, man. It's I I understand where they're coming from, but at the same time, I understand why it looks like a huge marketing ploy and I'm not sure which way to lean, honestly. Yeah. Okay. No, that's true. I think it just that's a whole other angle. I would think that they'd have so much more credibility if they just quit uh effectively like giving us shake a baby syndrome constantly with their marketing. It's just like it's constantly going back and forth. Every single couple months you're getting hit with the new, "Hey, we're all out of jobs here shortly. Hey, this thing is super dangerous." I mean, you got to remember that Daario was at Chad GPT or OpenAI. I like to call I like to call the company Chad GPT. He was at Chad GPT during the two days and the official language around Chad GPT2 7 years ago was Chad GPT2 is too dangerous to release to the public. So like this is not that's what the two sto that we've been on this like roller coaster. I think that's one thing that's just largely hurting the credibility is you can only cry wolf so many times even and then when a real wolf happens like if this is a real wolf everyone's like yeah okay okay C compiler boy tell me all about it but they don't care they don't care right they don't care because they're the the baby that they're shaking is called an investor that's that's who they have more money they have to shake the money out of the pockets right they don't they don't care what we think right because we're not going to write them the next hundred billion dollars that they need to like keep going and they're kind of locked in this, you know, it's a bitter bitter winner take all kind of war for this like core technology part, right? And so they have to be the last AI company standing because whoever is that company takes all the money and the other people kind of go to zero, right? Like unless unless there's some real differentiation soon where it's like oh the AIS bifurcate and like Claude is only for code and can't do anything else anymore and like chat GPT is only for like you know uh the humanities or something like good luck good luck raising money for that 40. Yeah. Uh so maybe that's not true but you know what I mean. If there's some kind of really severe bifurcation, then maybe they could both survive. But you, you know, they're in a winner take all battle right now. And so they got to keep saying this, every release has to be the one that's this is the one that it will take over the world. And if it doesn't quite, well, you know, it'll be next. You know that uh Claude got sorry, just one quick thing. Uh do you know that uh Red Bull in 2007, was it 2011? No, 2013 maybe. Oh, Red Bull was too dangerous to release. No, Red Bull claimed that it gave you wings. remember the day that it gave you wings? It was sold, it was sued successfully, I believe, for $10 million because it in fact did not give you wings. It was not superior to coffee. And so I'm pretty sure in college I got a check for like $2.30 from that. Yes. And so I I am curious. Ed, you sued Red Bull and won, bro. You should make a video about it. Call me the lawyer. Low level. Okay, listen. lowle. Legal. Let's go. Low legal. Uh, but I'm actually curious if if they keep saying that and then it doesn't happen, do they open themselves up to a false advertisement, class action law? Like, can you keep saying this and then not get like Red Bull made claims and then they got sued? Why not why not other people? Why can't other people get sued for that? I think the problem with like with Red Bull is like the the case was so obvious, right? Like Red Bull does not give you wings. End of case. Like, okay, fine. Like any judge over the age of I would have liked to hear the defense for that one. Yes, it does. Your honor. Your honor. The problem is they they had like these wings like strapped to their back and they go like, "I drank your Red Bull this morning and here are my wings. We ship you wings." Yeah. Um but the problem with anything technological when it comes to the government or legislation or or you know judicial process is that like boomers and higher run the world right now when it comes to these levels of like jur of uh of of making um like legal decisions and you couldn't explain to anybody at that age unfortunately like right now just people that are like running these processes what it even means to find a bug and then and then show them mythos's claims and like and make a sound legal argument that would like go well in court. You're right. You're right because Camala Harris did actually think computing was in the literal clouds and so it's my favorite clip of all time. Yeah, there's a clip of her. Josh, put the clip in. So, you're now no longer are you necessarily keeping those private files in some file cabinet that's locked in the basement of the house. It's on your laptop and it's then therefore up here in this cloud that exists above us, right? She'll have the last laugh though when like uh SpaceX is launching uh AI data centers into space and come like that's what I was talking about. That's what I was talking about. Yeah, it's cloud storage. So, you're probably right. A great clip where she's talking about the cloud and she literally points above and goes like the cloud it's like above us and stuff or something like that. It's so good. She should have known that it wasn't there because she would You don't see a series of tubes. There's a series of tubes necessary. Series of tubes. I learned that recently. It's true. Um, okay. I got a I got a question for you, Ed, like in this in this vein about your thoughts on it. So, right now, I get that there's there's basically like the argument like, okay, I'm a company. I release my thing. I run some models as like a preventative thing to look for zero days. the bad guys run models to try and look for zero days. We kind of fight it out and it's whatever, right? So, I think like everyone's saying like if the hackers can use it, I can use it. That's fine. But the thing that makes me like a little bit more like I don't really know is like for the state of like a bunch of open source stuff like and I'm an open source maintainer and I already can't convince a company to send me $100 a month to maintain this thing for them. There's no chance I'm getting them to Well, I'm definitely not going to spend 20k of compute. Yeah. Every time I release something and decide that now it's safe, right? But and like I can't get any companies to pay for that and sponsor it. But like if I'm a, you know, if I'm the one little pin in the excuse XKCD comic that's holding up from Nebraska, the the bad guys only need to do mine once. So I'm wondering like kind of how you see that as like the landscape affecting open source things like that cuz it seems very asymmetric in that way. I mean I think it's asymmetric for that reason right like the reason why you can make the argument that anthropic is afraid is because you are the lynch pin on the infrastructure of the internet and no one has funded you so far. You have had zero security audits or zero security work done on your stuff. And so like if you give access to these models, if you really are the lynch pin in the internet, you already aren't getting money from Netflix, Google, whoever that's using your software. And the black hats know that you're the lynch pin keeping the internet up. They're going like they're going to make use of that model to to do the exploitation, right? Um does that answer your question? I mean like I think it's just like the amount of power that it gives to a single organization given the current like state of open source software in particular um is very dangerous and to be very clear these models are also doing are also very good at doing close source software right like my recommendation to anybody interested in this by the way is like go take a capture the flag problem from like CTF time or crack.1 or whatever and uh hook up gedra to gedra mcp and then use claw code on gedra mcp it will reverse engineer and find a bug in that in that problem in a matter of minutes. Like it is it like like Opus 46 is a better reverse engineer than I am and I've been doing this for like coming on 14 years. Uh it's honestly terrifying to watch it work. So if you're if you're even remotely interested in this, go give it a shot and you you'll kind of see what I'm talking about. It's It's scary how fast it moves. Yeah, because that so that part that's where I'm like, you know, whether it's Mythos or not, I feel like right now a bunch of stuff you could just maybe it'll cost more tokens or it'll take longer or something, but like a lot of stuff you you still could find. Yeah. And the models also like any model does this obviously, but like the the current models are really bad about like false positives. Like I've done security research uh in my free time on like Chrome, ESXi, and some other like routers that I've like download regular weekend activity, classic weekend activity. um and the amount of times I've gotten like critical finding like buffer overflow in like the the RPC handler for this thing and it's like okay all right dude like write me an ASAN harness that tests that and you'll see very quickly oh sorry just kidding it's not actually there um and so the magic is like if mythos is able to make less false positives you reduce you increase the the signal to noise ratio in this in this process which is scary right because it just means you need less people to triage the uh the reports and ultimately find real bugs faster. Uh, so I have another question with this mythos thing and and maybe I'm curious I'm curious about your security expertise. Isn't this whole withholding a model kind of like a doomed uh proposition to begin with? Meaning that if OpenAI has a similarly powerful mythos model and they're competing for the zero like for the a zero game kind of like outcome of who is the best model. Doesn't it mean that when Open AAI has it, they will just release it? Like, and then aren't we just forced to go out because whoever kind of releases it gets the customers and then that by having the customers, you win. And so then you just get out ahead. Like, doesn't this kind of cause like a weird thing where Yeah. we're like, "Oh, we can't do this." You know, Daario's like saying we can't do it, but won't we just kind of fall right into it the moment there's two people that have it? Yeah. I mean, that's I'm not like [ __ ] on capitalism. I'm just saying that's more of like a capitalism problem than it is like a security problem, right? But yeah, your your point is basically like if actor A says thing too dangerous but could model open source model shall we say and actor B has same thing and wants to make money with slightly less ethics potentially. Yeah, actor B is going to release it or Yeah, exactly. Chinese model, Russian model, whatever. Um well I mean that's literally what I mean Daario quit Open AI cuz he's like bro they keep they keep making models that can kill humanity, right? Okay. So, I'm starting a company where we make models that could kill humanity, but they're mine. Uh, also Chinese models after open AI or Anthropic releases one. So, I think that that might be a little bit difficult. They might be a little bit behind. Has anyone seen Riverside chat? But yeah, I mean, OpenAI literally has a model that they claim they haven't made any claims, I don't think, about like mythos equivalents, right? Um, but they're doing effectively the same thing where it's a it's KYC know your customer. So you have to like upload your ID and like talk about what work you do and you get access to GPT54 cyber which I'm assuming is just a model that's trained better on bug patterns right use after free out of bounds reads etc. Um, now if it's actually better than mythos who knows right but you know it's I think we're all just trending regardless of what anthropic wants to do. I think we're trending towards every person on planet Earth with a couple bucks having access to models that are very good at bug hunting. Uh, and the question is, what does that mean for software, right? Does software get more secure? Does the world just get more scary for a long time and it never really like resolves itself? Like, what do we do with that information? And that's a tough question to answer. I'm interested to know how expensive it's going to be. That's the other question. I mean, this is obviously the question kind of that we've been talking about for a while on the pod and in life in general is what are what are token costs going to look like if uh OpenAI and Enthropic both get all of the customers that they would like to have? Uh, because the cost won't be the same. If demand 10 or 100 or a thousand X's it won't be so I'm not the price will not be I'm not super well read on this. Is it true that an inference currently is at a loss like I've heard I've heard both both. Okay. Some people are so confident I I have been looking to try and find a definitive answer. I'm the confident one by the way he's referencing. Okay. Oh no no no. I mean well I'm not going to reveal my sources. I asked Chet BT and I asked Claude. They both said, "Of course not." Yeah. Yeah. Right. I've heard I've heard though that some some people are saying they are running it at a loss or it's a bit complicated because like pretty sure Anthropics probably running some percentage of accounts on the $200 plan at a loss, right? Um but like is is API pricing at cost or below? And then how do you factor in like training and stuff? I my my personal take is that inference itself just looked at in the myopic view of just inference it makes a lot of money but you also then once you zoom out now you start saying hardware and all the incidental stuff around it probably still makes money but then when you zoom out to say like every time you release a model you defunct your previous model that is going to have that has a very large burden and they keep on not making money and needing to raise more money so I have a sneaking suspicion that part of it is very hard to make money in the current state uh all All right. Well, OpenAI is like publicly like losing money, right? But is Anthropic also negative or they just had another big raise as well, so I'm assuming I thought they just raised like $6 billion or something. Could be wrong about that chat. Fact check me. I know Open AI did 120 billion uh raise. So much money. This is the Yeah, cash. This is the one that I actually was really curious to see. This is the only benchmark that I was super curious to see if they're going to uh do well. Anthropic Opus 46 Max cost approximately $9,000 and got 0.5% score on ARC AGI. So this is like the the the super test and humans get into the high 90s. Uh AIS get like uh Jeypity 4 high cost $5,000 and got 2%. Gemini 31 did 4% for $2.2,000. And so it's like this really difficult uh it's a really difficult test for AIS to pass. And so mythos did not add itself to this one. So this is the reason why I largely think it's more like hype marketing than it is anything because to me this is like a really great indicator at least into some sort of better model improvement. And so I didn't see it. Sure. Uh let me can I can I just give a counter point to that though? Sure. Yeah. Yeah. Yeah. Yeah. Yeah. Once again with the huge disclaimer that I don't do any AI stuff. So this is just off the cuff. But ARC AGI, if I'm not mistaken, is a benchmark specifically to test how well AIs perform uh on learning completely arbitrary new things that don't exist anywhere in their training data. That's the only thing that it's intelligence of this all. Exactly. And so the only reason I would want to point out that I don't think that test says very much about this particular security thing is security is not that true. Like nobody nobody is claiming that Claude Mythos came out and discovered a whole new set of classes of security exploits that no one had ever come up with before. What it's saying is that it went and found a bunch of the exact same kinds of zero days that someone like Ed would find if they went and spent a week on that piece of software, right? Like so they're not claiming that this thing is somehow more intelligent than the predecessor in that way. It's claiming that it's got better pattern matching and like stringing things together to create exploits, right? That process which is well known. And so, so I don't think ARC AGI necessarily tells us very much about whether it can do those things because those things are very well-known tasks that security researchers know how to do and we kind of know the process that you do to do them, right? So, yes, that's okay. I will I will I will concede that point most certainly that the security at least known and obvious security vulnerabilities such as use after freeze and and all the fun stuff like the stuff that happened in ffmpeg with jumping ahead somewhere in a buffer based on yeah the these things are very common kinds of bugs they're not like unusual the things that they've talked about are like very very standard and so that seems like a more plausible claim like hey we just were able to scale up the sort of security checking that a security researcher would do it can do that thing and and find you know potential places for that a lot more plausible than AGI. Yeah. The thing too for I feel like for the security side of it as oppo like as opposed to constructing a product or a new product or like building a feature where you have to get like in some ways all the things right for a security thing I only need to find one of the things that are wrong. Yeah. which is like a like you can test a bunch of the scenarios like you're saying that already exist and I only need one thing to be wrong in the program for then me to be able to take control of it. Well, and it's combinatorial, right? Like a lot of what security research is doing is like a it's pattern matching for these kinds of bugs and then b going like okay if I did this one followed by this one would that produce an exploit? What if I did in the opposite order? What if I did this one and then this one and then that one? Okay, what if I did this one? Right? And again, these are things computers are good at like that. It's not you don't have to believe in some kind of a weird like supernatural like AGI achieved internally Sam Alman nonsense to believe that this is something a computer could do. It's it's much more plausible if anything than some of the other claims. So that's why I I would like say I'm I'm not that like when I saw this I wasn't like that's got to be false. I was like okay yeah I could believe that. Yeah. I don't know. Mo most of vul research is like you know take a function that gives user input like define your threat model and then do source to sync analysis on some vulnerable function or failure to gate a function on like a length check and like does user data get there bug confirmed and like yeah that's literally just pattern matching that we've solved a lot of the times previously with like satisfiability solvers right like anger and like Z3 like take the graph of a function turn it into a math problem can you solve the math problem cool bug confirmed Well, now with AI, it's just like that process of doing source to sync on like text, it can do incredibly fast, right? It's very good. Now, obviously, because it's soastic, it creates a lot of false positives, but if we can figure out a way to reduce the false positives or uh automate the the validation of of those false positives, then yeah, it's it's crazy. And I think what they thought about what's that have they thought about asking mythos? I know. Come on. Can you just No mistakes, please. Um the thing that mythos is set apart differently according to the anthropic report is its ability to chain together primitives. Right? So the scary part from like a cyber crime perspective is like you have uh gadget A that gives you an arbitrary read and gadget B that gives you an arbitrary right. Okay. Like those two separate things are like not super important if they're not used together. Well, what Mythos is able to do is out of a 100 tests, I think it's like 83% of the time, find exploit primitives in a vulnerable codebase and chain them together to get rce, right? That's the scary part because then that's true like end to end exploit creation for a bad actor. And that's I think what scares anthropic the most. Um, now I know there's argument where like Firefox wasn't in the sandbox for that experiment. So like it doesn't actually matter. But I mean just apply that process to the sandbox and the same thing applies. You know, it's just I think they wanted to prove a point that it could do that. Well, and also I mean again like as I've said many times, I can't stand AI companies, so I'm not trying to defend them or anything, but I'm just trying to point I'm just trying to point out how plausible this stuff is to me from a neutral observer standpoint. Classic case defending AI companies. Yeah, I know, right? I know. Uh if you think about it, it's like look, security researchers who do not number that many were already cranking out zero days at a much too alarming rate for me, right? Like like you know there's a hack every other day, right? It's not like CVES are piling up like there's no demar and yeah, not all of them are actually all that bad or whatever, but like it's not like security researchers were having trouble producing a fair number of of critical vulnerabilities even with the limited resources that they had. So, it's also not weird to think that like if you had more automation, you would find a lot more of them. It doesn't like there's clearly just a lot of bugs, guys. Like there's a lot of freaking bugs and it's just doesn't seem that unusual that if you have more sophisticated pattern matching, more sophisticated cominatorial checking where the security research doesn't have to spend a lot of time setting up the tool because it can just kind of ingest the code and it knows what roughly what it means. Yeah, I mean they're their rates going to increase if nothing else existing security teams rates of finding exploits. It has to. I mean it just has to. Unless this thing is just a complete pile of crap, it's got to. The other thing too we've been seeing from each like new generation of model is that they're getting at least from my experience and what I'm what I'm reading from people and everything they're getting better at calling other tools. So like they call out to stuff more regularly and they can pay attention for longer. recompile this code and see you know make this exploit and run it against this thing or whatever right like those are all things that if you automate them a security researcher gets much faster at finding because they're not having to set up the tooling themselves to like go work on this exploit like whatever whatever those steps were they don't have to do them anymore right right so then if you're like oh now it can run instead of like I have to prompt it at every stage for the next thing to do is I can give it 10 rough things say try a bunch of combinations of these and and it runs for 24 hours. You're just like a lot. It's literally like in in my mind some of it is like Yeah. Well, we already know fuzzers exist. Like we use them all the time and they're good, right? It's like in some ways almost Yeah. It's like fuzzer. It's like fuzzer squared, right? It's like a thing now that can like target the fuzzing at things specifically. So that things that would be very hard for stocastic testing to catch because when you have stocastic testing and you have to chain two things together, you're never going to randomly pick the two things that would have to happen for them to work. Here's a thing that can like target that specifically and go like, "Oh, I think combine these two things. Probably let me fuzz that specific path." Oh, yep. I got it right. That's where it gets crazy is like you just have the AI write the fuzzer and then like if you can automate that process, you win a lot of the time. It's it's pretty pretty amazing. Um, I do have to go though. I have a meeting in 3 minutes, so I got to I got to rip. Um Oh, hopefully you get Mythos access. Congrats. That'd be neat. No, it's not going to happen. Come on, guys. Give him Have a good one, man. I like you guys, but it looks like it's the end of our show, unfortunately. True. All right. Thank you everybody. I would just like to say that uh I would I would just like to say that Casey and TJ and obviously Teimu Casey that just left commonly known as Lowle learning uh you guys you know you make the show magic and and now I'm just going to go about being lonely again. Kind of sad. Oh, Prime. I knew that was coming. I thought I I thought I was going to get booed, but I I just assumed something was going to happen. All right. Um, the real the the good news is is that you can enjoy full episodes of the standup now on YouTube. If you go to the standup pod full, which I'm going to try to rename hopefully at some point, we're trying to work some things out to get it a better name. But right now, YouTube, am I right? Um, if you go to the website, if you go to our website, will it have links to these? Yeah, it will. It will. And it'll have it spelled out. Uh we'll we'll make it more clear once we figure everything out over the next week. Maybe by the time you're listening to this on YouTube, by the time you're listening to this on YouTube, uh we're going to upload all of the backlog to that channel as well. So we should have every episode on YouTube in one spot, very easy to see, etc. Obviously, you always can, you know, RSS download the audio directly. Don't press the red button on that site. Of course, teach, what is that web address that people should go to for the standup pod? Hey, the stand. Go to the standup pod.com. All the links will be there. All the episodes will be there. You want YouTube, you want Spotify, you want downloads, you want RSS, you got it. The standup pod.com. Yeah. Yeah. Check this out. I'm just going to do something for the audience. Look at this. If you go here, you click Trash made a black mirror app, you can go and you can listen to it right on the website. You can have all the information right here. You can trashes app right there. You can go in here. We don't even look at this. We don't even charge you. You can play on Spotify. You can download and just have personally for you to do whatever you do. That's for you. Now that we're And then I'll make it I'll make it so it links to the YouTube there later as well now that we're going to have a dedicated YouTube channel for that too. So for all of you out there, you know, the AI companies claim that you're going to get UBI, but we're actually giving you universal basic podcast. You just get it for free. UBP. UBP. You know me. UBP. Yeah, UBP. I was going to say, well, I don't know what I was going to say. That's fine. We should just We should really just end this episode. Stick a fork in it, guys. It's done. All right. Thanks. Good seeing everybody. Thanks, YouTube. Thanks again, uh, whatever your name is. Tee, you're pretty neat. Boot up the day. V coating errors on my screen. Terminal coffee and living the dream.

Get daily recaps from
The PrimeTime

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.