When Code Meaning Breaks: The Gap That's Destroying Security

Chapters13
The chapter argues that AI code could become the new gold standard, shifting trust away from human-authored code and changing how engineers design and validate software, with Mozilla’s Mythos illustrating a potential trust flip.

AI-driven code review may become the new trust anchor, shifting meaning and intent above handwritten code for secure software.”

Summary

Nate B. Jones uses Mozilla’s Mythos experiment as the focal point to argue that AI-generated code review could redefine what we trust in software. He explains that Mythos surfaced hundreds of vulnerabilities in Firefox in a single release cycle, suggesting AI-assisted analysis might soon outpace human review as the primary security gatekeeper. The talk distinguishes between the meaning of code (intent and design) and its implementation, arguing that attackers read the code for what it actually permits while humans interpret intent. Jones points out that tools like Mythos could operate in an end-to-end security loop: understand, hypothesize, test, reproduce, and explain—sometimes better than humans can. He frames a shift from trusting human authors to trusting an evidence-backed, agentic pipeline that uses AI to verify meaning and enforce boundaries. The piece also cautions that AI is not a silver bullet: not every patch is trustworthy, and human oversight remains essential for product intent, maintenance costs, and real-world constraints. Looking ahead, he envisions a world where senior engineers move up a level to design systems and specifications, while AI handles exhaustive verification, paving the way for modular, auditable pipelines. The overarching message is that the future of software security hinges on building interpretable, verifiable means of software, with Mythos-like tools acting as guarantors of quality rather than sole authors of code.

Key Takeaways

  • Mythos found 271 vulnerabilities in Firefox version 150 in one release cycle, far beyond the 22 vulnerabilities found in Firefox 148 by Anthropic’s Opus 4.6/6.
  • The trusted anchor in software is shifting from human-written code to AI-assisted verification of meaning and intent.
  • Code review efficacy will depend on rigorous specifications, modular architectures, and tight human-AI collaboration rather than pure human effort.
  • Future pipelines should be agentic, where Mythos-style tools perform adversarial interpretation and testers validate results in sandboxed environments before human sign-off.
  • Senior engineers will focus on defining product intent, abstractions, and APIs while AI handles exhaustive security testing and patch suggestion.
  • Security debt today becomes more about maintaining interpretable and well-specified systems than just fixing bugs after deployment.
  • Not all AI patches are trustworthy; organizations must ensure the right models (like Mythos) and evidence-driven evals are in place before automating reviews.

Who Is This For?

Software engineers, security researchers, CTOs, and engineering leaders who are planning for AI-augmented development pipelines and want to understand how to structure agentic review processes and meaning-focused specs for future-proof security.

Notable Quotes

""A good human engineer wrote this feel like a much weaker security claim than it used to.""
Mythos reveals that AI-assisted code review can redefine how we assess security claims, weakening the trust in human-written code as the sole anchor.
""Mythos points toward a world where that stops being obvious.""
The talk emphasizes a paradigm shift where human authorship is no longer the default trust anchor for software safety.
""The best software is software that never generated the bug in the first place.""
A core takeaway arguing for proactive correctness and design to prevent vulnerabilities, not just patch them after detection.
""We are moving from making very expensive, very scary zero-day vulnerabilities in code hard to find to making them impossible to interpret.""
Jones paints a future where AI makes vulnerabilities harder to both find and exploit by constraining interpretation through formalized meaning.
""The human abstraction needs to move up yet another level.""
Suggests that humans will shift from drafting low-level code to governing the meaning, intent, and verification of software.

Questions This Video Answers

  • How will AI like Mythos change the role of software engineers in 2026 and beyond?
  • What is agentic engineering and how does it impact software security pipelines?
  • Can AI-driven code reviews replace human code review in production environments?
  • What does it mean to separate meaning from implementation in coding and why does it matter for security?
  • Which AI tools currently offer Mythos-like vulnerability discovery and how reliable are they?
Mythos AIMozilla Firefox securityAI in software securityAgentic pipelinesMeaning vs implementationAdversarial interpretationCode hygiene and securitySpec-driven developmentSecurity debtOpenAI Claude / Anthropic Mythos comparison
Full Transcript
So the heart of this story is about mythos and how it changes how we view human code. We've always thought of human code as something that is our job the best it can be. Any kind of AI code isn't great. In fact, there's even jokes in Silicon Valley, that amazing 2010s era TV show about how terrible AI code is. And that's exactly what we assumed in 2023 and 2024, even in 2025. But now in 2026, things are different because now we're getting to a point where AI code may end up becoming the gold standard, may end up becoming more trusted than human code. This is actually really happening as being suggested by some of the most respected engineers on the internet for one of the most respected institutions on the internet. And it's time we paid attention because if we start to think of this as even a possibility, it's going to change how we think about what engineers do. It's going to change how we think about architecting our systems. So let's get into it. The most important thing about Mozilla's Mythos experiment is not that AI found bugs in Firefox. It's that it makes the sentence, "A good human engineer wrote this feel like a much weaker security claim than it used to." That sounds strange because for basically the entire history of software, human written code has been the default trust anchor, right? Humans write the code, machines maybe help check it. But if models get good enough at attacking, at testing, at repairing, at verifying code, the trust model is going to flip. And there are signs that it is flipping already. This video is about that flip. Why implementation of AI code is becoming the deacto standard. Why confidence in human code is starting to erode. And why the future of programming may look less like personally examining and even personally reviewing code which we do a lot now and more like defining what software is allowed to mean and trusting agents to review it. We see hints of this already in good agentic pipelines. We're going farther here. Mozilla recently published a post called the zero days are numbered. The short version is this. Mozilla got early access to Anthropic's clawed methos preview pointed it at Firefox and Firefox version 150 shipped with fixes for 271 vulnerabilities identified during the Mythos evaluation. Firefox is obviously not a random weekend project with no tests, right? It's one of the most securityh hardened open-source code bases in the entire world. Browsers are brutal targets because they constantly process untrusted content from the internet. They already have fuzzing and sandboxing and memory safety work and internal security teams and bug bounty programs and years of hard one paranoia built into the engineering culture and they need to. And yet according to Mozilla the AI system mythos surfaced 271 vulnerabilities in just one release cycle. The previous collaboration with anthropics opus 4.6 six found just 22 security sensitive bugs in Firefox version 148, 14 of them high severity. So this is not just AI helped with a code review, which I've talked about before. This is starting to look like a new industrial process for vulnerability discovery in code and it's challenging the way we think about the quality of human code. But I do want to be careful here because this is where the hype can get out of hand really fast. This does not mean every AI writes safe code today. It does not mean you should replace your senior engineers with a model. Quite the opposite. It does not mean every AI generated patch is trustworthy. Absolutely not. If you've used AI coding tools seriously, you know they can hallucinate APIs. They can miss edge cases. They can create insecure defaults. And they can produce code that looks plausible while quietly misunderstanding the point of your system. A good human engineer is still vastly better than a model at understanding product intent, organizational context, user promises, maintenance costs, and all of the weird unstated constraints that make real software work in the real world. So the point is not AI is now better than engineers. The point is much more specific and much more uncomfortable than that to be honest with you. The point is that the reason we trusted human written code was never that humans were perfect. We trusted it I think because human judgment was the only thing capable of producing and understanding software at the correct level of abstraction. The engineer wrote the implementation. The engineer imagined the edge cases. The engineer reviewed the diff. The engineer carried the system in their head to some extent. Tools helped. But the core act of implementation was a human craft. Mythos points toward a world where that stops being obvious. Because if machines become better than humans at exhaustively searching the consequences of code, then human authorship stops being the trust anchor for us. It just becomes one more source of unverified risk, which is kind of weird to think about. Look, I think the cleanest way to understand this is to separate out two things that usually get mixed together. Meaning and implementation. Code has always been both of those things. It's a machine executable artifact, but it's also a human language for intent. When you write a function name or a type signature or a module boundary, when you write a test or a comment or an error message or an API contract, you're not only telling the machine what to do, you're telling other humans what the system is supposed to be. That meaning layer is why code review works at all. A reviewer can look at the code and say, "Okay, I understand what this is trying to do. I understand the shape of the system. I understand why this boundary exists here. That part makes sense, but" and then they'll have whatever they're critiquing, right? Security failures often live in the gap between what the code means to the person and what the code actually permits. And that is a very deep statement. So like replay that two or three times because there's a lot in there. In other words, the author meant this parser accepts one format. The implementation allows well two parsers can disagree and the attack can live in between what those parsers might agree or disagree on. This difference between what humans mean and what technical solutions actually allow is core to the security problem. Right? Humans see intended meaning. Attackers search for actual behavior. Vulnerability research is basically adversarial interpretation of code. It asks what does this code allow? Regardless of what the author thought that they wrote. And by the way, if you even if you're not a programmer, have never written something you didn't intend to be interpreted that way, then you're not much of a writer. Because I got to say, all of us who write have had the experience of writing something and we were sure it was crystal clear, but our reader took it differently. That's the difference we're talking about here. Adversarial code interpretation is basically reading the code as if it's an essay and saying, "Oh, I see what you allow here and you didn't mean to allow that, but when you read it their way, you can kind of interpret it that way." And you can argue about it on the internet if it's an essay, right? It's just he said, she said. But if it's code, code allows you to control computers and there are big consequences when there's a misread or a difference in how you read code. And that's why this all matters. And that's why mythos is interesting. It's not just looking for a known bad pattern here. It appears to participate in the research loop because it's so intelligent. It reads the code. It forms a hypothesis. It uses tools. It generates test cases. It reproduces the issue. It refineses the finding. And then it explains the problem. Google's project nap time and big sleep have been moving in the same direction. I cannot believe those are real words, but they are. Open AI's codec security is explicitly built around a similar loop. Understand the codebase, build a threat model, validate issues in a sandbox, and propose patches for human review. DARPA's AI Cyber Challenge tested autonomous systems that find and patch vulnerabilities across big code bases. These details here differ, but the shape of what's going on with autonomous systems is very consistent, and we need to pay attention to it. The model is not just writing code for us. The model is learning to essentially interrogate the code for us and make sure that the meaning can only be read one way. And once models can interrogate code better than people, which there are signs that we hit that tipping point in the last couple weeks, the question changes. It becomes less did a good engineer write this and more has this implementation survived adversarial machine scale scrutiny. That shift by itself is bigger than the whole cyber security industry because it changes all of our assumptions. Software has gone through some version of this before, so we've seen it happen. But this is easily the biggest shift in 20 years. There was a time when being a programmer meant writing much closer to the machine. And then we got assemblers and compilers and garbage collectors and managed runtimes and type systems and package managers and cloud platforms eventually and deployment systems and observability tools. All of those moved pieces of execution away from human hands because humans were not trusted at scale to do the same process over and over again. We had a saying at Amazon, good intent doesn't scale. You have to have mechanisms. We did not conclude that humans were no longer involved in computing when all of those things happened. We just concluded that the human role had moved upward to a higher level of abstraction. Most programmers no longer handplacees in memory because that work is slow and fragile and very difficult to validate at scale. And by most I mean I don't know of any that do. The responsible human role moved into algorithms and architecture and data models and interfaces and system behavior and intent because we could trust those lower level primitives. Security is pushing that transition even harder right now. That's what the mythos story is really about. And we've stopped trusting developers to do things before. We stopped trusting developers to casually write cryptography. That's not allowed anymore. We stopped trusting manual memory management in large classes of software once safer alternatives became practical. We stopped trusting handrun production deploys without automation and roll back and observability and policy controls. In every case, human skill didn't disappear, but human execution lost the presumption of safety. Code itself may be the next thing to lose the presumption of human safety. not all code and not tomorrow and not in the theatrical sense where programmers vanish. But even today, even after February, March of April of 2026 when we believe in agentic coding and we're setting up our agentic pipelines, we still talk about the importance of humans reviewing the code to make sure it's safe. But what Mythos may be teaching us is that even those days are numbered. that mythos itself may be better at finding vulnerabilities and security risks in code than humans ever could be. And if that's the case, then maybe we are getting to a point where the human abstraction needs to move up yet another level. And it's up to the human to review what something like Mythos did and make sure that the overall meaning of the software matches the product intent. That is the real inversion. Humans didn't disappear and they're not going to, but they are going to move farther up into the meaning layer. And this is where I think a lot of the will AI code for us arguments become very unserious. Isn't that ironic that I said that right there? You'd think that's exactly where we're going. But most people are imagining that coding is equivalent to typing. So they imagine the future as fewer people typing. No, we have to broaden out our imagination. Coding was never the largest part of a developer's day to begin with. Typing was not the hard part in the first place. The hard part was knowing what should exist, what should not exist, and how to preserve that distinction as the system changes. Holding that meaning in your head is going to remain very, very important for senior engineers to do. If AI makes code cheap to produce and safe to produce, which is the big conversation with mythos now, that doesn't make software effortless. It just changes what the scarce resource is. And I wish we talked about that a lot more with AI because implementation of course will become abundant. And the ability to understand the software is going to become more scarce unless we invest in it. Now we can invest in primitives that are clean abstractions from the software layer so that we understand what's being built and can translate that meaning. And let me tell you, I do this all the time. If someone sends me a piece of code and they do, I look at it first through a tool like Codeex or Claude Code and I ask myself, okay, what's in the box? What does the architecture look like? What are the level of abstractions here? And I'm very rarely going line by line anymore. Almost never. And I don't feel the need to do that because the tools that I have allow me to operate at the correct level of abstraction. We are talking about that kind of a move for the security sphere where you are now saying, I'm going to trust what Mythos says about my code. Mythos finds 200 some vulnerabilities. We're going to focus on fixing it, but then we're going to move forward into a world where we do not have these high severity vulnerabilities because we've gone from making very expensive, very scary zeroday vulnerabilities in code hard to find to making them impossible to interpret. In other words, they disappear. We are making them extinct in the wild, which ironically takes the security conversation for 2026 in a whole different direction. Instead of saying AI makes things scarier, which I hear from a lot of security researchers and frankly read in a lot of media, we're saying we have almost a perfect weapon to fight back. Now, it is possible to get to perfect code that does not have security vulnerabilities. We just have to let the right AI system read it and reliably fix the things that it calls out. And then we have to integrate that into our agentic build pipeline so that when we build stuff we can actually review it and ensure we launch without breaking that trust with the user. And what I'm saying is if that is true then the way we think about how we build software has to change because it's not just the old model as old as in February and March where we said oh well humans should review it at the end. It becomes a world where the the model like mythos reviews it at the end and then we humans look at the overall meaning of what's been created and we say is this in line with the direction we're going with this piece of software and we are freed from having to worry about whether we're launching insecure code. Ironically we may start looking for AI generated code as a sign of quality. That is the software supply chain I think that we are moving toward. We're not just talking about changing source code by hand anymore. But we're not even talking about agentic pipelines where we review by hand soon. Although not everybody has mythos and I'm not saying every AI system is equivalent. So if you don't have mythos, don't just swap it out and say, "Nate told me to use AI to review the code." Only certain systems, guys. It has to actually be proven to work if we're going to put AI in to review the code, right? There's a big difference between the AI gets it wrong and AI gets it right. And there is an intelligence barrier. And we appear to have just tipped over it. So I think we'll have a lot more systems that can do that in a few months here. The only way to get to a system that is trustworthy is if you are in the habit of forcing trust through evidence. And what I'm really arguing for is that we start to architect our pipelines so that either humans at the end who are very smart sign off and say this is in line with our hygiene standards and our code standards. This is in line with the product intent and this is clean code which is the best practice today in most cases. or and like 2 or 3% of software pipelines do this today. We believe our evals are so good we are not going to put a human at the end because we trust the eval. It's very rare today. It does happen. It's not the end of the world but you have to be really really good. Or you say we're going to use a cutting edge AI system like mythos and we're going to have that be our reviewer. Only if you have mythos today, right? Like you got to have the right model. Now I do believe that we are going to see more than just Mythos get here. Don't misinterpret this as only Claude can do this. There is evidence, for example, that Chad GPT 5.5 has some of the same security sniffing attributes as Mythos, although we've seen much less of a sidebyside case study on security, so it's hard to know for sure. And certainly, I expect future chat GPT models to publicly catch up. I expect versions of Claude to get into the wild that have mythos-like capabilities as Claude catches up on compute. And eventually, as Daario says, maybe by Christmas time, I expect open source models to get to this point. We will all have Nethosike capability by the end of the year. I feel pretty good about that. And that's why this is important because the end of the year is not that far away. We are already talking now about how we think about how we build software. And we want to build our pipeline so that we expect these kinds of changes. So if you put in your pipeline, it's modular for agentic building and you have a principal engineer reviewing your code today, that's great. Think about that role and think about how modular it is because you may want to swap that out and put mythos in or a mythos equivalent in four or five months. Just be aware and be ready. And then start to think about how you force value and quality through in the meantime. How do you have a certificate of quality from your principal engineer? Basically saying, "Yep, this is good enough. This is the real thing. This is good code." And then how do you define the standards of good so clearly that when your mythos equivalent comes along, you can automate that? And now it's like a very very clean eval. And mythos itself is good enough at being creative and writing test cases and adversarial interpretation of the code. You can be confident that it actually is marked as safe because so much of the art of mythos appears to be not just following an eval like you can write excellent evals and still get your code to be insecure because insecure code is partly an act of creativity. It's partly not just saying it does X Y and Z. It's clean code in this way and that way. It's also adversarially reading the code and trying to put the worst possible interpretation on it and trying to break it. And Mythos seems to be really good at that. And for the moment, other than Mythos, it seems like the best people in the world at that are excellent security engineers. And that's why excellent security engineers insist on good code hygiene so they can read the code carefully, which is why they're so important when you write evals. So many people by the way sidebar write their evals for agentic pipelines and they're 80% functional code right here and 20% of it maybe maybe less is non-functional requirements around good hygiene. No, no, no, no. At least half of your eval should be about the quality of the code you write. You should be insisting on a certain number of lines per function so that you're not confusing someone who's reviewing the code. You should have your own standards and hygiene around how you interpret particular expressions, how you handle dependencies, what is allowed or not allowed in your programming language of choice, expressions that you will tolerate, expressions that you will not because you found them to be undependable. Every language has its own version. You can get into that if you're interested. In fact, you can literally take this transcript and ask Chad GPT or Claude, tell me about some expressions in various coding languages that are notoriously undependable to security researchers because every language has its own variance. You can write all that into evals. All of that, that whole process, moving from 20% code hygiene and good code architecture in your eval to 50% that still just gives an excellent security researcher a easy pair of glasses to read your code, which is great. and it passes the eval, it is much more likely to be something you can review and certify as safe. But Mythos can do all of that for you in a few months. Mythos can do that part where it passes the evals and you want one last look from someone who's a paranoid creative security researchers and Mythos or a Mythos-like model, maybe from Chad GPT, maybe from Google, maybe from somewhere else. We'll do that. Ultimately, this is about building strong engineering cultures. So much of what I'm talking about is building strong engineering cultures. And so if you are thinking about now how do I plan for the future in a fastmoving world of AI bookmark this video. This video is part of how you see ahead because if mythos in Mozilla is doing this today all of us will be living this world in a few months. Our engineering cultures have to evolve. We as builders have to move to a level of abstraction where we are ready to modularize and stick a tool like Mythos in and say we trust that more than ourselves because we have actually run it on existing code we certified as safe and it found vulnerabilities that we missed because it's that creative. It has that ability to adversarially interpret that we need because just discovering as a human doesn't guarantee safety. It's just the best we've had. A zeroday vulnerability, which is the most scary one you can have, stops being a zero-day vulnerability only when the vendor knows about it, understands it, fixes it, ships the fix, and users deploy it. The world is full of systems that remain vulnerable long after these fixes exist. Enterprise appliances, edge devices, abandoned dependencies, internal corporate software, industrial systems, old Android forks. A model finding the bug does not magically heal the system. And so what I want to encourage you to do is when you're thinking about your engineering culture for the 21st century here, realize you're going to be shipping more software than ever because AI makes it so easy to ship. The mythos story is the story of stopping those zero days before they ever get out. The best software is software that never generated the bug in the first place. Think about the world you live in as having shipped the tip of the iceberg of all the software you will ever ship. 90% of the software you will ever ship as a company is ahead of you because that is statistically likely in the age of AI because AI code is so cheap to me. Get that 90% right. Make sure that you are ready to put a system like Mythos in place. And the best thing you can do is what I've described in this video where you set up an agentic pipeline. You write it to eval. You have a human security researcher at the end look at it today and you're ready to swap that out for a model when the right quality model comes along. And then you got to get to work patching what you have out there. Right? You've got to apply Mythos-like capability to look at what you have and aggressively patch and deploy. And the reason why, by the way, Mythos is only being released to some organizations is the organizations it's being released to happen to control some of the most powerful systems on the internet. We would not want Mozilla to be adversarially attacked in three or four months by one of these systems. We want it hardened up, which is exactly what Anthropic thought, and that's why they released it the way it is. I'm going to save some of the deeper institutional implications here for the Substack because there's a whole separate essay about open source maintainers, disclosure norms, funding, and what happens when small teams get flooded with real vulnerability reports and they don't have the capacity to process that. That deserves a deeper read for for this video here. The practical point is simpler. A good codebase is not just readable because humans like readable code. That's a side benefit. A good codebase is readable because it can be attacked by friendly machines. And the best way to defend ourselves is to make it readable so that clean architecture can be used by an AI researcher to cleanly reason over. Narrow modules are easier to constrain. Explicit off boundaries are easier to test. Small interfaces are easier to verify. Good tests give the model feedback. Clear specifications give the model something it can satisfy. technical debt becomes security debt in a much much more direct way these days because we're building so fast. Messy code is not merely annoying. Messy code is extremely dangerous. Messy code may be structurally resistant to the AI tools that could make it safer. In other words, we may have a golden refactor window here. And this is why I think comprehensibility is about to become a security property. You may have a four or five month window when it makes sense to refactor your code so it is interpretable by AI researchers in line with best practice security that humans understand because a system humans cannot understand is going to be harder for an AI researcher to understand as well. It's also incidentally hard to be governed by humans. You know by the way fundamentally if the organization does not understand what promises the system is making then the meaning layer collapses. I am making a plea here that you make your code interpretable. The goal is not to replace readable code with inscrutable machine output. I know some people think that's where we're going with AI. I'm pleading for the opposite. And I'm telling you the trend with AI is toward readable code because the goal is to make implementation more mechanically reliable while preserving the semantic structure that humans will need to think about over the system, reason about over the system. That is the standard. That is what we're aiming for. We're not just saying natural language in, app out, we're all good, it works. It's more like natural language in, think about, you know, your traces, your proofs, your type systems, your tests, and your adversarial review all as part of the agentic pipeline, and then see if it passes, and see if a human agrees it has meaning. At the end, humans describe intent at multiple levels of precision. models can propose implementations. Other models can attack those implementations. The tooling can produce the evidence of what happened. And then humans at the end, maybe very senior researchers in the post- mythos age, can start to inspect the evidence of a mythos review cycle, revise the meaning of the overall piece of software and decide whether the system is acceptable to ship. And so really we're moving from a world where the codebase itself is the thing that is the gold standard to this idea that the codebase is underneath a bundle of intent and implementation and verification that is produced by these agentic pipelines that we're going to start to need to review at scale. And this changes what a valuable developer starts to look like. Right? Because the valuable engineer is not just the person who can produce a clever prompt. Yeah, you see I did that. I'm not just saying writing code. Produce a clever prompt. The valuable engineer is the person who can define a system that can be safely implemented. They know how to turn product intent into very crisp standards and specifications within the overall code hygiene practices of the business. They know how to decompose a system into verifiable boundaries. They know how to design APIs that minimize authority leakage. You get the idea here, right? They know how to build all of this stuff. So, human judgment does not become less important here. It becomes concentrated at the places where meaning enters the system. This is actually closer to what senior engineering was always supposed to be. But it hasn't always been that way at most places. The more experienced an engineer becomes, the less their value comes from typing every line themselves. They define the abstractions. They notice hidden couplings. They understand why a tiny product choice creates a security problem. They know when a system is becoming illeible. The reason we've depended on senior engineers to hold up this whole security architecture in the first place is because they're good at that stuff. And AI doesn't make that ability to comprehend meaning disappear. It just changes where you apply it. So hear me when I say even though we're hitting the end of trusted human code as an age, and I think that's likely, I don't mean that humans are going to stop writing code tomorrow. I mean that our default assumption as builders of engineering culture needs to start to shift right now because today people worry that AI generated code is unsafe because nobody really wrote it. But in high assurance settings we may eventually worry that human written code is unsafe because nobody exhaustively adversarially searched and reasoned over. Generated code will not be trusted because it came from a model in this world. It will be trusted because it came from a verified process. In other words, as you think about the engineering culture you're trying to build in the future, you are building agentic pipelines that will be signers and certifiers and guaranurs of the quality of your code and your engineers will be there to maintain the intent of the software. And by eventually I mean yeah not you know May 10, May 15, not May 20, but probably by December. That's not that far away. If you're an individual contributor, you barely have time to start thinking about your skill sets at that point, right? If if you are a leader for a team, you have to start thinking about it right now as to how you architect your system and set up your agentic pipeline to get ready for this. And if you're a CTO, right now is when you have to start planning and budgeting for all of that. And yes, humans will still be responsible for what software means, maybe more responsible. Machines can't decide what promises a system should make, right? They can't decide what failures are morally acceptable. They can't decide what kind of authority a user should have. But the execution on those promises may increasingly belong to a loop that humans supervise rather than personally author. And in that world, engineers are less like scribes and more like a constitutional designer for machines. We define their powers, their limits, their rights, their obligations, the tests of legitimacy. Sounds super abstract, right? But but the practical takeaway skill that you can have today, write better specs. I talk a lot about specification and clear intent. I feel like I'm banging a drum. And the reason why it's important is that if you can't get the specs written down now, you are going to really struggle with good clear software that a tool like Mythos can use in 6 months. And this goes for juniors. If you're listening to this and you're like, I'm not a senior engineer. I'm not trusted as an architect. What's going to happen when all of this starts to happen? Write better specs. Get excited about clarity of intent. Demand specificity. Specificity is the enemy of technical and security debt. A good file for code has a verb that goes with it. It does a thing. Make sure you can be that clear with your code. Don't just ask whether your code works. Ask whether it's legible enough to be defended because Misilla's experiment is early, right? The tools are imperfect. You have to have this repeated a few more times. Institutions around us, including many of our development teams, aren't ready for this world. I'm telling you this because the direction is visible. There are companies that are already moving this way. If Misilla is writing about this today, there are 10 other companies that are just doing it, not talking about it. I guarantee it. And so AI beginning to read code is fine, but you need to start thinking about AI as a guarantor of code and how you set up pipelines that enable that world because the future of software will not be built on the belief that humans write safe code. I'm convinced of that now. It will be built on the ability of humans to define meaningful systems and the ability of machines to prove that the implementation has not betrayed them. That is the shift you we have to get into our heads together because implementation is going to get so cheap. It's going to zero. The cost of developing software is going to nothing. The cost of confidence and trust in software is something different. That may be expensive because that has to be guaranteed and that's something that you need a whole pipeline and probably a team watching the software to guarantee. And so we are going to move more and more of how we develop toward that assurance. Now if you want the deeper written version of this, I've got way more detail for implementers, for engineers, for engineering culture builders. I'll put it on the substack because the institutional side of this is a whole separate conversation. And if this framing was useful, subscribe here too. This is the kind of AI shift I think matters most, right? It's not about the demo. It's not about the leaderboard for models. At the end of the day, I'm less and less interested in that. I'm interested in how we start to change our human engineering cultures so that we find ways that we can thrive as these machines start to change the very roles and the very pieces of the roles that we thought were essentially human. It's just mindblowing to me that we are now in a place where we are starting to imagine a world in six months where AI written code will be the gold standard. But I think we're headed that way. So we had better start getting ready now. Best of luck. Cheers.

Get daily recaps from
AI News & Strategy Daily | Nate B Jones

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.