Ticking Timebomb in Mac OS

The PrimeTime| 00:10:24|May 7, 2026

Chapters9

Introduces a classification of bugs based on timing and observing effects like Heisenbugs.

A fascinating look at a real Mac OS TCP bug that bricks machines after 49 days due to a uint32 time counter rollover.

Summary

The PrimeTime’s video dives into a chilling class of bugs, beginning with the Heisenbug, where simply observing a bug (like adding a log) changes its behavior. The host then pivots to a notorious timing-related issue: a 49-day countdown that can crash Macs and halt new TCP connections. He walks through how TCP ports work, the time_wait mechanism, and why reusing ports too soon is dangerous. The spike in connections near the 49-day mark triggers a rollover in a uint32 counter, causing time tracking to reset and the TCP state machine to stall, effectively freezing new connections. The explanation connects to a classic RFC concept from 1981 (RFC 793) and draws a parallel to Y2K38 by showing how a 32-bit timer can overflow and cause widespread failure. The host ties this to Photon’s Mac fleet monitoring iMessage and notes that Apple’s xnu code calculates current time in a vintage way, making the bug reproducible on all Macs if they stay online long enough. It’s a compelling reminder that even small design choices (like a 32-bit time value) can have outsized, time-based consequences in production systems. The segment closes with an invitation to read the article detailing the ticking bomb and with a plug for kernel.sh to accelerate AI agents’ browsing needs.

Key Takeaways

A Heisenbug alters behavior when observed; adding console.log statements can make the bug disappear and reappear when removed.
A 49-day countdown (49 days, 17 hours, 2 minutes, 47 seconds) is caused by a uint32 counter overflow turning milliseconds into a reset point.
Time_wait holds TCP ports for about 30 seconds after close to prevent misrouted packets from being misinterpreted.
Reusing TCP ports too quickly after long uptimes can exhaust available ports, blocking new connections once roughly 32,000 connections are reached.
The root cause mirrors Y2K38: a 32-bit timer wraps around, causing time calculations to stall and TCP state to stop advancing.
The bug is tied to Apple’s xnu TCP clock code and a historical timing approach, illustrating how low-level timing decisions impact modern systems.

Who Is This For?

Essential viewing for system engineers and developers who dig into low-level networking bugs, TCP lifecycle, and time-based failures that can silently escalate in long-running Macs or infra agents.

Notable Quotes

"There's an entire classification of bugs that you can just understand what they are simply by the conditions."

—Opening setup explaining timing-based bugs and the Heisenbug concept.

"This is the most famous one. It's actually called a Heisenbug."

—Definition of the Heisenbug and its behavior when you observe it.

"49 days, 17 hours, 2 minutes, and 47 seconds."

—The infamous countdown that triggers the TCP bug.

"If you just let a Mac stay on longer than 49 days... you will start experiencing the same problem."

—Empirical claim linking the observed issue to all Macs beyond the threshold.

"This is actually the exact same problem of Y2K38."

—Connecting the bug to the classic year-2038 problem.

Questions This Video Answers

What is a Heisenbug and why does it vanish when you try to study it?
How does the 49-day Mac OS TCP bug work and why is it tied to a 32-bit timer?
What causes TCP time_wait to linger and how can it exhaust available ports?
Why is Y2K38 relevance brought up in modern Mac OS networking bugs?
How does the 32-bit timer rollover affect real-world uptime and connectivity on Macs?

HeisenbugTCP time_waituint32 overflowY2K38 analogyMac OS TCP networkingRFC 793xnu TCP clockPhoton iMessage monitoringkernel.sh sponsor

Full Transcript

There's an entire classification of bugs that you can just understand what they are simply by the conditions. Let me give you a perfect example. If you have a bug, then you put in a console.log statement, and then the bug disappears, you should immediately know this is a timing-based bug. The cost of actually printing slowed things down enough that now the bug no longer happens. And then when you delete the print statement, what happens? The bug comes right back. This is the most famous one. It's actually called a Heisenbug. A Heisenbug is a software bug that seems to disappear or alter its behavior when one attempts to study it. So, simply observing this bug will cause it to go away. And typically, you just know that this involves time. Now, there's another set of bugs that if I even just say a number, there will be some of you out there that will just break out in sweat. So, okay, brace yourselves. 49 days, 17 hours, 2 minutes, and 47 seconds. Now, I know most of you are like, I don't even know what you're saying to me. Well, hopefully by the end of this video, you will also get sweaty, too, because it is a horrible, horrible number, and you should know exactly what the problem is immediately. But before we begin, hey, got to make that bag, baby. I know a lot of you have agents, and you're letting them run around on the internet on your computer. Stop it. That's the easiest way to shoot yourself in the foot. This is why you need today's video sponsor, kernel.sh, the crazy fast and open-source infra for your AI agents to access the internet. It takes under 30 milliseconds to spin up one or 1,000 cloud browsers for your agents. And authentication is automatically handled. Right now, over 3,000 teams already use this in production, including Framer and Cash App. So, quit nerfing your agents and give them a real browser. Head on over to kernel.sh and let them use the internet. All right, welcome back. Well, I would love to show you this really sweet document that we have right here, but unfortunately, they just recently changed their website, which caused all the text to turn white. And even when I go in there and manually adjust the CSS, it just refuses not to be white. So, I have the readers open right here. The article is called we found a ticking time bomb in Mac OS TCP networking. It detonates after exactly 49 days. And this involves a company called Photon, in which has a fleet of Macs they use to monitor the iMessage service. And for whatever reason, the Macs just crash after 49 days, 17 hours, 2 minutes, and 47 seconds every single time. They just start also just start running out of control with memory. And then eventually, they can no longer make any new TCP connections. That's kind of weird, right? And it turns out this is a bug that's actually in all Macs. If you just let a Mac stay on longer than 49 days, 17 hours, 2 minutes, and 47 seconds, you will start experiencing the same problem. You make enough TCP connections, and all of a sudden, you will be able to make no more, and your computer will effectively be bricked. All right, to understand this, you first need to understand the basics of making a TCP connection. When you make a TCP connection from your computer to some server, by the way, look at that beauty of a line, hand-drawn artisanal straightness. Never thought I could even tell that to my mom. So, with this in mind, when you go and you make a connection, you actually use a port on your machine. You have about 65,000 ports on your machine, but you only use approximately half of them for making these connections. When you close the connection, that port isn't immediately useful again. It needs to wait a small amount of time, because remember, the internet's big. Stuff happens. Packets that were sent out from the server could go bounce around in the cloud for a while, and then come shooting to your computer. So, if you just immediately reuse that port, you could get some other connection's packet coming in, completely confuse the whole thing, completely corrupt it, and then that actual connection would be completely useless. I just said the word completely like three TIMES IN A ROW. SO, there's this thing called time wait. It's always the usually specified in big capital letters, time wait. It's actually from the old RFC from 1981, 793. Now, if if you've kept up with any of my content, this is kind of a B-tier RFC, because I did rank all of the RFCs. And this one, you know, it's it's pretty good. Of course, I had to take down the video later. Everybody knows why, you know, the controversy. I I don't really want to talk about it. Anyhow, so, whenever you use a connection, that port will be held for about an additional 30 seconds. And then after those 30 seconds, hey, you're free to be used again, bud. The second important thing you need to know is integer overflows. I'm sure most of you know, but in case some of you who don't, let me do a quick explanation. A byte contains eight bits. Let's pretend all eight bits are ones. If I were to add one one to this, what would end up happening? Well, this would go to zero, you'd carry the one. This would go to zero, you'd carry the one. Dot dot dot dot dot all the way to the end, you'd carry the one, this would be a zero, you'd carry the one, and then the overflow bit would be set to one. By the way, that was a vertical straight line. Super rare. Okay, guys, this is a good video. This is a You better say something about how good this line looks, okay? I expect it. And that means your value went from 255, the maximum size of an unsigned byte can be, all the way down to zero. Very sad day. And this is called an integer overflow. You go all the way up to the maximum value just to be put all the way back at the beginning. And if I could really try very hard, I think I could make this drawing even better. Keep that in. Okay, so now that you know those two very important pieces of information, what Photon ended up doing is setting up an experiment. The experiment was, okay, we're going to take a Mac that's just about to run out of life, right? It's just about to cross this line where they all keep crashing. And what we're going to do is about 5 minutes before that time, we're just going to start sending massive amounts of client connections. And what we should see internally is that we reach an equilibrium of ports taken, where every time we make a new request, one of the old ones after 30 seconds should be dropped. And that's exactly what they have right here. Again, sorry for the text just not showing up. But right here, you can see they get up to about 200 active connections. And so, the connections just keep being made over and over again, and new connections start opening up after 30 seconds. But as they approach that number, if you look right here, you'll see that the connections start gaining and gaining and gaining. What's effectively happening is that no new connections are being made. And the reason why is actually kind of surprising. So, if you jump over here to the old Apple OS - distribution/xnu, you will find the actual Apple like xnu distribution. This what's on side of your Mac. You can go and you can check out stuff. You can go check out what's what's going on with security, man. Or maybe you can look at the Apple license. I don't know. Oh gosh, don't show this. Oh man, Apple, that's a lot of license. This doesn't look like MIT to me. What the heck's happening here? Anyways, if you go to TCP_subr-c, you will find this beautiful little function right here called calculate TCP clock. Now, this is used to kind of keep the entire TCP packets and everything in sync, because whenever you send out a packet, they have some timing information associated with it. Of course, this is also used to know when, hey, has that passed? Therefore, can we free up this port? They use this one singular clock. If you look right here, this is how they get the current time in milliseconds. Now, if you're used to something like date.now, or you use Odin, you use like tick_now, you'd be like, what the heck is going on here? Well, this is the old-fashioned way of being able to get timing information. It actually looks a lot like the Chrome performance APIs, doesn't it? Yes, it does. And if you look at this line right here, this is where the bug starts. Now, you'll notice that it takes the TV seconds. I don't know what TV stands for, but just just go with it. It takes the amount of seconds that has passed since your a machine has started up, and multiplies it by 1,000, which effectively makes it into milliseconds. Then it casts it into a uint32. Now, what's the problem with uint32? Well, there's about 4.2 billion values that can be represented by a uint32. Seconds multiplied by 1,000 turn into milliseconds, and about 4.2 billion milliseconds is approximately 49 days, 17 hours, 2 minutes, and 47 seconds. Therefore, once you've reached this point, this thing rolls over, and where does it go? It goes all the way back down to zero. Now that it's back down to zero, we get the current TCP now, which is just going to be some low value once you cross that threshold. Then what it does is it loads the shared timestamp among all the TCP things going on. This temp, it goes and checks and says, hey, is the temporary time currently less than the now time? Oh, it is. We're going to update that time now. Well, what's the problem right here? What happens right here? What's going on right here? Well, current now just wrapped all the way around. It's some small value, like 5,000, 1,000, 500, who knows what it is. Either way, it's a small value, therefore this statement will never execute, therefore the time internally in all the TCP stack will never move forward, therefore I've said that word too many times, BUT JUST DEAL WITH IT. AGAIN, therefore time wait can never be exceeded that 30 seconds. That means that none of the old ports can be freed, which means that once you've made too many connections, after like what, 32,000 connections, you can no longer make any new connections. Any of your currently open connections, of course, they work, but any new TCP connections do not work at all anymore. Oh my gosh, look at this. And it's all because of this uint32. You hate You wanted to save four bytes of data instead of going to a 64-bit number. Causing problems for everybody. In fact, this is actually the exact effectively the exact same problem of Y2K38. It's the exact same thing. Except for Y2K38, it's the amount of seconds from 1970 up until the number will start rolling over and going back into the past, because they use an i32, so they only get 2.1 billion seconds, and that is about, you know, 2038. Anyhow, I just thought this was super interesting. This is like such a super cool bug. You should definitely go check it out. The actual article is fantastic. If you can read it, it really goes into some good depth, and uh that it was it just I just find this stuff just absolutely fascinating. So, if you've ever kept a Mac on for too long, and then everything breaks, well, this is exactly the reason why. The name is the prime machine.

Get daily recaps from
The PrimeTime

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.

Get Started

Ticking Timebomb in Mac OS

Summary

Key Takeaways

Who Is This For?

Notable Quotes

Questions This Video Answers

More from The PrimeTime

Mythos unleashed on Opensource

Company Retreat from Hell

Microsoft doesn't understand FPS

Zig is at a crossroads

Get daily recaps from
The PrimeTime

Ticking Timebomb in Mac OS

Summary

Key Takeaways

Who Is This For?

Notable Quotes

Questions This Video Answers

More from The PrimeTime

Mythos unleashed on Opensource

Company Retreat from Hell

Microsoft doesn't understand FPS

Zig is at a crossroads

Get daily recaps from The PrimeTime

Get daily recaps from
The PrimeTime