The end of the GPU era
Chapters8
Discusses how Nvidia became the world’s most valuable company by supplying chips essential for AI workloads and the broader AI ecosystem.
Theo argues the AI hardware race is moving beyond Nvidia GPUs to specialized accelerators like Grock and Cerebras, driven by TSMC’s dominance and bottlenecks in manufacturing.
Summary
Theo breaks down why Nvidia, despite its immense value, could lose hardware dominance as AI workloads shift from generic GPUs to purpose-built accelerators. He explains how TSMC enables Nvidia’s success yet remains the true linchpin of chip performance across the industry. Companies like Grock, Cerebras, and SANA are racing to optimize inference, not just training, which could unlock vastly higher tokens-per-second than traditional Nvidia-driven pipelines. Open Router’s model-routing approach demonstrates the practical benefits of heterogeneous deployments for large language models. Theo also highlights the challenges: secrecy around chip designs, NDA-heavy fabrication facilities, and the long timelines (5–10 years) for new manufacturing processes. He notes that even Nvidia is funding partnerships (e.g., a roughly $20B deal with Grock) to hedge against a future where GPUs aren’t the obvious answer. The video contrasts Nvidia’s software and CUDA-centric strengths with the growing viability of silicon architectures designed specifically for inference workloads. Finally, Theo expresses cautious optimism for faster codecs and more usable models, while acknowledging the messy, competitive path ahead for AI hardware."
Key Takeaways
- Nvidia’s market leadership rests on GPU compute, but a wave of inference-specific accelerators (Grock LPUs, Cerebras chips) could redefine where speed gains come from.
- TSMC is the real kingmaker in chip performance; without its manufacturing capability, Nvidia, AMD, and Apple can’t deliver the latest silicon.
- Inference accelerators like Grock delivering up to 360 tokens per second (TPS) versus 60 TPS on traditional Nvidia GPUs, with Cerebras reaching 702 TPS in optimized setups.
- Open Router enables flexible routing of LLM traffic, showcasing how model-hosting and hardware can be swapped to improve throughput.
- Grock and Cerebras build chips around inference workloads, often requiring new SDKs and model-architecture tuning to match hardware, unlike GPU-native CUDA workflows.
- Manufacturing timelines are long (5–10 years for process nodes), which explains the slow pace of disruption despite growing demand for AI hardware.
- Nvidia is actively hedging by investing in alternative architectures, signaling that the era of GPUs dominating AI hardware could be challenged over time.
Who Is This For?
Essential viewing for hardware enthusiasts and AI practitioners who want to understand why dedicated inference accelerators are gaining ground and how manufacturing, partnerships, and secrecy influence the pace of hardware disruption.
Notable Quotes
""It's crazy to think that the literal most valuable company in the world might be selling a type of chip that stops being relevant in the next few years, if not even few months.""
—Theo previews the central thesis: GPUs may not stay the default forever as inference accelerators mature.
""The thing that you're actually shipping to a user... those specs, those plans, those tolerances, the SDKs... TSMC isn't shipping a chip to a user.""
—Clarifies the divide between chip design and manufacturing, emphasizing TSMC as the fabricator, not the product maker.
""Grock LPUs... every cycle accounted for. They also integrate the memory on chip... 702 TPS with Cerebras, insane.""
—Highlights the performance gains of specialized accelerators in inference workloads.
""When TSMC wants to spin up new manufacturing, that doesn't happen in months... 5 to 10 years.""
—Underscores the long lead times that shape how quickly disruption can unfold.
""Open Router lets you route your LLM traffic across different places to make it easier to change out what model you're using.""
—Illustrates practical deployment strategies that complement new hardware architectures.
Questions This Video Answers
- How can Grock LPUs beat Nvidia GPUs for AI inference in real-world workloads?
- What role does TSMC play in AI hardware competitiveness and chip shortages?
- Why are specialized accelerators like Cerebras and Grock potentially faster than traditional GPUs for inference?
- What is Open Router and how does it help switch between models and hardware backends?
- Will Nvidia maintain dominance or is a multi-vendor inference stack the future?
Full Transcript
Did you know that Nvidia is the most valuable company in the world? I mean, it makes sense, right? All of these huge AI companies are relying on them fully in order to do the AI stuff that we all expect them to do. That's why Nvidia's value is comparable to that of silver. Yes, all of the silver in the world is comparable to the value of Nvidia, just one company that makes chips for gamers. And now those same chips turn out to be really good for AI stuff, which is why everybody is using them. So, does Anthropic, oh, Anthropic's moving to Google's TPUs?
Well, that's fine. There's plenty of other companies that aren't doing that like OpenAI. Oh, OpenAI is partnering with Cerebrus, the chip company. Well, at least there's Meta, right? Oh, Google's working to erode Nvidia's software advantage with Meta's help. Google's going to be giving TPUs to Meta. Huh. Very interesting. In fact, things have gotten so crazy that even Nvidia is investing in alternatives to Nvidia, like Grock, who they just paid some crazy number. It's currently estimated around $20 billion to license Grock's technology and pull its founder and some other people over to bring what Grock does to Nvidia because it turns out GPUs might not actually be the best solution for AI going forward.
It's crazy to think that the literal most valuable company in the world might be selling a type of chip that stops being relevant in the next few years, if not even few months. And I have a lot to say about it. But you know what's likely going to stay valuable? Today's sponsor. One of the biggest hiccups I've seen for big companies adopting AI tools is the change in the mindset and flow. A lot of their stuff just isn't built for moving this fast. In particular, their CI. How great is it to file 10 PRs a day if those PRs take 40 minutes each to build?
Be a lot better if they took under a minute, right? That sounds impossible though. Unless you're using today's sponsor, Depot. These guys will make your builds absurdly fast. Post Hog got 30 times faster. Their builds went from 138 minutes down to 4 and a half. That's hilariously faster. Even the worst case with Zed still saw a 1.4x increase. Most saw closer to 3 to 20x though with Mastadon hitting 19x from 46 minutes to 2 1/2. I like how Post Hog put this best. Around here, we say Post Hog ships weirdly fast. And you can't say Post Hog ships weirdly fast if you're waiting for an hour and 45 minutes for it to ship.
I personally experienced this when we were trying to ship changes on T3 chat for our wrapped feature and Post Hog was shipping stuff for us. They got things shipped during a holiday week in literally 10 minutes because their builds finished almost instantly because they moved depot. Depot unblocked features for me. That's something I can't say about very many sponsors. It's time to stop wasting time. Fix your builds at soyv.link/depo. So, let's talk a bit about why Nvidia's kind of doomed. In order to understand this, we need to first understand how we got here in the first place.
Why is it that the company that made graphics cards for our gaming PCs suddenly became the most valuable company in the world? I could go way too deep on the weird history of Nvidia as a hardware partner from how they over Apple, so many other things. I just I'm a nerd about this. I know way too much. I have been an Nvidia hater for the better part of like 15 or so years now because they made my life as a PC builder way harder. They made Apple's lives way harder. They Nvidia's just been a really bad player in the market for a long, long time.
But Nvidia did have one thing, the best GPUs. It turns out making GPUs is hard. That's why so many companies have been struggling to get into it themselves. Intel started their own GPU division a few years ago in order to try and catch up with Nvidia and AMD and they have struggled immensely. AMD tried and failed as well and instead chose to buy out ATI which was a company that was competing pretty close with Nvidia at the time, the time being like 20 years ago and since then has been folded fully into AMD as their graphics division.
And those graphics chips are the ones being used in most home consoles today. Things like the Xbox and the PlayStation because both Sony and Microsoft have moved as far away from Nvidia as possible because they've had so many problems with them. And funny enough, the only one still betting on Nvidia for consoles is Nintendo because they inked some crazy private deal in order to use all of Nvidia's failed ARM chips that they were trying to sell for tablets, trying to move those to the Switch in hopes of finding a market for it. And that went better than anyone would have expected.
So Nvidia's only console sales, only gaming sales right now are to Nintendo of all companies. But for the most part, Nvidia has pissed off all of their partners over the last 20 years to the point where very few of them still choose to rely heavily on Nvidia in the hardware sales and distribution space. And the reason they get away with it is again making GPUs is really, really hard. It's so hard that Nvidia doesn't even do it themselves. So what the hell makes Nvidia so valuable then if they're not making the GPUs? company that's actually making the chips and the dyes and the silicon that goes into Nvidia's GPUs is a company named TSMC, Taiwan's semiconductor manufacturing company.
These guys manufacture the silicon for all of the best chips in the world. There are other companies that make chips, but none of them make chips as small, refined, and powerful as TSMC. That's why Apple relies on them heavily for all of their chips. The reason the M series Maxs and the iPhones and iPads in general are so performant is because Apple made a bet on TSMC early as their silicon manufacturing partner. And more and more companies have had to move over as well, including crazy enough companies like Intel that historically have owned their own manufacturing.
TSMC is in my opinion the company that is actually most valuable by far across all of this because without TSMC the performance that we expect from Nvidia, from AMD, from Intel, from Apple, from all these things is no longer possible. This company is what the next world war will probably start around. As crazy as that sounds, whoever controls TSMC kind of controls how powerful chips are for the rest of the world. But TSMC is just taking a blueprint and constructing it. Those blueprints have to come from companies. Those specs, those plans, those tolerances, those expectations, the SDKs and platforms around them, the thing that you're actually shipping to a user.
TSMC isn't shipping a chip to a user. TSMC is taking in a blueprint, printing it onto an impossibly small die, and then sending that to a company like Apple or Nvidia to do what they want to with. And Nvidia has really, really good architecture for doing GPUs through processes being developed by TSMC. Nvidia comes up with crazy ways to do compute with silicon that they then forward to TSMC as a plan that gets manufactured and then given back to Nvidia. And part of the agreement is that TSMC can't keep those blueprints, can't reuse or sell those blueprints or any techniques that Nvidia makes up or discovers, which means a lot of the things that make Nvidia's chips so good at generic crazy math processing is stuff that TSMC knows about but can't reuse or resell.
And all of the things that have leaked are so heavily patented that Nvidia will sue your life out of your soul if you try to copy them. And the reason for this is relatively simple. Nvidia's GPUs are really good at generic compute BS, especially all the fancy vector math stuff and matrix transformations and the things you'll hear about on other YouTube channels that go way deeper in the math. The NVIDIA chips were built to handle those types of things because they were built to handle lots of pixels on a screen. If your processor has four cores and you have to process 1920x 1080 pixels, that's 2 million pixels you have to process every frame.
That's 16 milliseconds of time. Imagine processing over 2 million pixels on 4 to 8 cores in 16 milliseconds. Good luck. Good luck. This is why GPUs are so powerful. They have thousands of these much smaller, dumber cores effectively that are built to do this large math transformation stuff. And it can handle these types of workloads much more easily, similar to how they handled things like mining cryptocurrencies. I'm probably gonna date myself a bunch here, but when I was in high school, my favorite thing about having a nice GPU is that it was a free source of income because I could use it to mine Bitcoin because Bitcoin mining was a complex enough math problem that a powerful GPU could solve and you would make some money as you powered the network of Bitcoin.
It was so powerful and it was so useful that people started looking for ways to optimize the math because the GPU is really good at these types of generic problems. But if you're doing the exact same thing over and over again, the GPU being so generic stops being as beneficial. And this is why AS6 started to become popular. And ASIC, as Google summarizes for us here, thank you. Probably ran on an ASIC, funny enough, is an application specific integrated circuit. They were commonly used for Bitcoin specifically because the Bitcoin math could be optimized further if you made a purpose-built chip to just do that one piece of math.
And very quickly, AS6 took over in the Bitcoin mining world to the point where if you were using a GPU, you were losing money because your power bill was greater than what you would make back versus an ASIC minor which was so much more optimized that it was more capable of getting more blocks broken in faster times and using less power. This is the thing that we are here to talk about. Not Bitcoin AS6 but the idea of application specific integrations in general on chips. Cerebras is the most prominent company in this space. The space that we are talking about here is accelerator hardware chips that can be given these workloads and actually perform them and generate results.
a chip that can traverse this gigantic pile of parameters that are often hundreds of gigs large, which is the model itself, that it can then use and pull in this pile of text that a user generated or is your chat history or whatever, and combine those two things to figure out which token is most likely to be next. By using all of the parameters and its giant map, adding in the math it calculates off of the text you put in to point it towards what parameter is most likely to be next. It's this giant web of vectors pointing to and from each other that is hard math to calculate, but it turns out you can optimize chips to be way better at it.
GPUs are better than almost anything else that exists in the traditional world at this. And since a lot of workloads for training tend to need more competence and capability in what you can do, GPUs are still the chip of choice for training. But when you actually want to run the model once it's done being baked, those same chips are nowhere near as efficient as tailor made solutions. The other interesting thing here is the companies you think of most immediately when you think of AI, you know, the open AIs and the anthropics and the metas of the world, those companies don't make their own accelerator hardware.
They are all just using chips. They buy from other companies, mostly from Nvidia. But there are some investments happening in accelerator hardware. Most notably, Google, which is the only company that covers everything from the apps you use with AI to the models to the place that you host the models to the hardware the models run on. Google's one of the only companies that's competing in all of those spaces and fighting any success with it at all. Even Nvidia doesn't really let you rent GPUs from the cloud. They bought a company that does it. They barely maintain it.
If you want to use a bunch of Nvidia GPUs, you better go buy them from Nvidia or find somebody else who already has. So Nvidia has historically been the default option that all these other companies rely on when they're trying to do inference training and everything else. But that is a wedge that other companies noticed existed. Companies like Grock, Cerebrus, and SANA. All three of these companies know that this can be optimized better, especially the actual inference side based on what they've seen in the past with things like, you know, Bitcoin AS6. There's obviously opportunity here to make things that are faster for doing inference.
If you want proof of this, look no further than Open Router. If you're not familiar, Open Router lets you route your LLM traffic across different places to make it easier to change out what model you're using, what platform is hosting the model, things like that. So, it's really easy to take an open weight model like GPOSS120B and change where you're actually running it because you just change the string or let them do it for you. Companies like Deep Infra, which are using traditional Nvidia GPUs, can run this model at around 60 tokens per second. on my laptop, on my MacBook, I could run the same model at 80 tokens per second.
But if you scroll down to a company like Grock, which again makes their own chips for this purpose, you go from 60 to 80 TPS up to 360 tokens per second or with Cerebrris all the way up to 702 TPS. That's a 10x difference. That means it's running the model 10 times faster. Insane. There are models where Cerebras can pull 3,000 TPS if the optimizations are right and the model is built to work well with the architecture of their chips. One of the most common complaints with OpenAI models right now, especially models like GPT 5.2 codecs, is that they are really smart, but they're really slow and just not as pleasant to use as a result.
Which is why this partnership is really exciting and why the first thing Sam Alman had to say about it was very fast codecs coming. Yeah, this is a bet to let them make the models way faster and also theoretically speaking, free up some of their Nvidia GPUs that were being used for this inference. So, they could use those GPUs for training instead. Every GPU you're using for training is a GPU you can't use for inference. So, if they move inference off GPUs, they have more of them available for doing training work. And when you look into a company like Cerebras, you see just how hard they're going to make all of this happen.
They make gigantic chips on one hand because they want to put more things on them, but also because they don't have access to the same manufacturing that Nvidia does for TSMC. So, their actual die sizes have to be larger as well. That said, their wafers can do crazy inference. It's wild to look at this and see how gigantic the chips are. The problem with manufacturing chips that are this big is that having all of the pieces you put into it work is likely not going to happen. The bigger the chip is and the more dyes you put on it, the higher chance that some of those are failing, which is one of the big things they've had to innovate in a lot.
How do you reduce the failure rates low enough that you can actually make a chip this big without having a bunch of the dies dead on it on arrival? Also, notice how little information they're putting on the page here. That is a choice. These companies are all hiding the absolute out of everything that they are doing. Cerebras, Grock, Sonova, Nvidia, AMD, Intel, none of these companies are sharing any details about how they make these things. This is the type of information that like you need to sign 15 NDAs, promise your firstborn child, and then wear a top tobottom gown that is static proof, and go through a room to be sanitized before you're even allowed inside of the facility.
The amount of secrecy and privacy around these things makes Apple look like an open- source company. It's kind of nuts, but there's a reason for it. There's a very expensive reason for it. Nvidia is worth $4.5 trillion. If the secrets that make their chip so good become public information, Nvidia is no longer worth $4.5 trillion. And Nvidia knows this and is scared of this, which is why they spent 20 billion of those 4.5 trillion to do a partnership with one of these chip companies, Grock. Not Grock with a K. Grock with a Q. Grock with a Q is one of the companies building these chips.
They refer to their chips as the LPUs. These chips are built for doing inference really, really performantly, just like the other ones we're talking about. No wasted operations, no unpredictable delays, every cycle accounted for. They also integrate the memory on chip because then things end up being way faster and you can squeeze way more RAM on to fit bigger models. They're power efficient. They're on these gigantic racks that they put wherever they can in the world. They also often have to build their own SDKs because remember you don't get access to cool things like CUDA anymore, the standard for building GPU optimized work when you have a chip that's not a GPU.
So that not only are these companies rethinking the actual chips that we're running everything on, they also have to rethink the SDKs in the software that we're building on top of them. And often this is a back and forth where a company like Grock has to deeply analyze how a model like Llama works, what parts of the GPU it's hitting, and then take those things and optimize them to Helen back to squeeze into a custom chip. And that back and forth is crazy. Sometimes a new model drops and it doesn't fit well on a Grock chip.
And now they have to go back to the drawing board and make modifications both to the model and to the chip to try and make them mesh better. And since everything's still trained on Nvidia, everything's still largely built to be run on Nvidia. So yeah, it turns out when you make a chip for a specific purpose, you can outperform chips that are made for more generic work. And the only way Nvidia's current architecture would keep them as number one is if it turns out by some weird divine intervention that GPU architecture is just magically the exact architecture that makes the most sense for doing AI work.
In order to beat Nvidia, you have to replace all of the things that make Nvidia great. You have to have all of the things that they do in a generic way accessible in other places. You have to handle the fact that everything's built around CUDA and crazy tools around it just aren't going to work on these other things yet. You have to be prepared to fight on all of those levels. But there's a lot of companies that are prepared for that fight and are pushing hard. But it's important to know how long that takes as well.
When TSMC wants to spin up new manufacturing, that doesn't happen in months. That doesn't happen in a year. It takes five plus years for TSMC to say, "Okay, we have this process. We want to implement it. We're going to build a new factory for it." 5 to 10 years. That's also why the chip shortages are happening now because the demand went up. And it takes 5 to 10 years for the manufacturing to catch up. That's also why we're starting to see these companies like Grock and Cerebras finally really getting competitive because it's been about 5 to 10 years since they started.
But in the end, margins always win. And the margins Nvidia has right now are far too high when their chips aren't as optimized as they could be. Right now, we're putting a lot of money into training and inference, but over time, if the AI bubble continues to grow, we'll have way, way more money and time going into the inference side, and Nvidia selling chips for inference will stop making sense very quickly with basic economics of scale. But until then, I suspect, and again, this is not financial advice, I would expect Nvidia to stay pretty close to the top for a while as the market slowly realizes that TSMC is the company actually producing the value.
And more importantly, the novelty that makes Nvidia so high up here is something that they aren't necessarily the best solution for. I think I've said all I have to here. It's always fun to rage at Nvidia as a lifetime gamer that has a lot of opinions about the company, but in the end, I'm just excited to see competition happening and for inference to get faster. I want these models to run as fast as we can possibly have them because that makes them easier to use and more powerful in our day-to-day work. I'm excited to run things like codecs on 3,000 tokens per second instead of the measly 30 to 40 that we get today.
Let me know what you guys think. And until next time, peace nerds.
More from Theo - t3․gg
Get daily recaps from
Theo - t3․gg
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.









