What Happens After A 1,000,000x AI Compute Leap? | Jeff Dean

Two Minute Papers| 00:28:35|Jun 1, 2026

Chapters14

Jeff Dean discusses his career highlights at Google, the lore around him as the ‘Chuck Norris of computer science,’ and the goal of the interview to probe questions that even he may not have solved while sharing insights with fellow scholars.

Two Minute Papers chats with Jeff Dean to explore AI compute scales, hardware shifts, and the frontier of continual learning and model distillation.

Summary

Two Minute Papers sits down with Jeff Dean, a pillar of Google Brain and TensorFlow, to unpack what a 1,000,000x leap in AI compute could actually enable. Dean discusses the evolving balance between training and inference in data centers, noting that inference workloads now dominate and hardware must be optimized for low precision and energy efficiency. He highlights TPU 8i/8T as examples of how specialization accelerates real-world AI workloads, and cautions that even aggressive data generation requires careful filtering and augmentation to avoid brittle results. The conversation touches on data availability, synthetic data, and the idea that multiple passes over existing data can yield markedly more capable models without unlimited new data. They also probe the future of model efficiency through techniques like distillation, clever context-window management, and retrieval-based augmentations to push the limits of what “context” can mean. Beyond these technical threads, Dean shares practical insights into reliability at Google-scale, including how ECC memory and software checksums mitigate the inevitable hardware hiccups in vast data centers. The dialogue closes with reflections on the next decade—where continuous learning, safer deployment, and ambitious engineering tasks (like autonomous OS-like systems or novel hardware designs) could redefine what “AI capability” means. Throughout, the host links these ideas back to real-world implications, from health care to operating systems, with a sense of curiosity and humility about what we still don’t know.

Key Takeaways

Inference now dominates data-center compute, driving hardware specialization for low-precision, high-throughput workloads.
TPU 8i/8T exemplify recent strides in energy-efficient, inference-optimized AI hardware.
FP4 and other ultra-low-precision formats can yield high-quality results with proper scaling and occasional weight reconditioning.
Data quality and augmentation remain critical; synthetic data and multi-pass training can substantially boost model capability.
Continuous learning remains an unsolved problem; effective continual or interleaved learning could unlock new capabilities.
Distillation remains a cornerstone for making frontier models usable at smaller scales (Flash vs. Pro models).
Retrieval and cascading context strategies could dramatically expand effective context windows without quadratic attention costs.

Who Is This For?

Engineers and researchers building or integrating large-scale AI systems who want concrete takeaways on hardware design, data strategy, and training vs. inference trade-offs.

Notable Quotes

""What did you mean by there being still plenty of data out there? There’s lots of interesting video data that we’re not really training on yet.""

—Dean explains that data diversity, including video, can fuel progress beyond text data caps.

""More compute will generate you more interesting solutions and then those can then be put into the training data... augmentation can be like translating Python code to other languages.""

—Highlights how compute enables richer data augmentation and cross-language transfer for training.

""You’ll see a lot more specialization in hardware for inference workloads... lower precision, large volume of requests.""

—Describes why inference-focused hardware like TPUs is essential.

""Two big things: can we build continual learning in an interleaved way, and how do we test and safety-check live models before release?""

—Touches on the practical challenges of continual learning and safety for deployed models.

""Distillation has always been a amazing way to get really capable models into a smaller footprint...""

—Points to distillation as a key technique for scalable model deployment.

Questions This Video Answers

How will TPU 8i/8T change the economics of running large AI models in production?
What are the practical challenges of continual learning for production AI systems?
Can ultra-low-precision formats like FP4 really maintain model quality in real-world tasks?
How does retrieval-augmented generation alter the limits of current context windows?
What is the role of data augmentation and synthetic data in scaling AI beyond existing training data?

Jeff DeanTwo Minute PapersGoogle BrainTensorFlowTPU 8iTPU 8TFP4inference hardwaredata center reliabilityECC memory"," continual learning"," distillation"," retrieval augmented generation"," context window optimization"],

Full Transcript

There used to be a chat group internally called data centers on fire that would have like exciting uh exciting events happening. A distant supernova goes off, a cosmic ray hits a memory cell and a zero flips to a one. Does that really happen? Oh yeah. So my question is do you enjoy these Chuck Norris style jokes about you? It could be true. um one problem that you solved tried to solve many times but have never been able to crack. I cannot believe that this is happening but I got to talk to a legendary engineer the chief scientist of Google Jeff Dean. He led Google Brain, one of the most legendary AI labs in history. He co-created map produce which taught thousands of computers to work together as one. He co-built TensorFlow, the engine behind a huge chunk of AI research. And for all this, they call him the Chuck Norris of computer science. Yes, I will tell him a joke about that too. Now, when I see interviews with these executives, everyone is asking about China and taxes and all that. Look, I know nothing about that. I am just a student who loves to talk about research. So, my goal was to try to go a bit deeper and ask him questions that maybe only he knows the answer to, which is incredible. I'll also ask him about problems that even he couldn't solve yet. And I will ask him about some of the secret sauce at Google and see if we get something and more. And I am so happy to share it with you fellow scholars so we can learn together. I am not sure if I saw Jeff smile and laugh this much before. So, I hope he enjoyed it too. And once again, this is an incredible honor. I cannot believe that I was sitting there. There were some production issues with the video part. I apologize for those. Also, I was super nervous. I could barely hold on to my papers. Now, fellow scholars, let's learn together with Jeff Dean. Thank you so much for doing this, Jeff. We talked a bit last year and I learned so much from you. It was incredible. And then I got a message that we we get to do this and I was so happy. So thank you so much for this and and we get to share your knowledge. A small part of your knowledge with with the fellow scholars. So that's that's absolutely it was great chatting with you last year. I'm looking forward to this. Thank you. Thank you. So everyone says that we are running out of training data for LLMs, but you you said that there is still plenty of data out there. What did you mean? Yeah, I mean I think everyone has this view that uh we're running out of training data and um it's true we've like used quite a lot of of the public text data in the world. Um but I think there's lots of interesting video data that we're not really training on yet. uh there's lots of interesting kind of um ways to generate synthetic data and then use that for training and then I also think we can start doing things like uh making more passes over the data that we do have to make more and more capable models and also come up with algorithmic techniques that enable us to get a lot more information from every piece of data that we do have. So I'm not too worried about that as like an impediment to making progress. It seems like there's lots and lots of things we can do. People also say that with so much simulation data as as you mentioned sooner or later most of the data will be AI generated which is then used to train a different AI and then suddenly everyone starts to you know learn on the same thing but you said wait it still helps I think the argument was that uh if you have enough compute you can crunch through a lot of data and if there is just a little needle in the haststack that's useful the system is able to learn from it. Is that true? because my previous crappy little experiment uh it it was not true at all. So you had to be very careful with the data. Yeah. I mean I think it is true in general. I mean there's a lot of details to get right to make this a reality. Think about for example doing RL training and rollouts to uh you know figure out how to solve some fairly highle phrased uh coding question right. So you might explore a hundred or a thousand different ways of generating solutions to these problems and you might have some, you know, some filters that you apply to these things like does the code even compile? Well, you can throw out 800 of them right off the bat. Uh does it pass the unit tests? Does it like perform well? And so you can really start to hone in on like which of these you know potentially many solutions to the problem is the one that actually sort of generates the highest you know characteristics that you're looking for the reward in some sense and that I think is is definitely true like more compute will generate you more interesting solutions and then those can then be put into the training data they can be enriched with like data augmentation techniques you know I generated the solution in Python now I could generate a solution in Oh, and have more go programming language training data. That's like an incredible kind of augmentation like augmentation before with convolutional neural networks, you know, it was just just shift the image by a couple pixels and whatnot and here the augmentation can be like completely different programming language and and whatnot. Yeah, I mean I think you know a lot of times we think about coding based problems as you go from natural language which is often very underspecified. It's like you know make me a cool space invader game or something. Um, but actually if you have a program that already works that does what you want and you want to translate it, that's awesome because in effect your prompt is the fully specified behavior of the system you want and you just want it in a different language for whatever reason. Maybe better performance or better safety characteristics or whatever. So that we've seen internally with some tools that have been written in Python and people have been able to sort of just say please use all the tests for this code and the actual Python codebase and make different versions of it and found you know much faster solutions. So you can you can suddenly get so much more out of the same amount of data basically. Yeah. So that's that's why you're not worried about the data. Okay. Nice. Now Bod Deli has said that something like 90% of what happens in modern data centers is not training anymore which I I found really surprising. It's inference like there's more less training and more using like relatively speaking. Um how does that shift the way you design hardware at Google? Yeah, I mean I first there's a lot of other things that are not either inference or training that happen in data centers like all the applications we run and search and Gmail and so on. But of the sort of machine learning workloads you know I it is the case that training uh is becoming you know less proportion of the overall compute that we want to do because there's so much you know inference workload you want to do and the inference workload includes both like offline inference u sort of RL rollouts during RL training uh and then also online inference for handling user requests or agent-based behavior. Because of that shift and the different characteristics of those two kinds of computations, it makes a ton more sense to now specialize much more for inference workloads in hardware for example. Um because the characteristics are quite different. You need lower precision. You you know are handling a very large volume of requests on this particular model. The model weights don't necessarily change uh at inference time. Um all these things lead to very different solutions for hardware and much more energy efficiency can be gained by specializing and so I think you'll see a lot more in this area uh you know now and in the future. We've already done this with our TPU uh 8i and 8T chips that we announced a couple um maybe a month ago. Um but you'll see even more specialization I think. And that's pretty crazy that you said that even FP4 kind of works. And I when I first heard it I was like it cannot possibly work. can do anything useful and and it does. Yeah. If you told that to a computer scientist from 15 years ago, they'd be like, that's that's not not enough numbers. Yeah. Yeah. Exactly. And I look at every now and then at these papers and you you you have these these different transforms that are the the the distance preserving transforms, rotations between the points and all kinds of compression. But still FP4, that's unbelievable. It's not many bits for expert or enters or sign and it and it and and it's high quality, you know, intelligence that comes out of it. So, it's just it's a good sign that it works. Yeah. Yeah. But I I I don't know if we can get lower. Uh what what do you think like even lower? Possible. I mean I think um you know people are seeing and experimenting with things where you have some even lower precision and then it every so many weights of that you know lower precision you have a scaling factor and that seems like you get a little bit of a higher precision thing that's kind of shared across all the other lower lower bit precision u formats whatever they might be two bit integer one bit integer you know I haven't heard anyone say two bit float because I'm not sure what that would mean but um yeah I that plus a scaling factor seems to be able to get you pretty far. And the question is like how often do you need the scaling factor? Is it every 64 or 128 or 256 weights? Pre and post training are typically separate steps today. Do you see that split holding or do you expect the two to merge as as capabilities increase? Yeah, I mean I feel like it's a little intellectually dissatisfying that they are these distinct phases and you do one and then you do the other. it like conceptually the right uh thing to do is to have interle periods where you're sort of observing data and then periods where you're trying to use that new knowledge you've gotten from the data you like like with DQN this experience replay kind of thing yeah and then you want to now take actions in some environment maybe it's a simulated environment maybe it's the world with a robot or whatever it is and then you know learn from those actions because I think you get a lot more benefit from actually um taking actions and observing the consequences or trying to write code and seeing does the code work than you do from just passively sitting there and seeing tokens streamed by you which is really what most of pre-training is these days. It's really interesting that you say that that in an interled manner because when I when I hear merging the two what what in my mind is continuous like continuous learning but at the same time people have to test models you cannot just chuck it out there you know you finish training you finish the post and and then maybe the red teaming steps and and and you know safety and everything and then you package it up and you say okay this is good to go but if there's continuous learning then then there's no challenges because how do you know that this intermediate state is actually safe. Maybe some more research there too. Yeah, I mean I think uh first like a bunch of discrete steps where maybe you do this a 100 times or a thousand times starts to look more like an integral than a summation. Um and so um I do think interle in that way will make sense but you're right like you have a bunch of things you need to do for a live model that is serving user requests. You need to make sure that it's safe. Um so it may be that the continual learning happens and then there's some uh application of uh you know safety protocols and red teaming as you say uh and then you release a new version of that but then that model still continues to learn kind of behind the scenes and then before the newest version of it is is provided to users you redo the sort of final safety testing and and teaming. Jensen likes to say that compute capabilities advanced 1 millionx over the last 10 years. So if in the next 10 years, assuming we get another 1 millionx, what would we be able to do that we cannot do now? Yeah. I mean it's like imagining the future is always a hard thing because this field is moving quickly. I mean I think if you think back, you know, 10, it was 10 years. 10 years. 10 years. If you think back 10 years, you know, we were kind of just starting to have language models that were the sequence to sequence paper had appeared. You know, it was just before the transformer. LSTMs, maybe LSTMs were were popular. Um, and now those models sort of look uh not nearly as ancient and not nearly as capable as the models we have today. So, I think if you project forward that level of advancement, you're going to see huge investments in both like new kinds of hardware um you know new kinds of research techniques uh there's just a lot more attention being paid to the field. So I I see that progress rate not slowing down um over the next 10 years. And so that's going to be incredible like the multi- aent workflows we're now able to start to kind of get to work on very complicated tasks like you saw in the IO uh keynote being able to write an operating system autonomously with a relatively simple prompt. Crazy. uh you know obviously there's a lot of operating systemy like things in the training data so it's not completely out of distribution but you know the fact that it's able to build an OS that can run Doom uh successfully is is pretty amazing I couldn't couldn't believe it I mean last year I heard a talk from Steven Balaban the Lambda CEO and he had this neural OS like hey you know it it does more and more like like forget the UI forget forget the maybe the drivers I don't know but but just let's let's have a neural OS and I was like, "Yeah, that that sounds like an amazing science fiction idea. I would love to see it, but I don't know. I mean, it sounds far off." A year later and we got you, you know, not exactly like that. I know but but if if you look at the derivatives over time I mean I would say one thing I'm particularly excited about is you know can we with these tools accomplish so much more in you know science Demis was mentioning in the keynote or in you know complicated engineering tasks that often would take you know lots and lots of people multiple years to accomplish. Could you actually have a system that with the correct access to the right kinds of simulation environments and a learning set of agents that are trying to accomplish the task and break it down into smaller tasks, could you design an airplane in, you know, five days instead of, you know, many many years? That would be amazing. 1 millionx and we we can we can try again. Yeah. I mean, we're not there yet, but that would be a pretty pretty amazing capability. Or designing new new computer chips or computer systems, new hardware. Um, you know, I'm pretty excited about that. Yeah, incredible times. Are open models standing on the shoulders of giants? And by that I mean if if Frontier models suddenly stopped being released, would open models improve as quickly as they do now or is their progress mostly driven by distillation? Yeah, I mean I think certainly a bunch of the progress is driven by distillation. For example, our own Gemma models are definitely distilled from higher quality larger scale models. Um and I think a lot of other open source models are getting benefit from distillation data. Uh distillation has always been a you know amazing way to get really capable models into a smaller footprint thing and you know uh that's how our flash models are quite capable for their size relative to the pro models is we're able to use the pro model to to teach the the flash models. So I mean I think really the the question is uh not so much one of closed versus open. It's you know if we want small incredibly capable models we have to keep building larger scale models that are maybe less inference efficient but are more capable and then use distillation to uh you know transfer the knowledge into into the smaller models whether they are open or closed. Now I'm I'm wondering you might be the only one who can answer that. So I I really want to ask this. Everyone has their their flagship models and yes the distilled models like pretty much every company does this tiered level thing. the quicker faster models are always were well below the the frontier models and at some point I think 3.1 where there was one version where where the the quick one was suddenly so so close to the frontier one there was like a 3% difference in in in tough benchmarks and and I just heard someone saying I don't even know who that was that that yeah it's not like just distillation there is some magic sauce in there that's been in the works for years. So, can I hear a bit about that? Sure. Well, not too much. I mean, there is always some magic sauce that we don't reveal, but distillation is definitely one of the key things that makes those, you know, much smaller models much cheaper, much faster, much more affordable um models be, you know, nearly as good as those frontier models. And then we push ahead and build an even better frontier model. And then we have to then do the process again where we now transfer the the knowledge and the really capable frontier model it back into a a lighter weight one. And I think um you know this is this is really important because the flash models are really the workhorse of what people generally want to use because they're you know they're almost as capable. We saw it. Yeah. Yeah. And uh and they're they're quite good. Yeah. It's unbelievable how close they can get like this. This didn't used to be like that at all. All right. What trends in machine learning are you most excited about right now? You you have a separate talk about like exciting trends in machine learning or something like that. Yeah. I mean what's what's the newer version of that? Yeah, the newer version I guess I mean there's a few different trends that I think are really exciting. The one is um uh so first I think continual learning is still a little bit nent but I think looking at ways to make models that are more interled in their way use of so sort of seeing data passively and taking action and learning from that seems like a really important thing. Uh you know agents and multi- aent use of uh these systems is really really important. Um, as one trend of that though, I think as you see, uh, you know, we're going to need a lot more inference hardware and capability for that because those systems that are working autonomously in the background actually consume lots of tokens in order to sort of do the the kind of important work they've been asked to do. Um, you know, I think, uh, being able to build really efficient inference hardware will enable a lot of of things. So looking at you know co-design of model architectures and hardware architectures to make sort of the best use of um things and have really good properties in terms of very low latency you know much higher performance per watt performance per dollar are things we we really care about. um you know I think looking at how do you you know the context window of these models is an important characteristic but uh I think there's a lot we could do if we come up with mechanisms that are sort of cascaded series of things that kind of give you the illusion that you have all information in the context window like you'd like to have the whole internet at your model's fingertips or on a personal level if you've opted in you know all of your email and your photos and your the videos you've watched and things like that. Um, but you can't really do it with the sort of quadratic attention mechanism. But I think you can build a series of kind of retrieval and lighter weight mechanisms and then ways of cascading from you know here are the 30,000 documents out of 10 billion that seem most relevant and then you know have a lighter weight model that looks at those and decides these 117 things seem really relevant to what you're trying to do and puts those in the sort of more expensive context window of a a bigger model perhaps. Uh that's going to be kind of exciting. And how do you orchestrate and interle all that stuff so it gives you the illusion uh without you having to even think about it? Interesting. So it's very advanced games to be played with the context window because obviously very expensive. So the attention mechanism you get you get bigo squared. Uh are we still there or are do we have some I mean I've heard some n login things. Can we go lower? There's like a whole series. Obviously we can go lower but the question is what what the trade-offs are right like what do you have to pay for that? Yep. um where are we in that? Yeah, I mean I think there's actually quite a large body of work there probably, you know, hundred papers on more efficient context uh uh algorithms than than the than N squared one. I mean the N squared one works really well. uh so it has a pretty high bar but I do think there is traction in finding things that are you know much lower cost whether it's you know reducing algorithmic factors or very large constant factors on the the base n squed algorithm I think all of these are pretty exciting you can actually combine many of these these approaches um and and get uh you know much cheaper attention over many more tokens yeah I think that's one of the most important things because if it was cheaper in some sense and and and and you could still find the the needles in the in the haststack over very long contexts. Then you could you could have some sort of lifetime AI thing. Yeah, totally. Like I'd like my whole life of all the digital things I've seen uh in there. Uh as a say internal Google developer, I'd love for the entire Google codebase to be in there, which is you know probably 10 billion lines of codes, probably you know big you know 100 billion tokens. I just want my wine list. I just want 100 billion. All I want is a 100 billion tokens of attention. It's all I need. Amazing. I think we got to do this one. So, Google's data centers run an enormous number of machines. And at that scale, anything that can go wrong will go wrong. Like I hear that wires wear down, hard drives fall apart, motherboards overheat. Um, is that something that actually happens day by day? And do you have any good stories? Absolutely. I mean, I don't have that many personal stories, but there used to be a chat group internally called Data Centers on Fire that would have like exciting uh exciting events happening and sometimes exciting videos. Um yeah, I mean I think at scale lots of things that are very very unexpected happen and usually those are the combination of one thing fails and something else fails simultaneously or in cascade of during the yeah you have a cascaded failure of some sort. You know, sometimes that means some software system stops working. Sometimes it means like the the bus bar overheats and you get too much power to the to the rack and like it catches on fire. I mean that's a much rarer thing. But um you know you have to be prepared for this and I think one of the things even from the very earliest days of Google is we have really focused on how do you build reliable systems out of unreliable parts. Yes. Right. Like in the earliest Google days, we were buying consumer machines without uh ECC memory didn't not not only not ECC not even parody we were buying consumer motherboards that didn't have like redundant power supplies and you can do that if you can handle things at a higher level and that's generally what we try to do in all cases is I actually wanted to ask you about that the ECC thing because here here's one of my favorite failure modes if if that's true but you you tell me the distant supernova goes off, a cosmic ray hits a memory cell and a zero flips to a one. Does that really happen? Oh yeah. Yeah, absolutely. I mean, alpha particles definitely can flip uh you know DRAM state. We've actually observed this because we have monitoring data of how many ECC uh errors and like single bit errors that are corrected and two-bit errors that are not corrected are happening in all of our machines. And you can actually see this where some clusters that are pointing in a particular direction in the earth have a much higher rate for a you know a brief period like 10-minute period or something and then the other ones in the other side of the earth do not have that. So it's definitely something that happens. How worried should I be? Because MacBook Pros don't have ECC memory as far as I know like for for one machine is it so vanishingly you know unlikely that you shouldn't care but for data center or I mean for one machine it's generally not too bad. I mean I I think they have par so at least they detect it typically if it's a single bit error so detection but not fixing right but ECC usually gives you single bit error correction and dual bit dual error detection. Yeah. So for with that you don't have to worry about it too much um at a single machine level but even at you know tens of thousands of machines you'd have to start thinking about that. So you know one of the things we did when we were using machines without even parody is we built an entire softwarebased check summing system for large amounts of our data. So doing it by hand doing it by hand essentially and like we would you know for crawling web pages and putting them in the index you know if you detect that this particular record is corrupted it's usually generally okay to just you know ignore that record. Now I have something interesting for you. I call it lightning round. So, please try to answer in one sentence. One word is okay. One one sentence. Can I make run-on sentences? We'll see. We'll see. So, I I read that Jeff Dean's pin code is the last four digits of pi. I I give this one an eight out of 10. So, my question is, do you enjoy these Chuck Norris style jokes about you? It could be true. Um uh I I do enjoy them. I mean, it's a April Fool's joke gone ary by my colleagues in 2009, but it's very both flattering and kind of embarrassing. I think I think he felt the same way about them, too. But he he he enjoyed them, too. Legend. All right. One big thing that you were wrong about and came around. I think AI is going to influence health care quite dramatically, but I think it is harder not necessarily for technical reasons, but for you know, how do you actually get things in regulated industries that are super important and have all kinds of privacy constraints and safety concerns, but I think ultimately that will happen. It's just taking longer than than I I hoped. Yes. Because I think there's tremendous world benefit to do it. Um, but we need to do it carefully and safely. Vim or Emacs or something else? Hint, there's only one good answer. Emacs. Was that it? Oh, no. Look, I I'm a Vim person, but but I'm I'm not Maybe I'm I'm an embarrassment of a Vim person because I I I looked at Emacs, too, and I was like, that's pretty cool, too, but I I don't want to learn both. It's it's just so much time. So, yeah, it's true. One can spend a lot of time customizing Emacs. the VRC I wrote up and then and then it never ends. Yeah. One problem that you solved tried to solve many times but have never been able to crack. I mean I think in some sense we still don't have an answer to how do you do continual learning appropriately? That's something I've thought about a little. I' I've dabbled a little bit with some some techniques along with colleagues. But I think uh you know if we're able to crack that it's going to be amazing. Um, but it's not there yet. Last one. Favorite Two-Minute Papers episode. Oh, yeah. I mean, I assume the the Transformer one was a good one. All right. All right. Well, that's that's a good one. Okay, Jeeoff, I I learned a lot today. Thank you so much. This chatting with you again. Thank you so much. Thank you. Here you see me running the full Deepseek AI model through Lambda GPU cloud. 671 billion parameters running super fast and super reliably. This is insane. I love it and I use it on a regular basis. Lambda provides you with powerful NVIDIA GPUs to run your own chatbots and experiments. Seriously, try it out now at lambda.ai/papers AI/papers or click the link in the description.

Get daily recaps from
Two Minute Papers

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.

Get Started

What Happens After A 1,000,000x AI Compute Leap? | Jeff Dean

Summary

Key Takeaways

Who Is This For?

Notable Quotes

Questions This Video Answers

More from Two Minute Papers

Claude Opus 4.8: Lying Machine No More?

Demis Hassabis: Cure All Disease In 10 Years

DeepSeek’s New AI Is A Game Changer

NVIDIA’s New AI Is Fast For A Strange Reason

Related Videos

Top 3 AI Courses To Learn In 2026 | Best AI Courses To Future Proof Careers In 2026 | Simplilearn

Deep Learning Full Course 2026 [FREE] | Complete Deep Learning Tutorial For Beginners | Simplilearn

Machine Learning Engineer Full Course 2026 | Machine Learning Tutorial For Beginners | Simplilearn

Agentic AI Developer Roadmap 2026 | How To Become Agentic AI Developer | Agentic AI | Simplilearn

Get daily recaps from
Two Minute Papers

What Happens After A 1,000,000x AI Compute Leap? | Jeff Dean

Summary

Key Takeaways

Who Is This For?

Notable Quotes

Questions This Video Answers

More from Two Minute Papers

Claude Opus 4.8: Lying Machine No More?

Demis Hassabis: Cure All Disease In 10 Years

DeepSeek’s New AI Is A Game Changer

NVIDIA’s New AI Is Fast For A Strange Reason

Related Videos

Top 3 AI Courses To Learn In 2026 | Best AI Courses To Future Proof Careers In 2026 | Simplilearn

Deep Learning Full Course 2026 [FREE] | Complete Deep Learning Tutorial For Beginners | Simplilearn

Machine Learning Engineer Full Course 2026 | Machine Learning Tutorial For Beginners | Simplilearn

Agentic AI Developer Roadmap 2026 | How To Become Agentic AI Developer | Agentic AI | Simplilearn

Get daily recaps from Two Minute Papers

Get daily recaps from
Two Minute Papers