NVIDIA’s New AI Just Cracked The Hardest Part Of Self Driving

Two Minute Papers| 00:09:00|Mar 24, 2026

Chapters8

Announces the emergence of the first completely open reasoning system for self driving and the release of model weights and code.

Open, explainable self-driving AI with open weights and training data could revolutionize safety, transparency, and experimentation at home, despite high training costs.

Summary

Two Minute Papers’ latest breakdown spotlights a groundbreaking open self-driving system from a major research push. Dr. Koa Eher highlights how the approach shifts from opaque, purely reactive driving to an auditable chain of reasoning that explains why the car makes each move. The model weights, inference code, and a subset of training data are released, enabling a student anywhere to download and test a state-of-the-art driving brain. Key techniques include reinforcement learning with a consistency reward to align words with actions, and a conditional flow matching loss to smooth control. The team uses 700,000 video clips annotated with diary-style explanations, along with a hyper-realistic simulator called AlphaSim built with 3D Gaussian splatting to safely practice rare scenarios. Even with impressive improvements like a 25% reduction in close-encounter rate, the method still faces high training costs and the challenge of preventing the AI from “hallucinating.” The video also notes a practical path forward from other work at DeepSeek that trims supervision by letting multiple planned trajectories compete. The host references attending GTC to dive deeper into AI breakthroughs and plugs Lambda GPU Cloud’s 671 billion-parameter DeepSeek model as a tool for researchers and hobbyists alike.

Key Takeaways

Open model release enables broad experimentation by letting researchers download state-of-the-art driving brain and test it locally.
Reinforcement learning with a consistency reward aligns the AI’s stated intentions with its driving actions, reducing misalignment between words and wheels.
Conditional flow matching loss helps convert shaky policy outputs into smooth, continuous driving movements.
700,000 annotated video clips are used to generate diary-style explanations of driving decisions, enabling deeper causal reasoning.
AlphaSim provides a hyper-realistic, crash-tolerant training environment using 3D Gaussian splatting to reproduce real-world scenes.
The approach faces cost and scalability challenges due to expensive teacher-based reinforcement learning, with ideas from DeepSeek offering potential workarounds via self-competition among plans.
The release of model weights, inference code, and subset of data marks a transition toward open, auditable self-driving development.

Who Is This For?

Researchers and developers in autonomous driving and AI who value open science, reproducibility, and explainable AI, plus tech enthusiasts who want a peek at the bleeding edge of self-driving research.

Notable Quotes

""We are getting what I think is the first completely open reasoning system to do self-driving that we can all use right now.""

—The speaker emphasizes the openness and immediate applicability of the new system.

""The keys to the kingdom are being handed to us. They released the model weights and the inference code and a small subset of the training data.""

—Highlights the open-release milestone enabling external evaluation and experimentation.

""This is excellent because if it reasons it actually drives better. Its close encounter rate is reduced by 25% just by thinking out loud.""

—Cites tangible safety improvement from the reasoning approach.

""The AI says stuff and then the AI does completely different stuff. So it just basically makes things up.""

—Describes a core problem—hallucination—that the proposed solution aims to fix.

""We the long tail refers to those rare bizarre situations...""

—Explains why handling rare events is critical for robust self-driving.

Questions This Video Answers

How does an open self-driving model differ from proprietary systems in terms of transparency and safety?
What is a consistency reward in reinforcement learning, and how does it reduce misalignment between intent and action?
What is conditional flow matching loss and why does it smooth driving trajectories?
What is AlphaSim and how does 3D Gaussian splatting create realistic training environments for self-driving?
Why are model weights and training data important for reproducibility in autonomous driving research?

Self-Driving CarsOpenAI-style Open ModelsReinforcement LearningConsistency RewardConditional Flow Matching LossAlphaSim3D Gaussian SplattingDeepSeekGTC ConferenceLambda GPU Cloud

Full Transcript

Self-driving cars are on the rise. As of today, Whimo is providing hundreds of thousands of paid trips per week across cities like San Francisco and LA. Crazy. They are already designing what the cars of the future would look like if they were like little study rooms for us while we travel. So, how does it work? I don't know. [laughter] Almost nobody knows why. Because all of these solutions are proprietary technology. It's a secret. And now finally something incredible happened. We are getting what I think is the first completely open reasoning system to do self-driving that we can all use right now. There are other open systems I think not as good as this and most not reasoning. This is the key. Okay, let me try to explain. We have a 42page research paper on how it works and I will now try to explain it in the simplest words possible and surprisingly it teaches us not only about self-driving cars but it teaches us about life itself. You'll see now imagine sitting in the passenger seat in a car next to a teenager. So you are sitting there minding your own papers and then suddenly they smash that gas pedal. Whoa. You ask why did you do that? And they say I don't know hormones. [laughter] And you see that's the problem with current self-driving systems. They are a bit like a teenager. They just do stuff. You ask why. And they just shrug. They look through the cameras. They output steering commands. But we have absolutely no clue why they do what they do. Now this one is not like that. Look. Oh my. That is incredible. It says exactly what it is about to do. And why it says we are nudging to the left because there is a car stopped on the right. Now we keep left to follow the temporary corridor. And now we keep a bit of distance. This is excellent. I love it. Now there is a reason why this is excellent and it is not that when we die we also get a nice little message why. Hooray. No this is excellent because hold on to your papers fellow scholars because if it reasons it actually drives better. Its close encounter rate is reduced by 25% just by thinking out loud which is kind of insane. Also, if it made a mistake, we now exactly know why and can improve the system accordingly. But it gets better. You see, it focuses on the heart of the self-driving problem. The long tail. Oh, yes. The long tail refers to those rare bizarre situations like a crazy unicycle person on the highway or a confusing hand signal. These are trouble. Why? Because the AI never sees enough of these to learn properly. And here preparing for the long tail means that it even understands that there is a construction worker ahead and it should listen to his instructions. Absolutely incredible. I'll tell you how it works in a moment. At least I'll try my best. So here is the best part. The keys to the kingdom are being handed to us. They released the model weights and the inference code and a small subset of the training data. This is not everything, but this is incredible. This means a student in a dorm room can now download a state-of-the-art self-driving brain to run and evaluate. Just think about how insane that is. We don't have to be at the mercy of closed proprietary systems. So, huge thank you for that. What a time to be alive. Now there is a small problem. And by small I mean a big problem. The AI says stuff and then the AI does completely different stuff. So it just basically makes things up. How do we fix that? How is that even possible? Dear fellow scholars, this is two minute papers with Dr. Koa Eher. Well, they didn't just teach the teenager to drive. No, they hired a strict driving instructor. This is where things get crazy. They use a technique called reinforcement learning with a consistency reward. This is a lie detector. Love it. Imagine the teenager says, "When I see a red light, I will stop." Got it? Then the red light appears and it just keeps driving. So, what do we do now? Well, the instructor, the reward model, smacks the dashboard and says, "Zero points." Next time, make sure your actions match your words. Something many humans could learn from, I shall add. So, the AI cannot just make stuff up. It actually has to live by it. But it doesn't end there. This just keeps getting better. They also added something they call the conditional flow matching loss. Now, our teenager knows what to do, but oh my, his hands are quite shaky. This piece of math helps smooth the shakiness out into nice continuous motions. Okay, let's continue to understand how it works. To teach this teenager, they didn't just show them hours of driving footage. No, they had this AI look at 700,000 video clips and write a diary entry for each one. Why? To explain exactly what caused the car to move. Something you can only do well with reasoning systems. Genius. Now training. Did they just let out this teenager onto the highway? No, of course not. So they built a hyper realistic video game called Alpa Sim. It uses 3D Gaussian splatting to reconstruct the real world inside a computer. So it looks almost like real life. In this video game, you can crash all you want as long as you are learning. here it can practice those rare and dangerous scenarios and only if it proves that it is really good only then is it allowed on the streets. Now wait there is so much to learn from this paper and not just about self-driving cars but about ourselves too. You see we said the AI performs better when it explains the cause before the action. That is excellent life advice. Don't just react to anger or stress. Force yourself to say the cause out loud. I am angry because I am hungry and only act after this. It works. Remember, the AI had a much higher success rate doing this. Also, the AI is penalized if its words don't match its wheel movements. Say out loud what is important to you. Now, look at your calendar. Does it really reflect what is important? Are you really doing what you just said matters the most? Now, of course, not even this technique is perfect. The limitation here is that our driving instructor, the reinforcement learning process, is expensive. Just imagine how tiring it is to grade every single decision the teenager makes. Oh my, [laughter] it's like paying for a private tutor 24/7. Effective, but costly. Now, scientists at Deep Seek in a different paper got around this limitation by eliminating the teacher and having the teenager create 16 different plans and grade those against each other. Maybe something that could be done here too in the future. By the way, I will be at the GTC conference this year in San Jose talking to amazing scientists about recent AI breakthroughs, including this one. It takes place on March 17th, and I am already getting pretty nervous about it. Anyway, if you like this, subscribe, hit the bell, leave a really kind comment, and of course, look for the man in the lab coat at GTC, and I'll give you a gift. I'll probably won't have the guitar on me, though. Here you see me running the full Deepseek AI model through Lambda GPU Cloud. 671 billion parameters running super fast and super reliably. This is insane. I love it and I use it on a regular basis. Lambda provides you with powerful NVIDIA GPUs to run your own chatbots and experiments. Seriously, try it out now at lambda.ai/papers AI/papers or click the link in the description.