NVIDIA's New Free Al - A Gift To All Of Us
Chapters8
Fast performance on the surface but coding tasks reveal limitations and glitches.
Nvidia’s Neotron 3 Ultra debuts as a blazing-fast, fully open AI model with an Apache-2.0–style Open MDW license, impressive open-source balance, and practical limits for real-world tasks.
Summary
Two Minute Papers’ coverage of Neotron 3 Ultra frames it as a groundbreaking open AI model: incredibly fast, yet imperfect for challenging coding tasks. The host tests real code, notes some black screens, and then shifts focus to its strengths—rapid prototyping, file organization, and installation fixes. Crucially, Nvidia’s model is fully open, with weights, training data, and recipes sharing openly under the Open MDW license, a significant win for open science. The video dives into the architecture—mixture of experts with 550B parameters (active 10% per token), NVFP4 low-precision math, and multi-head token prediction—explaining why it feels “blazing fast” in practice. While vision capabilities are missing, the host argues for a “roster of models” approach, pairing Neotron 3 Ultra with other specialized models like Gemma 4 for multimodal tasks. The discussion also covers practical constraints: running the full model locally is impractical for most, but services like Lambda GPU Cloud make experiments feasible. Overall, the host celebrates the open model ecosystem and Nvidia’s licensing as a near-10/10 achievement for open AI.
Key Takeaways
- Neotron 3 Ultra is 550 billion parameters with about 10% active per token, enabling many specialist mini-brains to run concurrently (mixture of experts).
- Open MDW license allows “do whatever you want” usage with strong derivative and commercialization rights, plus a patent-claim protection feature that seems to terminate the license if misused.
- The model is extremely fast for non-coding tasks (e.g., fixing installations, organizing files) but struggles with challenging coding prompts (e.g., full light simulations or real-time strategy game rendering).
- Full local execution is blocked by hardware needs (hundreds of gigabytes of GPU memory) and 1-million-token context; Lambda GPU Cloud is recommended for practical experimentation.
- The architecture combines NVFP4 low-precision math, mixture of experts, and multi-head token generation to deliver speed without sacrificing too much accuracy for many tasks.
- A pragmatic, roster-based approach to AI—no single model does everything; plan to bolt complementary models (e.g., Gemma 4 for vision) to Neotron 3 Ultra when needed.
- Nvidia’s licensing and open-sourcing choices are framed as a major milestone, with the Open MDW license earning the speaker a near-perfect score for openness.
Who Is This For?
Researchers, developers, and AI practitioners who want a genuinely open, production-ready AI model to experiment with, plus a clear view of licensing, hardware needs, and how to build a multi-model toolchain.
Notable Quotes
"This AI is not Neotron 3 Super. No, this is Neotron 3 Ultra, Nvidia's newest free and open AI model."
—Introduction to the model and its open nature.
"The weights are open. The research paper on how it was made is open. Training data and recipes are being released at least for the redistributable parts."
—Highlighting the open-access aspects and licensing philosophy.
"Download it. It is yours forever. No limits, no funny business."
—Emphasizing the open and persistent availability of the model.
"You don't need one model to do everything. You need a roster of models that cover your use cases."
—Advocating a multi-model approach to AI tasks.
"This is basically Apache 2.0 tailored for machine learning weights. This is absolutely fantastic news."
—Describing the Open MDW licensing nuance.
Questions This Video Answers
- How open is Nvidia's Neotron 3 Ultra really, and what does the Open MDW license permit?
- Why is a roster of models recommended instead of a single all-in-one AI like Neotron 3 Ultra?
- What are mixture of experts and NVFP4, and how do they contribute to Neotron 3 Ultra's speed?
- Can Neotron 3 Ultra run locally, and what are the hardware requirements?
- How does Lambda GPU Cloud help researchers run massive AI models like Neotron 3 Ultra?
NVIDIA Neotron 3 UltraOpen MDW licenseMixture of expertsNVFP4Multimodal AILambda GPU CloudGemma 4Open-source AIOpen scienceOpenAI model licensing
Full Transcript
This AI is not Neotron 3 Super. No, this is Neotron 3 Ultra, Nvidia's newest free and open AI model, and I've been delighted, disappointed, and confused by it. But I think I got it now. You see, you can look at the benchmarks all you want, but we are fellow scholars here. We don't just believe stuff. We test it for ourselves. That is the way of the scholar. So, I had an early look at it and ran some of my experiments day and night. First impression is that it is incredibly fast. Blazing fast. Love that. But then my coding experiments did not go that well.
When I ask it to write a light simulation program, this is my original area of research and I get a black screen. Nothing. When I ask it to fix it, it does a bunch of things and same. And then I said, "Okay, let's debug this by hand." It had some mistakes. After fixing that, well, we get something. But maybe it's a scene that does not work at all. Other even smaller systems can do this task with relative ease. And the other thing is, goodness, it wrote up more than a thousand lines of code. You don't need that much.
My handwritten solution from my research is about 250 lines and renders this scene. Fully open source, free for everyone, forever. Now, let's write a realtime strategy game. Yes. Oh, no. Black screen again. Almost. We got a square. But if you ask Deepseek 4 Flash with the same prompt, you get something really cool. But not here. So, what is going on here? Well, I went back and forth with Nvidia and reported some of the issues and later there were some improvements. But still, this kind of coding is not something I would personally use this for. So I said, you know, maybe let's not use this AI.
But then I thought, wait, it is super fast and probably good at other things. So I gave it aic things. Fixing broken installations on my machine from the terminal, excellent. Whipping up quick experiments, organizing files, excellent, super fast. And over time, I found myself reaching out to it more and more. And I found it to be useful basically for everything other than challenging coding tasks. Now that is excellent because this might be the openest AI model ever. Weights are open. The research paper on how it was made is open. Training data and recipes are being released at least for the redistributable parts.
Now that is pretty crazy. Now hold on to your papers fellow scholars because it gets even better. Licensing. Super important question, very overlooked. We are always hoping for Apache 2.0. This is the do whatever you want license. For me, this is 10 out of 10. Now, Nvidia started publishing their models under their own proprietary license, which I would rate 7 out of 10. Derivative works and commercial use is fine. On the other hand, it needs a bit of attribution and a little stricter on patent grants. Now, this has the open MDW license. This is basically Apache 2.0 tailored for machine learning weights.
This is absolutely fantastic news. Glorious. I think this might be a 9 out of 10, maybe as close to 10 out of 10 as you can get from a big company like Nvidia. Allows basically everything, but less battle tested. And my understanding is that if you sue claiming this model infringes your rights, you lose the license. Huge improvement. Double thumbs up. Thank you. Now, can you run it yourself? Hm. Um, yes and no. Yes, because completely open. Download it. It is yours forever. No limits, no funny business. However, no, because I would love to run it locally, too.
But it's huge. 550 billion parameters. You need hundreds of gigabytes of GPU memory for that. This is why I will probably use it on Lambda. Also, 1 million token long context window. Great. Have a larger code base with a bug hiding somewhere. No worries. Massive box. Easy. Okay. How about images and videos? Well, it does not have vision capabilities. Not multimodel text only. Oh man, how much I would love a multimodel version of this. Goodness, please. Okay, and I also had a realization. You don't need one model to do everything. You need a roster of models that cover your use cases.
For instance, I can't add vision capabilities to Neatron 3 Ultra, but I can bolt Gemma 4 to it with a screwdriver. It's like a seeing eye dog guiding a smarter blind man along. It is hilarious and it kind of works. Kind of. So, we finally have more competition in the open AI model space and that is glorious. So, how does it work? Well, one trick is that it is huge, but not all of it runs at once. 550 billion parameters total, but only about 10% of that is active per token. These are specialist mini brains that are being activated at a time.
We call that mixture of experts. But you wise fellow scholars know that already. So what else? Now they also use mambber layers. Why member? Is this like a snake or like the fruity chew? I don't know. I don't even know why I brought this up. So what do these do? Well, traditional AI systems have a bit of a memory problem. They work like a student who constantly rereads the textbook over and over again when they are given a question. But memory is precious. So instead read the book only once and take highly compressed notes. So this kind of memory remembers important details about the conversation.
However, it is also smart enough to throw away the filler words. Thus, this system can process massive amounts of data efficiently. It also uses low precision numbers, so you have to do less number crunching when running this. They call it NVFP4. And this doesn't rely on predicting tokens one by one. No, it has multiple heads that draft multiple future tokens at the same time. Once again, many things that make it blazing fast. And we get all of this for free forever. What a time to be alive. Thank you to everyone who worked on this and absolutely everyone everywhere who is working on open-source projects and open models.
You are all heroes. And look, this system is great, but it could be tiny. It could be bad, ugly. I don't care. As long as it is open science and open models, it pushes humanity forward. Thank you. What a time to be alive. Here you see me running the full Deepseek AI model through Lambda GPU cloud. 671 billion parameters running super fast and super reliably. This is insane. I love it and I use it on a regular basis. Lambda provides you with powerful Nvidia GPUs to run your own chatbots and experiments. Seriously, try it out now at lambda.ai/papers AI/papers or click the link in the description.
More from Two Minute Papers
Related Videos

![Generative AI Full Course 2026 [FREE] | Complete Generative AI Tutorial For Beginners | Simplilearn thumbnail](https://rewiz.app/images?url=https://i.ytimg.com/vi/wuk0LP9eRo8/maxresdefault.jpg)
Generative AI Full Course 2026 [FREE] | Complete Generative AI Tutorial For Beginners | Simplilearn
05:56:50
![Generative AI Full Course 2026 [FREE] | Complete Generative AI Tutorial For Beginners | Simplilearn thumbnail](https://rewiz.app/images?url=https://i.ytimg.com/vi/Fc8HlmOoExk/maxresdefault.jpg)
Generative AI Full Course 2026 [FREE] | Complete Generative AI Tutorial For Beginners | Simplilearn
05:56:59
![Generative AI Full Course 2026 [FREE] | Complete Generative AI Tutorial For Beginners | Simplilearn thumbnail](https://rewiz.app/images?url=https://i.ytimg.com/vi_webp/vrB_ezQRqMs/maxresdefault.webp)
Generative AI Full Course 2026 [FREE] | Complete Generative AI Tutorial For Beginners | Simplilearn
05:55:45

How To Use AI Tools For Free | 5 AI Tools Are Free And Insane | Top 5 Free AI Tools | Simplilearn
00:16:40

Generative Artificial Intelligence Full Course 2026 | Gen AI Tutorial For Beginners | Simplilearn
22:38:08
Get daily recaps from
Two Minute Papers
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.



