DeepMind’s New AI: A Gift To Humanity

Two Minute Papers| 00:11:55|Apr 16, 2026
Chapters11
Gemma 4 is presented as a free, open, and hardware friendly AI family that can run on devices with minimal memory and even offline on phones.

Gemma 4 from Google DeepMind is a free, open-source AI with 2B and 31B variants that runs on devices from phones to switches, reshaping open AI access for everyone.

Summary

Two Minute Papers dives into Google DeepMind’s Gemma 4, a family of open-source AI models that break traditional barriers around access and hardware. Dr. Károly Zsolnai-Fehér outlines how Gemma 4 comes with a 2B parameter and a larger 31B parameter version, both designed to run locally—not just in the cloud. He highlights four surprising innovations: strict, curated training data; a hybrid attention mechanism that blends local and global focus; improved image understanding and a shared KV-cache for memory reuse; and an Apache 2.0 license that invites derivative models and commercial use. The video emphasizes that Gemma 4 can operate offline on devices as modest as an old Nintendo Switch, enabling offline translation, summarization, and even real-time browser-based image classification. The presenter also praises its agentic capabilities when tied to OpenClaw, and notes ongoing practice and community experimentation building an ecosystem in days. Yet he remains candid about limits: no live web browsing without an agent harness, and some open-ended tasks or high-frequency image details remain challenging. Overall, Gemma 4 is framed as a rare, frontier-level gift to humanity—free and ownership-enabled, for everyone. The video closes with gratitude to the scientists behind the project and a reminder to assess real-world usage beyond hype.

Key Takeaways

  • Gemma 4 includes a compact 2B model and a larger 31B model that remain competitive with models up to 10x larger on certain benchmarks.
  • Gemma 4 uses a hybrid attention mechanism that combines sliding window local attention with global attention for better long-range understanding.
  • The model achieves improved image understanding by avoiding premature squashing of image data, unlike Gemma 3 which altered landscapes before processing.
  • A shared KV-cache allows later layers to reuse memory, reducing redundant computation and boosting efficiency.
  • Gemma 4 ships with an Apache 2.0 license, enabling modification, commercial deployment, and derivative works with minimal friction.
  • The ecosystem around Gemma 4 grew rapidly, with practical use cases like offline translation, summarization, and browser-based image classification appearing within days.
  • Despite strengths, Gemma 4 lacks live internet browsing without an agent harness and may struggle with highly complex, open-ended tasks.

Who Is This For?

Essential viewing for developers and researchers curious about open AI models, edge deployment, and open-source licenses. It’s especially valuable for those wanting powerful, offline-capable AI that doesn’t lock users into cloud subscriptions.

Notable Quotes

"Gemma 4 runs on your phone without an internet connection."
Highlighting the offline capability and device compatibility that set Gemma 4 apart.
"This license is true to the open source spirit. You can modify it, sell it, deploy it commercially with almost zero friction."
Emphasizing the Apache 2.0 license and downstream freedom for derivative work.
" Gemma 4 understands the image as-is, and the difference really shows on any benchmark that has to do with images."
Noting the improvement in visual understanding over Gemma 3.
"Four, the license. Oh my, the license. This one gets overlooked so much."
Leading into the significance of licensing in open models.
"It is fantastic at agentic workflows. This is where we don’t just have an AI assistant that spits out a bunch of text."
Highlighting practical, action-oriented capabilities when paired with tools like OpenClaw.

Questions This Video Answers

  • How does Gemma 4 compare to earlier Gemma versions in terms of parameter count and performance?
  • What makes Gemma 4’s hybrid attention mechanism different from traditional attention?
  • Can Gemma 4 run offline on a device like a Nintendo Switch, and what are the trade-offs?
  • What does the Apache 2.0 license mean for commercial use of Gemma 4 and derivatives?
  • What are the practical limitations of Gemma 4 without an agent harness for browsing the web?
Google DeepMindGemma 4open-source AIApache 2.0 licensehybrid attentionshared KV-cacheMoE vs dense modelsOpenClawoffline AIimage understanding
Full Transcript
Google DeepMind gave an amazing gift to humanity.  And it is full of surprises. Here’s why. Today,   we are living in the age of AI where these  smart assistants and agents can do things   we could only dream of 10 years ago. But.  Many of these solutions are proprietary,   require a subscription, and run in the cloud. And then this happens. Yup, some OpenClaw users   reported losing access to their Claude  AI subscription citing “heavy workloads”. Now, maybe they did something unsavory, I don’t  know. I also understand you pay a fixed rate,   you can’t eat all you want.  I respect that. However,   this is the point. We have to rely on the  goodwill of these companies for our workflows. So this is why I keep saying over and over that we   should always look for options where you  we own these AIs and run them on our own   systems for free, forever. No one can take them away. NVIDIA came out with their Nemotron 3 Super,   which has super capabilities…but its  hardware requirements are also super.  Not so much with Google DeepMind’s new AI, Gemma  4. This is a free and open family of models,   and yes, finally, the smallest ones  require only a few gigabytes of memory. No need for an expensive GPU. So much so that  I wanted to wait a bit before publishing this   video to see how you Fellow Scholars use  it in practice. And…look at that! It runs   on your phone without an internet connection.  And folks are already using it in practice to   create offline translation and summarization  apps. Also, real time image classification   running in your browser while talking like a  bard? No problem. You can already fine tune it   with Matt’s work. It is so good, it has a little  ecosystem around it already in just a few days.   Because of the brilliance of  you Fellow Scholars. Nice work. But it gets better. You see, the smallest Gemma is  so small, it runs on…oh my. Look…I love that. It   runs even on an old beat up nintendo switch,  first generation. Not exactly something with   a lot of memory or processing power. Still  runs the 2 billion parameter Gemma 4 model. Now that is a gift to humanity. But  it gets really strange from here   on out. Here are 4 things that I found  really surprising. Dear Fellow Scholars,   this is Two Minute Papers  with Dr. Károly Zsolnai-Fehér. One, they also have a bigger, 31B  model which was the #3 best open model,   and now hold on to your papers Fellow  Scholars, because it beat some models   that are 10 times larger. And still  competitive with some that are 20 times   larger. On some measurements. And it is a  dense model. What? What is going on here? You see, many of the modern AI systems  you encounter are what they call mixture   of experts models. MoE. These are  huge AI models with many parameters,   and to make sure we don’t burn down all of our  hardware using them, it splits up this big brain   into many small ones. If you have a biology  question, it chops it up into small parts,   and routes them to the parts of the brain that  it thinks are the best experts at processing it.   Typically, to the top 2 to 8 experts. Only  ask them. Yes, with that, we only activate one   small part of a brain at a time. It makes  sense, right? It’s not a simple process,   but it is possible. This enables us to create  huge intelligent models that are still efficient. Dense models, however, just light up every  parameter of the system. These are not new,   and in some ways, these are very  inefficient. You light up all the   31 billion parameters in the brain all  the time, no matter how simple or complex   the question is. But this one…this  one is somehow magically good. How? They did four amazing things: one, Google  didn’t just dump half the internet into it   to learn about us. They apply super strict  filters to give it only highly curated   training data. That is actually good advice  for our thinking too. Don’t let everything in,   curate your information diet. There is lots of  noise out there - ignore it. That is excellent. Two, they use an interesting attention  mechanism that has a sliding window and   also global attention at the same time. What  does that mean? Well, when you read a book,   you read it line by line to finish a page. That is  a local sliding window. With that, you get all the   details. But sometimes you want to zoom out and  ask, okay, what book are we reading? Which chapter   is this? That is global attention. Here, they use  both, and call the mechanism hybrid attention. Three, it is better at understanding images.  You know, Gemma 3 had weird glasses on,   and its image understanding was kind of  a lie. If you gave it a landscape image,   it squished it back to a square image before  processing it, losing some information. It   squishes everything into its own preconceived box.  Not good. Gemma 4 understands the image as-is,   and the difference really shows on any  benchmark that has to do with images. Four, it has a shared KV-cache.  KV-cache is short term memory for   what you are currently talking with it,  documents, questions. Now the layers of   this neural network like to recompute their  fresh memory from scratch. This one doesn’t,   it essentially borrows the memory already computed  by earlier layers. Less work, nearly the same   result. This is one of those ideas where we are  wondering why we didn’t always do it like this. Okay, and all this was just part of my first  surprise. Second surprise. It is fantastic at   agentic workflows. This is where we don’t just  have an AI assistant that spits out a bunch of   text, this is when we give it arms and legs and  ask it to do stuff. Tool use, local coding, and   a ton more. Plug it into OpenClaw and it can book  a plane ticket. Look for news and summarize it   in a more unbiased way. Or write silly emails to  Károly from Two Minute Papers. That sort of thing.   It is really good at that. So when any company  decides that you can’t use their system anymore,   that’s alright. Just plug in Gemma 4, and you  are good to go. For free. People find that if   you give it custom instructions, sometimes you  don’t even notice the difference. That is huge. Surprise number three, the context window was  improved to 256k, twice as big as Gemma 3 had.   This is pretty expensive to compute, so don’t  take it for granted. Here, you are not going   to chuck gigabytes of movies into it, but for  a few long documents, it is perfectly fine. Four, the license. Oh my, the license. This  one gets overlooked so much. Gemma 3 came with   a Gemma license. In other words, it comes with  strings attached. The model comes with handcuffs,   if you will. If you use it to create training data  for a derivative model. Yup, that one inherits the   handcuffs too. But, with Gemma 4, not anymore.  Look, Apache 2.0 license. Now we’re talking,   yes! This license is true to the open  source spirit. You can modify it,   sell it, deploy it commercially with  almost zero friction. Make derivative   models, do a ton of stuff with far fewer  restrictions. This is huge. Thank you so much! Now, not even this technique is perfect.  For instance, the model does not have a   live database. Without an agent harness, it  cannot browse or look up stuff. Meaning? Well,   meaning that it can be confidently incorrect.  The internet special. Also for highly complex,   open-ended tasks - it’s not great at that. Or,   when you have images with lots of high-frequency  visual details, thin structures, blades of grass,   or a fence from far away. Not great at that,  it’s going to need even better glasses. But, adding this all up, this is an amazing  gift to humanity, one that cannot be taken from   us. This is not for Mr moneybags, this  is for the little man, and it is free,   for all of us, forever. Hugely appreciated.  Absolutely loving it. What a time to be alive!   Also, I waited with this video because I did  not just want to take the marketing messaging   and copy-paste it to you. I wanted to see how  you Fellow Scholars are actually using it in   practice. Read through your experiences  with it. Does it really work in practice?   Super important. That’s what we are here  for, not the copy-pasted media headlines. That needs time. Trying to explain all this in  simple words also takes time. I don’t have a   team here, I do everything from the writing to  recording, video editing. I am trying my best   here. But it gives you more accurate information,  and that is the most important for me. So, now, after 10 million downloads in the  first week and more thorough testing. My   opinion is that yes, this thing rocks. I would  like to send a big thank you to every single   scientist who worked on this! And hold on to  this one for dear life because a frontier model   just got locked down for a few select clients.  You know, it’s a big club. And we ain’t in it. If you enjoyed this, consider  subscribing and hitting that bell.

Get daily recaps from
Two Minute Papers

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.