DeepSeek Just Fixed One Of The Biggest Problems With AI

Two Minute Papers| 00:09:46|Mar 24, 2026

Chapters13

Introduces the idea that current AI systems are inefficient, often re-deriving facts from scratch instead of looking up stored information.

DeepSeek’s Engram lets AI pull from a pantry of pre-stored facts, slashing compute with smarter lookups and even improving performance overall.

Summary

Two Minute Papers’ Dr. Károly Zsolnai-Fehér breaks down a breakthrough from DeepSeek AI: Engram, a mechanism that turns neural networks into smarter hybrid systems by letting a fast lookup pantry replace much of the old scratch-work. Rather than rebuilding every fact from scratch, Engram stores premade ingredients and retrieves them on demand, dramatically cutting compute. Surprisingly, removing part of the model’s mixture of experts and relying on this pantry actually improves performance across benchmarks. A context-aware gating system further filters retrieved memory to avoid rotten ingredients, keeping dish coherence. Dr. Zsolnai-Fehér walks through how n-gram embeddings with multi-head hashing enable fast lookups, likening the process to a chef grabbing a premade sauce from a specific pantry shelf. The take-home: automate the easy parts, focus on the hard tasks, and you end up with cheaper, smarter AI systems that could run in consumer environments without heavy subscriptions. He also notes limitations—put the Engram module too deep can hurt accuracy—and emphasizes this could be a cornerstone for future AI without hidden, pricey ecosystems. The video closes with practical takeaways and a tease of running DeepSeek privately with Lambda.

Key Takeaways

Engram enables a fast lookup-based memory layer that lets AI pull premade information from a pantry instead of re-deriving it from scratch.
Replacing part of the model’s mixture of experts (MoE) with the Engram pantry actually improves loss metrics across benchmarks, not just speeds.
Context-aware gating compares retrieved memory with the current task to drop incompatible ingredients, preventing low-quality or out-of-context data from being used.
The technique relies on n-gram embeddings and multi-head hashing to map short phrases to pre-stored ingredients, enabling efficient lookups.
Removing 20–25% of the “smart” compute and substituting a lookup layer can yield better overall performance, according to DeepSeek’s results.
Engram’s ablation test showed trivia accuracy dropped when the memory was disabled, while reading comprehension stayed high, illustrating a split-brain efficiency.
Long-term implication: this approach could underpin many future AI systems, enabling cheaper, private, on-device intelligence without subscription models.

Who Is This For?

Essential viewing for researchers and engineers curious about scalable AI architectures and memory-augmented models; especially relevant for teams exploring efficient, on-device AI and reducing reliance on opaque, expensive cloud systems.

Notable Quotes

"“Instead of growing that peanut butter sandwich from scratch, it now just grabs the ingredients from the pantry.”"

—Illustrates Engram’s core idea of using a memory/pantry instead of full recomputation.

"“The new engram technique makes the neural network better…everywhere.”"

—Highlights the broad performance gains across benchmarks.

"“If you put the engram module too deep in the network, it gets less accurate.”"

—Notes a limitation and practical design consideration for deployment.

"“This is basically a look up table. It is as simple as it gets, and it makes everything more efficient.”"

—Summarizes the simplicity and power of the lookup-based approach.

"“No subscriptions, these run in our pockets super fast, mostly for free.”"

—Drives home the practical and economic appeal of the approach.

Questions This Video Answers

How does Engram differ from traditional retrieval-augmented generation methods?
What are n-gram embeddings and how do they enable fast lookups in AI models?
Can on-device AI achieve performance parity with cloud-based systems using memory-augmented architectures?
What are the practical limits of replacing MoE with a pantry-like memory layer?
Why might a context-aware gate improve the quality of retrieved memory in AI systems?

DeepSeek EngramMoE (mixture of experts)n-gram embeddingsmulti-head hashingAI memory architecturescontext-aware gatingon-device AIprivacy-preserving AI

Full Transcript

Few people know, but modern AI systems are  really silly. How? Well, imagine having a   Michelin-star chef being asked for a simple peanut  butter sandwich. That’s weird, but okay. Now,   the chef says, you’ll need to wait just a bit,  because I am going to start planting peanuts,   wait six months, harvest, churn some peanut  butter, and then get to work on your bread.   That sounds really silly, and that is exactly  what modern AI systems like ChatGPT and Gemini do. When they need to recall a simple  fact, like who Alexander the Great was,   something crazy happens. They  go through complex reasoning   layers and reconstruct everything from  scratch every single time. That is crazy. Now I have an amazing research paper for you  here from folks at DeepSeek AI, and this is a   piece of technology that might underpin most if  not all of the amazing AI systems of the future.   Now every now and then we are going to look at a  figure, but for the rest, I am going to bring my   physics simulations and all the goodness we  talk about around here. Apologies for that. Okay, so this is a massive waste of  compute. But why does this happen? Well,   standard transformers are a kind of neural  network that is inside nearly all modern AI   assistants. And here is the problem: they lack  a simple and cheap way to just look things up. Whatever the question is, the answer is a huge  bunch of dense mathematical calculations. From   scratch. Yes, it is literally planting  that peanut when you ask for a sandwich. Now in this work, DeepSeek introduces  Engram. With this, they are giving our   tired little chef a pantry. Cutting  edge technology brother! Instead of   growing that peanut butter sandwich  from scratch, it now just grabs the   ingredients from the pantry. I’ll explain  to you how exactly they did it in a bit. Now, this makes the AI way more efficient, okay,  I expected that. But, what? Are you seeing what   I am seeing? Now this I did not expect at all.  So here comes the surprise. Now hold on to your   papers Fellow Scholars, because when taking away  some of the AI's complex reasoning parts, known as   mixture of experts, MoE. Then, replacing it with  the pantry actually makes the AI smarter. Lower   is better here on the loss curves. And not just a  little, this is significantly smarter. Those dots   dipping way down show that this hybrid chef makes  far fewer mistakes than previous techniques. It   achieves a perfect balance of active cooking  and just grabbing from the pantry. Genius. But that is not the only surprising thing in this  paper. They also added a way for the AI to check   the ingredients before using them. You don’t  want rotting fish in your strawberry jam. To   ensure this, they created a context-aware gating  mechanism. The current context is the dish being   cooked. Now here, this is compared against the  retrieved memory, the jar from the pantry. If   the jar's contents don’t agree with the dish,  the gate drops to zero, throwing the ingredient   away. Bye bye rotting fish! This mechanism lives  right here, inside this jolly little dot product. Now let’s see how it actually performs against  the current systems. I’ll tell you exactly what   is going to happen now. What happens in nearly  all research papers with something new. It   does something, it is compared to previous  methods, and it’s better at some things,   worse at others. And then you sit down and you  do your analysis. Okay, let’s see…wait what? What   just happened here? The new engram technique  makes the neural network better…everywhere.   Absolutely everything is measurably  better. This is an absolute miracle work. The engram model is actually better  on every single benchmark compared to   the previous techniques. It is better everywhere! Now this is an amazing life lesson too. How? Well,   essentially what DeepSeek does is  automates the easy part, and focuses   on the more difficult tasks. No wonder it  works so well! What a time to be alive! We   can learn so much from these research papers,  and not just about AI, but about life itself. Okay, now I’ll tell you how this works, and  it turns out, there are more surprises ahead.   Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. Okay, so how does it do this magic?  Well, it uses what they call n-gram   embeddings combined with multi-head hashing.  Okay, what the heck does that mean? Well,   in the kitchen, the chef looks at the order  ticket, sees a 3-word phrase, and instantly   knows exactly which shelf in the pantry has  the premade sauce, and grabs it quickly. And I think this also shows us that there are  simple and basic ideas in AI that we haven’t   found yet. I mean, this thing is basically  a look up table. It is as simple as it gets,   and it makes everything more efficient and  better across the board. Just think about it:   we removed 20 or 25% of the smart  experts in this little virtual brain,   put a spreadsheet there, and it  got better! I mean what? Crazy. And I love how we have a little  better understanding of the AI   system itself. Usually, no one  knows what is going on inside,   but here. Look. When they switched off  the engram memory during testing, the   AI’s ability to answer trivia went down 70%. But  its reading comprehension remained at 93%. Why?   Well I think this shows that AI split its brain,  and it’s using the new part just to store facts. Just think about it. When they locked the  pantry door during testing his ability to   understand a recipe stayed at a massive  93%! What does that mean? It shows the   chef split the work perfectly. He used  the pantry strictly as a storage shelf   for memorized ingredients, but he  can still cook an amazing meal. I think this is going to lead to even  cheaper and even smarter AI systems,   and this will be an important part of why we will  all get more systems that we can actually own,   no subscriptions, these run in our  pockets super fast, mostly for free. Okay, now not even this technique is perfect.  One limitation is that if you put the engram   module too deep in the network, it gets less  accurate because the model has already wasted   time processing what is being asked. Of course,  there is no need to look up what you already   computed. I think this is common sense at this  point. Our chef has to check the pantry at the   start of the shift. If he checks it after the  food was served, the pantry is completely useless. A really advanced research paper explained  in simple words. We are Fellow Scholars,   and that’s what we do here. And we have a growing  club. I’ll continue in a moment, but you know   who is also watching us? The one and only Larry  Wheels. Yes. He is one of our OG Fellow Scholars,   doing some Scholarly work between two hard sets  of bicep curls in the gym. You think I am kidding?   I am not. Link is in the description. Reading his comment made me instantly   more muscular. So much value. Huge respect  to Mr. Wheels! Honored to have you here. And here comes the best part. I think this  will be a part of every major AI system,   and it is knowledge out there for free for all  of us, and now you know exactly how it works!   No nonsense where everything is hidden  in a proprietary system that costs   300 dollars per month to run. Nope. All  free for all of us. Glorious. An epic paper. Now, as our chef does, I took a bit longer  to cook this video. But I promise that I did   not put together my computer from scratch  before starting. So I took some more time   to make sure you get a better video. If you feel  this is the right way of doing that, subscribe,   hit the bell and leave a really kind comment.  And you can also check out Lambda with our link   in the description because it is an excellent  way of running DeepSeek privately, I do it too.