DeepSeek Just Fixed One Of The Biggest Problems With AI
Chapters13
Introduces the idea that current AI systems are inefficient, often re-deriving facts from scratch instead of looking up stored information.
DeepSeek’s Engram lets AI pull from a pantry of pre-stored facts, slashing compute with smarter lookups and even improving performance overall.
Summary
Two Minute Papers’ Dr. Károly Zsolnai-Fehér breaks down a breakthrough from DeepSeek AI: Engram, a mechanism that turns neural networks into smarter hybrid systems by letting a fast lookup pantry replace much of the old scratch-work. Rather than rebuilding every fact from scratch, Engram stores premade ingredients and retrieves them on demand, dramatically cutting compute. Surprisingly, removing part of the model’s mixture of experts and relying on this pantry actually improves performance across benchmarks. A context-aware gating system further filters retrieved memory to avoid rotten ingredients, keeping dish coherence. Dr. Zsolnai-Fehér walks through how n-gram embeddings with multi-head hashing enable fast lookups, likening the process to a chef grabbing a premade sauce from a specific pantry shelf. The take-home: automate the easy parts, focus on the hard tasks, and you end up with cheaper, smarter AI systems that could run in consumer environments without heavy subscriptions. He also notes limitations—put the Engram module too deep can hurt accuracy—and emphasizes this could be a cornerstone for future AI without hidden, pricey ecosystems. The video closes with practical takeaways and a tease of running DeepSeek privately with Lambda.
Key Takeaways
- Engram enables a fast lookup-based memory layer that lets AI pull premade information from a pantry instead of re-deriving it from scratch.
- Replacing part of the model’s mixture of experts (MoE) with the Engram pantry actually improves loss metrics across benchmarks, not just speeds.
- Context-aware gating compares retrieved memory with the current task to drop incompatible ingredients, preventing low-quality or out-of-context data from being used.
- The technique relies on n-gram embeddings and multi-head hashing to map short phrases to pre-stored ingredients, enabling efficient lookups.
- Removing 20–25% of the “smart” compute and substituting a lookup layer can yield better overall performance, according to DeepSeek’s results.
- Engram’s ablation test showed trivia accuracy dropped when the memory was disabled, while reading comprehension stayed high, illustrating a split-brain efficiency.
- Long-term implication: this approach could underpin many future AI systems, enabling cheaper, private, on-device intelligence without subscription models.
Who Is This For?
Essential viewing for researchers and engineers curious about scalable AI architectures and memory-augmented models; especially relevant for teams exploring efficient, on-device AI and reducing reliance on opaque, expensive cloud systems.
Notable Quotes
"“Instead of growing that peanut butter sandwich from scratch, it now just grabs the ingredients from the pantry.”"
—Illustrates Engram’s core idea of using a memory/pantry instead of full recomputation.
"“The new engram technique makes the neural network better…everywhere.”"
—Highlights the broad performance gains across benchmarks.
"“If you put the engram module too deep in the network, it gets less accurate.”"
—Notes a limitation and practical design consideration for deployment.
"“This is basically a look up table. It is as simple as it gets, and it makes everything more efficient.”"
—Summarizes the simplicity and power of the lookup-based approach.
"“No subscriptions, these run in our pockets super fast, mostly for free.”"
—Drives home the practical and economic appeal of the approach.
Questions This Video Answers
- How does Engram differ from traditional retrieval-augmented generation methods?
- What are n-gram embeddings and how do they enable fast lookups in AI models?
- Can on-device AI achieve performance parity with cloud-based systems using memory-augmented architectures?
- What are the practical limits of replacing MoE with a pantry-like memory layer?
- Why might a context-aware gate improve the quality of retrieved memory in AI systems?
DeepSeek EngramMoE (mixture of experts)n-gram embeddingsmulti-head hashingAI memory architecturescontext-aware gatingon-device AIprivacy-preserving AI
Full Transcript
Few people know, but modern AI systems are really silly. How? Well, imagine having a Michelin-star chef being asked for a simple peanut butter sandwich. That’s weird, but okay. Now, the chef says, you’ll need to wait just a bit, because I am going to start planting peanuts, wait six months, harvest, churn some peanut butter, and then get to work on your bread. That sounds really silly, and that is exactly what modern AI systems like ChatGPT and Gemini do. When they need to recall a simple fact, like who Alexander the Great was, something crazy happens. They go through complex reasoning layers and reconstruct everything from scratch every single time.
That is crazy. Now I have an amazing research paper for you here from folks at DeepSeek AI, and this is a piece of technology that might underpin most if not all of the amazing AI systems of the future. Now every now and then we are going to look at a figure, but for the rest, I am going to bring my physics simulations and all the goodness we talk about around here. Apologies for that. Okay, so this is a massive waste of compute. But why does this happen? Well, standard transformers are a kind of neural network that is inside nearly all modern AI assistants.
And here is the problem: they lack a simple and cheap way to just look things up. Whatever the question is, the answer is a huge bunch of dense mathematical calculations. From scratch. Yes, it is literally planting that peanut when you ask for a sandwich. Now in this work, DeepSeek introduces Engram. With this, they are giving our tired little chef a pantry. Cutting edge technology brother! Instead of growing that peanut butter sandwich from scratch, it now just grabs the ingredients from the pantry. I’ll explain to you how exactly they did it in a bit. Now, this makes the AI way more efficient, okay, I expected that.
But, what? Are you seeing what I am seeing? Now this I did not expect at all. So here comes the surprise. Now hold on to your papers Fellow Scholars, because when taking away some of the AI's complex reasoning parts, known as mixture of experts, MoE. Then, replacing it with the pantry actually makes the AI smarter. Lower is better here on the loss curves. And not just a little, this is significantly smarter. Those dots dipping way down show that this hybrid chef makes far fewer mistakes than previous techniques. It achieves a perfect balance of active cooking and just grabbing from the pantry.
Genius. But that is not the only surprising thing in this paper. They also added a way for the AI to check the ingredients before using them. You don’t want rotting fish in your strawberry jam. To ensure this, they created a context-aware gating mechanism. The current context is the dish being cooked. Now here, this is compared against the retrieved memory, the jar from the pantry. If the jar's contents don’t agree with the dish, the gate drops to zero, throwing the ingredient away. Bye bye rotting fish! This mechanism lives right here, inside this jolly little dot product.
Now let’s see how it actually performs against the current systems. I’ll tell you exactly what is going to happen now. What happens in nearly all research papers with something new. It does something, it is compared to previous methods, and it’s better at some things, worse at others. And then you sit down and you do your analysis. Okay, let’s see…wait what? What just happened here? The new engram technique makes the neural network better…everywhere. Absolutely everything is measurably better. This is an absolute miracle work. The engram model is actually better on every single benchmark compared to the previous techniques.
It is better everywhere! Now this is an amazing life lesson too. How? Well, essentially what DeepSeek does is automates the easy part, and focuses on the more difficult tasks. No wonder it works so well! What a time to be alive! We can learn so much from these research papers, and not just about AI, but about life itself. Okay, now I’ll tell you how this works, and it turns out, there are more surprises ahead. Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Okay, so how does it do this magic? Well, it uses what they call n-gram embeddings combined with multi-head hashing. Okay, what the heck does that mean?
Well, in the kitchen, the chef looks at the order ticket, sees a 3-word phrase, and instantly knows exactly which shelf in the pantry has the premade sauce, and grabs it quickly. And I think this also shows us that there are simple and basic ideas in AI that we haven’t found yet. I mean, this thing is basically a look up table. It is as simple as it gets, and it makes everything more efficient and better across the board. Just think about it: we removed 20 or 25% of the smart experts in this little virtual brain, put a spreadsheet there, and it got better!
I mean what? Crazy. And I love how we have a little better understanding of the AI system itself. Usually, no one knows what is going on inside, but here. Look. When they switched off the engram memory during testing, the AI’s ability to answer trivia went down 70%. But its reading comprehension remained at 93%. Why? Well I think this shows that AI split its brain, and it’s using the new part just to store facts. Just think about it. When they locked the pantry door during testing his ability to understand a recipe stayed at a massive 93%!
What does that mean? It shows the chef split the work perfectly. He used the pantry strictly as a storage shelf for memorized ingredients, but he can still cook an amazing meal. I think this is going to lead to even cheaper and even smarter AI systems, and this will be an important part of why we will all get more systems that we can actually own, no subscriptions, these run in our pockets super fast, mostly for free. Okay, now not even this technique is perfect. One limitation is that if you put the engram module too deep in the network, it gets less accurate because the model has already wasted time processing what is being asked.
Of course, there is no need to look up what you already computed. I think this is common sense at this point. Our chef has to check the pantry at the start of the shift. If he checks it after the food was served, the pantry is completely useless. A really advanced research paper explained in simple words. We are Fellow Scholars, and that’s what we do here. And we have a growing club. I’ll continue in a moment, but you know who is also watching us? The one and only Larry Wheels. Yes. He is one of our OG Fellow Scholars, doing some Scholarly work between two hard sets of bicep curls in the gym.
You think I am kidding? I am not. Link is in the description. Reading his comment made me instantly more muscular. So much value. Huge respect to Mr. Wheels! Honored to have you here. And here comes the best part. I think this will be a part of every major AI system, and it is knowledge out there for free for all of us, and now you know exactly how it works! No nonsense where everything is hidden in a proprietary system that costs 300 dollars per month to run. Nope. All free for all of us. Glorious. An epic paper.
Now, as our chef does, I took a bit longer to cook this video. But I promise that I did not put together my computer from scratch before starting. So I took some more time to make sure you get a better video. If you feel this is the right way of doing that, subscribe, hit the bell and leave a really kind comment. And you can also check out Lambda with our link in the description because it is an excellent way of running DeepSeek privately, I do it too.
More from Two Minute Papers
Get daily recaps from
Two Minute Papers
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.



