DeepMind’s New AI Just Changed Science Forever
Chapters10
Introduces an AI system that can perform research tasks, generate candidate solutions, and potentially write core parts of papers.
DeepMind’s Aletheia can autonomously propose, verify, and even co-author new research, pushing the boundaries of AI-assisted science.
Summary
Two Minute Papers’ host explains that DeepMind’s latest AI, nicknamed Aletheia, isn’t just a clever solver—it can generate candidate research ideas, draft core paper content, and work with human scientists to publish novel findings. Dr. Károly Zsolnai-Fehér highlights three key innovations: a move away from rigid formal proofs to natural language reasoning, a stronger base model trained to reason efficiently (achieving similar smarts with 100x less compute), and a loop that tightly integrates search and synthesis from dozens of cutting-edge papers. The system can autonomously tackle frontier problems, such as Erdős-type puzzles, and has already contributed to multiple published works, including a core arithmetic geometry calculation and new limits for interacting particles. Crucially, the verifier component acts as a filter to catch junk or hallucinations, while the model’s reasoning is kept separate from the final written outputs. Although hallucinations remain a challenge, the approach includes training to read and synthesize vast bodies of literature without being overwhelmed. Zsolnai-Fehér emphasizes that this marks a transition from task-level assistance to helping scientists produce publishable, and potentially autonomous, research. The discussion wraps with a reflection on the pace of progress and the ethical questions of AI-driven discovery. Finally, the host invites viewers to share whether they want more deep dives like this.
Key Takeaways
- Aletheia uses natural language reasoning instead of formal math checks to verify proofs, separating the thinking process from the answer.
- The system runs on a base model strengthened for reasoning, delivering 100x less compute while maintaining performance.
- Adding web search and access to hundreds of papers enables the AI to cross-reference techniques without losing track, stopping junk output.
- It autonomously solved several Erdős problems and helped produce core contents for new papers, including arithmetic geometry and particle interactions.
- The verifier acts as a filter to prevent the AI from simply agreeing with its own outputs, a crucial guardrail against hallucinations.
- Researchers demonstrated a jump from Level 1 novelty to publishable-level research, with potential toward autonomous scientific contribution.
Who Is This For?
Researchers and developers in AI and scientific computing who want to understand how AI-assisted research could complement or accelerate real-world discovery.
Notable Quotes
"This AI is even better than that. They call it Aletheia."
—Introducing the AI and its name, signaling a leap beyond prior tools like Deep Think.
"The generator starts working on it, creates a candidate solution, and now here is one of the important parts of the paper. The verifier."
—Describes the core two-stage process: generate a solution, then verify it.
"Same smarts, but it uses a 100 times less compute."
—Highlights efficiency gains from model improvements.
"It solved a few of these Erdős problems. It autonomously found the answer to 4 open math puzzles left behind by a legendary Hungarian mathematician."
—Shows concrete success on open problems.
"For the first time ever, an AI created core parts of a research work that is new, it has impact, it is useful."
—Summary statement of impact and novelty.
Questions This Video Answers
- How does DeepMind's Aletheia differ from earlier AI research assistants in math?
- Can AI realistically author publishable scientific papers without human co-authors?
- What are the main safeguards to prevent AI hallucinations in frontier research?
- What steps are involved in turning AI-generated ideas into peer-reviewed publications?
DeepMindAletheiaTwo Minute PapersQuoc LeAI researchAIs that write papersnatural language reasoningverifierfrontier researchmathematical olympiad problems
Full Transcript
I appeared on camera for an interview not so long ago. And I was really surprised by how many of you Fellow Scholars said that you would like to see more. So first of all, thank you so much to all of you for the kind words. Second, I thought let's try this and hope that you will enjoy it. Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Look, it only took 1,000 episodes. Now, I have an amazing paper for you because scientists at DeepMind did something pretty insane. Our question today is can an AI invent something that is fundamentally new and pushes humanity forward? Well, they said that their new AI agent can actually do research and even write research papers.
Most of the core content anyway. Is that insane? Well…it’s not. A lot of other people have tried it and the only insane thing about it was how many poor papers they wrote. But it turns out… there is levels to this game. You see, I visited the research group that is behind this work last year. I flew to Mountain View into this crazy lab, and a grumpy guard didn’t even want to let me in first. Crazy town. So I was very surprised that they are guarding these secrets and they take them very seriously. What is even more surprising is that now they give some of those secrets away to all of us for free.
Now that is insane! More on that in a moment. So I talked to these scientists, this was the research group of Quoc Le. They are brilliant. They wrote an AI that was able to do a gold medal worthy performance on the mathematical olympiad. This is serious business. Then they released this technique, anyone who is made out of money bags and pays for the Gemini Advanced can use it, it is called Deep Think. And now, this AI is even better than that. They call it Aletheia. Now that, once again is insane. Okay, so what does it do?
Well, it promises that it does research. It solves novel problems. This is something that could push humanity forward. Now that is so much harder than the mathematical olympiad. Why is that? Well, in these contests, you have a not that huge piece of core knowledge you are supposed to have, and every problem can be guaranteed to be solved by those small set of tools. Every problem is nice, shiny, and polished. Tough, but polished. You know what is not polished at all? Real life problems. With these open problems, we don’t even know if they are solvable at all.
Maybe they are impossible, or maybe possible, but not with our current tools. That’s the point: no one knows. When this technique is given a problem, the generator starts working on it, creates a candidate solution, and now here is one of the important parts of the paper. The verifier. This takes a look, and says, okay bro this is junk. Start again. This is essentially a filter. You know, that’s actually good life advice. Sometimes it’s good to have a filter, so you don’t just shoot those hot takes out there into the ether. Now every now and then, the solution looks pretty good, and could maybe pass with a few modifications.
Then, it gets polished for another round of reviews, and so it goes. Sounds simple…maybe even trivial right? So what is so scientific about this? Why doesn’t every system do that? Well, that’s easier said than done. In fact, it is almost impossible to pull off. Why? One, when the AI is doing something fundamentally new, unfortunately, hallucinations still happen. Yup. It just makes stuff up. Fake papers, fictitious authors, you name it. All kinds of junk comes out. Two, when you want to compute 1+1 or other simple things, you have tons of training data about it out there.
You can verify that easily. But if you want to do frontier research? There is no training data on what we don't even know yet. Of course there isn’t! You are trying to invent things no one understands yet. These two factors make it extremely difficult to get an AI to do something fundamentally new and useful. So how did they pull it off? With three key steps. First, Alethia does not use this formal rigid math language to check its own proofs. It uses natural English language. That is notoriously hard, because when the AI checks its own writing, it just blindly agrees with it. We humans do that too!
Now here, the researchers found a way to separate the thinking part from the answer part. So the messy train of thought is hidden from the verifier, it cannot trick itself into just blindly agreeing with itself. Brilliant. Our brains would need something like that too. Then, two they let the computer think longer. That’s not new. However, they added some optimizations to this, so much so that the model they have now is just as smart as the one from 6 months ago. But hold on to your papers Fellow Scholars, because yes, same smarts, but it uses a 100 times less compute.
What! Crazy. They trained a much stronger base model which made it more efficient at reasoning. So this one, even without internet access, beats the mathematical olympiad gold AI easily. About 65% was improved to 95%. Wow. It went from a bit better than a coinfip to destroying the tasks made for some of the best human minds. All this in just a few months. I am out of words. Now three, they gave the AI the ability to search for stuff. We are talking about Google after all. Once again, that is easy. However, getting the AI to read and combine techniques from dozens and dozens of cutting-edge research papers without losing its mind.
Now that is hard. You saw it earlier, this really happens! They heavily trained this AI to be able to use these tools and research works that are out there. That was what finally stopped it from making up junk. Okay, so how good is it? First I saw that it solved a few of these Erdős problems. It autonomously found the answer to 4 open math puzzles left behind by a legendary Hungarian mathematician. Is that insane? I asked a mathematician friend. He told me yeah, that’s pretty good, but there are so many of these problems out there, and not a ton of people work on them. In other words, they are fairly easy, they were just ignored by experts for years.
So not nearly as good as I thought. But then, it stepped up its game and wrote the core contents of a research paper. On something new. Note that the final paper is written up by a human scientist. They had one paper on calculating constants in arithmetic geometry. And then it helped human scientists write 4 other papers, like finding new limits for interacting particles. So how good are these research works? Well, they are submitted for peer review and that’s going to take quite a while. So, in the meantime, they had a bunch of math experts look at it, many of them independent scientists.
They checked it for correctness and novelty, and it checks out man. I think for the first time ever, an AI created core parts of a research work that is new, it has impact, it is useful. That is…wow. What a time to be alive! So I told you there is levels to this game. So where are we now? Level 0 is negligible novelty work, it can do that. Level 1 is somewhat novel work, it can do that too. But now, it can help a person create publishable-level research. That is incredible. But wait, it can also do that autonomously.
An absolute game changer. Levels 3 and 4, those are groundbreaking works, these are out of reach, but I ask you Fellow Scholars, given the pace of progress, for how long? For 6 more months? And I think that is something that needs to be talked about more. Research helping the people live a better life. Love it. And thank you so much to all of you Fellow Scholars for watching us over the years. We can only exist because of you Fellow Scholars. I really hope that you enjoyed this. It allows me to talk about papers where there is not a lot of visual content, and I really wanted to share this with you.
Let me know in the comments if we should do more.
More from Two Minute Papers
Get daily recaps from
Two Minute Papers
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.



