DeepMind’s New AI Just Changed Science Forever

Two Minute Papers| 00:10:07|Mar 27, 2026

Chapters10

Introduces an AI system that can perform research tasks, generate candidate solutions, and potentially write core parts of papers.

DeepMind’s Aletheia can autonomously propose, verify, and even co-author new research, pushing the boundaries of AI-assisted science.

Summary

Two Minute Papers’ host explains that DeepMind’s latest AI, nicknamed Aletheia, isn’t just a clever solver—it can generate candidate research ideas, draft core paper content, and work with human scientists to publish novel findings. Dr. Károly Zsolnai-Fehér highlights three key innovations: a move away from rigid formal proofs to natural language reasoning, a stronger base model trained to reason efficiently (achieving similar smarts with 100x less compute), and a loop that tightly integrates search and synthesis from dozens of cutting-edge papers. The system can autonomously tackle frontier problems, such as Erdős-type puzzles, and has already contributed to multiple published works, including a core arithmetic geometry calculation and new limits for interacting particles. Crucially, the verifier component acts as a filter to catch junk or hallucinations, while the model’s reasoning is kept separate from the final written outputs. Although hallucinations remain a challenge, the approach includes training to read and synthesize vast bodies of literature without being overwhelmed. Zsolnai-Fehér emphasizes that this marks a transition from task-level assistance to helping scientists produce publishable, and potentially autonomous, research. The discussion wraps with a reflection on the pace of progress and the ethical questions of AI-driven discovery. Finally, the host invites viewers to share whether they want more deep dives like this.

Key Takeaways

Aletheia uses natural language reasoning instead of formal math checks to verify proofs, separating the thinking process from the answer.
The system runs on a base model strengthened for reasoning, delivering 100x less compute while maintaining performance.
Adding web search and access to hundreds of papers enables the AI to cross-reference techniques without losing track, stopping junk output.
It autonomously solved several Erdős problems and helped produce core contents for new papers, including arithmetic geometry and particle interactions.
The verifier acts as a filter to prevent the AI from simply agreeing with its own outputs, a crucial guardrail against hallucinations.
Researchers demonstrated a jump from Level 1 novelty to publishable-level research, with potential toward autonomous scientific contribution.

Who Is This For?

Researchers and developers in AI and scientific computing who want to understand how AI-assisted research could complement or accelerate real-world discovery.

Notable Quotes

"This AI is even better than that. They call it Aletheia."

—Introducing the AI and its name, signaling a leap beyond prior tools like Deep Think.

"The generator starts working on it, creates a candidate solution, and now here is one of the important parts of the paper. The verifier."

—Describes the core two-stage process: generate a solution, then verify it.

"Same smarts, but it uses a 100 times less compute."

—Highlights efficiency gains from model improvements.

"It solved a few of these Erdős problems. It autonomously found the answer to 4 open math puzzles left behind by a legendary Hungarian mathematician."

—Shows concrete success on open problems.

"For the first time ever, an AI created core parts of a research work that is new, it has impact, it is useful."

—Summary statement of impact and novelty.

Questions This Video Answers

How does DeepMind's Aletheia differ from earlier AI research assistants in math?
Can AI realistically author publishable scientific papers without human co-authors?
What are the main safeguards to prevent AI hallucinations in frontier research?
What steps are involved in turning AI-generated ideas into peer-reviewed publications?

DeepMindAletheiaTwo Minute PapersQuoc LeAI researchAIs that write papersnatural language reasoningverifierfrontier researchmathematical olympiad problems

Full Transcript

I appeared on camera for an interview not  so long ago. And I was really surprised   by how many of you Fellow Scholars said that  you would like to see more. So first of all,   thank you so much to all  of you for the kind words. Second, I thought let's try this and hope that you  will enjoy it. Dear Fellow Scholars, this is Two   Minute Papers with Dr. Károly Zsolnai-Fehér. Look, it only took 1,000 episodes. Now,   I have an amazing paper for you because scientists  at DeepMind did something pretty insane. Our   question today is can an AI invent something that  is fundamentally new and pushes humanity forward?  Well, they said that their new AI agent can  actually do research and even write research   papers. Most of the core content anyway. Is that insane? Well…it’s not. A lot of other   people have tried it and the only insane thing  about it was how many poor papers they wrote. But it turns out… there is levels to this game. You see, I visited the research group that is  behind this work last year. I flew to Mountain   View into this crazy lab, and a grumpy  guard didn’t even want to let me in first. Crazy town. So I was very surprised that they  are guarding these secrets and they take them   very seriously. What is even more surprising  is that now they give some of those secrets   away to all of us for free. Now that  is insane! More on that in a moment. So I talked to these scientists, this was the  research group of Quoc Le. They are brilliant.   They wrote an AI that was able to do a gold  medal worthy performance on the mathematical   olympiad. This is serious business. Then they  released this technique, anyone who is made out   of money bags and pays for the Gemini Advanced  can use it, it is called Deep Think. And now,   this AI is even better than that. They call  it Aletheia. Now that, once again is insane. Okay, so what does it do? Well, it  promises that it does research. It   solves novel problems. This is something  that could push humanity forward. Now that is so much harder than the mathematical  olympiad. Why is that? Well, in these contests,   you have a not that huge piece of core  knowledge you are supposed to have,   and every problem can be guaranteed to  be solved by those small set of tools. Every problem is nice, shiny, and polished. Tough,   but polished. You know what is not  polished at all? Real life problems.  With these open problems, we don’t even know  if they are solvable at all. Maybe they are   impossible, or maybe possible, but not with our  current tools. That’s the point: no one knows. When this technique is given a problem,  the generator starts working on it,   creates a candidate solution, and now here is  one of the important parts of the paper. The   verifier. This takes a look, and says, okay bro  this is junk. Start again. This is essentially   a filter. You know, that’s actually good life  advice. Sometimes it’s good to have a filter,   so you don’t just shoot those hot takes out  there into the ether. Now every now and then,   the solution looks pretty good, and could  maybe pass with a few modifications. Then,   it gets polished for another round of reviews,  and so it goes. Sounds simple…maybe even trivial   right? So what is so scientific about  this? Why doesn’t every system do that? Well, that’s easier said than done. In fact,  it is almost impossible to pull off. Why? One, when the AI is doing something  fundamentally new, unfortunately,   hallucinations still happen. Yup.  It just makes stuff up. Fake papers,   fictitious authors, you name  it. All kinds of junk comes out. Two, when you want to compute  1+1 or other simple things,   you have tons of training data about it out  there. You can verify that easily. But if   you want to do frontier research? There is  no training data on what we don't even know   yet. Of course there isn’t! You are trying  to invent things no one understands yet. These two factors make it extremely  difficult to get an AI to do something   fundamentally new and useful. So how did  they pull it off? With three key steps. First, Alethia does not use this formal rigid math  language to check its own proofs. It uses natural   English language. That is notoriously hard,  because when the AI checks its own writing,   it just blindly agrees with it.  We humans do that too! Now here,   the researchers found a way to separate the  thinking part from the answer part. So the   messy train of thought is hidden from the  verifier, it cannot trick itself into just   blindly agreeing with itself. Brilliant. Our  brains would need something like that too. Then, two they let the computer think  longer. That’s not new. However,   they added some optimizations to this, so  much so that the model they have now is   just as smart as the one from 6 months ago.  But hold on to your papers Fellow Scholars,   because yes, same smarts, but it uses a 100  times less compute. What! Crazy. They trained   a much stronger base model which made it  more efficient at reasoning. So this one,   even without internet access, beats the  mathematical olympiad gold AI easily. About   65% was improved to 95%. Wow. It went from  a bit better than a coinfip to destroying   the tasks made for some of the best human minds.  All this in just a few months. I am out of words. Now three, they gave the AI the  ability to search for stuff. We   are talking about Google after all.  Once again, that is easy. However,   getting the AI to read and combine techniques  from dozens and dozens of cutting-edge research   papers without losing its mind. Now that is  hard. You saw it earlier, this really happens! They heavily trained this AI to be  able to use these tools and research   works that are out there. That was what  finally stopped it from making up junk. Okay, so how good is it? First I saw that  it solved a few of these Erdős problems. It   autonomously found the answer to 4 open math  puzzles left behind by a legendary Hungarian   mathematician. Is that insane? I asked  a mathematician friend. He told me yeah,   that’s pretty good, but there are  so many of these problems out there,   and not a ton of people work on them.  In other words, they are fairly easy,   they were just ignored by experts for  years. So not nearly as good as I thought. But then, it stepped up its game and  wrote the core contents of a research   paper. On something new. Note that the final  paper is written up by a human scientist.   They had one paper on calculating constants  in arithmetic geometry. And then it helped   human scientists write 4 other papers, like  finding new limits for interacting particles. So how good are these research works? Well, they are submitted for peer review  and that’s going to take quite a while. So,   in the meantime, they had a  bunch of math experts look at it,   many of them independent scientists. They  checked it for correctness and novelty,   and it checks out man. I think for the first  time ever, an AI created core parts of a   research work that is new, it has impact, it is  useful. That is…wow. What a time to be alive! So I told you there is levels to this  game. So where are we now? Level 0   is negligible novelty work, it can do  that. Level 1 is somewhat novel work,   it can do that too. But now, it can help a  person create publishable-level research.   That is incredible. But wait, it can also do  that autonomously. An absolute game changer. Levels 3 and 4, those are groundbreaking  works, these are out of reach,   but I ask you Fellow Scholars, given the pace  of progress, for how long? For 6 more months?  And I think that is something that  needs to be talked about more.   Research helping the people  live a better life. Love it. And thank you so much to all of you Fellow  Scholars for watching us over the years.   We can only exist because of you Fellow  Scholars. I really hope that you enjoyed   this. It allows me to talk about papers  where there is not a lot of visual content,   and I really wanted to share this with you. Let  me know in the comments if we should do more.