AI Agents Just Learned A Language Humans Can’t Read

Two Minute Papers| 00:06:57|Jun 19, 2026

Chapters8

Explores how AI agents can automate tasks but bring issues like spam, security, and miscoordination when multiple agents are used.

Two Minute Papers reveals a breakthrough where AI agents communicate via brain-like latent signals, slashing text-based chatter and boosting small models on hard math tasks.

Summary

Two Minute Papers dives into a provocative new approach where multiple AI agents connect through latent brain-like signals instead of plain English. Dr. Károly Zsolnai Fehér explains that traditional, text-based coordination among agents wastes bandwidth and often yields suboptimal results, especially as agent collaboration scales. The core idea is cross-agent latent state transfer: instead of writing sentences, agents pass raw, undecoded numbers that map to internal thought processes. Early experiments show promising gains on math problems—moving competition-level questions from 73% to 86% with much smaller models (under 10B parameters) and cutting token usage by about 75%. The approach could bring smaller systems closer to the performance of larger, expensive models, at a fraction of the cost (roughly $4 in training expenses). A crucial caveat is that these results come from smaller-scale tests, and scaling up remains an open question. The episode also highlights a broader implication: if latent transfer works, we’ll need new tooling and evaluation methods for LLM-based workflows, with Weights & Biases’ Weave featured as a practical example. Fehér closes with a cautious optimism, praising the brain-linking concept while acknowledging limits such as optimal latent-thought length (around 80 steps) and the potential for results to depend on the teacher model.

Key Takeaways

Latent-state transfer lets agents share raw numerical signals instead of English text, dramatically reducing communication overhead.
In math benchmarks, small models (<10B parameters) improved from 73% to 86% accuracy using cross-agent latent communication.
Token usage dropped by about 75% as agents condensed thoughts into latent space, effectively “evaporating” into the latent representation.
Training costs appear low (as little as $4) relative to performance gains, suggesting a cost-effective path to stronger AI collaboration.
A controlled study showed brain-linking outperforms purely text-based coordination even when the teacher model is matched across architectures.
Results are preliminary and limited to smaller models; scalability to larger systems remains an open question.
There is an optimal latent-thought length (~80 steps) beyond which additional rounds yield diminishing returns.

Who Is This For?

Researchers and developers working on multi-agent systems, especially those exploring efficiency and coordination at scale with smaller models. It’s also valuable for ML practitioners curious about non-text inter-agent communication and latent-state strategies.

Notable Quotes

"Instead of using English words, they pass raw undecoded numbers directly to the next agent."

—Describes the core shift from text-based to latent-state communication between agents.

"Token usage down 75%. They all evaporated into the latent space."

—Highlights a major efficiency gain from latent communication.

"Four bucks. Basically, you spend your coffee money on these agents and in return they punch a hole in space-time."

—Emphasizes the surprisingly low training cost relative to potential impact.

"This is the good, but at the same time, you get so many news headlines about spam, security issues, and system breakdowns."

—Anchors the discussion in real-world tradeoffs and risks of AI agents.

Questions This Video Answers

how do cross-agent latent states differ from traditional agent communication
can latent-state transfer scale to large language models and real-world workloads
what are the practical limitations of brain-like latent communication in AI agents
how does Weave by Weights & Biases help iterate on LLM applications and debug data flows
what is the optimal latent-thought length and why does it matter for multi-round reasoning

Two Minute PapersDr. Károly Zsolnai Fehércross-agent latent state transferbrain-to-text interfacelatent communicationmulti-agent systemssmall-model efficiencyWeave by Weights & BiasesLLM toolingmodel scaling limits

Full Transcript

The number of AI agents on the internet is increasing at such an insane rate. I don't think I've seen anything like this. This is crazy. And this is an area that is quite new, and the technology is still pretty rough. Improving rapidly, but pretty rough. And the promise of agents is incredible. It would book the cheapest plane ticket for you, or run 24 hours a day to manage your schedule, submit insurance claims, continuously scan a codebase for vulnerabilities and patch it. Well, this is the good, but at the same time, you get so many news headlines about spam, security issues, and system breakdowns. And it gets even tougher when you have not one agent, but multiple agents. Imagine two agents organizing a holiday for you. The flight agent hallucinates a cheaper airport 400 miles away from your real destination. Then, the hotel agent says, "Let's book something super cheap nearby." Well, super cheap is often non-refundable. And now, congratulations. You now have a non-refundable room you will never see. And so many of these problems come from the fact that agent coordination is super difficult. Now, check out what this paper says we should do. Here is a math problem. First agent writes a plan. The next one critiques it, and the third one solves the problem. And at this point, I said, "Okay. I see nothing interesting here. This is what everyone does with agents." Yes, but here's the key. Most agents communicate a bit like we do, in words. Wait a second. Why should we do that? Look at this neural interface for brain-to-text communication. Yes, this really works. You just think about a letter in the alphabet, and it magically appears. And if you keep doing this a lot, you start asking. The alphabet is optimized for writing. Why use that? Why not use one that is optimized for thinking? And what would that even look like? Hint, it would look like this. We talked about this 500 videos ago, paper in the description. Now, if you look at the agents, the first one does some work, packs it up, and passes it to the next one. So do the second and the third ones. Every [clears throat] time an agent wants to communicate something, it has to write out full sentences, decode tokens one by one, and the next guy has to read and re-encode the whole thing. Why are we doing that? Who said they should talk in plain English? And this is the part where I fell off the chair. Now, hold on to your papers, fellow scholars, because this work says, "Huh, forget English. You know what? Forget letters entirely." It says, "Instead, let's link up their brains." Kind of. Instead of using English words, they pass raw undecoded numbers directly to the next agent. Send raw brain signals, if you will. Call it cross-agent latent state transfer. So, the theory is that these three agents can work together round one, round two, and round three much cheaper than the text-based agents. They refine an answer, and you get better answers with the same amount of computation. So, is it better? Hmm, let's see. Dear fellow scholars, this is Two Minute Papers with Dr. Károly Zsolnai Fehér. Well, when given competition-level math questions, it goes from 73% to 86%. That is crazy. We are talking free sub-10 billion parameter models, not expensive frontier systems. And here is where it gets the Michelin star status. Look at that. Ooh. Token usage down 75%. They all evaporated into the latent space. Loving it. So, this can improve smaller systems to be in striking distance of much bigger, more expensive models on difficult math problems. So, I bet it costs a fortune to train, right? Well, look at that. Four bucks. Basically, you spend your coffee money on these agents and in return they punch a hole in space-time. Love it. Additionally, it might even unlock Wait, wait, wait. I shouldn't say unlock. That's AI speak. So, it might give us a new scaling law. More rounds, better results. And at this point, I thought we might have a deadly flaw here. And it's really subtle. So, the training for each agent's role is written by a giant AI model. So, if they perform well, you have to ask, are things better because of the brain linking or is it good distillation from an excellent teacher? So, which one is it? A good teacher or a good architecture? Well, fellow scholars, we are in luck. This is a really good paper. So, the scientists thought about this too. And look, goodness, a controlled comparison gives the same teacher to other architectures and this one. And the new one still outperforms. So, yes, the brain linking really works. What a time to be alive. Okay, now, let's not get too excited. This is two-minute papers and we respect the science here. Limitations. One, tests were on smaller models. We don't yet know how these insights scale up to bigger ones. If they don't, then this puts small models on steroids. Still good. If yes, potential huge game-changer. Two, there is an optimal latent thought length, and that is about 80 steps. This is somewhat of a limit on how much thinking an agent can do per round. [clears throat] I am thinking, you know, if it solves a mathematical Olympiad problems already, how bad can that be? And sure enough, after 80, you don't get a lot of value anyway, but I wanted to mention it. Okay? So, code and models are available for free. Note that this is still very rough, very early, but it shows potential. And this is still research. Please do not think you just plug this in and everything will fly immediately. We need new tools for the era of LLMs, and Weights & Biases now has Weave, a lightweight toolkit to confidently iterate on LLM applications. Use traces to debug how data flows through each step of your app, and use evaluations to measure your progress. It is the best. Try it out now at wnb.me/papers, or click the link in the description below.