Memory and dreaming for self learning agents
Chapters9
The speaker introduces the breakout stage, the presenter, and the focus on memory and dreaming as new foundation pieces for evolving agent capabilities.
Memory and dreaming unlock long-horizon, multi-agent self-learning with enterprise controls and a decoupled dreaming loop to continuously improve memory quality.
Summary
Claude’s Ravi introduces memory and dreaming as the building blocks that let agents learn across tasks and environments. Memory provides a frontier memory system for cloud managed agents, enabling multi-agent collaboration with enterprise-grade controls, versioning, and scoped memory stores. The talk cites real-world gains, such as Racketin achieving a 97% drop in first-pass errors in production, and Wise Docs reducing cross-session issues in verification pipelines. The architecture stacks storage, structure, and cloud-driven processing, with an optimistic concurrency model to prevent clobbering and a standalone API for memory management. Dreaming operates as a decoupled feedback loop that analyzes session transcripts, optimizes memory, and emits a refined memory snapshot across agents and sessions. The demo shows SRRES on-call triage using read-only and read-write memory stores, with dreaming identifying patterns like a CPU spike followed by a delayed alert retry. Claude emphasizes that memory and dreaming together scale knowledge sharing across organizations, environments, and multi-agent systems, all without adding latency to the hot pass. This is the beginning of a frontier memory system designed to raise the floor for every agent and push memory quality higher over time.
Key Takeaways
- Memory for cloud managed agents enables learning across tasks and environments, with versioning, audit trails, and clear attribution.
- Dreaming analyzes session transcripts across agents, then proposes optimizations and outputs a better-organized memory snapshot.
- A multi-tenant memory store model uses readonly organization-wide memory alongside per-task read/write stores to scale across environments.
- Optimistic concurrency control prevents agents from overwriting each other’s memory updates, preserving data integrity across sessions and agents in production workflows.
Who Is This For?
Essential viewing for teams building enterprise-grade agents who need cross-session learning, versioned memory, and scalable collaboration across multiple agents and environments.
Notable Quotes
"Memory lets agents learn. It lets agents carry forward learnings from their previous tasks."
—Ravi defines the core purpose of memory as enabling learning across tasks.
"Dreaming can be kicked off ad hoc, nightly, hourly, or it can even be triggered by events like the end of a session."
—Description of how the dreaming process is scheduled and triggered.
"Dreaming truly enables continuous self-learning."
—Direct statement about the impact of dreaming on learning and memory quality.
Questions This Video Answers
- How does memory improve agent performance across tasks and environments in Anthropic Claude?
- What is the role of dreaming in organizing and optimizing memory across multiple agents?
- What enterprise controls are available for memory in cloud managed agents (versioning, auditing, and concurrency)?
Memory (cloud memory store)Dreaming (memory optimization)Cloud Managed AgentsOpus 4.7Claude memory architectureOptimistic Concurrency ControlAudit logs and versioningCross-session memoryOn-call SR incident triageAgent SDK and Cloud Code
Full Transcript
Hello. Thank you for joining us today. I'm excited to kick things off on uh the breakout stage. My name is Ravi and I lead the API knowledge team within platform at Enthropic. And since joining Anthropic last year, my focus has been creating the building blocks for agents to interact with many forms of knowledge, ranging from the context window itself to skills, files, and even content on the web. And we recently released two features that I'm most excited about. Memory and dreaming. We now have the building blocks for agents to learn over time and improve from one task to the next.
And I'll talk about why we think memory is important, how we designed it, and we'll close out with dreaming, our new frontier memory feature. There we go. But first, a quick timeline of milestones that got us here. And the important thing is models have been improving and agents are capable of completing tasks that take many many hours and are increasingly complex. So in 2024 we released model context protocol MCP and this gave models access to external tools and data in a principled way. In 2025, we released Cloud Code and the agent SDK, which lowered the barrier to using and building agents, which as an aside, that blows my mind that that was in 2025.
It honestly feels like a lifetime ago. Later that year, we launched skills, which gave model models a generic abstraction for unlocking and effectively bolting on new capabilities to complete specific tasks. Last month we released cloud managed agents a platform for reliably running agents that takes care of the hard parts. Now the important through line here is that agents can do more and they can operate over longer and longer time horizons. So in 2025, Meter released a study saying the length of tasks that agents can complete is doubling every seven months. And we're seeing this happen.
But managing context over long horizon tasks is still a work in progress. And that's where memory comes in. Memory lets agents learn. It lets agents carry forward learnings from their previous tasks. And in the simplest sense, imagine a set of tasks. Task one, task two, task three, and so on. The goal is for performance to improve from one task to the next. In the base case without something like memory, performance on each task might be similar because every agent is just starting from the same slate. In the optimal case, performance improves from task one to two, task two to three, and so on.
That's the goal. Learning from task to task, but also from environment to environment and agent to agent. So with memory agents can learn from common strategies and previous mistakes. They can learn from the tools they have access to or code bases and files. And finally they can transfer these learnings to and from other agents. Think swarms of agents contributing to and maintaining a shared understanding of the organization they work in. This is the dream. So we recently launched memory for cloud managed agents and this is a major step towards this vision. It gives developers a frontier memory system that is built to maximize intelligence out of the box and it supports multi- aent systems all with enterprise control and observability.
And we built memory in partnerships with several teams that are using managed agents. And the results speak for themselves. Racketin saw a 97% decrease in first pass errors in agents deployed in production. Wise docs reduced common issues using cross session memory in their document document verification pipeline. And the through line here and the common feedback we get is that our memory primitive allows teams to focus on building the product not the infra and all while reaping the benefits of increased intelligence that comes along with better memory. you might be thinking is memory really new? Rightfully so.
Memory is a concept that's not entirely new, but our approach for it with agents has greatly evolved and previously we built memory focusing on capabilities in the harness. So you might be familiar with claw.md for cloud code or dedicated memory tools in the SDKs. But one pattern we're seeing is that as models improve, we really just want to get out of Claude's way, similar to what we did with skills. And skills was a very basic format that was highly flexible. And it created endless possibilities. And the model understood how to operate with it. And so with memory, we've leaned into that same direction with files.
So let's talk about some of the capabilities that we design memory around. So right now with the current set of models, we know a few things. Models and claude are great at navigating virtual environments and a file system. And Claude is also very capable at using familiar tools like bash and GP to read, update, and organize files. Opus 4.7 that we launched last month is a state-of-the-art model at file system based memory and it's increasingly capable of discerning which context is most important to save for its future self and how it should be structured and how it should be represented.
And so with memory we've modeled it as a file system to quad. Again, the key principle is getting out of cloud's way and letting it use the capabilities it already has that are very strong. Or as we like to say, let it cook. This is the dream. But we've talked about Claude's memory capabilities within the context of a single agent, but we want it to work across multiple agents that are operating in the same environment at the same time or maybe across environments. And this introduces new requirements like for example letting multiple sessions share the same memory store at the same time.
And maybe they want different scopes. So we offer readonly scopes and read write scopes. So for example, you could have organizationwide memory that's readon and it's updated fairly infrequently and it can be accessed by all agents and the same set of agents can have access to more granular memory stores that they can read and write freely and so this creates a hierarchy and uh allows the memory system to really scale. Now, to combat right conflicts, to make sure that one agent isn't clobbering another's rights, we employed a optimistic concurrency control model to avoid agents overwriting each other's changes.
And last but not least, memory needs to work for real production agents. This means enterprisegrade controls. So version control uh creates an audit trail as agents make changes and developers can see how memory evolves over times. They can even diff between versions and there's attribution to see which agent wrote which part of the memory. And I think one of the most important pieces is that memory has a standalone API. It enables developers to manage their memory from anywhere. And the reality is teams are building their systems in many different environments. So they can use memory via these APIs which provide standard credit operations but also more enterprise focused operations like exports and redactions.
Okay. So we've covered three key components of a memory architecture. One, we started with the storage layer, which is how the data is managed itself and how changes are tracked. Next, the structure of memory, optimizing in a format that allows Claude to get the most out of it. And finally, cloud-driven processing for updating the Now, let's stop at that processing point. agents writing memory as they work is very key to the processing layer. Think of it as taking notes while you're doing something. But as we scaled up this pattern to more complex multi- aent works like uh use cases, we started seeing some limits across different sessions and we started seeing some common patterns.
For example, agents were prone to making many of the same mistakes and they learned from their mistakes independently. agents also displayed some of the same patterns of inefficiency. And the general theme was memory was being updated in a locally optimal way, but it wasn't globally optimal. In some cases, there was duplication or fragmentation. And so we started thinking really deeply about this problem and in the last couple of months we built a feedback loop in the processing layer that combed some of these problems. Now, I've said it a couple times, but this time I mean it.
This really is available in research preview right now, and it can be used with managed agents. It's a process that looks for patterns and mistakes across agents and sessions, and it automatically curates their memory. Customers like Harvey saw a six times increase in completion rates for their legal benchmark with Dreaming and we're actively seeing other usage of Dreaming and we're really excited to see how people are benefiting from it. A quick overview of how it is process from sessions. It's completely decoupled. Think of it like a feedback loop. Agents write memories and dreaming refines and this process repeats.
And dreaming can be kicked off ad hoc, nightly, hourly, or it can even be triggered by events like the end of a session. It's all controlled via API. So, it's very flexible. Each dreaming run analyzes session transcripts. It inspects the existing state of memory and it proposes optimizations to the memory in scenarios where sessions were inefficient, made mistakes or needed improved guidance. And the output is a verified better organized snapshot of memories that agents can choose to adopt. And dreaming truly enables continuous selfarning. It closes the loop on I mentioned outofband the outofband component of dreaming is really really critical creating a process that's decoupled from the underlying agent loop has benefits for one architecture makes it useful for multi- aent systems looking at cross session cross agent transcripts discerns patterns that a single agent in isolation might struggle to identify There's also benefits to having a dedicated dreaming harness.
It allows for clearer objectives. Since dreaming is an independent process, there's no risks of agents needing to trade off between improving their memory quality or actually just completing their task objective. It's clean separation. And lastly, it doesn't add any latency to the agent. It's completely removed from the hot pass. So zooming out, we now have a robust memory layer that can be shared across agents and environments instead of only within specific tasks or usage. We also have dreaming, a process that globally optimizes and reconciles memory across And the result is a capable memory system for organizational memory that is capable of scaling up both the size as well as the quality of memory.
And the way I think about it is sharing memory that's constantly improving across agents raises the floor for every agent and dreaming raises it even further. And if you really explode the size of this capability and you pull it all together, memory becomes a huge source of knowledge. models or test time compute where letting models spend some tokens to explore a problem on average produces better outcomes. With dreaming, agents are doing the same thing. They're spending some work up front to curate and produce higher quality memory and that pays dividends for all downstream agent performance.
We believe that dreaming and memory form the basis of a frontier memory system. Memory on the left helps agents learn and remember from task to task and dreaming on the right verifies, organizes and enriches the memory. The way I think about it is dreaming is the bridge between memory as we know it today and organization scale memory and knowledge. Now I'm going to flip over to a demo. So this uses both dreaming and memory in practice. It's an agent platform for SRRES and everyone loves being on call, right? So here we have a system that looks at incoming alerts and pages and for some of them it actually uh spins up agents that decide how to triage and fix the issues as they come up and it has access to a couple of memory stores.
One is a readonly orwide knowledge memory store. And so this contains things like the SLO policy or runbooks and on call mappings information that doesn't change very often but is important for every agent. And it also has access to read write memory stores that are specific to the task at hand. we can dig into an interesting example here where an agent investigated and found the root cause of an alert and it put up a fix and it noted in memory. You can see the writes. It noted in memory that a fix was in flight and it was incoming.
And then the shared memory store can be read by uh subsequent sessions. And so here we can see that when a similar issue arises, the downstream session already knows that a fix is in flight and it's able to act based on that information. And I really think this is just such a cool pattern because you know the I I was once an SR in my career and this really uh helps coordinate across all agents and it's really cross- session memory at work. Now for running in enterprises uh an important piece here is audit logs and history.
So with memory you can see the full version history. You can switch between different versions and you can also attribute the rights to specific sessions. And there's also a precondition here and that's the optimistic concurrency model to make sure that agents aren't clobbering each other's rights. we'll flip over to the cloud console. One moment. So, here we see the list of underlying memory stores that we were using in that application. And so, we'll go over to our team SR memory store. And you can see exactly the underlying files that were populated there. And we're going to head over to the dreams tab.
And we're going to kick off a dream. And so this can also be done via the API uh but also in the UI. And we're going to select the team SR memory store and we're going to select a batch of sessions from the last seven days. So that's about five. And we're going to start dreaming. As it begins, you can see it making progress. You can look at the dream and see that there are five input sessions. And then you'll see there's actually an output memory store that's being compiled. And you can actually open the dreaming session.
This is an important piece. Dreaming itself is built on cloud manage agents. So it's a feature for cloud manage agents built on cloud manage agents itself. You can see that it spins off a series of sub aents to analyze transcripts in parallel. And it has all the same UX as the rest of manage agents. And we'll fast forward to a completed dream session. And you can see the diffs on the memory store updates. And in this example here, we see that across sessions and across agents, there's a a common pattern of an alert triggering 60 seconds after a CPU spike.
And this is a recurring pattern. And so it starts to discern that there might be some issue with the retry behavior. And so it makes a note. So this dreaming process makes a note and updates memory so that the next agent that sees this pattern can actually similarly updates the triage log in a more holistic way rather than just being a wrote log of all the events that happened. And that's memory and dreaming at work. So we'll flip back over to the slides and we'll close out. So with that demo, we saw how we can build a a production agent that uses memory and dreaming to self-improve the agents.
this year I think is going to be a really big one. We're going to see agents run for longer and longer time scales, days for example, and continuously building upon and improving their understanding and view of the world around them is very critical to unlocking that capability. And I think memory systems are going to be a big part of what makes this behavior possible. So give it a try. I'm excited to see what everyone builds with it. And I'll be outside if you have more questions. Thank you.
More from Claude
Get daily recaps from
Claude
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.



