AI Memory: Stop Building Stateless Agents

Jack Herrington| 00:08:13|May 18, 2026

Chapters6

The chapter demonstrates agentic memory in TanStack AI, contrasting episodic memory with long-term retention and showing how memory improves responses by remembering preferences (like sushi) across conversations, plus an overview of using different memory vendors or a DIY local memory.

Jack Herrington demos agentic memory for TanStack AI, showing how memory can persist across chats using Hindsight, Mem0, Honcho, or a DIY local store, and explains how recall/retain works with per-session context and long-term memory.

Summary

Jack Herrington unveils a prototype for TanStack AI memory, illustrating how agentic memory can persist facts across conversations. He contrasts episodic memory (per chat) with long-term, user-specific memory that survives refreshes. The demo shows how saying “I like sushi” is retained when memory is on, enabling smarter follow-ups like suggesting dinner options based on prior preferences. Herrington then explains the Memory Bench monorepo structure, highlighting apps/web as the UI and packages/ai-memory as the core memory engine with a generic MemoryDriver interface. He details three built-in memory engines—Hindsight, Mem0, and Honcho—plus a DIY local implementation using SQLite for storage, OpenAI for vectorization, and Anthropic for LLMs. The talk covers the flow: recall facts before a turn, inject them into context, and retain facts after a turn by running an LLM over the chat transcript. He emphasizes the need for API keys (Anthropic, OpenAI) and shows how to plug memory middleware into TanStack AI, including role variants like recall+retain versus retain-only. Herrington notes applicability to other agents (Claude Code, Codex) and hints at tool-assisted memory (MCP tools) for ad-hoc storage. The video rounds with guidance to check the README, try the Memory Bench, and engage with the community via comments, likes, and subscriptions.

Key Takeaways

Memory on: a user input like “I like sushi” is stored as a durable fact in TanStack AI memory, remembered across conversations.
Three out-of-the-box memory engines are demonstrated: Hindsight (tool-supported), Mem0, and Honcho, plus a DIY local memory option using SQLite, OpenAI embeddings, and Anthropic for LLMs.
Memory flow explained: recall relevant facts before a turn, inject them into the chat context, and retain the full transcript after the turn to extract and store new facts.
Memory types clarified: episodic/context-window memory, model memory, working memory, and long-term user/application memory with durable preferences and summaries.
Integration steps shown: add createMemoryMiddleware and engine scope, configure recall+retain vs retain-only modes, and optionally enable tools for ad-hoc memory persistence.

Who Is This For?

Essential viewing for developers integrating persistent memory into chat agents, especially those using TanStack AI who want to move beyond stateless interactions and experiment with Hindsight, Mem0, Honcho, or DIY memory backends.

Notable Quotes

"What I really want to do is actually retain information between chats."

—Herrington contrasts episodic memory with long-term retention across conversations.

"The memory system has three different things: recall, inject into context, and retain."

—Summary of the memory loop fundamentals.

"We can reset and then observe how each memory engine stores and recalls facts like 'I like sushi'."

—Demonstrates Memory Bench comparison across engines.

"If you really want to get tricky, you could inject MCP tools for ad-hoc memory storage."

—Hints at advanced tool-assisted memory workflows.

"This is basically pre-work thinking about different ways to bring memory into TanStack AI."

—High-level rationale for the prototype.

Questions This Video Answers

How does TanStack AI implement memory across chat sessions?
What are the differences between Hindsight, Mem0, and Honcho for agent memory?
How do recall and retain work together in an AI memory system?
Can you use a DIY SQLite memory store with OpenAI embeddings and Anthropic LLMs in TanStack AI?
What is MemoryBench and how does it compare memory engines side-by-side?

TanStack AIAI memoryagentic memoryepisodic memorycontext window memorylong-term user memoryHindsightMem0HonchoSQLite memory store','OpenAI embeddings','Anthropic LLM','MemoryBench','MemoryDriver interface

Full Transcript

Recently, I've been working on agentic memory for TanStack AI. It is really fascinating. Let me give you a quick demo of what I've been working on. I'm going to turn memory off, I'm gonna say into my chat box, " I like sushi." And it comes back saying, you know, "Great for you. What do you like?" And then I'm gonna ask, "What should I have for dinner?" Now, because it knows from our conversation already that I like sushi, well, it's gonna say, "How about sushi for dinner," right? That makes sense. But here's the problem. If I were to refresh and create a whole new conversation, and I were to type, " What should I have for dinner?" again, well, it doesn't know anything. It doesn't remember our previous conversation. So it just gives me some generic answers about something for dinner, which makes a lot of sense because all it has is what's called episodic memory. An episode being the chat, and the information within that chat is the contents of the episodic memory that I have with the agent at this point. And that's okay, but what I really want to do is actually retain information between chats. So I'm gonna turn memory on, and then again say, " I like sushi." And now we can see over here in the local system that we have retained one fact, and that's that I like sushi. Now, cool, I can go back over here and refresh, create another conversation and say, "What should I have for dinner?" And now we recall from memory any facts that might be pertinent to dinner. And in this case, that is the fact about me liking sushi, and therefore it gives me back a much better response, given the fact that I like sushi, about having sushi for dinner. Really nice. So what I've done in this TanStack AI memory monorepo that I've created kind of bespoke and as a POC for introduction into TanStack AI, is I've built a system where you can wrap one of three different vendors, Hindsight, Mem0, or Honcho for memory, or you can create your own local memory and do the work yourself. Let's get right into it. Of course, all this code is available to you for free in a link in the description right down below. And if you're really into agentic memory, I would love for you to have a look at this because this is basically some kind of pre-work thinking about different ways to bring memory into TanStack AI. Let's go take a look at the README, because I think it's actually really good, and it talks about, one, the fact that there's two elements of this monorepo. The first is apps/web, that's this memory bench application, and then there's the really important packages/ai-memory, and that's what gives us our types as well as connectors to various memory systems, as well as a way to go and do a DIY thing and create our own memory. So what is a memory system? Well, a memory system has three different things. It has the ability to recall. That is, before the turn of the conversation, I'm gonna take your input as the user, and then I'm going to recall facts from my system that would be relevant to that conversation. And then I'm going to inject those into the system context. Then at the end of the turn, you do a retain, where you take the whole chat transcript and then run an LLM over it to extract any facts and store them in the database. And some providers provide tools, and the tools allow an LLM to ad-hoc during the conversation and say, "Well, that's an interesting thing to remember," and then store or retain facts, and then also recall facts during the conversation. That's all wrapped up in this generic MemoryDriver interface. There's also a really cool section in here called types of LLM memory. As I mentioned, episodic memory is just one of those. We're calling it context window memory here. There's also model memory. That would be what we were trained with. There is the working memory. That's within a single response, the working memory. And what we're doing here is in particular called long-term user or application memory, where we have durable facts, preferences, observations, summaries that we glean from the conversation. And then in terms of the different memory engines, there's three out of the box. There is Hindsight. Hindsight's unique in that it has tool support, which is great. There's Mem0 and Honcho. They each kind of have their own advantages. And then there's a DIY local implementation that I created over in the Memory Bench app that just uses SQLite for storage. It uses OpenAI to do vectorization, and then Anthropic for LLM. So what are all those used for? Well, you need an LLM to go and extract facts from the conversation. Then you need a embedding model to go and create vectors based on those facts. And then you can store the vectors and the facts in a database or wherever you want. Then when we're asked to do the recall, we do a vector-based search, and that gives us more of a concept search as opposed to a keyword search, which is much better for an LLM. Now, comparing these four is actually what this Memory Bench is all about. So I go over here to the Memory Bench route. We can see that we still have retained that user likes sushi. Well, I'm gonna reset that so that we're at a blank slate. Then I can start putting in things like, " I like sushi." And then we fan out that retain to all four of those different systems. And so you can start to see, well, how does each one of those manage getting the facts and then storing the facts and what facts does it store? And it's, it's actually really cool. To get this all set up, there's a couple things you're gonna need, but it's really not that onerous. So if I go over here to env example, we can see that we need an Anthropic API key and an OpenAI API key, but we don't need subscriptions to Hindsight, Mem0, or Honcho. Now, those all have cloud instances, but in this case, we're just using Docker to run all of them locally. Some of them end up calling out to LLMs or OpenAI or whatever, so you're gonna need those keys even if you don't actually use the local version that I created. The local version also uses Anthropic for the LLM and OpenAI for the embeddings. Let's take a look at the code and how you'd actually put this into your chat. Over in packages/ai-memory in the README, we've got an example using hindsight. The two important things that we're bringing in that you're unique here are the create iddleware that we're bringing in from that ai-memory unpublished NPM library, and then the hindsightEngine. Then down here, we're doing createMemoryMiddleware, and we're giving it the engine as well as the scope. The scope is who are you and what's this session? And that allows for the back-end system to hold those facts, but based on who you are and what you're talking about. You also get to decide the role of the memory in this conversation. The two different roles are recall+retain, meaning full memory or just retain-only, which is like an incognito mode thing here, where you're saying, "I don't really want you to remember anything about this conversation, but I do want you to know about me." Once you get the middleware back, in this case memory, all you need to do is just add it to the middleware array, and memory takes care of putting itself in the right point to recall all the facts before the turn starts and retain all the facts once the turn ends. In the case of a memory engine that supports tools like say, Hindsight, you can go and request the tools and give them the scope and then get back scope-based tools that you then add to the tools array. That's all it takes to add agentic memory to a TanStack AI chat. For those of you who are creating systems that are agnostically running coding agents like Claude Code or Codex, you actually can use memory. And the way that you do that is before you invoke those agents, you do the recall based on the user prompt, and then at the end with all the chat transcript that you have, you then use the retain phase at that point, and that gives you the same memory as you're getting here. And you can use any of these backends for that. Hindsight, Mem0, Honcho, they'll all work. And if you really want to get tricky about it, you could probably even inject some MCP tools and get tool support for that ad-hoc memory storage. I hope you enjoyed this quick look at my prototype for TanStack AI memory. If you have any questions or comments, be sure to put that in the comment section right down below. And in the meantime, if you like this video, hit that like button. If you really like the video, hit the subscribe button and click on that bell, and you'll be notified the next time new Blue Collar Coder comes out.