Open Source Friday with Mastra
Chapters7
The hosts welcome viewers to Open Source Friday, highlighting the show, GitHub sponsorship, and the community around open source projects and Mastra.
Mastra (open source) shows how to build scalable AI agents with memory, observability, and flexible storage, all inside a TypeScript-first framework.
Summary
GitHub's Open Source Friday spotlights Mastra, a bold open-source AI framework born from Gatsby veterans turned problem-solvers. Abby Ayer explains how Mastra evolved from an AI-powered CRM pivot into a full primitive for AI engineering, emphasizing memory, tools, and agent orchestration. The studio, a central UI, lets you spin up weather agents or coding agents, test prompts, and view traces in real time. Mastra supports multiple model providers through a dynamic model router, so you’re not locked to a single vendor. A standout feature is observational memory, which compresses long histories into event-based signals to save tokens while preserving essential context. The talk also covers how Mastra exposes base interfaces for storage, memory, and observability, enabling developers to plug in their own databases, dashboards, and evals. Finally, Abby points to a thriving community—Discord, weekly AI agents hours, and a roadmap focused on memory, observability, and community—with real-world demos of how agents can learn to code and even generate skills automatically.
Key Takeaways
- Mastra is open source, TypeScript-first, and designed to be extended via base classes and interfaces for storage, memory, and observability.
- The platform includes a studio for building and testing agents, plus a model router that lets you pick model providers (e.g., OpenAI, Anthropic, OpenRouter) without lock-in.
- Observational memory compresses long conversations into event-based signals (three-agent memory system: actor, observer, and reflector) to improve recall and reduce prompt costs.
- Weather and coding agent demos show how Mastra can read/write files, generate skills, and evolve into a true memory-driven assistant rather than a one-off bot.
- Mastra emphasizes measurable evals and observability (traces, scores, data sets) to support the scientific method of improving agents over time.
- The team advocates meeting users where they are (exporters to Langfuse, Brain Trust, etc.) and designing for production needs from day one.
- Community-led growth is a core pillar, with frequent live streams, workshops, and a San Francisco conference on the roadmap.
Who Is This For?
Developers and AI engineers who want a production-ready, open-source framework to build, test, and scale AI agents with robust memory and observability. Ideal for teams exploring agent-to-agent workflows or looking to customize their own cognitive primitives.
Notable Quotes
"And since then we have you know our 22k cases GitHub stars we have over a million downloads a month like people are really resonating with what we're building."
—Abby highlights Mastra’s traction and community interest.
"Observational memory is a new memory system we just released... three agents—the actor, the observer, and the reflector."
—Explains the core memory architecture and why it matters for recall and cost.
"Memory is the ability to you know may seem kind of simple but a conversation history is one way to do long-term memory."
—Defines the concept of memory in Mastra and why it’s essential.
"Three agents tackle the same memory problem: the actor, the observer and the reflector."
—Describes the multi-agent memory system underpinning observational memory.
"We ship with this very canonical example of a weather agent… a weather tool is a free API."
—Demonstrates how Mastra handles tools and real-world APIs.
Questions This Video Answers
- What is Mastra and how does it differ from other AI agent frameworks?
- How does observational memory work and why is it cost-effective?
- How can I start building agents in Mastra using the Studio?
- Can Mastra integrate with my existing databases and observability tools?
- What is the role of MCP in Mastra and how does it stay relevant in production AI?
MastraOpen Source FridaysAI agentsmemoryobservabilityobservational memoryMCPbase interfacesmodel routerstudio UI
Full Transcript
Heat. Heat. Heat. Hello everyone. Good morning, good afternoon, good evening, wherever in the world you're joining us from. Welcome to Open Source Friday. Thank you for being here. We made it. We made it team. We made it to Friday. I don't know about you all. I had a nice and spicy week. Very eventful, but very good. Lots of good work done. Welcome Brandon. Thank you for being here. Welcome Zumi. Welcome Krepa. Welcome Mr. Landini. I appreciate you all so much for joining Open Source Friday. And for those of you who are just tuning in for the very first time to the stream, welcome.
This is Open Source Friday. This is a show sponsored by GitHub, produced by GitHub, and hosted by an amazing team from the Devril and open source teams at GitHub. And we're here just to bring you amazing projects. Not just the projects themselves, but we get to talk to the contributors, the creators, the maintainers, and all the folks who are making open source keep on going. And boy, do we know we depend in open source. Welcome, Mr. Isan. Welcome, welcome, welcome. I love to know where people are joining from. So, this is my favorite. Thank you for doing that.
I'm prompted. That is my favorite thing to read when I see where folks are joining from. From Jamaica, Miss Roxan, I'm so jealous. from Cameroon, from India. Thank you everybody from joining. What an amazing We got a great team watching today. So I am joining you live from the west coast of Florida where it's very warm. So uh salute and warm hugs for those of you in parts of the world where it's still very cold. But as I mentioned, Open Source Friday is a show where we get to highlight amazing projects. This project in particular that we have today has had an incredible story, a masterful pivot really.
So, it's like really interesting to me. Also, when like teams get started, they get to work on things and then they they make adjustments. Um, but they're doing something that is extremely timely. Um, and I'm delighted to have the CTO of the company joining us today. For those of you who are not familiar with Mastra, you're about to learn everything that there is to learn about it. Uh it's a great framework and welcome to Abby Ayer who is the OO. Hi, thanks for having me. Thank you for taking the time for being here. I I appreciate that the team has been trying we've been trying to put this together for quite some time.
Um I think I met someone from your team at one at a conference last year. Maybe it was render or it's been it's been a while. It's been a long conversation. Um I'm in your Discord. I've been lurking. I've been using lately. But thank you so much, Abby, for taking the time to be here. I I cannot wait for you to share what master is about. So I know that it's sort of it came from the Gatsby team or maybe some of y'all were from Gatsby at one point. Maybe I'm Yeah. Okay. So you could tell us Yeah.
a little bit of the origin story like what problem were you trying to solve and how did this all get put together? Yeah. So me and my co-founders Sam Bugwat, Shane Thomas and myself, we all worked at Gatsby for quite a while. Uh Sam being a co-founder of Gatsby and you know uh Gatsby was a for for those who don't know Gatsby was an open-source uh static site generator uh with with React and one of the first of its kind back then. Um, static sites were all the rage during the Jamstack era of our development careers and you know we kind of expanded from static to doing other things and eventually got acquired by Netlefi another prominent uh company in the open source world and Jamstack world.
Um once we got to Nellifi it was a little you know it takes some getting used to joining a new team um being part of a new product and slowly uh we all started to fade out and tried to think of what was next. Surprisingly I was the one who left last from Nellifi. I actually enjoyed my time there. Um and uh Sam had pretty much stolen us out of Nellifi with this idea of uh I want to build a better Salesforce. Um which is surprisingly how we're going to get to MRA is a very surprising uh journey there.
But he wanted to build a better Salesforce. And I guess we had the arrogance to think that we were the guys to to do a better Salesforce. Um we tried um we tried building a linear like experience for doing a CRM and we spent a lot of time building a CRM when this CRM was supposed to be an AI powered CRM. The reason we kind of didn't we didn't want to touch the AI parts because there wasn't any movement in Typescript in AI at the time. Um, and we were kind of too busy focusing on things that we like, user experience and product and things like that.
So, when it came for us to go raise a seed round, um, pretty much all investors were like, hey, this is a AI CRM, but I don't see the AI part. What is the AI? We were like, oh yeah, if you squint a little, you might believe that it's there. Um so you know taking rejection on the uh we started trying to integrate those AI things that they expected and we found it to be very difficult to get far fast when you're trying to iterate and like you know experiment. So um funny enough I was visiting a friend in New York City.
I went to his office. We work in financial district. I walk in, I was supposed to have beers with the guy, but I walk into a hackathon, an AI hackathon that was happening in the Weiwork lunchroom. And so I, you know, turns out he can't hang out. He's participating in the hackathon. So I figure I'll participate, too. Um, in doing so, it took us eight hours, the whole day to build a chatbot that can read PDFs. Now, that might seem so uh really contrived today, but it was really hard to do mainly because you're educating yourself for the first time how to do something as well as putting yourself under a time pressure to do it.
Um, so we did it. We got it working. Pretty much most people were submitting Google Collabs, Jupyter Notebooks. Yep. If you ever been to hackathon, that's not what was going to win a hackathon. You need to have something cool. So, we built like an X.js JS app and thankfully having the web experience from before, you know, our our hackathon project looked very good and we were able to win. And that actually made me really think that what if other people could get far fast and something that took me eight hours should take the next person one hour or even less.
So, uh, Sam, Shane, and I, we looked at each other. We didn't want to do open source again. No offense to open source. we just had done it for so long that it was kind of tiring but then after going through this experience we were like we have to do this again because I think there is a just a missing library or framework in the market uh to help people build AI. So that next day Mashra was born and the CRM was dead. Um and since then uh we started you know from that moment we started building what we like to call the primitives of AI engineering.
Um and when we started the primitives were not as much as they are today. It started with you know LLMs with tool calls and a loop and then it has expanded to workflows and memory and MCP and skills and you name it. We are building the primitives for doing that. Um I'm sorry if this is a long story but we got into Y Combinator shortly after you know our first release. Met so many agent builders in San Francisco. So it helped really focus what the like what Mashra as a open source framework should be and you know since then we have you know our 22k cases GitHub stars we have over a million downloads a month like people are really resonating with what we're building.
Yes, you have been on fire. I'm going to share the repository here and for my friends that are joining me on LinkedIn hang with me. I'm coming with the link that way. So everyone go ahead and go to the link and start the project please. Let's get the project trending. I love Gatsby. I I think one of the I mean multiple of the very first things that I ever did like my first web pages were on Gatsby and then funny enough I became like a Neti fan after too like obviously like that that that that transition.
Um that art of the pivot is so interesting to me. That's a whole conversation. we need to sit down and talk about founders and making those decisions because it's hard to shoot the puppy when it's like you're like but no I love this thing. Um and then making the decision that open source is the way to go. Um I salute that. I know it's exhausting and having had a project like Aspie that you know such a long time and so impactful in the community. I know that comes with a lot of baggage and work and and and time and effort and tears and sometimes even blood.
So I can totally get the hesitation there. So you got into Y combinator and then based on this hackathon experience you build basically you're like okay there's an opportunity here to build this framework we're going to do it and then you pick your stack and you build it I guess you build it using um is it Versel or no the next definitely in the nextJS framework right uh yeah for the AI pieces we uh we started with using AI SDK okay and that was a really good way to jump start everything but AI SDK K is a great library.
Um, but there's a lot of other components needed to build an agent for production. And so then we started layering other open-source libraries. Um, and then started building some of our own uh to to kind of complement the LLM piece of the the puzzle. Amazing. So now you're Okay. So you you actually have a lot more going on now. Um, and so if I'm a developer and I'm watching this and I'm thinking, okay, I want to build an agent. I want to get started doing this. I need a framework that's going to support my scale, right?
Like maybe this is going to grow into be a business and they want to start with Mastra. Um can you show us a little bit about the framework itself in action? Like I think as as we kind of see a demo, we'll be able to, you know, dig into some more questions. And friends, if you're watching this and you have questions, please don't hesitate to post them on the on the chat. I'll be sure to read them for you. Um go check out the master website, too. It's so beautiful. I was just complimenting your designer like hazards.
Yeah. So, let me I'll do a canon like the the canonical weather agent example. I've I've prepared some other ones as well as the conversation gets a little deeper. Perfect. But uh let me share my screen and I'm going to show just how people get started with Maestra. And I might have to like unshare to put a API key in or two, but uh we can get started anyway. That's totally fine. While you're doing that, I'm going to share actually and this is when I met and my gosh, I'm I'm spacing out the name, but because you're you're either your co-founder or maybe it was it was all of y'all wrote a book, right?
Or Sam wrote a book. Yes, I'm going to share it because friends, I have it. It's it's it's a fantastic little like it's a good comprehensive look of how this works. Uh and it is free. Thank you for that. I love that. Uh I shared it on the link uh on the chat so that your friends can go ahead and grab it. And Mina, thank you for being here and to answer your question. Correct. This is open source is an open source library. Here we go. So I got your screen now up. Okay, great. So to get started with MRA, you can run MPX create MRA at latest.
I'll zoom in for everybody. Thank you J. I don't know what this npm thing is about right now, but uh um opensource Friday and I'll do in source. I could pick a my provider. Let's use open AI. That's fine. And we'll skip the API thing. Um and this is kind of my first uh discussion point. When we first started Ma um LLM did not know about us. So we had to figure out other ways to help users use their own tools to write it. And at the time Windsurf was very popular, cursor was very popular and MCP was a very new thing.
And so what we did was we built an MCP doc server. So you know once installed cursor and cloud code and all these coding agents would know about MRA via your questions to them. Um and then now we have skills of course. So we can add skills to this and we have different MRA skills that allow you to write MRA or let your agent write MRA because when we started a year ago people were still writing code by hand but as we know things have changed right so now our agents are writing code and so you need to be able to do a lot of good context engineering for that and skills are a great way to do that.
So let's just go with that and I'll initialize a new repository. Now we'll install the MSRA CLI. One very interesting feature of Maestra is we have what's what we call the MRA studio. It is a all-in-one place to build and test your agents. Um it's a nice companion to the code and you know you can go and chat with your agents there. You can go experiment and you know have a lot of fun. So cool. Now we have this. So it's Friday. Now, let me not let me not leave my key real quick. So, I'll go put this open AI key in and then we'll be off to the races.
I love this. So, you have your own interactive UI then studio actually allows you to like visualize this and do all the testing. That is pretty cool. Correct. And maybe I'll take two of these keys and try all of them. Um, yeah, that is pretty cool. and something fun about and thank you for having the MCP also and like um MCP is not dead friends. Yeah, I'm put I'm going to put it on a t-shirt, but you're one of the few um open source or rather types frameworks that you ship both sides of MCP like you have it like both as a client and a server uh in the framework.
Yeah. So, yeah, MCP is not dead. All right. You tell me when you're ready for me to share. All right. And then Maybe. Oops. I am I'm ready again. There we go. We're in. So So we're back in our project. I'll run npm rundev, which starts the MRA development environment. And you can then go to localhost 411. And now you'll see that you are in the MRA studio. And we ship with this very canonical example of a weather agent which I guess you know the reason why people do weather agents for example is the tool call to get weather is a open-source free API right so it's a very much easy thing to get started with um I set up open AI but in Ma we have this thing called the model router and it allows you to pick other models so while I was offcreen I added API keys for enthropic and open router And so you can see here that we support many models.
You just got to bring your own key. So it's not necessarily tied to a specific model provider. Um let's ask what is the weather in Miami, Florida. Let's see. So call the weather tool. You have this diagnostics here. You know, Miami, it's partly cloudy. Um, cool. And it should return soon or I'm having some internet problems. But, uh, listen, it's live. That's like we we know we know this happens. But while it's coming up, so you charge you picked up two different providers, right? Yes. U, so you can add I mean you can add any of your model providers that you want.
Um, but then are you able to pick what provider you use for what task? Yeah. So everything in Ma is dynamic. So I'm going to open up the code now. So I can do I'll open up cursor and oops let's go into an agent here. This is our weather agent. Let me move this here and I'll talk about the primitives a bit um while building an agent. So our agent class is the main agent primitive and what it does under the hood is it runs agentic loops and how you configure it is one you give it instructions or the system prompt then you can give it the model here I I we have magic strings in our model router but you could also pass an AISDK model or what you could do is you could pass a function and then return this.
So you have we have this thing called request context and you can imagine that if you know let's say if the request context that the user sorry get and based on you know let's say if it's Andrea I'm going to then return anthropic or that so like and this request context is where you can put this uh kind of uh contextual information to then make those calls a lot of this a lot of these parameters are uh configurable um dynamically and I think that's very much uh a necessary thing in today's uh game people are not using a single model for everything much yeah so I'll go put this back for now and if we go back to our demo it did respond and that's cool the next thing that MRO does because what we learned early on is if you're just chatting with an agent you really need to understand what's going on behind the scenes.
So you can see in the top left here we have traces. So this is what I just looked at. What is the weather in that? I can look this up and now I have a full tracing view of all the things that happened. So this was the input. This was the output other metadata that occurred and different attributes on the agent. I can look at you know the LLM call itself. You can see how many tokens it uh inputed and outputed. You can see the raw input and then the raw output and I'm using GPT 5.1 mini so it has reasoning state in it and you know that's cool you can see all those things and then finally you have these different steps of the agentic loop so this is you know just returning the raw data from that the next thing that we learned is with observability and traces you need to be able to constantly iterate on your agent.
And so that's when we build scorers. And scorers are a way to run evals. And evals are very important. Not a lot of people do them because I'll be honest, they're very hard to do, but you should be doing them. And if you are doing them, you can run scores based on the inputs and outputs that are going to your agent. So for example, this is a tool called accuracy score. um it's written so you know if I'm asking for the weather does it call the weather tool it should and that's what this score is trying to do then we also have other scores like completeness is the response complete you know these are all we call codebased scores they don't use LLMs to to evaluate things you can you know to score anything you can just write a function you can decide what that means to you can use NLP P libraries etc.
But there are some types of eval that are done by LLMs and we call this in the industry LLM as a judge. So this translation quality uses an LLM to score the response from the agent using its own you know model itself. So this one scored one because usually in eval you want to score between you know either pass or fail zero or one and you want to leave less ambiguity to it. Um to take that Oh yeah we can stop for a question. No no yeah for sure and let me take a look at the chat but I'm wondering because obviously you just mentioned like ealysis are hard to do right and like scores like this looks really simple and letting the LLM sort of be like the signal was this good or not?
uh doing like its own evaluation in a way. Uh but are you seeing like people is this something that people are using actively for their own like what if they're making mistakes while they're setting up scores because I would imagine like there's a lot of noise too that comes with it that looked beautiful though like that was like great but I don't know if that's something that's implemented in the way that scores work would it actually help you keep more clean information label like show you what matters. Yeah. So it really depends on the type of company that is really going deep into evals.
We have many big logos and customers using MOSA now. And the range of people doing evals is a spectrum. If you're in an industry that requires compliance like law or med uh healthcare, evals are being written uh religiously. Mhm. Mainly because you have people to answer to when things go wrong. But on the other end of the spectrum, you have many people building coding agents and personal assistants where the eval is the user's taste, right? If I built an email agent for myself and it did something wrong, I know that it's wrong because it's my email, you know, I have a relationship with it.
Um, so there's like a big spectrum. Scoring is not an exact science. It definitely takes many iterations to get right. And uh so to do to kind of build what we we like to call it the scientific method of you know having a hypothesis on how to change your agent for the better modifying your score to actually collect that that score and then over time seeing are you increasing score to from zero to one. Um and if you are consistently getting ones then you're doing pretty good for yourself. Beautiful. I love that. Okay. So, you register your scores.
They're in your instance and then you also have like the historical data that you can continue to run it against. Perfect. Thank you. Yeah. So, scores here, you set them up on the agent. Very similar. You can put all these different scores. You can say, you know, at what rate do I want to do these? We consider these live evals. Live being they happen asynchronously after a turn with an agent. So all of these will run 100% of the time because we we ratio that that way. Um and but you can also do offline in the sense that I can come here and I can create a data set.
I'll just make a fake one. Oh, let's do it. And within this I can just add items that are the inputs and outputs. Like what is the weather in Paris? It's hot AF. Let's just say uh this should be message I believe and this also should be I don't know result let's see and it might validate these things must be valid JSON you know it's always a live demo when it's a live demo this is how we know it's it's fail. Um, in any case, if that worked for whatever reason, um, you could have a bunch of inputs and outputs that, uh, you want to add or you want to test over time.
So, actually, I can go to our trace view and I can add this to our data set here. And I will add this to the fake data set. And and you know what? Let's let's go add some more things to our data set. So, what is the weather in Paris? Let's use a different model. Let's just use 40. Why not? RIP. It's stinking. Someone in the chat say he didn't like AF. That was it. Okay, that's not too hot. And then what's the weather in San Francisco? So, you know, as you're, you know, you're communicating with your agents, you're shipping them into production, you're going to gather a lot of different traces.
And some traces will be signaled good and signaled bad. Aka signal good means, wow, the agent really responded well. I should use that for a test item um going forward. So I know that given the same scenario, it's always going to respond well in that scenario. But then there's also some example bads where it did not do well and you want to make sure you have those. So here now we have items in our data set. We can run experiments on them and I can just run this agent. I can select my weather agent and I can select the score.
Let's just use the translation quality and I can run that. And so now it's going to take those inputs and output or input items and then run the agent on them. And then maybe I I change something in my agent. I can iterate on it. And then I can just start running these over time. So it scored one. It'll probably score one on all of them. You know, all of them were scored. So that's kind of cool. And this is what people are doing now um with us. Like we just released these these these experimentation features and so people are getting really excited about building their data sets up getting non software engineers involved because a lot of companies are you know there's software engineering teams for a subject subject matter expert company.
So for example if you're a vet clinic your engineers are not veterinarians. So maybe the veterinarians should be the ones deciding what is good and bad in uh the data sets and what is worth testing and what is worth noting. Wow, that is very fair. I was going to ask you and you actually just answered that question because there is a lot of like third party tools that do eval but you made it part of the framework like as in the framework. Yeah. the reason we wanted to and you know no hate on any other tool but when you're using those tools you're not actually as close to your code as you as possible um like you're let's say you wrote a monster agent but then you're doing eval product well we thought you know these things should be connected especially if you know you're running the agent loop in MRA well the traces and the observability and the evals and the data sets should all be part of that loop as well and so that was the decision we made.
But you can use products like Langfuse, Langmith, Brain Trust. We have exporters to everything. And this might be a good little thing we learned about open source. We may want to build our own things in open source, but we have to meet users where they are. And so, for example, when we first started Ma, we only supported lib SQL as a database, mainly because I thought lib SQL was cool, and I still do. But when someone asks you for Postgress, you can't say no. A lot of people are using Postgress. You want to meet users where they are.
Not everyone's going to use monster observability. If they want to use Langfuse or Brain Trust, we should be 100% in support of that and make sure that they can. So the way we designed Maestra, I can go just do a little quick view here. This is the MRA monor repo. And I'll just give you an example of how we do our code in a way that it is available for anyone to fork or extend. And the best way I think to show that off is storage. Monster supports many storage adapters because you need to store data when you're running these applications whether that's messages or traces or what have you.
And so in MSRA we have let me go to the base. We've designed everything as base classes. So we have these base storage domains. And so workflows has a storage domain, scores have a storage name, even observability, etc. And if I go to one of these, you can see that it's all class interfaces. They're abstract classes. So if you wanted to bring your own database, you just implement the workflow storage with your own flavor of whatever the heck you want to do and then you're off to the races. So we've designed kind of all of MRA to be extendable via these class interfaces.
I appreciate that. I think that that's where a lot of platforms go wrong, trying to lock you into your choices, but you gota you got to let developers choose what they want to do. I that's fantastic. I love that. And you're doing that with ewasability with storage. Correct. Fantastic. Yeah. Um some other things we can note is our agents have memory by default. So memory is the ability to you know may seem kind of simple but a conversation history is one way to do long-term memory. So as you can see this conversation has all my messages.
We have like a memory section here where you can add different types of memory. Semantic recall which is the ability to use vector embeddings and search through the message history. We have working memory which is a you know a memory type where an agent can use uh either a string or an object to write little notes for itself. And then we just shipped a new memory uh which is called observational memory. And I'll go and add that right now. observational memory. True. And what does that do? What is observational memory? This is new to me.
I have not I wasn't familiar with that. So observational memory is a new memory system we just released over a little bit over a month ago. And so couple things we learned. Actually was prepared for this, but uh if you want we have this like this obsession right now with building agents that never forget. Yes. And I just think it's such an interesting topic. I'm no neuroscientist or anything, but I really love how the human brain works. And right now, before we were working on observational memory, we were all using claude code. And every time we hit compaction in claude code, we felt like claude got labbotomized.
It just did not know what was going on. And we have this million context window. And at the time it was like 200k context window and you just feel like after compaction the journey you went on with claude it's always all for loss you know because you're kind of starting over and many people had the I'm just going to start a new session every time compaction happens the old session doesn't matter and we're just like okay this is a weird this is a weird problem let's like think about it so we know that LLMs can process so much information but like the memory over the long session is actually worse than a humans because I can still remember what happened this week even though it was a spicy week like you know I probably remember it more than my coding agent does and so we think that these things are kind of related you know like there's so much stuff that goes into agent context and there's so much noise but what you really want is the agent to make sure it knows and keeps the signal right I may have a bunch of pool calls to book flights or search for flights.
But the main observation here is I'm trying to go to New York City and that's really what matters. All this other stuff is just polluting the context window. And so our the common solution to this which many people do is you extract the signal from the user message. Right? I say I'm trying to go to New York City. It's going to go search the conversation history to find the right messages to build the context window dynamically. And that's cool. Um, but it invalidates the prompt cache. And you know what we what you want when you're well, if you want to save some money, um, you want to make sure that your your c your prompts are cachable.
So you're not paying uh the same price. There's a different price for prompt uh prompt or cash tokens versus input and output input. So we figured like LM never build rich continuous pictures especially if you keep like uh starting new sessions and if you're always dynamically injecting uh this information it's always like in a single turn situational for that one moment but you kind of want like a like a holistic view of everything. And so and then if you have too much signal how do you decide what doesn't matter anymore? You know for humans a lot of biggest uh thing that allows us to compress is that time right things that happened five years ago obviously maybe not don't matter anymore um and you know time is one way you can decide what doesn't matter emphasis you know your emotion etc.
So like this is like a typical retrieval pattern that people will use. You know, you get a user message, you embed it, you get it, put it into a vector database, you rerank it, you inject it, and you hit an LLM. And you know, this is kind of how it works. So you're just chatting with your agent every time you are just retrieving context from your database and then you're injecting it. And you know, pretty much this may be efficient, but it does not it's not cost effective, right? Especially if you're paying input tokens for every everything you're doing.
So that's one example. So Tyler Barnes on our team, he really do dove into this problem and it's honestly shout out to him. I'm just a figurehead for talking about it right now. Um we had he had this insight like you know humans are really good at like filtering out noise and what we can do is we can thrive like as humans we thrive with lossy memory. Um we don't necessarily need all all of the data to then figure something out. We just need you know highlevel pointers or clear events that have happened. And agents right now they need a lot of like handholding there.
So we thought you know it's forgetting a feature if you look at your common day you do all these things but the things that stay are the events that matter and the things that you forget are things that don't right like I don't really remember tying my shoes uh today you know I probably did or you know it just it's not important but I do remember I had to come on this podcast and I'm going to remember later that it was a great podcast right or live stream. So those are the kind of things that we wanted to model it.
So we wanted to model agents this way. So we built it. It works pretty well. And how it works uh is uh we have three agents. So your main agent like the one that I showed in our studio. This is the main agent we call it the actor. And these other two agents are the we call the observer and the reflector. And we think it's the subconscious mind of your agent. It is this agents, these two agents that are always watching the conversation history, making observations and then compressing things into events. So you don't necessarily need full context.
So this is how it works. I'll just show this version of it. And and this is three separate agents, right? Three different agents tackling the same memory problem. Okay, correct. And it's all running in the background. So, as you can see, as messages come in, these observations get buffered up. As soon as you hit a token threshold, usually we set it around 30k tokens because we're trying to do more with less. Um, it will then take the message history, compress it into these eventbased observations, and then now this observation list is in your system prompt, and you only need need to take messages that you have not observed yet.
And so what this allows us to do is you have something that's very cachable and it actually works very well in terms of recall. Um and how we kind of figure that out is we did a benchmark called the long mem eval. It's a 500 questions and it has tons of data like 57 million tokens of data and essentially it's like a question answer with an agent um and the model has to find all the right and so we benchmarked it against other memory products that are on the market and we did really well. This benchmark is quite old.
It uses GPT40 as the base model which I did say RIP GPT40 even though we just used it. Uh yeah, people. It's like a nice baseline though because if you can score well on a bad quote unquote bad model, old model, then what what would you be able to do in a new one? So in our results we got around 84.2% which is like you know pretty solid across the board. But then we thought okay what if you start adding really good models? So we did GPT5 mini scored like 94.9% Gemini 3 Pro and everyone in this list is good.
It's just we were you know our new strategy is kind of helping out in different uh areas. So these are the areas and once again it makes sense that these are the areas because this feels more human to me. Temporal reasoning is thinking about things in the aspect of time. That's such a human thing to do because we're always we're looking at the clock, right? So I think that was really interesting how we scored high knowledge update when we learn new things we should unbias the old things we know and start biasing new things we know and then multiple sessions like you are you know as a human we are multi-session in our lives talking to you right now talking to a friend I didn't change as a person I have to maintain context between these sessions so it was really cool to go through all of this and uh yeah and Gemini did pretty well there as well.
That's impressive. I'll stop for questions. That was a lot. No, that was that's super because honestly this is so timely because that the problem of persistent memory like especially now there's a lot more people tinkering with agents for their own things and they want to be able to offload the things that don't matter like what the exercise you just showed us of like I'm not going to remember the door that came through but I know I came in into the office. like is it important how I walked in kind of thing. Uh that's super super interesting.
And so you have three agents on this job now. Um you mentioned something that is super relevant to me right now. I've been reading a lot about this in doing my own experimentations with memory because of cost. And you mentioned something about the cost of cash precious input for tokens. And I guess that is the reason why we should be thinking about this problem especially when we're building agents for scale. Like you want your agents to feel human when they're especially like a support bot. I much rather not have to repeat to you I'm trying to find a flight to New York like we had this conversation already.
Um so thinking about cost obviously like what's one massive application but the human angle of it and thinking about approaching it like from the point of view of how a person does it is super fascinating to me. What are you seeing people experimenting with this? How old is observational memory? Like you say it's about a month old. Yeah, we released it in February and uh this is our research post. You all can uh take a look. Yes, please. Um and it's al deep dive on how it works and everything and you know people have built their own versions of observational memory in like Python.
Um we're going to be building out a more generic library so like anyone in Typescript can use it. You don't have to be a MRA user. Um, it's completely open source. Um, beautiful and it's super fun. Like, uh, I've been in the same Oh, I guess I don't want to get too far of ahead of myself, but like with this, I've been in the same coding session for a month. No way. Yeah, it's pretty cool. Okay. So, because see you we're moving beyond compacting and like what you mentioned earlier and I'm going to drop a link on the chat about how copilot handles memory because it's a new thing for us as well like because of the same reason like you don't want to lose their context of sessions being able to resume sessions and compact them for expenses for whatever reason.
Uh, it makes a lot of sense, but you've been coding the same session for a month using observational memory and it's working out. Obviously, you're not feeling any kind of degradation and you're you're still you're still coming out fresh. I love that. Yeah. I used to be so into doing get work trees. Yeah. And I still am. I just wanted to test myself and see how far this can go. So, I don't even use git work trees anymore. I just try to accomplish one task at a time as fast as possible, then switch contexts within the same uh session, fix that, and then you know over time you're overlapping in code areas, right?
Um and I have not missed a beat. Um so it's been it's been pretty pretty amazing to use. I appreciate that. I think if you would have said that to me six months ago, I would have hopping in a so box of gig works myself. Yeah. But I find myself not using them as much anymore because of this because now we have that possibility and like the linear path might be the best path. This is awesome. Let me take a look at the chat and see um if there are any questions uh from folks. There were questions about whether it's open source.
Of course it is. When you were doing the demo uh the studio is is is part of the project. Yes, that is open source. So when you're testing out Mastra, if you want to go in and and create your agents like yes, this is this is the framework. What you're seeing here is what you're working uh with and it is free to use. Um a comment about being Typescript first and definitely because frameworks are for sure shaping the way that we're we're doing things now. Uh I don't think there's any questions that are super relevant right now.
friends, if you want to ask a question of Abby, please drop it in the chat. Uh, otherwise, we can carry on with the demo. But this has been fantastic. Awesome. So, I want to Okay, switch gears now. I mean, it's the same gear, but we're going to go into a different perspective. Now, agents today, we're I feel like we're in a coding agent era right now. Um when we got into Y Combinator many agents that were people were building were for human human for workforce for example replacing the back of office of a small business or trying to replace accountants and these things and I think you know as I mentioned earlier writing evals is very hard to truly replace a human you have to come correct you cannot have any mistakes and so it's a very hard task And what I think has happened is coding agents do not require the same rigor because we are the taste makers of the thing that we're using.
Plus, we can decide not to push the code or commit the code. And so we've seen this huge expansion. It's a Cambrian explosion of coding agents and coding products. Copilot obviously, Cloud Code, Open Code, Droid, all these different ways to write more code. So, it almost feels like they're trying to replace us first. Uh, it feels like and whether that be the case or not, um, in MSRA, we want people to be able to do whatever they want to do. So, I'm going to turn this weather agent into a coding agent. Um, lines of code.
Yeah, just a couple lines of code and I'll turn this into a coding agent. So we have some primitives to turn your to turn yourself into a coding agent or your your agent into a coding agent. And we can do that with a local file system and a local sandbox. And I can also get a workspace. So workspaces in Ma are a collection of a file system and a sandbox put together to allow your agent to read and write files, execute untrusted code as well as leverage skills, right? Uh cloud skills or agent skills. Um so let's make this workspace happen.
Once again, workspace is an interface. You can extend it. And let's do sandbox. And let's do file system. Oops. I haven't tabbed in a long time. Um, what is this base path workspace? So simple as that. Um, I can also have a skills path. I think this is an array. Let's just say it's dot agent skills or something. Or is it skills dur? I forget. I forget all the time. That's why Typescript's great because I can just uh see what's in there. There you go. I want to get it stills. agent skills. Cool. Rail. So take this workspace and I'm going to pass it to my agent.
Okay. And because I have a studio Oh, go ahead. No, no, no. I just wanted to clarify like the premieres were there. You just gave it within the workspace the tools that it needed and then you gave it the execution, right? Like I missed that bit. If you give it the was there a uh let me see file space work. Okay. Okay. Okay. Okay. Okay. Okay. Okay. And then of course sandbox. Yeah. I could get a sandbox from E2B or Daytona or you name it modal etc. And file system. I could mount a file system from Google Cloud Storage S3 but you know for the sake of the demo we'll just use my local machine and let it yolo.
So I passed the agent uh workspace to an agent. Cool. And now if you see on the right here, we have all these workspace tools. Um and these just magically get added. So now your your agent is is able to read and write files. So let's actually say, can you write the Tokyo weather to an MD file for me, please? call the weather tool. Fun. now it's going to write the file. So, we can see here and it saved it. Okay, that's cool. Well, I want to go look at this. So, let me go to the workspaces tab.
And here we are. Tokyo weather. That's awesome. Um, why don't we add some skills? Huh? So, in MRA, we support skills. Once you have a workspace, you can go and find skills that you like. Um, for example, this is all the skills that are on a website called skills.sh. And I believe there's like something called a skill creator. We can try it out. I don't know if it'll work, but here's a skill creator. You can see what it does. It helps creating a skill. Let's install it. Cool. Now we have that skill and let's go back to our weather coding agent thing.
You can see now we have skills here on the right. So this is signal um help me write a skill for telling the weather. Let's keep on brand here today. Okay. So what you're doing now is you're using the coding agent that you just created to create a skill to do the thing that it just did. Which uh bit meta I think. But uh I love it. I listen if you show me that you publish that skill to skills. Well actually it's picked up automatically, right? Like it's it's like an automatic thing. Yeah. Um, but you should be able to, right, if you have MCP in there, you should be able to like create a ripple, publish it, send it on this way.
Look at that. And this thing is making it's making like a a script to fetch the weather. So now like you know this is just like next level. It's like whoa. Like how crazy is that? That's just amazing. Let's go back here. Let's see if we have our skills. Here's like our skill there. And uh that's all all cool. I should say you need to write it to the skills path agent skills. Oops, I forgot another API token. But as you can see, this observational memory was running and you can see it was trying to do um stuff.
I forgot the API key, so that's my bad. But at least you can see that it's it is running in the background. Cool. It's writing a file. It's all good. I think if we go back to our workspace, should be somewhere. Oh, it's right here. Weather skill. Pretty cool, huh? That is very cool. So, let's take it a step further than this. Okay. So, um as I said, you know, if you want to build a coding agent, you should. Um we have a coding agent. Um and you can get that from mpmi, uh master code.
This might have some problem because I don't really use mpm that often. I use pmpm. So, it'll probably do something stupid. We'll try it out. Try it out. Let's try it out. Knew it. Um, I just have to delete this path. Yeah, I don't know. That's this is why this is the reason I don't use um Cool. Cool. And now let's just run MOSAR code. Cool. So now you have a coding agent that's more focused than the weather the fake one I just made with real system prompts on how to be a coding agent powered by our observational memory.
So like when I was saying that I've been in a session for a month, I'm in one of these sessions. Um so I can say what's up. Actually, first let's set up some observational memory things here. You can set up your observer and reflector models. I'm using Haiku. That's chill. And then what are those thresholds? Those thresholds are for you. Like what your expect is or the threshold is like I'm going to wait. I'm only going to start observing if the conversation history reaches that. Okay. Reaches that. got and then to compress it even more um it has to hit 40k of observations already made.
So you're constantly trying to compress down um and then you know uh keep the context window uh like without rot. Um so then what I usually like to do is like please deep research this codebase um understand what we did and finally make no mistakes. All right. With a thread. So you can see you can see that when I installed this, I installed a master skill which allows you. So this is imagine if you were working on a master project yourself. There's a master skill here that's teaches the agent and steers it in a way to know what the framework does.
and you know um so this loaded the framework guide and now it's reading stuff it's reading the package JSON it's going to go and understand what all is in this just reading a bunch of files and it has a complete picture it's going to give me this information and I think I'm going to have it build me another agent for fun. Okay. Listen, this is very interesting though because you're like you created a space where you can build the agents or becoming an agent itself. I my my brain my mind is blown. No, this is very very interesting.
So what surprise me you just gave that okay surprise me. Um and so as you can see like we've built so this master code is open source as well. Nice. It uses a base class and that's the theme of the talk today. Base classes and interfaces. It uses a base class interface of what we call a harness and harness is a very popular word in AI engineering right now. Um, and so if you want to build your own coding agent or cloud code, you could extend that harness, use MRA agents, and you know, you'll get a lot of this out of the box stuff.
For example, the harness allows you to have a task list. It'll ask you to, you know, you can ask questions. This whole cloud code or cloud co-work UX gets built in by default when you're building with this harness abstraction. Um, and uh, yeah, it's just been really cool to like take open source to its fullest because, you know, a lot of these other tools are closed source. Um, and we just kind of wanted to like pull back the curtain and Thank you. You can build your own and expensive. Yeah, for sure. But this has a really nice photo feel to it.
I love it. I love So, completely open source and basically when I I can create this and create as many agents as I want. Yep. In MRA. Amazing. Yeah. And you can deploy it to your favorite hosting provider. Um because MRA can run as a server or you could use it in, you know, Nex.js or whatever server that you're trying to to run. Um but MSRA itself, you know, the thing that powers the studio behind it is a MRA server. So if you wanted to have a let's call it AI micros service, you could deploy just monstrous stuff by itself and you can put that on AWS or Nex.js or whatever and then now you are you're in the game.
All these agents are on the internet and you're off to the races. Amazing. Amazing. What what did it surprise you with then? What is what's making what's a glove agent? I guess we'll have to wait and see. Yeah, I guess it it'll help uh plan trips, I guess. Okay, I like it. You think and that that that was a deduction from the having the context of the conversation you just had. That's why I chose that. Yeah, based on the weather agent that already existed. Yeah. Yeah, that's very very interested. Um we're coming up on time.
I want to make sure that we tell people how they can get started and I want to share the discord. You have where is your community at? I know there is a discord instance which is very active. I think all of you are in there um chatting with folks. So I'm sharing that right now on the comments. So folks go ahead and join. How can people get started contributing to the project if they want to? What kind of contributions would you like to see? Uh this is a lot though. Like I had no idea there was so much in MRA.
Like it's going to be a fun weekend for me. Yeah. I mean been around for a year so imagine what we can do in another year. It's wild. In terms of finding us Discord very active. Uh we're all there chatting. Um on Mondays we do a live stream ourselves. It's called the AI agents hour. It's at 12:00 p.m. PST. You can find it on YouTube if you look on MRA AI. Um on Thursdays, every Thursday we do a workshop where we teach these concepts. Um you can find that on luma for master AI there. You know we have we took we talked about harnesses.
This week uh previous week we talked about master code and memory. Next week we're talking about multi- aents. So if you want to learn something every week, come see us there. And then finally, you can follow MRA on Twitter for X. I mean, you can follow me, my co-founder Shane and Sam. Uh, just follow all of us. We retweet and post a bunch of stuff and we don't troll that often. I like the disclaimer. Not that often. Beautiful. Well, we got folks already joining the Discord. Um, I'm going to check out your streams. It sounds super super interesting.
Definitely. Listen, friends, if you're working with this, like understanding what a harness is is going to take you a long way. So, and we haven't talked about this a lot, honestly. I know I definitely haven't like um but the ways that you can make this effectively work for you. Um and you brought up a super excellent point about how when we first started working, even three months ago, four months ago, it was agent to help Abby. Like it was I was building an agent so that Abby could work with it. But now we're moving to a spectrum that we need to build agents to help other agents and it's it's a completely different uh scenario.
So that's super super interesting. Very timely question. What is the future of Mastra look like? What's in the road map? What are you working on that you can share? Um so we will we have like three pillars that we're going to be continuing to work on. One is memory. Uh there's just so much to explore and we're so interested in it. So we're going to keep going down the memory path. I'm sure observational memory is going to get some powerups. So that's cool. Two, observability. While people think it's just traces and evals and things, I really want the scientific method to be automated for users.
Maybe you don't have to care about writing evals and improving your agent. Maybe another agent is responsible for that. And then lastly, community. I am so paranoid that the community will go away. So we will continue investing in it and making it the best. Uh we have a conference coming up in uh April 9th. Yes. In San Francisco. It's called TSAI demo days. Uh you're going to see a lot of companies who have taken not just MRAA, could be any agent into production. That'll be live streamed as well. So more community events like that. Expect those to come.
I appreciate that support and that's really what makes open source open source. So, thank you for keeping the emphasis on community and putting on all these educational things, taking the time to be here. Like friends, we had the CTO of the company demoing it for us. Come on now. Thank you. I mean, this has been delightful. Thank you so much for taking the time to come by. I'll be sure to add all of the links that we share in the comments to the description. For the folks who join a little bit later and maybe missed the beginning of the transmission, I'm going to post a YouTube link for where you'll be able to see it, but it's I actually made a short link for it.
So, it's gh.io and that's going to take you to the YouTube uh stream and you'll be able to see it there. It started from the beginning. You definitely do not want to miss everything that we talked about. Thank you so much, Abby. This was great. Of course. Yeah. Thanks for having me. I had so much fun. come back. I would love to anytime. Amazing. Thank you. Uh friends, we Let's give some sparkles to Abby. Be sure to follow Abby on Twitter. He promised not to uh troll us too much. Not all not all the time anyways.
Uh but you guys are putting out such amazing content. Like this book is is such a good gem of a little book, honestly. So, thank you for putting out all those resources and do it free and open and for all of us to have access to it. So, thank you, Abby. Have a great day. Worst. Yeah. Thank you. How good was that? I for one look forward to experimenting with observational memory this weekend. I've been going round and round in circles trying to do things like this myself when there is a brilliant open-source project that's doing it already.
Love that. Love the possibility of you creating your own agents and sort of examining that new frontier of like the agent to agent agent. what he just mentioned at the very last bits of our conversation how maybe you shouldn't or it will reach a point where the humans don't need to be concerned with the evals and there will be agents that will be optimizing that for us and I I I 100% believe in that this has been fantastic thank you all so much for joining open source Friday thank you for supporting open source uh let's see let's make sure that I'm not missing any links I giving you the links to the repository Please don't forget to go by the repository and drop some stars.
Drop some love to Mastra via stars. It is a free way that you can support this project. Go drop a star if you haven't yet. Join their Discord server. I been in it kind of like lurking. They are very active. They are all in there answering questions and interacting with the community. And I really appreciate that this company is keeping the focus on community and all of us giving us all these resources. Please don't forget to subscribe to our YouTube channel so you never miss a notification for when Open Source Friday is on. It's been my pleasure to be here with you today.
I'm going to share so you can make my little slice of hell. I mean, make the my interactions on Twitter better if I get to talk to all of you. So, please follow me there. Um, and yes, the link to the book, let me find it quickly. It's a phenomenal little book. I think I remember now where it was. It was a render ATL and I think it was their CPO or someone from their company uh was there and he was passing them around and I think they actually done a newer edition. I think this version online is an update from what I have in hard copy but it's very easy to digest and if you are coming into this new paragma like you're figuring out the way that you're going to incorporate earlier a s question like can I add an agent to my app of course you can like there's it's unlimited what you can do now um should you that's a conversation for another open source Friday uh maybe we should dig into that like when when is a good time for something to be an agent versus a skill versus a documentation.
Anyways, thank you all so much for being here. I posted the link to the book. Check it out. It's really a great book. Go on and give a follow to Abby on X and I'll see you next week for another open source Friday. Thank you for being here. Let me leave you with another let's leave you with a jangola. You know, I got to look through my through my scroll of things. Oh, I'll bring you in the very timely topic of developer choice. I am proud to say that GitHub is very focused on that. And this video that follows is proof of that.
Thank you for being here. Catch you next time. With GitHub Copilot, you can shift your development tasks from sequential to simultaneous with the agent of your choice in agent HQ. The agents tab on GitHub is where you can manage all your cloud agent session. From here, you can assign ad hoc tasks to copilot to claude byanthropic to codeex or to a custom agent. We can also use this tool to monitor our tasks all from one centralized location on GitHub. Now, most change requests come via issues and you can assign issues directly to an agent like Copilot, Codeex or Claude.
Choose the branch you wish to use as the base and again the agent you want and let the agent do the rest. Inspiration can strike from anywhere including away from your desktop. We can manage agents and even assign tasks right from GitHub mobile. Given that developers spend the bulk of their time in an IDE, it's natural to expect to have access to the same functionality from there. In VS Code, you can orchestrate all your agent sessions, local and cloud, from your IDE. Just like before, we can assign to claude or codeex or to copilot using a custom agent or specified model.
These tools allow you to assign and manage parallel coding agent sessions all within your existing workflow. Hey, hey, hey. Hey, hey, hey.
More from GitHub
Related Videos


Top 10 Agentic AI Project Ideas 2026 | AI Agent Project Ideas | Build AI Agents In 2026 |Simplilearn
00:24:35


Cracking Down on Illegal Operations | To Catch a Smuggler S7 MEGA Episode
02:12:18

Search Engine Marketing Full Course 2026 | Search Engine Marketing Tutorial | SEM | Simplilearn
04:43:50

Get daily recaps from
GitHub
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.



