Open Source Friday: Mozilla AI | Project cq

GitHub| 01:01:57|May 16, 2026
Chapters9
Host introduces Open Source Friday and outlines Project CQ and Mozilla AI's role.

Mozilla AI's Project CQ introduces a memory-driven plugin to share 'knowledge units' among coding agents, with local-first operation and a forthcoming hosted platform.

Summary

Peter Wilson from Mozilla AI explains Project CQ, a plugin-driven system that turns agent memory into shareable knowledge units. CQ aims to solve the exploding size of context by moving learning and observations into structured units that agents can query on demand. The stack includes a Go CLI, Go and Python SDKs, and an MCP server to store data locally in SQLite, with a remote hosted option planned for Monday. Wilson describes CQ as a “stack overflow for agents,” where knowledge units capture domains, languages, and a detailed fix so agents learn from past problems without re-encoding everything. The project includes a governance layer: local private namespaces, a remote review workflow, and a planned commons namespace for shared knowledge across agents. A demo showed CQ querying for GitHub Actions details and detecting version mismatches, highlighting the practical friction CQ is designed to reduce. The team is actively inviting open-source contributions, RFCs, and feedback as they prepare multi-tenant hosting and richer governance features. Overall, CQ blends local-first memory with scalable sharing, privacy controls, and an evolving standard around knowledge units for AI-assisted development.

Key Takeaways

  • CQ provides a plugin that integrates with Claude and OpenCode-style tools, plus a CLI and SDKs (Go and Python) to manage the memory system.
  • The core data unit is a Knowledge Unit (KU) with fields for domains, frameworks, languages, a summary, and the detailed fix, enabling structured learning.
  • Local storage uses a local MCP server and SQLite/SQL-based storage, with an upcoming hosted version to simplify deployment for teams.
  • There is a remote review flow: KUs can be approved or rejected by a human-in-the-loop before they become queryable by agents.
  • Project CQ surfaces practical gotchas (e.g., GitHub Actions version drift) by proposing relevant KUs to save time and tokens.
  • The team emphasizes privacy and private namespaces, with plans for a commons namespace and multi-tenant governance to scale collaboration.

Who Is This For?

Engineers and teams building AI-assisted development workflows who want to manage and share agent knowledge safely, privately, and at scale; developers exploring local-first memory with an eye toward a hosted, multi-tenant offering.

Notable Quotes

"the project the CQ project is the idea of it is as well we p pitching it sort of like a stack overflow for agents"
Describes the core metaphor driving CQ and its intended audience.
"the heart of the memory system is more like there's a schema that we published in the open source repo that defines a thing we call the knowledge unit"
Explains the KU concept and its role in CQ.
"Now, let me confirm this the KU since it actually did work"
Demonstrates the remote review and confirmation workflow in action.

Questions This Video Answers

  • How does Project CQ's Knowledge Unit improve AI agent memory and context handling?
  • Can I run Mozilla CQ locally and how does the hosted version change collaboration?
  • What governance controls exist for sharing knowledge units across teams in CQ?
  • How do I contribute to Project CQ and get involved with its open-source roadmap?
Mozilla AIProject CQKnowledge UnitOpen Source FridayLLM memoryOpen-source pluginsClaudeOpenCodeGitHub ActionsMCP server
Full Transcript
Heat. Heat. Heat. Heat up here. Good morning, good evening, and good afternoon. Welcome to Open Source Friday. Here we have Peter Wilson from Mosilla AI joining us today to talk about our new project CQ. And so, thank you all for joining us today. We're super excited. Uh, Peter, welcome. Welcome to the show. excited to to connect and chat more about Project CQ and and Mozilla AI. Maybe um maybe uh a brief intro to tell us a little bit about yourself and kind of your background into open source and your journey just to kind of kick off the show. Yeah, sure. Nice to meet you, Kevin. Um so I'm Peter Wilson. I'm staff engineer at Miser AI. I've been here for a bit over a year and a half now. I'm based in the UK and I've been doing software engineering for over 20 years now which makes me feel very old. Um yeah so still young at heart though that's fine. Yeah I joined uh I joined Missouri from Hashi where I used to work on vault um and before that I was a principal engineer out west um doing kind of like financial technology stuff and gets a little bit more boring as you go further and further back from there so kind of leave it at that. Um, yeah. What made you kind of get into open source uh from early days and h like how how have you kind of stayed into it along the journey? Um, I think it's something I didn't I didn't do a lot of. I probably used a lot of but uh but wasn't a great contributor. Um, I contributed like off and on now and then to different projects like in tiny tiny ways I think you know like oh I can do a doc PR and feel like I helped you know because I suffered a problem. um which is good right because like you know but at the same time wasn't doing a lot and then I think it was when I joined Hashi Corp um they obviously had a big focus in their like principles and tow around how they wanted to do work and they had before they changed the licensing and all the rest of it they like an open-source model really around how they did their some of their products and I think that was really cool and they did a lot of work in like the go ecosystem with libraries that are still open source now so I guess I started getting more involved with that sort of thing. Um, yeah. And that was when it was like, oh, this is quite nice. Nice to be able to do a bit of both, right? Like you want to contribute to open source projects. It doesn't mean that, you know, everything in the world has to be open source. So, yeah. No, that makes sense. And I'm curious, like, do you remember what your first stream was that you you kind of worked on? Uh, open first open source one. Yeah. the first open source one, but also the first um maybe like the first contri contribution that you had or the first like uh you know project that you actually like worked on. Not just that you consume but the first like commit that you had in an open source project. Do you remember at all? Uh putting you on the spot. On the spot, no, but I would imagine it is literally like a little doc PR on sure thing. I don't know. You're going to Is this something you've done your research and now? No, no, I was just actually really curious. I I'll I'll dig it up while while while we're chatting, but I was just kind of curious if you've if you've had some thoughts on that, but yeah, very curious. I'll think about, but I'm Yeah, I'm not sure. Cool. And and then maybe like let's kind of start just shifting a little bit into the project. So maybe just at a tell us a little bit at a high level like what is project CQ? Um how did you find the the problem space itself and like what are you seeing that you were like, "Hey, we need to build something to make this work." Sure. Um so the the project the CQ project is the idea of it is as well we p pitching it sort of like a stack overflow for agents for my for my smooth brain it makes it much easier to understand. Yeah, I guess the idea being that you you sort of run into problems when you're using these coding agents quite a lot and there's ways to try and teach them not to do things but it sort of like there's different levels to that like you like the pyramid of testing and all the rest of it the same vibe you know you've got your rules in a repository or whatever then you've got like your memories that are associated with that thing and you've got your global rules and but then it's all kind of at that point linked to your machine or whatever and normally ally linked to your specific coding agent or you know it's getting better I think now because there's a lot of um fall back between so use open code for example I think it honors claude MD files if you can't find agents MD needless to say you get all this stuff um and which is good right and it's useful but we start seeing like you end up like huge huge huge files as you sort of like adding more and more and more and more and you kind of expect the the LM to be able to just like have its attention on all this stuff and doesn't always work. So, I guess the idea with CQ was like, okay, how how can we like fix this? How can we I guess a make it so the agents can share the stuff that they're learning so that they don't always have to have the problems, but b kind of make it more on demand. So, you can we've created this plug part this part of the the project. It's quite I guess it's quite a big monor repo now is that I guess at its core it's like a plugin that works with open code or claude or some of the IDEs as well that will install into like cursor and wind surf um that plugin's got a skill which kind of tells it how to get on with its business and um it runs an MCP server underneath the hood to let it do calls to a local SQL like database to save stuff or if you've got remote connected up with one of like the server then it'll send there. Um, and yeah, so before it starts a task, it's told you should go and query CQ and see if it knows anything about this beforehand. Um, the idea is it kind of like it's not really like for project specific stuff. It's more like general gotchas, you know, like oh this library that open source library everyone's using this version this problem or I checked the docs it said this but when I tried it it didn't work or I spent 10 turns doing something so if I'd just known the last thing I could have saved all those tokens and the idea is it kind of proposes these things when it finds them. Um, and yeah, there's there's an approval process on the remote if you've got that configured and once everything's approved, then it is available for any anyone else who's quering that as well. Kind of like the high level vibe. Um, but yeah, we've built a lot of stuff in the last two I think it's about two months now. So, there's like a uh Go SDK and a Python SDK. There's the CLI that's written in Go um which also host the MCP server. There's the plugin and the skills and stuff. There's an open source server in there, so you can kind of host it locally or like amongst a team or something. Um, yeah, I love it. How many people do you have working on the project right now since it's uh since it's been started? Uh, well, there's a lot of one startup still, so we're kind of like juggling a lot of projects and a lot of people. Um, I would say I've spent most of my time on it. Um and we've had about say one, two, three, maybe three or four other people kind of coming in and out helping with different things when they've got a bit of time. So like massive shout out to everyone who's who shows up uh on the cont like the contributors. We've had a few open source contributions as well which has been really cool. Um but yeah, and we're trying to build as well uh like a hosted version which I think we we're going to it is sort of it is up but probably better to do an official release on Monday I think. Um, so that'll allow people. Yeah, we'll do an official one on Monday because uh Friday is maybe not the best day to do this sort of thing. But yeah, it it just means that people who maybe can't or don't want to ho like self-host their own image or whatever, they can have this we'll we'll do this sort of stuff for you and we we kind of want to make it free for people to to push their own private like like their own private name space and then we'll build features on it from there. But more on that. Yeah. Yeah. I mean maybe maybe let's dive in a little bit like can you share a little bit more about the architecture of the project and kind of how you've built the system. I mean obviously the the the challenge of going from local to cloud is always an interesting one and like what you're choosing to like build from the the memory system itself like would love to unpack that a little bit. Uh yeah can try um shut me down if I start saying things that like I've either already talked about or that are a bit boring. I guess like the heart of the the memory system is more like there's a schema that we published in the open source repo um that defines a thing we call the knowledge unit um and that knowledge units supposed to contain the kind of um when it gets proposed the agent can try and figure out like what domains it thinks it's applicable to like is it testing or you know CI/CD or whatever um and then there's a frameworks and a languages field in in the schema. So it can propose like multiple values for all these things. Um and then like a sort of summary of what happened and then the detail of how it fixed it. um that kind of like I think more or less encapsulates like this knowledge unit and then once that's in the system um and it gets sent back to you know like a an agent that finds it via the query on it gets the skill tells it to validate things it gets so to try and there's like a few things in place to try and stop you know stuff going wrong although I'm sure it will eventually um but yeah once it's validated something and then it says okay this thing's legitimate and it applies the fix it's it confirms it so it'll send it back to CQ to say, "Yeah, yeah, that one worked." And the idea being we want to build up like signal of over time of like how often something gets confirmed and how many times being confirmed by like how many different types of you know agents, all that sort of stuff. Um so like at a high level that there's that um that going on architecturally. So yeah, there's like I mentioned before. Yeah, maybe maybe unpacking that just a little bit like how do you ensure like high quality confidence in the memory itself and accuracy like are you building any like scoring around the the memory units itself for retrieval or what does that look like? So yeah, so like the schema is kind of like being intentionally uh light I would say at the moment and we had we had some people we'll call it. Yeah, just even internally we're chatting about all this stuff and it's like should we add this, should we add that? And for now, I think the idea that we've kind of we're trying to align to is like, okay, let's build stuff in a back end ourselves until we prove that it's really useful and then we would propose it to the schema. And there's other there's other folks working on like similar systems, not just like folks at CQ, but kind of like in the same space. we've already had a like conversations with um around you know like potentially like a working group and stuff like that to try and get more of a standard organized around this sort of stuff. Um so at the moment the the kind of the magic the secret source or whatever you would say of the scoring and stuff is like um kind of detached. So you have like the knowledge unit which has got like evidence of which has got all this other stuff attached but like that's calculated like in a back end. Um and so yeah, that kind of leaves a bit of room, I think, bit of flexibility for folks to implement it how they want at the moment. Um yeah, it's trying to strike the balance of like a protocol and a platform and not trying to force too much on people. But sure, yeah, I think we'll hopefully on once we do this stuff on Monday and we get folks using it, then it might give us better insight into like what's useful and not useful for people that'll drive it from there. Yeah, that makes sense. And I I'm curious like from an immediate use case perspective, are you finding certain um users for like whether that's like the actual memory layer using this more frequently or is it like app layer that are kind of pulling this in right now? Like where where are you finding initial traction and and usage? Uh I would say it's mostly the the gotchas. I mean, I should I should I'll show after we we'll do a I'll do a little demo later and then I can also kind of like uh do a here's one I made earlier and show you some of what these things look like. Yeah. Um I think a lot of them are are kind of around like if I've you know we've used it a lot for different things like not just saying for just purely coding but like dealing with code review stuff or like interacting with GitHub's API and um things that so you see a lot of like oh I try to do this thing and it doesn't work like that and so you have to you know be careful when you use this API not to do such and such. So there's kind of a lot of like um I guess like the glue between things is where you know where stuff normally seems to you kind of run into friction. Um there's yeah quite a lot of stuff mainly around libraries and refactoring and stuff then other stuff I've got back up that I I'll kind of I'll we'll do a demo in a bit and I'll and I'll recede the thing quickly behind the scenes and show you afterwards like kind of what it looks like in the UI and stuff. Yeah. Maybe maybe unpacking just a little bit of like this construct of a unit of memory which is an interesting topic in and of itself like how did you land on that as the the the thing to solve specifically and like what has made it so useful or valuable in the tools you're building? Um I think it was like how what's the easiest way to to do to like to do this like because you know like the kind of I don't know like the root cause I guess is you know the the training data that the models are using mostly is like slightly behind and it's always going to be slightly behind. Um and so it's like how do you kind of like patch that I guess with like knowledge that's relevant at that point in time. Um, so like it just felt like this is how do we kind of give it the smallest amount of information to like make me happy when I'm using these things daily? Um, yeah, I guess would be like the the approach really. There's not like any crazy science behind it. No, that's fair. It's interesting just to think about. And then do you do you see people pairing this with other uh like open source memory frameworks as to sit alongside that or you know like what is that what does it look like in practice if someone's going to enable this in their in their project? Uh so at the moment we've had like I said we've had a conversations with folks that have that are doing different things and that was more on the standard. Um we've got the SDKs that let people kind of like hook hook in easy I would say. So like if you don't if you you don't want to use it in kind of a agent like a coding CLI type tool but you want to build it in your app like you say if you've got an agentic loop in your app um we've got a they've got the go and we've got the Python SDKs which kind of can kind of do all the stuff it needs to do. So like you can it can emit the prompt like the skills so that it can tell the agent how to do the thing. Um, and it can handle kind of making calls to save stuff and all the rest of it. So, like I said this a couple weeks ago and we haven't kind of got round to yet because we've got a lot of I don't know if I like shout out all these products that we're building at the at the moment, but I really wanted to like integrate it and I still do want to integrate into a product we're building at the moment where I think it would be really great like it's part like the flywheel to to let it run and get stuff emitted quicker. Yeah. Um but yeah those those tools are available for people like the CLI as well like you can just use the CLI it can try to think what it can it can basically do it uses the SDK under the hood so it can kind of do everything the SDK does that makes sense command line why not right uh so so I'm I'm really curious like we started local first like and you're talking about going into like a a cloud hosted version how did you start there as as kind of the the framework and like why more or less like why did you start there and then how did you think about expanding that as like a shared memory across teams or or workflows? Um yeah, how has that evolved? Um I think it was like selfish at first. It was like how do I fix this? um went for a walk one morning to drop the kids off at school and on the way back I was like listening to an audio book and someone someone said something in the audio book they just like flick the switch and I thought oh like obviously you know people will have done this before but at the same time like can can we do it as Mozilla because it feels like if we can get if we can get it right and we can get people behind us like we this is something we should maybe be involved with. So it was more about okay like I guess that started a vibe of like how do you test something like how do you build how do you build a plug-in for claude like how do you then you know save something locally in SQL light and get it to check it and then once you've done that a bit okay how do you how do you look at sharing that with other folks in the team or whatever. So it kind of was more like a natural progression than like a sort of a a plan or anything like that. No that makes sense. I I mean it seems like you know more and more of the tools that I'm seeing are kind of starting with a local first mindset. Well, for a variety of reasons, right? Is like get it building locally, it's faster, easier, but also, you know, from a local model perspective, like you're not having to spend a bunch of inference trying to like test out applications, which is kind of interesting, you know. So, so there's a lot of reasons why, but I I ultimately, you know, the the idea of having shared memory and shared applications becomes really powerful and important. So like as you're thinking through the unit of memory, how does that scale and how does that become shared? Is actually just kind of an interesting construct to to to put in place. Um maybe I mean if you're ready like do you want to pull up a demo and and test it out and show show? Yeah. Yeah, I can give it a go. So as you're saying that I'm thinking like I should be saying more related names here just for anyone who's watching. So I was going to say like with all the local stuff like we do a lot of stuff around small models like I say we when I say we I mean like everyone else who works there. Um but yeah like so we've look after llama file which a lot of people have heard of. We've got any agent libraries in Python, any LLM library in Python which all kind of like playing in this area of like people being able to just get on with stuff without being tied into certain things. Anyway, sorry you did ask for a demo. So I mean I think that it's good to call those out because I think you know you all are doing some really interesting stuff and and so it's good to have that on on uh on the stream as well. maybe like we can showcase that at a later time. So yeah, let's pause the demo and let's let's walk through it. Okay, one moment. Sorry. No worries. Okie dokie. Cool. Pulling that up. This definitely won't go wrong. I did do this demo a little while ago and um I did the thing where you're not supposed to do where moments before I messed with something locally. Got to put it back and it didn't quite work as expected. I have I have done that before. It's like oh what am I thinking? Um, so yeah, the the the kind of idea of this is that um if I just show you in this repo, uh there's not much going on really. Give you a quick look at what's going on here. So there's just some like basic Python stuff in here. Nothing special. And um what the plan is is to uh I guess start up Claude. That might be a good idea. Um oh actually no, let's let's do the demo properly. So you have to install this plugin, right? So uh we can do that with this uh cloud plug-in marketplace ad and then Mozilla CQ. Oh, there you go. It's already on disk. Thank goodness for that. Um and then once you've got the marketplace installed, um you can install the actual plugin which is called CQ. And it says, "Yeah, that's great. I've done that." Okay, wonderful. Right now everything will be wonderful. So I can do claude for everyone else at home. Once you've done this normally like you can just get on running it locally at that point. It's completely fine. Um but we'll you can also do in the readme as well on the main on the GitHub repo as it explains you can kind of set the environment variables for your agent if you want to connect it to a remote. So you can give it like a CQ address to point to uh and an API key to be able to like tell the remote who it is. Um so I've done that behind the scenes just just full disclosure but normally you wouldn't have that and you would just be saving these things locally. Um so let's see if it works for a start and then uh what we'll do is I will ask it the same thing I asked it last time which is to add some GitHub actions. So, if I say uh add a GitHub action CI workflow to this, which this is where it lives, just in case anyone cares. Um, you did want to see what the code was. There's nothing much to it just to run the tests. And if I kick this off, uh, what's happened in the past normally is that it'll happily go and do these things. Um, oh, good. It's asking if it can use the CQ skill. Yes, of course. You may always use it. So, you should. And hopefully it'll now say, can I also use the query? And we'll say, yeah, you can. There we go. So, it's going to ask if it know if CQ knows anything about CI or GitHub actions or UV or Python or Piest for Python languages. It's handy. Unfortunately, I don't think it's going to get anything back. So, it'll probably just say nothing. I'm going to get on with it and then we'll get the kind of normal world of um There we go. No, no, right. So, and so starting point. Yeah. Yeah, the same pain that hopefulness everyone has. I mean, I say this and what's probably happened is like Anthropic fixed everything behind my back recently. But um normally you get the because stuff moves so quickly and the the training data doesn't. It'll say like, "Oh, I confidently know how to write a GitHub action and it'll do it, but it might use a version of say like the checkout action that's like one major version out of this. Not even just like, oh, it's not quite right, which which also in this example might be fine. like it's something that you have to use the latest versions of stuff. Um, but it was just a thing that I saw a lot when we were playing with this and sort of like I was like, "Oh, this it always does this." Um, one of my colleagues, Devday, shout out to Devday, um, also had a really good example, which I'm not going to try in case I get it wrong, which was where he was asking Claude if it could update its own configuration for an MCP server and just like watching it going wrong like about five or six times before it finally figured it out. Um, so yeah, we'll just let it we'll let it play around in here. Um, cool. And then once it had done that, it figured out, okay, I should I could have done this in the first instance. Um, don't worry. Don't worry. It's just one real quick. Yeah, the folks that uh clawed code there just using a bit red scare everybody. So, here we go. We'll have a look at the diff when it's finished, but um uh yeah, yeah, you're okay. You can try. Don't know what it's up to. And yeah, you can run that. I mean, why not? Um, I always find it interesting what it recommends and when to to run certain things. Yeah, that looks right. Yeah. So, like you're reading the command to be like, oh, because it likes to do a little thing where it'll stick two things together. And then you like, hang on, what are you doing at the end? And you're like, all right, okay. It's trying to fatigue me, so I'll just say yes to all or something. But, um, okay, cool. Right. So, what does it do? It said, "I've added this." Okay, that's great. And blah blah blah blah. And here we go. I already can see here. It was like I used actions check out uh version 4 and Astral setup version 4. And I know and and we will check uh if I go actions uh check out. We have to excuse me. I'm not going to I'm not going to switch over, but uh if I just look on GitHub at actions checkout right now, it's currently on 6.0.2. So, it's kind of out of date. Um there's Yeah, I think that's I think that's fair. So, what we'll do is we'll say um okay, you've just like used some stuff that's kind of out of date. So, I'll just paste this in. I'm glad this worked. You will see. So I said the versions of GitHub actions use a major version out of date exclamation mark. So serious. And then I'm really h I'm really kind and so to save it going off I'm like you should just check these places. I already know they exist. Y um it's like yeah I'll go and have a look at the latest release tagged like you sure can go on then. And it's oh yeah six that's a bit of a blow. now. Okay. And you V. Yeah. Sorry. I'm just I'm very reluctant to just press yes to all on on a I definitely can appreciate that. I'm not the type to just kind of like, you know, keep pushing yes. Yes. Faster. Uh, okay. So, it's it's changed some stuff, including using a more up to date version. Yeah. Oh, version eight. Oh, okay. So, yeah, we can let it do that. You can run your mad Python from the command line. Yes. And what a wonderful demo it says. This is exactly the kind of thing CQ's for. So I should tell them about that. So it's going to be like, can I use this plugin? I'm like, yes, you can use the plugin. And then that's perfect. I have great happiness in my heart. I'm not going to lie, Kevin. Like do a live demo is terrifying. And everything lines up exactly as you want. Yeah. Yeah, it's it's perfect. Let's have a look. Although it said it missed something. I'm not quite sure what it was saying there, but we'll find out. Here we go. Right. Okay. It did some stuff. And then there did you propose that? Kind of weird. Oh, no, it did. Sorry. It says if I just read what it told me. I see. Yeah. Yeah. Sorry. Sorry, Claude. It did tell me and I just got tired of reading. I was like, "Okay." So, yeah, it did propose this thing which it's been given an ID for. So, that's the knowledge unit right there. Right. Yeah. Exactly. Um, and so if I do like CQ status, we will see. We will see. Yeah. Yeah. You can use that. That it's got this one locally. Here we go. Oh, there you go. I told I told you this as well before where I was like, "Oh, yeah, yeah, you know, I configured this thing behind everyone's back. I'm really clever, but I put an API key in there that's absolutely no use." So, we'll I'll fix that at the end. But for now, what that means is it falls back to saving it in this kind of offline local cache. Um, so that's great. Okay, cool. So, it's got the thing here. Um, and now what we can do is if I just get out of here and um if I do get status, you can see that it fiddle with that thing. And if I just do get reset oh with all the typing skills and okay, now it's gone. And if I start a new session with Claude, so no d- continue or dash c and it's back and it's, you know, fresh as a daisy. It's not going to make any mistakes. And we say this time, please do the same thing as we did last time. Then what I hope we should see is that it says, I'm going to query CQ. It's loaded. Let's look. And I'm hoping that a query then finds this information and says, "Okay, that's going to save me some time." So, let's have a quick look. Oh, okay. There you go. It did find this one. Um, here we are. K says, "Verify current major version before. Let me check the actual latest versions for the current thing." So, yeah. Okay. You can go in check some stuff. And then you can see already it's just found like six and eight. So, it's like, okay, cool. And we'll let it check the other one as well. Yeah, here we go. I need like music or something just to like just just to take the edge off while it takes its extra few seconds. You need the Yeah, the the waiting room music. Please hold It's good. It's talking to GitHub about something and then talking to Anthropic about We'll see. Yeah. While this is a long This is a lot of thinking. I'm very curious now. Okay, cool. So, now it's like, "Oh, watch me." I'm like, "Yeah, okay. you you didn't do all yes to all you edit all those files and make yourself happy. And I'm just going to in the background generate myself an API key which I'll um explain later. Oh, but I'll also say, yeah, you can run your luck. Off it goes. Oh, and then it wants to say, okay, tail. Yeah. Yep, that's fine. Oh yeah, you just too easy sync. Oh, I finally gave in and said yes to all, didn't I? Knew would happen at a certain point. Yeah, it's uh How many yeses do you need to keep pushing? For sure. Okay. Like this where it's like this is really what it's doing, but then you get asked about this. You're like, "No, Claude, I don't want to be tricked into this." Yeah, exactly. It's always about staying on task. I promise you, we're nearly there. Uh, Eevee run. Oh, yeah. Whatever. I mean, if it goes wrong at this point, it'll be fine. I think you'll have most of the value. Yeah. Uh, come on now. It's a lot of thinking, a lot of fiddling. I'm still just excited cuz, you know, it got the thing that saved a lot of time up front. Oh, here we go. So, now it says, "Now, let me confirm this the KU since it actually did work." And then I can also It's learned something else. So that's interesting. That's cool. So you can say, yeah, you can do this. So it's going to call confirm on that that KU it got um which told it about the major versions. That's going to be handy. Um and then it's going to propose. I think I'll ask if it can propose a new one and then we'll see. It should give us a little summary. What is the the key like action that proposes a new knowledge unit? Like do you have anything under the hood that makes it smart around how it identifies new knowledge units or or what's the heristic that you're using? Uh no no it's kind of more the skill. So it's like the wording of the skill which obviously leaves it up to interpretation of the LM. Um, so like in the open source repo, you can see like the main skill and there's a whole section there which is like um when should you propose something and it says like uh I'm trying to find quickly if I sorry I was just trying to quickly open the web page to see if I can um it's just interesting because I'm curious like when and how it should identify when a knowledge unit should be created and also like do you run into situations where it says it proposes a knowledge unit, you're like actually that's not really helpful or like not something that we would want to include because it could to your point earlier distort or or misguide the the LLM in the future because it's just not helpful context, you know. Yeah. So, yes. So, a couple of things to try and uh cover, I guess. So, like Yeah. So in the skill there's like a whole section that says um when you discover something that you think would save another agent time call call propose when and then there's some like examples like you discover undocumented API API behavior you find an unobvious workar around like blah blah blah. So it kind of like tries to give the LM something and the agent something to kind of say here's the criteria here's some good examples here's some bad examples. Um, we've also got was just parted a a blog post really recently um which was kind of like had a co-contributor on. So um Daniel, one of my colleagues put out a blog post um with Lauren Mushro which was kind of like trying to bake in this vibe check we call it as we call it that's what it's called um into CQ so that when it goes through the stage when it wants to propose something it kind of like runs what it's about to propose through this like um like five stage vibe check to try and make sure that what ends up getting up there that doesn't include like PII and stuff. So, I know Misilla on our blog, Mozilla's blog. You can kind of find that there, which is quite an interesting one. Um, was there something else you asked me as well, and I don't know if I didn't cover it. Well, well, just like if it ends up distorting memories because you end up like consuming things that you may not need down the road and like or is there a way maybe to remove things in the future or or maybe there's even examples where you say, "Hey, keep everything, but then you still just need to steer it over time." Like I'm just kind of curious how you think about that. Yeah. So, yes. So if it finds if it gets given some uh KU that it validates and says that's nonsense like say it was like can I just go and check this GitHub thing then and then it comes back and it's like nonsense then the skill protocol kind of tells it well you should flag that KU and say like that it was a stale or it was not it was incorrect or whatever. Um and that kind of overall in the you know in the back end will affect the confidence scoring and the relevance scoring and stuff. Um, and also when you've got when you've got it set up with a remote, like every when you've got it offline like we have at the moment, like everything just goes straight into the SQL like database and it's kind of there. Um, when you've got it configured with the remote, which I'll show you in like two seconds time, you get like a review. So you can say like yes or no. And so until it's approved, it won't show up in the queries for any other agents. So it's like you've got a way to basically like a human in the loop stage that we've got there at the moment to kind of vet that. Um yeah and then the idea is that with our hosted platform as well I'll be able to like have different heristics around what eventually might mean that something needs to be re-reed or all that sort of thing. So like there's there's a lot of fun stuff we can do. Um so yeah, hang on a sec. So oh sorry Chloe's going off on a tangent. I don't think it's helpful to like carry on X because it's basically like I didn't make the tests yet because of this other thing and something else. It's like okay, but we kind of saw that like it did actually figure out the CQ side of things. So, um did it write stuff? Oh, yeah. It did it did some stuff. Anyway, so uh what we can do though is if I stop Claude all together and then start when you start Claude up the first time, it obviously starts all its plugins. Um so you can see here in the plugins installed CQ and then if you look at um oh sorry struggling normally you can also see that it's got an MCP server as well which why can't I anyway sorry mcp okay there. So you can see that it's got an MCP server running and like once it starts up as well, it normally drains all the the things when it's got an API key that's valid and not a bad one. And so behind the scenes while we're doing this demo, I tried to update it in my config. Let's see if it worked. So what I'm expecting is that actually yeah, it can talk to the remote and it'll say okay like I don't have these things locally anymore. Yeah, there we go. Right. Nice. Excellent. So the thing I didn't cover and we won't bother talking about right now is like say for example you do loads of stuff and for some reason it doesn't propose things at the time when it finds them and you know you've fought with your I close the victim today but like you fought with your your coding agent assistant or whatever of choice and at the end of like your session you're like I've had enough of this like there's been so many mistakes you can do like I won't do it now but you can do like the CQ reflect and that again is like a sort of skill driven thing where it says like go through your session and find out the find those problems and then like abstract them and propose them and it'll give you like a little table and stuff like which ones of these do you want and you can say like h number one and number three or like I want to edit number two first so you can kind of tweak them um but yeah we won't do that right now and I'll flip over just quickly so uh sorry have you found for each uh for this right now have you found whether certain models are better or worse with CQ or or do they all kind of work the same? Uh, I it's hard to say because let me just switch that over while so don't forget. Let me know. Can you see that? Right. Okay, cool. Um, it's hard because it feels like I don't know if this is like just one of those weird like some kind of um there's a scientific phenomenon or something, but it always feels like you're using something and just when you trust it and you're like this thing's getting good, it's almost like they kind of like twiddle the dials behind the scenes and it just goes to pot. So, We've been using Claude a lot. I've using um like Codeex playing with Gemini and things like they all kind of have moments where they seem really like lucid isn't the right word but like you know on on on task and then other times when they go all over the place. So like it I think it's more I don't know it's a bit up in the air. Weirdly I've seen maybe until recently like it's I think now the we've made some tweaks to the the skill as well because it's really hard. You know, we talked before about all the rules people stuff into into the like Claude MD file. It's like kind of the same with your skill. Like everyone gets a tiny mast head to like advertise to Claude. What your skill does and when it should be called and so everyone's like vying for attention. Um but like I think we've got it right now for like when it should be getting triggered. But when I was playing with open code and using codeex, it was kind of it was quite happily like querying CQ like regularly whereas I had noticed before which it wasn't doing it as much. But it's much better now. So that's good. Um so no is kind of the answer. Don't know if anyone's better than another. Um anyway, sorry I'm running this locally. We'll talk about it uh publicly I think on Monday now. I wanted to show you show you the the pro one and and it is uh accessible, but it's just wasn't going to be as fun for the demo. Um sure. So yeah, we can so this this is kind of like my of my local running version now of the remote um CQ instance. So, if I log in with GitHub and log in as myself and it'll say welcome back. Uh, here we go. So, I can sign into C into the CQ uh remote and um here you get to kind of view. So, it's okay. We've got two KUs, which you might remember from before two like pending a review. So, we can kind of go and see what's going on there. Um, and then should I make this a bit bigger? Yeah. Uh, so yeah, you get this like little inbox type view. Um, and then you get to see like, okay, this is what happened. And then it kind of explains what the deal was and then it tells you what it thinks is the recommended action. Um, at this point, like when stuff's initially come through the review for the first time, there's only ever going to be like one confirmation on it because it's the agent that proposed it. But later, you might find that something's been confirmed or flagged and ended up back in re-review. Um, and so this is where we have this human in the loop stage where we would say, okay, like is there any anything in there that either we think is going to trick an LM that you know we hasn't made it through any of the guardrails checks or uh PII like personal data that we don't want other people to accidentally query. Um, and then once you're happy with that, we're like, okay, we read that, it makes perfect sense. Then we can say approve and then confirm. Yeah. Okay. It's saying once you confirm this, like any agent will be able to see this thing. Mhm. any how do you think through like governance and like policies of who should have access to certain things or is that kind of like up to the team to choose what what should get shared? So yeah, there's there's like a little mini internal road map. So I think on Monday we'll be able to talk better about what I'm showing you now which is kind of like uh what I kind of call like individual contributors like IC like private namespace type vibe. um when we kind of roll out the phase after that we want to make it like sort of like multi-tenency for orgs as well and then I think that's kind of where you see more like the policy stuff come in. Um so for now when I if I approve this all it means right now is that any of my any agents that I've given access to can read it. um in in the kind of what's going to happen with this as well is we're going to seed like a what we call it a commons name space which will be available to every agent. So when you query CQ like this one for example what you get is like a superset of like your all your knowledge that's relevant and anything that's in the commons that's relevant like combined and scored and then given back. Yep. Um, and then eventually we're going to also add like kind of all the nomination and graduation like processes. And so there'll be like a lot of cool stuff coming. It's all coming like fast. But um, yeah, at the moment it's it's set up now so that like anything I point to the real URL with an valid API key like any different agents on different machines will all be able to query and get stuff back for for my own kus that other people can't see. Um, so yeah, I can like either reject or approve this. I could say, "Okay, no, that not that one." Um, and then you can see on the overview what's going on. Like we've rejected stuff, we approve, approve stuff. You can also go in and like say, "Oh, mistakes were made, you know, look at that. We'll either send it for re-review or like for someone else to look at or we'll just approve it. It's fine." Um, and then you've got, well, you won't have an admin thing. This is a platform level thing, but you've got like API keys. So, this was the one I just made earlier while we were chatting, but um you can create an API key for your for an agent. Give it a name and then give it how long you want it like a TTL before the API keys revoked and some um stuff. So, I can kind of do like I don't know 90 days and then and then we promise that we copy this thing. This will be gone. So, it's totally fine. Um yeah, and then you can kind of drill in and search for like active ones or we could say let's revoke that one now. So anyone who's using that key, they can't get in anymore. Um so it's just a way for you like you to be able to manage like how you're delegating like your authority to agents because yeah like query, propose, flag, confirm like you need an API key which you get from this. Um so yeah, that's kind of where we're at at the moment with stuff. I'm I'm curious like as you kind of go through that, who do you think will be the primary on that? Is it like the IT teams that are kind of managing engineering teams like or or like a DevOps team? like who who ends up kind of being cuz in the one hand you you kind of want the engineering team to be able to validate the um the actual like shared context but it probably wants to manage like control of who actually or or like security for like who has access to certain KUS and whatnot you know so yeah exactly again I think this we've got a lot of stuff I think is like going to be published very soon so like we've got like security notes and like kind of things want to recommend. Um I guess what eventually boil down to is like so with this the kind of IC mode it's like you review the things and they're just there for you. um with the kind of org multi-tenency stuff then it gets to what you're someone now kind of more tricky where we'd say more like we want to go through like a double blind review you know and and you kind of would nominate oh yeah team might have access but then maybe like certain people are nominated to be like reviewers and and between them as long as two people approve that thing for example and they don't know what the other people say and then then it gets through etc etc and there's like all this other stuff we want to do around putting in like automated guard line uh guardrail pipelines and stuff so we can kind of like vet certain things beforehand. Um so yeah, this is all happening at the moment but kind of just to show you like the demo really and then um I guess now actually if I go back here and I say uh well let's just see what one of the domains was. Sorry we go and like approved and have a look for this one and like CI GitHub actions or something. So, if I just ask Claude uh quickly switch back, sorry, to the terminal and then I promise I will stop doing demos. No, this is great. I, as you're talking through this, it's it's an interesting one that I keep we've been thinking about and trying to understand of like how does the new um agent first developer flow work? And I actually think one of the things that you capture, especially in this the the cloud hosted version, is this ability to capture the developer intent with the agent reasoning and kind of control and changes and then log the context of why that is now part of the the memory itself and the knowledge unit. And I think that's interesting because part of the new developer flow is is that it's not just the the PR and the commit of the code itself. it's actually all the other information that is attached to that which you know has to be encapsulated in some sort of block before that gets pushed into and utilized within the agents going forward. Right. And I think that's actually really interesting. Yeah. Sorry, I just tried when you were talking I tried nothing and it claimed that there was nothing there and I'm like okay I'm not going to dwell on that. We'll come back to that later but uh we can do worry. Yeah, very quickly you should be able Yeah. CQ query uh query. Uhhuh. So I'm just going to copy. All right. Okay. So if I do cq query d- address and what does it want domain and I mean this is fine. This is my local um API key. Fear not. Uh, stick that in there just for Okay, not doing well here, am I? Yeah, I need to fix something else. And I I nearly got through a full demo without run into a bug because we definitely approved those knowledge units. So, they should be uh available via the the query. So, yeah, I'll stop sh I'll stop sharing. I went too far, didn't I? I pushed my luck too far. H, it's all right. It's part of the part of the fun. Um, super super cool. And then I mean obviously a full-fledged road map like anything that you're super excited about? Are there other um things that you're looking forward to with you know where this project could unlock memory with these agents? Like yeah, I'm kind of curious to see where you think it's going and what you're most excited about. Yeah, like I think it's really it's like the part of it I'm not like an academic person, but like part of it is like it feels like there's an academic element to some of the stuff we could learn here that's quite interesting and then the other side of it from like a more like product focused kind of view on stuff. It's like that kind of that you know locally running version of of the platform that we're going to have uh out is it's like I think it kind of like opens a lot of doors. So like with being muzzle AI we're like massively focused on like the like the principles and like the muzzilla values like are kind of baked into some of the stuff we do. Um so a lot around like choice and privacy and all that sort of stuff. So like we're trying to be really careful about saying okay how do we provide this platform where you know people own the KUS they create and they put in their private name space and they can export those and obviously the folder schema so they could take them elsewhere. Um, but then maybe like the way we're trying to sort of make sure that we word it in such a way it's like how do we give ourselves enough room to to like ask people who are using our platform okay do you want to like opt into this feature that we're trying to build and then maybe that's going to help. So like things I could think of like off the top of my head with like what we're looking at of thinking about now is like okay like how do you do like dduplication of kus like you know in reality you might run into something and it's kind of similar or you know propose one thing propose something that one other one hadn't found yet or whatever. So like how do you kind of you merge them and combine yeah like how do you kind of do like you know this this thing can you generate a new KU that sort of like supersedes the other ones. So the schema supports that like structure right now, but like how do we build that functionality to sort of like I think like that's what I was saying. could we do it in the platform first and show how it would work. And so like even to like offer that to to users to say, "Hey, do you want to like DG across your own name space and like look at all your stuff because we could like potentially offer that to people or even to say okay like this is like none of the KU should contain like personal data or stuff like that like ideally and like when the wafer follows the skill it shouldn't anyway. Um, obviously we'd want people to opt in if we want to do something like to offer maybe like cross namespace scanning where we say actually like we found like six things that are the same. So what about if we took those six things and basically created one that supersedes it and put that in the commons basically. Then then everyone else doesn't need to like hold on to those things. So you kind of like distill the knowledge that's relevant for you and everyone wins because they get in the commons available to everyone. So like all these sorts of things are quite interesting like what are these kind of features that we can build that like would help people but I mean also like it's really cool. Yeah I mean that is really cool. I there's a lot to be thought through on that one one thing I'm curious about so so locally you're using SQL light but like are you testing other technologies to help scale differently or think about like to to some of these like frontier use cases? Um so we're just yeah at the moment we've got like uh Postgress like SQLite for the local stuff the the open source server uses SQLite and kind of like it and Postgress. We've had one of the open source computers helping to like RFC that and build that out. Um and one of my colleagues has been trying to implement the semantic search stuff. So like we kind of haven't really gone like deep into anything else yet because we're trying to just get like stuff up and running and you know tested and put things put through it and that kind of thing. Um, but yeah, like I said this to someone else as well before, but like I feel like even just it's going to be really interesting like it's not like about like what people are doing or agents are doing, but more like how the the knowledge kind of like comes and goes over time and what sort of knowledge becomes relevant or like how often stuff like I think there's a lot of different ways you can like slice what could end up in a like a commons that's available to everyone. Yeah. like maybe then feed back into sort of like academic stuff. I I would imagine yes. I mean I think it goes to what does cognition look like and some of the the like memory and and you know general uh psychology around how memory gets managed and like the like the brain itself. So yeah, I think there's probably some interesting stuff. Um I love it. And then maybe kind of shifting gears a little bit like how can folks engage with the project? like where do you where do you need community help right now or or where do you see that going in the future? Uh yes. So on the open source repo like we'd love people to to get involved. So there's like a on there there's like a development guide for people who want to try and do like build it themselves. There's a contributing guide I think to sort of give people some tips and getting started. Um, I guess like with first says like clone the repo, install the plug-in, like play with it. Like if you find issues, that's a great place to start if you kind of don't want to get your hands dirty, but you happy to file an issue. Um, if people want to help with development, I think there's quite a lot of there's quite a lot of GitHub issues have we haven't sort of got round to yet. Um, we are working on everything though. It's not being forgotten about. But um there's there's sort of ones in there that I think maybe we could sort of go through and retriage and then there might be some easy things to pick up. Um yeah, there's lo there's loads of ways like we've had I mentioned just there about the the um Postgress stuff. We had a contributor kind of raise a GitHub issue um to say you know they wanted to deploy this self-hosted version. We've got the open source version in Kubernetes and that just doesn't fly with SQL like very well. So and you know it was a totally valid use case and they were happy to kind of like help do the leg work. So we asked them if they could open an RFC in the discussions part of the repo and we kind of speced it all out together and and then created all the issues and they're like kind of helping work through that. So like they they're going to help like themselves and everyone else who's using the products. That's awesome. So like there's definitely loads of different ways. Um that's great. I love it. And then uh may maybe kind of like as a as a call to action like where can people find you? Where can they find the project? I dropped the the project repo and the blog post, but any other call like areas you want me to share with the listeners and streamers? Um, yeah, I don't know. I need a better social media presence. I think I kind of went away from it all uh about 10 years ago and now I'm having to sign up again for stuff. I think I've got an account on um Masterdon. I've got on and on Blue Sky like Pitky22, same as my GitHub handle. Um, so like yeah, if any and you can reach out there or like I'm on LinkedIn and there's other AI. Uh, awesome. Yeah, that sounds good. Me directly and I'm not this isn't a shrug, but like like I said before like there's a team of folks here like who are all like amazing. So anyone that you manage to get a hold of from those other AI is going to be super useful. I love it. That's great. Uh, and then any other parting words for the folks tuning in and watching that you'd like to share? Uh well, thank you for thanks for coming along. I mean hopefully like it this is a project that maybe like people could get involved with. Don't have to get their hands like I say, but maybe even just using it, it might become something that they find useful. Um yeah, like even just running it locally like nothing's going to leave your machine to come to Mozilla. So you kind of are in control of those KUS or whatever. Um, and maybe you start realizing that it's a good thing or maybe it's a bad thing and you get in touch with us and tell us why and that helps us either way. But yeah, hopefully just like kind of I would hope it maybe just inspires people to try stuff maybe to kind of even to be more open to like some of the stuff around the kind of AI assisted engineering and like the pros and cons. But yeah, that's awesome. I love it. Uh, I'm super excited. Uh, well, thank you for joining us today. I think it's been really interesting to kind of talk about what knowledge will look like on agents. Uh so it was really exciting for me personally and I think the the folks that tuned in got a lot to learn and hear about what you're building. So thank you again uh to all the streamers. Thank you for joining us today as well. We'll see you next week on Open Source Fridays and uh have a wonderful weekend. We'll talk to you all soon. Thank you. Thanks. Bye bye. Hey, hey, hey. Heat. Hey, heat. Hey, heat.

Get daily recaps from
GitHub

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.