Rubber Duck Thursday: Agentic Loops and Context Engineering
Chapters10
Host greets viewers, explains the casual format, and sets expectations for a friendly, cross platform coding discussion.
Rubber Duck Thursdays dives into agentic loops, context engineering, and hands-on use of Copilot across IDE and CLI with real model comparisons.
Summary
Marlene hosts a breezy, hands-on session unpacking how AI agents operate in coding workflows. She defines an AI agent as an LLM that calls tools in a loop to reach a goal and highlights the agentic loop’s stages: gather context, take action with tools, then verify results. The talk centers on gathering context in VS Code (using the plus sign to attach repository data and issues) and the contrast between IDE and CLI workflows with Copilot. Marlene walks through practical setup steps for a simple agent in Python using the Copilot SDK, including starting a session, sending prompts, and defining behavior via Copilot instructions or a custom agents.md file. She demonstrates running Copilot CLI banners, comparing model choices (GPT-5.5, Claude Opus 4.7, Gemini 3.1) for tasks like generating SVGs, and shows tool use with Excaliraw MCP servers to explain MCP concepts and create diagrams. The discussion touches open models (KIMI, Gemini) and emphasizes tool-enabled reasoning (MCPs) as a core advantage. The stream also covers deployment choices: keeping both CLI and IDE open, planning for context-rich prompts, and touching on test-driven development (TDD) as a workflow companion. Finally, Marlene points to learning paths like MCP for beginners from Microsoft and urges trying Copilot SDK, Copilot CLI, and various MCP servers for hands-on practice. The chat atmosphere stays playful, with frequent audience prompts about preferences (IDE vs CLI) and a few fun live experiments with pelican-on-a-bike SVGs.
Key Takeaways
- Using an agentic loop helps ensure AI agents reliably gather needed context before acting, improving code quality and task success.
- Copilot in VS Code and Copilot CLI share a common underlying loop; context and instructions (Copilot Instructions vs. agents.md) shape the agent’s behavior across tools.
- Excaliraw MCP servers demonstrate practical tool use for models, enabling reads of documentation and dynamic diagram creation via a formal tool interface.
- Open models (Gemini 3.1, Gemini 5.x, Claude Opus 4.x, GPT-5.x) can be swapped in and out to compare performance; GPT-5.5 is highlighted as a strong coding-focused option.
- Context provisioning (repos, issues, and explicit agent instructions) dramatically affects how effectively the agent completes tasks in both IDE and CLI workflows.
- A dual-setup approach (keeping IDE and CLI open) helps users verify outputs visually while maintaining fast command-line iterations.
- The session endorses hands-on MCP-based learning paths (Microsoft MCP for Beginners) and practical use of Copilot SDK/CLI to build, test, and iterate agents.
Who Is This For?
Essential viewing for frontend and backend developers experimenting with AI coding agents, especially those who want practical, hands-on comparisons of IDE vs CLI workflows and how to engineer context for better agent performance.
Notable Quotes
"“An AI agent is an LLM that calls tools in a loop to achieve a goal.”"
—Definition of an AI agent and the core loop that powers coding agents.
"“The first thing we do is… you’re going to send a prompt to the agent and the agent is going to gather the context it needs.”"
—Explains the agentic loop steps, starting with context gathering.
"“Copilot CLI is magical…the little banner that comes up, and the playfulness of the blinking.”"
—Describes the Copilot CLI experience and its UX charm.
"“Excaliraw MCP server to explain MCP to me… the model should then generate an image.”"
—Demonstrates tool use via MCP servers and diagram generation.
"“I like to give my agent access to a file… use this file as an example.”"
—Shows how to seed context for agents with example code files.
Questions This Video Answers
- how does the agentic loop improve AI coding agents in VS Code and the CLI?
- what is MCP and how do MCP servers like Excaliraw work with Copilot?
- how can I use Copilot SDK to build a simple custom agent?
- what are the pros and cons of using Copilot CLI vs. IDE for AI-assisted coding?
- which models (GPT-5.5, Claude Opus 4.x, Gemini) perform best for tool-use tasks in coding agents?
GitHub CopilotCopilot CLIGitHub Copilot SDKAgentic LoopContext EngineeringMCP (Model Code Protocol)Excaliraw MCP serverVS CodePlaywrightOpen Models (GPT-5.5, Gemini, Claude Opus)
Full Transcript
Hello. Uh let's see how we can do this. Uh, I'm going to wait to make sure that I am streaming correctly on the different platforms. Hi everyone. How are you doing this morning? Um, hang on, hang on. It might take a second. I'm gonna just give it a couple of minutes. Okay, can everyone see me? I'm going to give it like a second or so. Perfect. Okay, I think it is starting and I think you can see me now. Hi, I see a lot of people in the chat. Someone from Nigeria. Hi, Salom from Nigeria.
Nice to see you. Um, where's everyone calling in from? I saw someone from Brazil. I'll show that. Hello from Brazil. Where's everyone watching? Uh, no, I'm not from Nigeria. I am in London right now. Um, this is where I am calling from today. And hi. Oh, wow. We have someone from Afghanistan. Nice to see you. Just let me know where everyone is calling in from. Good to see you. Wow, someone from Ukraine. Amazing. I love this. Someone says, "Hi, I'm a first timer. What's this about?" Um, this is Hello everyone. This is Rubber Duck Thursdays.
This is, if you are someone that's new, um, welcome. This is just a stream where we're going to chat about programming. We're going to chat about open source. We're going to chat about AI. So, this is going to be a very relaxed stream today. Um, my name is Marlene and um, yeah. Oh, wow. I'm seeing so many people joining in right now. Someone from the Netherlands. Hi from the Netherlands. Someone from Pakistan. I love that. Someone from Dubai. The weather must be amazing in Dubai. Hello from Turkey. Wow, I love that. Denmark. Hi. Hi. Denmark is really close to where I am.
Someone from Peru. Hi Carlos from Peru. Um and Misbr UK. I am actually in London right now. So and hi everyone from Brazil. Okay. So today today my stream is really going to be pretty relaxed. I am just going to be showing you some things I have been working on recently. Oh wow. I see someone from Nepal. I just had to shout out Nepole because I've I've never been there before. It's somewhere. Hi. I I would like to go there one of these days. So, it's nice to see you on the stream. I am going to also put my screen on the stage.
Um and oh, okay, one more. Hello everyone from India. There's a couple of people joining us from India. Where are you joining us from? I have been to India by the way. I love Goa. I was there by the beach and I really liked it. So, okay, let's start out with the stream. So, today my goal today with this stream is going to be to just walk us through um some things that I've been thinking about. So, what we have on the screen right now is something called an agentic loop. And this is a, you know, this is pretty common uh in tech.
And wh I'm going to try to I'm going to make try and make it a little bit bigger, but I feel like that's probably okay. This is probably fine. And we are going to be talking about how to how this aentic loop works. So I don't know how many of you are using coding agents. So if you're watching this right now, you'll probably know that we have GitHub copilot um that is very popular. Of course, we provided in VS Code and there's multiple different ways for you to use coding agents, AI agents in your work.
And so there's lots of different surfaces. You can use a coding agent in your IDE. You can use one in your CLI. So, we have Copilot CLI. You can use one on the web. And you can also build out custom coding agents. And one of the things that's really nice is that the coding agents all tend to work the same way under the hood. So, how they actually work uh and how they're created is very similar. And I think today we're going to talk about what that looks like. I feel for me and I'm not sure about everyone else but it I can do a better job when I'm using things like coding agents if I feel like I understand how they work.
So I would like for us to go over what AI agents are and then we are going to actually walk through the first step of the coding or the agentic something called the agentic loop. So what exactly is an AI agent? Hopefully you've heard of this definition before, but an AI agent is an LLM that calls tools in a loop to achieve a goal. So that is a good description I would say of what an AI agent is. And then this dis this loop that you can see on the screen I think is like such an important loop.
This loop is basically what powers most agentic or most coding agent. It's called the agentic loop. And I actually got this from claw like a blog post that anthropic put put out a while ago called uh like about how claude code works under the hood. And this is a similar thing to how you know co-pilot GitHub copilot would work under the hood as well. if you're using copilot in the CLI or if you're using it in your IDE and VS code. And the main things that happen is that the first thing we do is as a user, you're going to send a prompt to the agent and the agent is going to gather the context it needs and then it's going to take action with some tools and then it's going to verify the results.
And I would say if you're a software engineer right now, your goal needs to be able to improve the chances of your coding agent doing well at each one of these steps. So the first step that we're going to look at today is the gathering context step. So if you are using VS Code, how many of us are using VS Code right now? I'm actually in VS Code um and I'm just using simple browser for the the slides. But if I go like this, we will focus right now on this step about gathering context. So, if you how many of us on the stream, let me know if you're a fan of VS Code, but if you are, uh, one of the things that you'll probably already know is that you can click on this tool, uh, link.
And this link is going to, oh, actually, no, no, no. Let's talk about this plus sign here. So, one of the things if you're using VS Code, Copilot in VS Code, is that you're going to have this plus sign. And that plus sign is a great way for you to be able to add context. So if you are using a repo and you click on this, so for example, I have this repo. I don't actually have any GitHub issues, but if I had any GitHub issues, they would immediately show here. So okay, I see Ben in the chat is saying Ben uses Copilot CLI, GitHub Copilot CLI, which I think is amazing.
Um, I can definitely recommend Copilot CLI if you aren't using it yet. Um, I'll actually show you just now. Uh, and I also see here we have Ali who's saying day and night I'm in VS Code. This is my thing. I don't know like let's discuss in the chat. Do we prefer to work in an IDE or do we prefer the CLI? That's the thing that I'm fighting with people about because I personally love to see my code. So, but then a lot of people love the IDE and so they want to actually like Ben has said here he's using GitHub CLI.
So, let's actually go to take a look. This is an example. Oh, I'll show you. I'm going to just control C. I'm just going to pull this over because I have it on a different monitor. But I'm going to pull this over for us to take a look at what the CLI looks like. Oh, how do I pull this over onto my onto my other machine so you can see what I'm talking about? Um, okay. Let's do this. Oh. Oh, nope. There we go. There we go. Okay, great. And so if I pull this over and then I run in case you haven't heard about it, I can just run copilot-banner just to show the danner banner.
And then I like to also do yolo. And if I run that, that gets GitHub copilot CLI started. And I just want to point out that I think it's so magical the little banner that comes up. So I always love that. And I think Copilot CLI is fantastic. You can use it for a lot of different tasks like recently I don't know how many of you have been trying out uh GPT 5 like the newer models that have come out and one thing I've really liked for example is testing out the models uh and asking for example here if I just click model I have access to a couple of different models right now my favorite is GPT 5.5 but you can also use uh the claude models as well 4.7 is really And for me, I like GPT 5.5 in medium.
And I've been asking Copilot CLI, for example, when I've been trying to compare the different models. I'll just ask it, can you generate an SVG image of a pelican riding a bike? And this is like a actually a very popular um test that people do. And let me see, let me go back to the chat in case anyone has Okay. Okay. I do see Ben saying, "I love it, but I'm old school." A lot of people Yeah, this is my This is my thing. I think that one of the things that I love about Copilot CLI is that it's really cool.
You can get f stuff done faster and that's really nice. But at the same time, I also kind of struggle because I can't see the changes in front of me. So some people will like have copilot CLI open and then also have VS Code open in another terminal and that will work better for them. Um I can see yes 100%. It's so it's so magical like I don't think enough people are using the CLI because it has magical vibes. And by the way, by the way, Copilot CLI when we are using it blinks like is it blinking right now?
It's not blinking right now, but every now and then it's going to blink. And I think that's my favorite part of it. Like the banner animation and then the uh is is so fun. I can definitely recommend using it in the command line. So, one of the ways that I've liked to uh someone's saying as well, I'm going to take a look. Someone is saying, "I copilot fun CLI fun finally functional in bash." It is. I think it's really good. I've heard a lot of people saying they haven't used CLI at all or they haven't heard about it and I think it's actually very good.
And you can see right now on my screen, it's already saying here that it's created this pelican SVG of a pelican riding on a bike. I'm going to open this. Like one thing that I don't love. Okay. Uh okay. Okay. Let me let me switch over to some comments. Julia, hi Julia, who's my coworker. Julia says, "I thought I preferred IDE until I started enjoying working on the terminal a little too much. I love the terminal. I love it. But I think the only issue for me is I can't see the files. I have to actually have the terminal open and then I also have to see like have VS Code open to also see the files because that's my main thing is that I I can't see it." I know.
Um Ben is saying ask it that deep question. I want to wash my car and the car wash is only 50 meters away. Should I drive or walk? This is a good question. Should I ask this question? Let me ask this question. How do I copy and paste it? Let's let's let's ask this question from Ben to Copilot CLI and see how it works. So, the question Ben said is, "I want to wash my car and the car wash is only 50 m away. Should I drive or walk?" So obviously this is like this is a really good question because it's like the model needs to know that I want to get it says oh this is so funny.
Oh my gosh this is such a good question from Ben. Do we see the answer here? This is so so funny because it's a bit of a trick question right because the model the question is I want to wash my car and the car washes 50 meters away. Should I drive my drive or walk? And then the model is saying you should walk. It says walk if the car wash is only 15 meters away. Driving is probably more a hassle than it's worth. Unless you need to move the car through the automatic wash. So, actually this is really funny because it should have said for us to go ahead and drive the car because we want to actually get it washed at the car wash.
But it is saying that unless you want to move the car through a wash bay, if you're washing it yourself there, walk over with the car only if the car itself needs to be physically at the station. Otherwise, for safety, don't drive don't drive such short distances. So, I don't know, mixed reviews there from the model. I think that's really interesting. So, great question there. and then I was seeing, let's see, let's see, let's see. Uh, some people are saying they're new, um, to the stream. Sorry, guys. I can see I can see Sebastian is saying he's full-on vibing.
No more idees. That's fair. Uh, a lot of people are vibe coding and using the CLI and not using IDEs, so that's totally fine. Um, let's see. Um, and GPT. Okay. GPT, it really depends. It depends what you are going to say. All right. Anyway, okay. I'm going to now move on to what I was showing you earlier on before. So here one of the things that we could do I like I showed you was use the copilot CLI but I wanted to mention for those of us that are just joining we are talking about the agentic loop and trying to understand everything copilot GitHub copilot copilot CLI copilot in the IDE all of those things are coding agents so these are agents that kind of all work in the same way And some of you might have heard about you may have heard about something called a harness.
So sometimes people call these coding agents uh or the things that surround the model like GitHub copilot they call them a harness. And so I think it's really important to understand how these agents work for us to be able to get the most out of those agents. So, one thing that we could do, like I mentioned before, is you could go here, and the first thing you want to do if you're trying to use your coding agent, you want to make sure that your agent has the context that it has. So, one of the things people have told me in the past is that they have maybe had a coding agent.
So, for example, I have this file in my folder called simple agent. And then this is like the ideal way in it is some code with copilot CLI. And if you are using copilot CLI, this is to me the simplest way to get started with C-pilot CLI. And this generally the Copilot SDK, sorry not CLI, SDK, the SDK in Python. And generally the easiest way for you to get started with the SDK is to just start out a session. And it's only these lines of code, very few lines of code. But this past week, I've been testing out many different models and trying to get them to generate a very simple agent like with the SDK.
So I'll show you what happens if we have the simple agent. How many of us are Pythonistas here in the chat? Let me know if you're a Pythonista or if you use another programming language. But you'll see here that I can just go here and if you want to use the SDK, I'll show you. Um, you want to install GitHub Copilot SDK and then you want to import uh Copilot from the Copilot client and then you're going to start a session. And so when you run this command, it's going to be just the same as you would uh start a session in Copilot CLI or in the IDE.
So this is the code. Let me put this to the side. And here you can see if I want to actually then send the code to GitHub Copilot here, I have to actually send in the prompt by where it says session send and wait. So what I do want to do is I want to show you if I run this script. So this is just a simple Python script. And if I run the script, it's going to send hello copilot to GPT 5.3, the 5.2, which is the model that I've chosen. And it's going to say, "Hi, I'm Marlene's AI assistant for Rubber Duck Thursdays.
How can I help?" So one of the things to notice right away is that first of all we are creating just like this copilot client is going to work just like copilot does in our IDE in the chat here but it's in code. And another thing I want to point out is that I'm just saying hello copilot but it knows that I'm streaming right now. You're watching me right now on LinkedIn or YouTube or wherever you're watching me. And you can see that I'm it says I'm Marlene's assistant for Rubber Duck Thursdays. And actually, if I go here to co-pilot and I just say hello co-pilot in the chat and never mind, I I had I had attached this as well.
It says, "Hi, I'm Molen's assistant for Rubber Duck Thursdays. How can I help?" The reason it knows this is because I've already defined this in the copilot instructions. So I can go over to GitHub and in my copilot instructions I can define how I want the agent to behave. So all of this is part of this context layer. So when we're talking about that agentic loop I mentioned earlier, this context layer defines you know how your agent is going to behave. So the difference between you using for example you using GitHub Copilot CLI and GitHub Copilot in the IDE and liking one versus another is usually going to be the context part of that is going to be the context that you give your agent.
And so one of the things that I mentioned is that I like to give my agent access to a file that will have a simple version of code and an ideal version of a co of code and I'll say use this file to write a new version of this code or like write this application but use this as an example file. So that's one thing I will do. And so if you are using coding agents like I mentioned before with this agentic loop, the first thing you want to do is you want to make sure that the context that your agent has.
So when you give your your agent a task, it's going to it's actually going to gather the context that it needs to complete that task. And so part of your job needs to be giving the agent all of the correct context that it needs to be able to actually complete that task. I'm going to go back to the chat to see if we have some questions. I did see does copilot support loads the more agents.mmd automatically too. Yes, it does. So if for example here I actually have a an agents folder and in that agents folder I have this reviewer.agent.md file and if we actually click here you'll see that I already have the reviewer agent uh as part of it.
If I click to reviewer and I'm actually going to not select it there so that it's not using that but let's select copilot instructions so you'll see. But if you store an an agent in your agent.md file here and you also store it under an agents folder, it's going to automatically appear in this list of your custom agents. So, if I say hi, and again, I'm just going to switch to a different file so that it doesn't look like it's using the reviewer. It's using this. I'm going to say hello. And what you'll see here is that it's now saying, "Hi, I'm Marlene's reviewer agent and I act as a specialist agent for reviewing code.
How can I help?" So, initially it was just using the co-pilot instructions. So if you copilot instructions are that repo level of instructions when you're working with GitHub. So it's going to work at all levels. But if you want to use your agent MD, you then need to go ahead and you need to select it manually from the list of custom agents. And this is really nice if you want to define specific workflows. Let me go back to see um other questions. Uh Uh, okay. Someone is saying, what are we saying? What were we saying?
We're saying uh does I mean agents.md versus copilot instructions? Yes, it does support agents.md. The copilot instructions is usually the thing that it will automatically detect if you're using the SDK. for example with the agent you can also put it I think in the root of GitHub and it should still work I believe but for if you're using the IDE like VS Code you probably want to create an agents folder and that's where you're going to define those specific agents I don't know if you want to set Sebastian I'm not sure if you want to set like the agents MD so it controls it's detected at all levels or if co-pilot in instructions is fine.
Um but typically I would say using both is a good idea. So this is what I would I would recommend. There's someone else saying are there a difference between the IDE and the CLI part for the vibe? I you tell me how what does everyone think about um about the CLI versus the IDE? Do we like what's the vibe? Do we prefer the IDE? Do we prefer the CLI? I'd love to know like vibe coding wise. What are we preferring? For me, I think like let me go back. Let me go back. Let's go back to the let's let's pull it up and see.
This is I would say if you are still going to use the CLI, I like to have both of them open just so I could see both because I feel like if I can't see the code sometimes I just feel like I'm not I just don't feel great. And for example, earlier today I asked it to generate a pelican on the bike and I felt like I couldn't see what the pelican on the bike looks like that it generated. But then if I actually then open up VS Code and the file that I asked it to generate, we can actually see the pelican.
And it's been really fun because I've been testing out different models in terms of like which model generates the best pelican. And in VS Code, I can actually see and compare the different pelicans. So, this pelican was generated by GPT 5.5. This one was generated by Anthropic. I'm not sure. It was generated by Claude Opus uh 4.7. Which one as well do we prefer? I feel like I can't choose between the two. I I generated two. This is the one I just generated just now. I don't think it's as nice as this one. But it's okay.
It's pretty good. But um in terms of which one I prefer, I don't know. I think I right now I like both. So I will like I like opening having both open side by side. So even if I'm using the CLI because I feel like with the CLI sometimes you can just do things faster and I can have like multiple agents running at the same time. But then with the IDE I can see exactly what my agent has created. Like we asked this earlier today we said can you generate the pelican on the bike and this is what it generated for us.
So it really depends on what you prefer. Uh let's see. Someone is saying someone is saying GPT 5.5 is great. I think that's I agree. I really like GPT5. uh 5.5. I think it's probably the best. It's definitely the best model OpenAI has put out for coding. I'm not sure if other people have tried it out. Someone else is saying anthropic for sure. I do like anthrop I mean, how many of us are using Opus? Like Opus 4.7. This was the This is the one on the screen right now is the one that I I talked about from uh from GPT.
This is Anthropic's version from G, you know, uh Opus 4.7. Which one do we prefer? I feel like I like I don't I'm not sure which one, but I like both. And I do think that. Okay, someone Meta Chief is saying Gemini 3.1 and Nano Banana. How many of us are using Gemini models? I actually haven't heard a lot of people using Gemini. I'm not really sure though, and I I'd be open to feedback on the Gemini side of things, but I haven't really used them a ton. We could Let's ask it right now. We're going to ask Gemini right now.
So, if we go here and let's try Gemini 3.1 on the side here. I just have a lot of sessions open. Let's see if I have this sess. Let's see. Uh yeah, did I h Okay. Okay. Anyway, let me go back and copy. I want to use the exact same prompt um to say, can you generate an SVG of a pelican on a bike? Let's go back to the other one. And this time, let's say name it Pelican Gemini. And don't look at the other Pelican images. Okay. So, this time we're going to ask live to see here.
We're using um we're using Gemini 3.1 just because I haven't used it before. And okay, I see Ena is also saying she uses Gemini. Um I can I can also see and Kit says Opus 4.6 six versus GPT 5.5. Yeah. Yeah, 100%. Um, I see Opus is still my daily driver. Better personality. I do think Opus has a great personality. I think Opus is fantastic for coding. Let's go back. I'm just going to put Meta Chief's comment on because we just generated the new Pelican with Gemini. What do you think? This is This is the pelican.
What do we think about Gemini's pelican? It will show up here as well. Is this one? I I don't know. I don't know. What do we think? This is the Pelican Gemini generated. Do we like it? How does it compare to to Opus and and to the other ones? Do we prefer it over Opus? Okay. Metachchief is saying, "Amazing. What are the votes, chat? Do we like Gemini better than the other ones?" It is goofy. I think it it is goofy. And um Abdal says that Nano Banana is really great for image generation. I'd agree.
I would say like h I think if I am to compare like let's go back and just compare the different models like I actually think they all kind of have the personality of the models like this is feels very chattyt to me I don't know why and then anthropics one opus seems like doesn't it does anyone know what I'm talking about. Do you do you do you get what I'm saying? Like this is the models that we are comparing and I feel like this is just giving anthropic it's giving opus for me right now when I look at this just because their design style the branding is just like that and I just feel like it's it's the same.
So anyway, these are the we just live generated a pelican with uh Gemini and I did think it was okay. I thought it did a good job, a decent one. I'm not sure if it's my favorite, but it's a it's a decent pelican as well. So, let's go back to the idea of context. I think we have uh I think we have a couple more minutes left to our stream and so I I do want to go back to the idea of context engineering that we were talking about. Um someone was saying why GBT I feel like I don't like I like this one Lincoln is saying he thinks the Gemini Pelican looks happier.
Yeah, I think that's right. I think it looks and Kit says Gemini 3.1's physic is better than every model. Maybe I I really am not sure. Someone is saying, can you try Grock? I don't have Grock in VS Code, but you can use bring your own key. By the way, VS Code now uses bring your own key. So that is something that if you wanted to try Grock, you can get it from Microsoft Foundry and then try it there. Favor is asking which one is my favorite. Um for me right now honestly I I still really like the Opus models.
I actually think my daily driver in my work right now is Opus uh 4.6 six. And I also sometimes I've recently been using Gemini 5.5 a lot. So, I do like both of those models and I think they're really good. So, it really just depends on you, but I I personally like the Opus models the most. And then I feel like Gemini 5.5 does a good job, you know, pretty recently. Now, another test that I've been liking to do to test the models is this one. So, let's put con. Okay, next week we're going to do more context stuff because I wanted to show you a little bit more, but this week let's let's do something else.
I something I like to do is I like to um I like to test the models tool use. And a test that I've been giving the models is I've been saying uh let me try and find it. Uh no, no, no. Okay, let me go back to my other VS Code and see if it'll show up there. Okay, let's see. Um, I don't know. No, there. Yeah. Yeah. Yeah. Okay. So, you know, we've established that the models are are Someone's saying Gemini 5.5. I don't have Gemini 5.5, unfortunately, on my machine. I don't know why.
So, we'll see. Opus is really good. I like Opus's style. I think it's very minimal, which is great. Um, okay. I'm going to go back. I'm going to go back. I'm going to go back because I want to take a look at this. So, one thing that I have liked doing in when I want to test a model is I like to ask it about tool use. So we want to see that the model can do stuff on its own. And if you have been using the VS code for example or even in the CLI, you'll know that like the coding agents come with a couple of built-in tools.
And so most of the coding agents regardless of how powerful the model is, the built-in tools, it will do a fantastic job. But I want the model to use a new tool. So here I wanted to use Excaliraw and Excaliraw has a readme tool and a create view tool. So it has a tool that will give it context and a tool that will allow the LLM to draw an image. So I've been liking to ask the model. I've been saying can you use the Excaliraw MCP server to explain MCP to me? And so for example, let's test out Opus 4.7 and see if it we like the image it generates.
So you can see that right away it's going to call the MCP server. So, I'm using the Excaliraw MCP server. And when it runs this readme tool, the readme tool is going to grab context from Excaliraw, which will give it all of the tools that it needs to be able to actually know how to use Excaliraw to generate a diagram. And then after reading all of the notes on how to use it, the model should then generate an image. This is really nice. What do we think, chat? Can Can you see this? This is so nice.
Like I don't know if you can see it, but this is a a model and we can actually open it up then in Excali Draw. It will take us there. Let me just pull it so that you can see it on my screen here. Oh no. Okay. Okay. Yeah. Let's do this so you see the full diagram it generated. I'm going to pull it over here. And I really like this. So do you see what I'm talking about is that we give the model a tool and the model is able to use MCP to find out information about okay how do I use this tool?
What do I use Excaliraw for? And then it will then take the context from Excaliraw and then use that to actually generate a diagram. So this diagram is uh showing us how to use MCP. So it has the host is here. The host like VS code for example the user is going to send a prompt to the agent. The agent will then send it to the MCP client which is GitHub copilot by the way. And then this is going to connect to the MCP server and the different tools available to it. And so I think this is a really great way to also see whether or not models are capable of using tools like MCP because that's one of the major ways that we use tools and I think this is a really good example.
So we could also another thing I'm not sure if other people have been trying out some of the open models. So I did plug in the other day Kimmy K 2.6 six because a lot of people have been and I wanted to I'm going to send it the same exact prompt that I did to Opus and a lot of people have been saying that you know the price of the price of the of of the models is going up and so we need alternatives and so one of the things I've been liking doing is also testing out some open models so Kimmy 2.6 six is one of the more popular models that we also want to try and so I'm going to see how it also works and responds.
So it says it's also done the similar thing we can see here it was able to connect to the server and here is the diagram. So with Kimmy it also did a good job. It did a good job as well. I would only say that it has confused a the wording is a bit weird. You know what I mean? Like I feel like it's also really good, but it's a little bit like slightly off. Like here it's it's confusing a couple of things. Um but it was also able to use the tools. So, one thing I like to encourage people is you can use bring your own key in VS Code and you can switch out which models that you're using.
Um, so you can use open models like Kimmy and it does a decent job as well. I can see this person saying MCP is basically a better way for an AI model to get or query data outside of if it's local of its local context. aka instead of searching web, it has MCP as a tool to get data from source more directly. Absolutely. MCP is fantastic for giving the model context, giving the model tools to know how to then use that context to do something. Like in this example here, I used the Excali draw NCP server.
Um and and that was really good. Um, I see here someone is saying Meta Chief is saying I've I have seven different models connected to my VS Code. I have a lot of models connected to VS Code right now. Like it's a lot. Let's let's go back to my VS Code. Let's see how many I have a lot of models. They don't all show here, you know. Um, and I also have like a custom model, for example, that I was testing out. But you have a lot of models and I think that's really nice. Um, let's see what else people are saying.
Ena is saying Claude has so many MCPs. Exactly. It's very MCP coded because I mean Anthropic is the is the MCP creator. Amit says VS Code brings everything by the way. And yes, Claude Opus model. I love Opus. Come on guy, please everyone. Who else is using Opus? Sebastian said he was using Opus earlier and that's who I can relate to. I'm using Opus is the my main model is the main thing. Someone says what you've generated images look really awesome. 100% try it out. I can recommend in VS Code you can go to this.
I wonder if it's there. Let's see. So in VS Code you can type in MCP and that will bring up a bunch of different MCP servers. Let's see if Excalidor has theirs here. Oh, you don't. Ideally, it would be here in this list of MCP servers and we would get the Excal one, but you can also go to GitHub um to their repository. So, what I did was I went to the repository and uh I went what? Don't look at the news. But I went to the Excaliraw MCP server GitHub repository and it has information there about how to connect to VS Code or or whatever it is to be able to use that.
And I think that's like it's such a fun tool and and you should definitely use it more for MCP. I can definitely recommend that. Um, someone is asking what is an MCP server? I think we answered a little bit about that, but an MCP server is basically it was created by it's an open protocol and basically what it does it is allows a a a company. So for example in my uh in VS code right now we have a couple of MCP servers that you can see that I've loaded onto the screen. I want them to show correctly, but I don't know why it's not showing there nicely.
Hang on. If we type at MCP, you'll see all of these various MCP servers that are available. So, a company, say for example, Microsoft has a markdown MCP server. and you'll take this API that we've had in the past and basically we want to give the context that we had from that API to the model. So like what we saw with Excaliraw in the first example I think it was really clear where Opus talked to the MCP server and the MCP server opus will have a description so it'll know it'll see there's a readme tool from Excaliraw.
So actually if we go here and we go to the Excaliraw MCP server that we installed you'll see that there are two tools. There's a readme tool which says returns the Excaliraw element format reference with color palette examples and tips called this before using create view for the first time. So the model is going to read that description and I've told the model use this excali draw MTP server. So it knows immediately where it needs to go to get the information to actually use this server. So it's going to call that and then when it called the readme function from the server, it got back this information which is just context from that that the model can use to use this tool.
So basically your MCP servers are going to be whatever con they're going to be from a company that wants to allow you to use its tools and give that context to your model to be able to use those tools. So hopefully that's helpful. Again like I said in VS Code you can type at MCP and it'll install all of the MCP. It'll have lots of MCP servers for you to install. Also in the CLI in the CLI here we can also type in MCP and that will show all of the MCP servers that you have.
For example, I have playright installed. I have work IQ installed. I have computer use. So the agent can use computer use um Azure foundry MCP server as well. And so you can connect these directly as well to your um CLI and actually when the CLI boots it loads and lets you know which MCP servers are available for you there. I've really liked using the Playright MCP server in the CLI. I've really thought that's been very helpful. Let's take a look at what we are saying in the chat. Um okay. there is someone saying no problem.
Are they no longer unlimited GitHub copilot subscription? You can read a blog post about the subscription changes. I'm not going to comment too much on it, but I would definitely recommend reading a blog post on this because there are changes if you haven't heard to our subscription model. But there's it's still a fantastic option and there's lots of different models you can use. Uh I did see Pavan say asking about spec kit and I think spec kit is part of this specd driven development um which is a different way of approaching development with AI. I like it but I would say I'm more so into test driven development.
Uh let me show you another I'm going to show you another slide I have in VS Code. So let's go back to our IDE and I want to show you this slide I made for a different talk I did and okay in this slide it's kind of similar to how people would approach direct speech driven development but it's testdriven development and it's read TDD TDD so uh as a user if you're going to be using um a coding agent for example and you want to develop features that you want to be very sure about. The first thing you want to start off with is validating that you want to be pretty sure that when the model generates code it's going to be generating the correct code.
And so if you get a feature request the first thing to start with is to write tests. So get the model to work with you to write behavioral tests where the test is directly linked to some sort of a behavior. So maybe it's like we want to make sure that this page works and the search bar feature is working on this page. So your AI agent can actually work with you. I've been using playright. So having the agent generate some playright tests that the uh that we can use as a starting point and having those tests failing but you have clearly in the code a way for you to see whether or not something the agent's code is correct.
Then after that you quickly get the agent to generate code to make those tests pass and then I would say the last fact uh part of this loop would be refactoring the code the agent has generated. So I really like testdriven development as compared to um I have uh some different slides about playright um and I can recommend taking a look at that and I would prefer that over spec driven just because it's more code first it's more codeheavy um this is for a talk that I did recently and I thought you know I focused quite a lot on test driven development and I'll you know I'll do that I'll talk about that maybe the next time I'm on here.
Um, we have a few minutes left. I think we have four minutes left. So, um, great. There's a question here. Is there recommended tutorial learning path on MSLearn to follow on MCP workup and workflow on sample solution? Yes, there is. Let me go to GitHub. Uh, they're on the Microsoft. Let's go to GitHub. Let me go to GitHub. I mean, I'm not showing it on my screen because I'm going to just navigate there and then share it with you. But if you go to the Microsoft repo, Microsoft um repository, there's so many repositories. This is the thing with Microsoft.
I I just want the one. Hang on. Hang on. I I'm looking for the Microsoft repository. And in that repository there is a um there is going to be a uh a tutorial I actually worked on um that I will share here for you to try to get to learn a little bit more about what MCP does. So let me share that MCP. Um, and then I also want to share. Yeah. Um, there's an MCP for beginners tutorial. here it is. I'm going to paste this in the chat for us to take a look at if you would like to.
Oh, whoa. I don't know. I don't know why there was like another link there, but this is the link that I shared. It's a Microsoft course that we have called MCP for beginners. And then I also want to say that the next time I will be if you haven't yet tried it out, I showed it a little bit. I will recommend trying out the Copilot SDK and then I also want to recommend trying out this is the SDK and then I also want to recommend trying out the CLI as well if you haven't tried that out.
So, I would say trying out Copilot SDK, trying out Copilot CLI. If you can try both of those, I can recommend them. Um, I'm going to see if I can find Microsoft Copilot CLI, GitHub. Uh, those are the two that I would recommend, the three things that I would recommend looking into this week. And then next time I'm here, I will also share some more resources on agentic coding and stuff like that. I think this is something that I'm very interested in right now and would I'm going to come back to talk more about this agentic loop, but I think I'm out of time.
Thank you everyone for joining today. Uh this was so fun. I really had a good time. Someone says, "Tdd is a solid framework. Been using it for years." Um, I completely agree. Um, fantastic. Thank you everyone. I've had a great time chatting to you. I don't want to go because this has been so fun. Uh, good to go. Much appreciated. Thanks everyone. I appreciate you joining the stream. And yeah, let me know if you enjoyed it. what would you like us to uh talk about next time? But I will see you uh later. So, thanks everyone.
We'll play the outro now. I don't know how to do that actually. Let's do that as Okay, bye. We'll see if I can change to the screen anyway. Yeah.
More from GitHub
Get daily recaps from
GitHub
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.









