Rubber Duck Thursdays | Rubber-Duck Agent
Chapters11
Opening remarks, welcomes viewers from around the world, and a flexible agenda focused on GitHub news and live Q&A.
Rubber Duck Thursday dives into GitHub Copilot CLI’s new Rubber Duck agent, bring-your-own-key with Azure/OpenAI, and live demos of multi-provider models inside VS Code.
Summary
GitHub’s Rubber Duck Thursdays host, exploring the week’s freshest GitHub announcements with practical demos. In this session, the host walks through the new Rubber Duck agent and how it provides a second opinion by using a model from a different family to critique the primary model’s plan. Highlights include Copilot CLI’s new bring-your-own-key and local-model support, demonstrated with Azure OpenAI and a Foundry-deployed GPT-5.4 Pro alternative (and a fallback to a locally hosted model via Olama). The host also tests integrated browser workflows in VS Code to monitor logs and troubleshoot, and he touches on Copilot usage metrics for organizational visibility and Dependabot alerts assignable to AI agents for remediation. Throughout, the talk blends live QA from the audience with hands-on labs, including deploying models on Foundry, wiring up provider configs, and attempting to run multi-agent reviews in real time. The episode balances optimistic coverage of experimental features with practical caveats and invites the community to share feedback.**
Key Takeaways
- GitHub Copilot CLI now supports bring-your-own-key and local models, enabling Azure OpenAI, Entropic/OpenAI-compatible endpoints, and local deployments to power Copilot sessions.
- Dependabot alerts can be assigned to AI agents for remediation, enabling automated draft PRs and test-fix cycles as part of a coordinated security workflow.
- Rubber Duck is an experimental, built-in reviewer that uses a second model from a different family to critique a primary model’s output, improving error detection and performance gaps.
- Copilot usage metrics for code review clarify who actively uses Copilot in code reviews within an organization, aiding admin visibility and adoption tracking.
- Integrated VS Code browser and Foundry-based model monitoring give developers end-to-end visibility on token usage, model responses, and debugging logs during AIOps demos.
Who Is This For?
Essential viewing for developers and platform admins who want to experiment with multi-provider AI models in GitHub Copilot CLI, understand Rubber Duck’s cross-model critique, and explore bring-your-own-key/local-model setups for coding workflows.
Notable Quotes
""Copilot CLI now lets you connect to your own model providers... OpenAI compatible endpoints""
—Introducing bring-your-own-provider capabilities for Copilot CLI.
""Dependabot alerts are now assignable to AI agents for remediation""
—A new automated remediation workflow for dependency updates.
""Rubber Duck leverages a second model from a different AI family to act as an independent reviewer""
—Explains the core idea of Rubber Duck’s cross-model critique.
""The integration lets you run Copilot CLI with local models like Olama for offline or cost-conscious scenarios""
—Demonstrates offline/local-model options via Copilot CLI.
""This is experimental mode, switch on Rubber Duck to get a second opinion""
—Notes the experimental nature of Rubber Duck usage.
Questions This Video Answers
- how does rubber duck agent review differ from self-review in Copilot CLI?
- can i run Copilot CLI with Azure OpenAI and a local model at the same time?
- what are the best practices for using bring-your-own-key with Copilot CLI?
- how do I monitor token usage when mixing provider models in Copilot CLI?
- what are practical steps to test Dependabot alerts with AI remediation in a CI pipeline?
GitHub Rubber Duck AgentCopilot CLIBring Your Own KeyAzure OpenAIFoundryOlama local modelsClaudeGPT-5.4 ProIntegrated VS Code browserDependabot alerts AI remediation
Full Transcript
Hello. Hello. Hello everyone. Good morning, good afternoon, good evening depending on where you are joining us from. Thank you to everyone who's joined the stream today. Welcome to Robert Duck Thursday. So, if you are a regular on the show, welcome back. This is our live stream every Thursday every week. We are here to talk about what's new in the world of GitHub. And if this is your first time joining the stream, welcome, warm welcome. This is where for about 60 minutes we get to just talk about everything GitHub what's new what's exciting what's working what's not working and yeah so it's a really good time to connect with the community as we explore everything around GitHub all right so I see we have many people already tuned in welcome welcome welcome hi everyone and uh thank you for sending your messages on the chat chat.
I see we have folks from different parts of the world. Let us know where you're joining from. It's always good to interact with all of you on this live stream. Hey Schwa. Hey Arun. Hello. Hola midday. I recognize your name from one of our earlier streams. So good to have you back. Um so Majid is asking what topic do we cover today? I'll be sharing my screen momentarily, but I have some ideas of some of the new things that have been announced this week that we can have a look at and some demos. But as always in this live stream, we prioritize questions from you, the community.
So if you have any questions, if you have any suggestions of what you'd like us to cover, feel free to share that. And this is this is a flexible agenda so to say. So we can always um prioritize the questions that you want to see addressed. So Majid, I'll be sharing my screen shortly and then I'll just walk you through what I have in mind. All right, welcome. Hello. Good to see everyone. Great excitement on the chat. Good morning, Sally. Um, Patrick, you got up early enough. Yes. Yes, you did. You made it. Welcome. I'm based in Nairobi, Kenya, so it's it's not really morning for me here.
It's it's around 1:40 p.m. So, yeah, it's a it's a sunny day, so good to have you on the stream, right? Yeah. So, slightly interesting comments there. Yes. Um, I'm more regular than the regular host. it seems. Um yeah, so we have quite some changes um happening over here at GitHub. So you'll see probably um a more frequent rotation in terms of who gets to join you on these live streams. But hopefully with time this will stabilize. So bear with us for now. Uh in the meantime, you'll get to interact with multiple hosts from different teams and you'll also get to hear different perspectives and different product focuses.
So, I think it's a good mix right there. And hello from the UK. Let's connect. Um, Melissa is also asking what is the topic. Again, I'll be sharing my screen and as I said earlier, we're just going to cover the announcements from this week's change log and we'll have some demos in the process. All right. Hello from Nigeria. From Pakistan. Thank you. Thank you. Thank you. How do we work on a random project? Um, probably you could expound more on what you mean by how do we work on a random project. I'd like to hear more about what you have to say there.
It's 6:30 in Boston. Oh, wow. Wow. Well, you're you're right. You did get up pretty early today. I'm hoping that uh this session will be exciting for you and that you'll learn something new. So, welcome again, Patrick. Okay, so I'm going to pop my screen up and I'll keep my eye on the chat so this way we can always see what you all have to say about this. But as usual, we'll start with just going through what we have on the change log this week. We have quite a lot of very exciting announcements and just from the title of this stream today, it's rubber duck Thursdays then rubber duck agent.
So, in case you missed it, we have a new agent called Rubber Duck, which I'll admit, the first time I saw that announcement, I thought the agent had something to do with this series, but it doesn't. So, that's something that will hopefully get some time to even demo and see what the rubber duck agent is, how you can use it, and what's it it's what is it's designed for. So, that's always a good thing. So we'll go ahead and start with our change log. And as you can see here, I'll try and do everything from inside VS code.
I try to uh retain everything that I'm doing on VS Code since we have this new feature called it's not really new, but it's an exciting feature called integrated browser. So if you're on VS Code, you can open integrated browser and it's a browser instance you can you can run completely inside VS Code. it will be able to communicate with your AI agent. So your AI agents can access the logs, can access the elements if you want to troubleshoot web applications. So I tend to use this for a lot more than what it was probably built for.
So let's let's get started. Um all right. Hello all the way from India. Welcome welcome Phillip. Okay. Good, good, good. All right. So, our change lock this week. Let's go ahead and pop that open. And we'll see this has been a busy week, I will admit. So, you can see all of these announcements, all of these new releases, improvements have been released just this week. So, of course, we won't have time to go through everything. My plan is to just highlight one or two and then we can maybe do a demo of the other two.
So, that's probably what I have in mind. And let me just go back. So this is April 6th. This is probably the earliest um announcement that we saw this week. So we're just going to highlight a few of them and then hopefully pick out the most exciting and do some demos. So that's the plan for today's session. And um yeah, so the first one that we can highlight is copilot usage metrics. now um allows you to identify active and passive copilot code review users. Let's open this up and see what it's about. So, I'm going to pop in a new tab.
And um this to me looks like it's interesting for enterprise and organization admins where with this release, I believe it's a release. Here with this release, you're now able to distinguish within your organization how many people are actually having interactions with the copilot code review. Right? So this means that you can set um the used copilot code review active. So what this means is within your organization, you'll have clear metrics on who has been able to use copilot as a review on an existing pull request or who is requesting code reviews from um from copilot and who's applying a suggestion after the code review.
So this will give you a clear picture in terms of the adoption of the code review um features on on GitHub. So I believe that this is going to be useful for enterprise and organizational admins. So yeah, this is definitely a release you can check out. It's part of the API response now. So you can easily have very clear um metrics to gauge the level of real and actual engagement with co-pilot code review within your organization. So that to me I think is quite an interesting release. All right. Um, something else that stands out for me is right here, April 7th, we do have an improvement that was released.
Let's pop that open. We see here that dependabot alerts are now assignable to AI agents for remediation. Now, this one here is quite interesting. Um, so let's dig a little bit deeper into what this means. So, some dependency vulnerabilities require more than just a version bump. That's accurate. Some um some updates really cause or need some major code code changes across your project. So, this to me is truly timely. So, now you can assign dependable alerts to your AI coding agents. that's either copilot or clawed or codeex. They will analyze the vulnerability and then open a pull request, draft pull request with a proposed including the advisory um details and your repos dependency usage.
Then it's going to open a pull request with a proposed fix. And then lastly, it's going to attempt to resolve any test failures that are introduced by the update. So for me, that's quite interesting. that's where the actual value is. So it's not just going to bump up versions while introducing breaking changes into my codebase. The agent in collaboration with dependabot will ensure that that transition that update is properly managed. So yeah, this is an interesting um improvement from the team. Um, so as you can see here, the dependency updates aren't always simple. One major version can introduce breaking API changes, deprecated method calls, incompatible type signatures, you name it.
So in other words, this can leave your codebase in a messy state. So we see that now that combination of the dependabot plus the coding agent you select is going to basically handle that full um that full process. So yeah this is something we can actually try out. So I do have a repo that we can experiment with. So I'll switch over to my browser. So I'm going to open this repo. It's It's a simple app that helps me track my piano lessons uh and practice. So, as you can see, it's been quite a long time since I did some practice.
So, that's probably not good. But I do believe that it's it's been a minute since I pushed any new code to this. So, we should have some dependable alerts. All right. So, I have a few open PRs um from the release. Let me just look at what we're supposed to do to get this to work. All right. So, I'm supposed to open the dependable alert detail page and then assign this to an agent, your agent of choice. So, let's just try that practically and see how that experience looks like. Um, okay. Uh these are actually this has been opened 4 hours ago.
Let's see. Let's just pick one. Compatibility 80%. So it says that we can assign this to an agent of choice. So I'm going to go ahead and assign this to the cloud agent. Let's do that. So it's assigned and yeah. So, we have Claude as the assigne here and hopefully it's going to pick up that task and we're going to see what happens. And I believe we can also assign this to more than one agent if you want to compare the different implementations. So, why don't we go ahead and try that as well. So, um I have cloud already assigned.
Let me also assign this to copilot. See how that changes the behavior. So this PR has now been assigned to Claude and co-pilot. Let me see if we have we can scooch over to the agents tab. Okay. All right. Looks like the tasks are yet to be picked up. Let me see if there's something else I need to do. So, open this. Okay. So, I've assigned to cloud and copilot. Yeah, I'm not seeing an update over here. All right. All right. All right. Okay. Let's see. Let's see if there's something else we need to do from the change log.
Uh yeah. So you can assign multiple agents to the same alert and each agent is expected to work independently. Open its own draft PR. Yeah. So I assigned this to two agents. And yeah, they don't seem to have picked that up. Quite interesting. Maybe let's try let's try another one. So, we have an alert here. Let's give it one more shot before we try and see what's causing it to fail. Let's go with copilot for now. So we have signed that to copilot and yeah. So we'll we'll come back and I'll just leave this agents tab open.
We'll come back and see if we see any sessions going on. But I had tested this earlier, I believe yesterday. And right here I had copilot work on one session. So it was bumping um date FNS uh to include time zone support. So that was the update. It didn't have so many conflicts to resolve here. So that was fairly easy. So yeah, probably I'll have to check if there's something that I'm missing. If there's something that I'm missing And if you've tried this before, please uh yeah, also let me know if if there's something I'm missing from the change log page, if there's an additional step that I should take.
But yeah, I'll leave that running and then hopefully we can come back and see if if it eventually works. Okay. Um right, so let's look at the next release here. Copilot. Okay. Now this this here is a big one. This here is a big one. Let me just open it in a new tab. Copilot CLI now supports bring your own key and local models. This we have to test out. This we need to really see how it works. So if you're already using Copilot CLI, you're now not just restricted to using the GitHub hosted models.
You can now bring in your own keys. So suppose you have you're already paying for Antropic, Open AI, Azure Open AAI and you can host your models there. You can now bring in these models and use them with Copilot CLI. And this support also extends to local models. So if you're running models using foundry local or lama, you can have one of those models come in and power your GitHub copilot sessions. That that is a big change. Let me bump the font size a little bit just so it's a bit clear. Yeah. So, as I've said, GitHub Copilot CLI now lets you connect to your old model provider to fully run local models instead of just using GitHub hosted model routing.
So, that's pretty neat. you can connect to any model provider. So, you can use Azure OpenAI, you can use Entropic or any OpenAI compatible endpoint. you can configure that and run it using copilot CLI. Okay. Um so with that that means that GitHub authentication is now going to be optional since you won't need to authenticate into your GitHub account given that you're now um opting to use different model providers. So we'll actually test this out and see it in action. Now a few things you need to know before trying to use um copilot CLI with your own model providers is number one your model must support tool calling.
So it should have tool calling capabilities as well as streaming and we are also recommended for best results ensure that you have a model that has at least 128k token context window. This is for you to get the best results. Okay, so with that, let's actually hop into a demo. Let's try and use the Copilot CLI. All right. So, let's start this. All right. So, what I'm going to do right here is I'm going to open a Copilot CLI session. I'll open this right here on Visual Studio Code. So I can run Copilot in the terminal, but I'm going to still stick to VS Code because I want us to have a streamless experience.
Um, just navigating through the the different demos. So what we'll do here, I want to confirm that I'm now connected to my GitHub hosted um model. So this is the default experience. So if I check my model list, you can see that these are the different models that I have access to um through my GitHub copilot subscription. So these are the models that I can um access through my GitHub copilot subscription. And as you can see, GitHub does an amazing job at surfacing the most powerful OpenAI models for coding. So I have from GPT 4.1, GPT 5, 5.1, 5.2, 2 5.2 codeex 5.3 codeex 5.4.
So this is really like the best list of models. But what we're going to do is I'm going to exit this instance. And um yeah, I see a comment here. So Farooq is saying I don't understand but I still I'm still watching. So let me let me just take a step back and explain what we're doing here. So an announcement that was announced two days ago, April 7th, was that copilot CLI now supports bring your own key and local models. So instead of just using your GitHub hosted models that is through your GitHub subscription, you can bring in your own key and then power copilot CLI using your model provider of choice.
And that is the demo that we have just started. So Farooq, I hope that gives you some bit of context on what we are doing. So I have opened GitHub Copilot CLI. Okay. And I've just confirmed the list of models that I have access to as part of my GitHub copilot subscription. And as you can see, a model that I don't have, for example, is GPT 5.4 Pro. Okay. So I'll switch over to my browser. the model that I want to use or the provider I want to use in this case is Azure OpenAI. Okay, so you've seen that uh let me switch back to the change log.
You see that we can bring in models from either Azure Open AI and Tropic or any OpenAI compatible endpoint. So for this demo, we're going to use Azure OpenAI. And so for that, I'll switch over to Microsoft Foundry. So this is where we can deploy OpenAI models, but on Azure, right? So uh I'll go over to the discover tab. So this is Microsoft Foundry. This is where you can deploy models, you can build agents, etc. So what I want to do is I want to access models from OpenAI that are hosted through the Azure OpenAI service.
And a model that I can deploy here is GPT 5.4 Pro. Now I'll confirm that currently with my GitHub Copilot subscription, I do not have access to GPT 5.4 Pro. Again, we can see that from here GPT 5.4 Pro is not in this list. So we're going to exit we're going to exit uh the Copilot CLI instance there. And then um and then I'm going to just kick off a new terminal instance push that here. And there's a command that you can run to give you some setup example. So how can you start configuring to use your different providers?
So I'm going to run copilot help providers. Now that command should allow me to see the different configurations that um that I can configure to use different providers. So in this case I need to provide the base URL that's the endpoint. I need to specify the type of provider. As we said we have three options open AI Azure anthropic. Then we have the API key. You can pass that in bearer token wire API and the Azure API version. For your model, you need to configure the model name, the model ID, wire model, the maximum prompt tokens, as well as the maximum output tokens.
And we do have some examples here. So you can see if you're trying to configure OAM, if you're trying to configure an OpenAI compatible endpoint, in this case, this is what I'm interested in. I'm try I'm trying to configure um with Azure OpenAI. So this is the exact configuration I need. So I prepared that a bit earlier. As you can see here, I have the configuration. So I'm going to configure my environment variables. I'll set let me just copy this and I'll start a new terminal. paste that. So you can see that I'm configuring my provider type as Azure.
I have passed in my base URL. So this is the project that I have on Foundry. I have passed in my API key which I am going to change right after this stream. And then the model that I want to use is GPT 5.0 Pro. Now before I run these commands, I'll scooch over on Foundry and again remember from the change log we had some requirements to get the best experience with the model. So in this case I'm going to confirm that for the context window we do have provision for at least um 128k context window.
And then in terms of capabilities I believe this is a model that has support for tool calling. So it should be an ideal candidate to use with copilot CLI. So I'm just going to go ahead and click deploy. I'll deploy it with the default settings. So in a few seconds, this model should now be deployed on Microsoft Foundry. So I'm using Azure OpenAI. So my model is deployed and this is the model that I want to power my C-pilot CLI instance. All right. So the model is already deployed. I'll confirm the name. It's GPT 5.4 Pro.
Now if I switch back to VS Code, that is the model ID that I passed in as well as the wire model. And I'm going to hit enter. So now I expect to have, as you can see down here, let me just zoom in one more time. You can see that right now my C-pilot CLA instance is running. but it's using GPT 5.4 Pro. So this is not a model that I have as part of my GitHub copilot subscription, but rather this is the model that we just deployed on Azure OpenAI. We have configured our environments and variables and now we are able to use it with Copilot CLI.
So I think that's pretty cool. That's really awesome and we can test it out. Let's just send a quick hi. Um the request operation is unsupported. let's see. Let's try reloading. Let's restart. I'll confirm again that my model is deployed. Yes, it is deployed. Uh let me let me provide the confirm that the endpoint is okay. Let me let me paste the API key one more time. let's see. Let's see. Let's see. All right. So let's try it one more time. Not sure why that didn't work. Hello. Okay. The request operation is unsupported. Okay. All right.
All right. So, let's see what what might be wrong. So, let's just exit that. So, we've passed in our provider type. Um, you can look at the example. We need to pass in the provider type. Our base URL. Let's double check that we have that correctly. That is our base URL. Okay. Let's let's confirm I'm getting the right one. All right. Interesting. Um, so for the key that is the right key 5.4 Pro should be the name of the model. Let's see. Let's see. So, GP 5.4 4 Pro. Yeah. Uh looks like I have configured everything correctly.
GPT 5.4 Pro then copilot. So I'm not sure I'm not sure why this isn't working. One last try. So welcome welcome Alan. Uh what we're trying to do here is we're trying to configure Copilot CLI to work with a model that's hosted on a different provider. So not just from Copilot subscription, but you can bring in your own key and use your different model providers. So let me see. Okay, so I'm connected to GPT 5.4 Pro. Uh let me see. There's something I am missing and I again okay the request operation isn't supported. So why don't we try a different model still hosted on the same uh still hosted on the same service.
So I'll look under my models. I have an 03 model that is already deployed. So let's just try that. Let's see if we can swap out the model and then retest it. See if it's going to work. So I'm going to copy our configuration here. This is what we are configuring. So I'm going to change the model to 03. Sorry about this. 03. Okay, let's give it one last try. I could have sworn that this worked um earlier when I was testing it, but let's let's give it one last try. And I see that I am now connected to use Uh let's send a hi.
Okay. Yeah. So this one seems to work. So could be that the model that I deployed um either was not successfully deployed or some other reason. Let's try and see. So our 03 model is accessible. We have sent in a request from Copilot CLI and the model sitting on Foundry has given us a response. The model we're using right now is 03. You can see that at the bottom here. I'm not sure if this is visible. So, let me zoom in a bit. The model is 03. So, if you check on Foundry, yeah, I'm not sure why our GPT 5.4 Pro model didn't work.
That's that's okay. So let's just proceed with all three. Um so Jay uh was giving us a suggestion. The base URL URL is /v1 not required. Uh yes. So for O lama I believe that for all lama that is required but using it with Azure Open AI you do not need the /b1. So that's not required when you're connecting to Azure OpenAI. However, you see that for you to bring in any OpenAI compatible endpoint or to connect to OAMA, you do need to have that trailing/v1. Yeah. So thank you. Thank you Jay. I believe the issue is with the model because we've just swapped out the model.
So instead of using GPT 5.4 4 Pro. We'll now use um we'll now use 03 and then see how far we're going to go with it. Right? okay. Any questions? Any questions? Anyone lost? Okay. So, what we'll do is I have an application running here. Let me zoom out just a bit. And it's a more working room. So what I'll do is let me start this application and then let's try and assign a task to 03 that's running on Microsoft foundry right. So I'm going to run this application npm rundev again. I'll open it. I'll open it using integrated browser.
I'll go to local host 3000. Okay, we need 2011, right? So this is the application that we have running here and I have a feature in mind that we can assign to the 03 model and then just see how how how it's able to handle it. So as you can see we have a recently added shopping cart page uh card page. But if I go back on the homepage and I try to click on the cut icon for each of the items, that feature is yet to be implemented. Right? So that's the feature we'll try and assign to 03 to our model and then see um if it's able if it's able to work on it.
Okay. So I'll switch back to my instance. So this is copilot and I'm going to ask it can you add bitart funality pretty on the homepage and uh what's that section explore all popular product section popular products section. Okay. So, let's hit enter. Um, so again, remember this is a model that's running on Foundry that we are now connecting with from the Copilot CLI. Those who have just joined, we're trying to test out the new um functionality for bring your own key in Copilot CLI. So, we'll see that it's able to pick up the task.
It's going through our code. So, that's fine. It's able to use our codebase right here as context. So that's a good thing. And yeah, so let's wait for it and then we'll see how far it's able to go with this task. A disclaimer that I'll mention here is that we probably have the best models for coding already available through copilot subscription. Uh but in this case, I have not used all three for coding scenarios. So, I'm not sure how good of a job it's going to do, but let's just test it and see how far we're able to get.
Okay, so I'm going to look at the chat, see if we have any questions. Okay, Raza is asking me to come a bit closer to the camera. Um, not sure if I'll try zooming in. Let me make that's better. So, yeah. So, it's currently working. We'll probably give it some time and uh yeah, so it's basically working. Probably what we can do in the meantime is I know that on Foundry um on Foundry, you're able to monitor the usage uh the token usage. So let's just open our model and we can switch over to the monitor tab and right here we should see some um we should see some metrics.
Okay, you should see some metrics. Now for that single job that we've given it, it did not consume this many tokens. Uh this is an aggregation because I tested this yesterday. So you can see I tried to test this functionality yesterday and I used the same model. So that's why you're seeing the token count being a bit high. But yeah, so ideally you can get these usage metrics directly on the copilot CLI. But if your models are hosted on for then you're also able to get them through this monitor dashboard and you're able to just have a really nice um UI with a breakdown of the metrics in terms of you know how your model is interacting with your application, tokens in, tokens out, etc.
So that's also something I wanted to point out in case you're bringing in models from Azure Open Air. You have both options in terms of how you can monitor your token usage. So let's switch back. Uh can see it's still working. Uh let's allow it to edit our code files. Uh so we can see it's already writing the implementation. So I I would say it's quite slow. It's a bit slower than what I'm used to working with GitHub um models, but let's see if it's able to get the job done. That's that's the key thing, right?
Yeah. So it's working through the errors that it's facing. So as you can see, it's actually interacting with this um live application right here on VS Code. So that's that's also a good thing. instead of spinning up an external browser, it's able to um interact with it using the integrated browser. So, right, that's that's a that's a pretty nice feature. So, let's give it some time to work. It's updating our code files. So, let me minimize that and try and test it now. I don't know if it's working. So, it's not yet working. So, let's give it some some bit of time to keep working.
Okay, any questions? Any reactions? Has anyone tried bring your own queue with Copilot CLI? I know this is has just been announced this week, but probably someone has already tried it out. Let us know what your experience has been like if you have any questions. Okay. So added full card context with local storage persistence. Um all right to use. Next step obtain button on click. Okay. Let me tell it to continue. Again this is slightly different from what I'm used to using um GitHub hosted models. It sort of just takes up the entire task. But again let's just see how far we're able to push it.
But in the meantime, um let's go back to the change log. So we have tested out using a different model provider. So that's Azure Open AI. But you can also use this with models that are running locally on your computer. So think, think foundry local. So why don't we also go ahead and test that and see how that also works. So in this case, I have Ola installed. So that's probably what we're going to use. I'm going to open a new terminal as we allow Copilot to continue working on that feature. Uh wait, it says it's already done.
Okay, let's let's just test it. See if it's working. all right, so not yet. Yeah, that's still not working. So, the add to card button isn't working yet. It's an estate. Okay. So, let's open up a new terminal instance. So, now what we're going to do, we're going to do something different. Okay. So Jay is asking a question. Um can we do a quick compare token usage showing in copilot CLI and the Azure foundry monitor page? Of course. So we can do that. Um so we can do that. Let me let me see if I can pull usage.
So we do have yeah so we do have some usage metrics here. So it says 13.6k out um 5 1.5 m in. So let's switch over to Foundry. And again I said I've used this model before. So the reason I was trying to use GPT 5.4 for pro was because I haven't tested out this model. So that would have given us like a fresh um token count. The metrics would have been from a base of zero. But the 03 the session that we've just picked up right now the base wasn't zero. So the base wasn't you know the new job that we've just given it.
But what I can try and do maybe is just um look at this metrics for today. Uh okay. So let me see if we can get a better view. Okay. So this is still consolidated. This is a total um token usage trend for the two days. So I'm trying to see if I can split it. Okay. Yeah. Now, okay. So, we can filter. Let's do this. That doesn't seem right. All right. So, let's see. Let's see if this if this gives us the correct metrics because again yesterday after testing this working with 03 for the very first time I did get the exact token count on this dashboard as well as slash usage on the CLI.
So I'm just trying to ensure that I narrow down to today's usage and then we can compare the two. So this should be today. Yeah. All right. All right. So we have some metrics here. 1.8 8 million tokens in. Let's retry that as it works. Let let me pull up usage again. So you can see we have 1.8 um million tokens in 17.4K out. And if we are to compare that with what we are seeing here, it's the exact same thing. So we have 1.8 million input tokens as well as 17.4 for pay output tokens.
So the token metrics are the same both on the dashboard on Foundry as well as the um usage metrics on copilot CLI. So J, that's a good question and I hope that answers it. And again, as I said at the top here, we're not seeing the same numbers because this is a an aggregation, a consolidation from the tests that I did um previously. So let me know if that answers your question. Good question. But yeah, what I also wanted us to try was we're now testing it with a different provider. Now, let's see if we can run a model locally using Olama and then connect Copilot CLI to use that.
Okay, so the first thing we need to do is we need to confirm that Olama is running. Okay, so I'm going to do that. So using this command, let me push this a little bit. Okay. So this command just confirms that um is running and it confirms the port here because we're going to configure that as the endpoint. So Olama is running and I can also list to see the models that I have and one of the models that I have is gamma 4 latest. So this is something we can just test out and see if we can now run a copilot CLI instance that's powered using this model.
Right? So let's go ahead and test that. Now for the configuration uh we're going to need let me just use copilot help providers. This is a very useful command actually. So, in case you're stuck, you don't know what you need to configure, you can just run copilot help um providers at any point and you're able to just get this quick um cheat sheet in terms of what you need to configure for each provider. So, now we want to run this using OAMA and we need to configure two things. That's the base URL and we've confirmed that um ours actually works and we need to to configure the model name.
So that's exactly what we're going to do. Again, I have this already prepared. So I'm going to paste in our base provider is localhost 11434. And J to your previous point, we do have the trailing V1 um on this endpoint. So that's needed here. And then for model, we've seen that one of the models I have running on my machine is GMA 4 latest. So I'm just going to configure that. And then I'm going to open a new Copilot instance. Okay. So now this copilot instance as you can see is now running and powered by an Olama model.
So if I lost my internet for some reason I should still be able to work with Copilot CLI using this model. So this really is useful for offline scenarios or if you just want to um save a little bit of cost in terms of the paid models, you can easily find a good model that can do a decent enough job um and then basically just run copilot with it. So again, I'll just test it. Send a quick high um slower. Again, this depends on your computer. Uh, so mine, I don't know if I'm proud of my PC right now, but yeah, it's it's going to be a slow experience as you can already imagine.
But this is a this is a model that's running locally on my computer. It's not talking to my GitHub subscription. I don't have to be authenticated to use this. So, that's a pretty interesting um feature. Okay, let me look at the chat, see if we have any questions. So now we have two sessions running. One is using a model that's hosted on Foundry. Let's look at the progress so far. We had given it a task of adding this cut functionality uh to our homepage. again it has not yet huh yeah it has not successfully handled that task so for now I'm going to stick with GitHub models but yeah and it looks like it introduced an error but yeah I think for for simpler tasks I would of course not um assign very heavy tasks to these models again my go-to models are GitHub hosted models But I'm hoping from this demonstration you've seen how you can probably just pull in models from different providers.
So you're not just restricted to your the models that that you get as part of your subscription. If you have models um on OpenAI anthropic it's a similar path and then you can power your CLI um sessions using those models. Okay. So for now uh let's switch back to the instance that is running using an OMA model. So that is GMA 4 latest. We are getting a response here. Um again this is a local model. So I will not give it a heavy task. So in this case, I'm just going to ask it something like um what factors should I consider when choosing a database?
So again, the response that we get here, I expect to be from a model that's running locally on my PC. Okay. So let's see, that's uh what I had planned to show. So, we've looked at how you can power Copilot CLI using your own providers. Uh, we've seen how you can use both the usage um feature on Copilot CLI as well as the monitor feature on Microsoft Foundry just to have an understanding of your token consumption. So yeah, we've also tested how you can configure this to run with models um that are on your PC.
So we've tested out if you're using Foundry local, I believe it's also similar path yet to to test it out out. So if you have tested it out, please let us know on the comments. And uh one last thing that I wanted us to talk about before we wrap up is this um recent announcement about the rubber duck agent, right? And this to me is a very interesting um interesting announcement from the GitHub research team. So the whole idea is that the copilot CLA now in experimental mode can invoke a second opinion from a model within a different model family.
So let's just read through this announcement to see what this entails. Okay, so the announcement is that we have a rubber duck agent in experimental mode. I will briefly show you what that looks like. But the whole idea is when you spin up a task with copilot CLI. So let's say you generate a plan to add a new feature to do some bit of research. Then let's say you're using a cloud model claude set 4.5. Then the robber agent is a builtin review agent that offers a second opinion to that primary model. So let's just read what we see here.
So Robodak leverages a second model from a different AI family to act as an independent reviewer assessing the agents plans and work at the moments where feedback matters most. So if I'm using if my base model is let's say set 4.5 then the rubber duck agent is going to review the output of set using a GPT 5.4 for model that's a different model family. So that's anthropic and open AI and I believe vice versa should also be the same. Um and this is still in experimental mode. So you'll have to switch over to experimental mode in the CLI to work with the rubber duck agent.
And why is this important? So this statement here makes uh makes the most sense. So to catch different kinds of errors, a different perspective matters, right? Our evaluations show that cloet plus rubber duck makes up 74 about 75% of the performance gap between set and opus alone. Okay, so let me just talk more about that statement. Um you already know that you get different charges for set and opus. OPAS being about three times and um set is 1x. So what this means is if you use a combination of set plus rubber duck which will use an openi model you basically get an output that's very close to what you would get with opus.
Okay so so that I think is also a very big deal. Um so let's also read a bit more in terms of what this means. Um so from the research team right the research team they discovered that if you ask a model to review its own output then its own training biases will still kick in right so if I perform a task with cloud sonet 4.5 and then ask the same model to review that output then it's it's still going to be bounded within its own training biases. Okay. So in this case, instead of just using that selfreflection technique, we're now bringing in a review agent that's the rubber duck that now has a model that um that works with a model from a different model family.
So I hope that makes sense. That is something that's quite exciting. So rubber duck basically just adds a second perspective. So in this case when you se when you have selected a cloud model from the model picker to use as your orchestrator rubber duck will be using GPT54. So that is what um you can try today in experimental mode. So to test it out we can see we got an output here from our lama model. So again, if you have use cases for you want to use copilot, you're not connected to the internet or or you just want to use local locally running models, this is something that is now supported and you can do.
So let's just try a new instance and I'm going to open copilot. So let's run the CLI and in this case I am using my uh copilot subscription. I'm going to confirm that I have experimental mode set to on. So this is a required step if you want to test out the rubber duck um you want to test out the rubber duck agent. So I have toggled that over to on. And a task that I'm going to do is so our application right here again. Let me close. We have so many running um running tabs.
Let me close this. Yeah. So let's open that application again that is localhost 3001. Okay. So I killed the terminal. So let me restart the app. Let's see. Okay. So, for this application, we do have a chat functionality, but it's not yet fully built out. So, this is just the UI. It's going to give us some um some hardcoded responses. So, the agent behind this experience is yet to be built. So, what I want to do right now is in my new copilot instance, I'm going to uh I'm going to switch to set 4.6. Okay, so that's charged at 1x.
I'm going to use medium reasoning level and I'm going to ask it in research mode. So, I'll use the slash research. I'm going to ask the agent to um do a deep research on agent development with Lchain. And then I'm going to pass in a link to the lchain documentation. So here I expect um the CLI working with cloud sonet 4.6 to do some deep research on how to build agents using lang chain. I'm going to approve this permanently. So it's going to do some deep research. is going to go and just get as much information as it possibly can about how to build agents using lang and langraph and then it's going to uh give us a research report.
So it's going to create a research document that we can now give over to the plan agent to create a plan on how to basically enhance that application using a customer service agent. So that's the scenario that we want to test and I want to see if on its own um copilot is going to um ask for some help from the rubber duck agent and if it doesn't we can manually ask for that right so if I go back to the announcement um so you see here we have this section when does rubber duck activate so number one it doesn't activate at every single single um stage of using the agent but at only key points where feedback really matters.
So which are these key points? The first one is after drafting a plan. Okay, so you're working with an agent, it drafts a plan before implementing that plan, then it can ask for a second opinion, a rubber duck agent to come in and review that plan. Now if the plan was built using Sonet 4.6, six, the review agent is going to use GPT 5.4. So that's the whole idea. so let's see. So it's still doing our research. Let's give it a while. So that's the first place where the rubber duck agent will kick in. After a plan has been drafted, then it's going to simply review that plan.
And then number two is after a complex implementation. So after something really big has been implemented then uh the primary agent might call on the rubber duck agent to come and do a quick review just give a new and fresh set of eyes on that implementation. So this will overall just improve the performance of the output that you end up getting. And then the third place where the rubber duck agent activates is after writing tests but before executing them. So you write these agents are notorious for writing agents uh writing tests that are designed to pass.
So in this case after writing the tests before executing them then the rubber duck agent can come in and this is a chance for it to catch any gaps that might have been left intentionally by these models and address them. Right? So again, the whole goal is to have that critique at the stages where it matters the most and then it's going to use that to overall improve that output that you get. So we see that our phonet model here is still deep in the research. So it's just trying to get as much context as it can about this framework.
Uh and this will just make it a bit easier in terms of if we hand this over to our plan agent, it will know exactly the approach that it can take, the tools it can use, the frameworks, dependencies, etc. All right. So, we do we see that this is really extensive research. So, we'll just give it some minutes to finish up. Looking at the chat, um not seeing any new questions. Uh yes, I agree with you. It's it's time to try new things. So yeah, these are just announcements that have gone out this week. So it's your chance to see them in action, see what works, see what doesn't work, and what can be improved.
So I do agree with you. This is where we test out new features. We test out new things. All right, any comments, any questions? So we're waiting for the research to complete. Output from this step is basically a research document that has all the relevant information in terms of how you can build agents using lchain and langu. Let's see if there's anything interesting about this feature that we can talk about. Yeah, so it's in experimental mode. If you want to test this out, of course, I'm assuming you have Copilot CLA installed. You just need to use the /experimental um command and then toggle that to on.
Right. So, you're going to do that. All right. Sharifa, interesting comment that this is exactly what I'm interested in as part of my research. Amazing. I hope you get to try it out. Um share your feedback. uh share your thoughts and suggestions on how this can be improved. Speaking of research, if you want to have like, you know, a step before plan, we do have the usual plan mode that we're used to, but before you even get to planning, you can start with a research step. So you can see we have a builtin feature for that where you can just use slash research and then put in the topic that you want the agent to do a thorough research on and then it's going to give you its research findings and then now that that better positions the planning agent to work with the best possible context for whatever it is you're trying to implement.
So that has been part of my workflow. Before even adding a new feature, I do some bit of research in terms of approaches, research in terms of tools, frameworks, services, and then I pass over those research findings into the plan agent and then that gets into my normal workflow. So if you haven't tried it out, that's something you can look into. Okay. All right. All right. So, it's still doing its own research, looking at tutorials. It's looking at code snippets. So, I expect it to come with a pretty detailed research plan. Any questions? Any questions as we wait for the agent to work?
Yeah, I believe that's yeah, that's what I wanted to cover in this live stream. And let me just share this on the chat in case any of you want to try out the rubber duck um agent. Oh, that's a very long link. Okay, so it's on the chat. Yeah, pretty cool. right up here. It has more details about how it works, when it's activated. Um the the results for testing this out. As you can see, the result you get with Sonet plus Rubber Duck is very uh similar to what you would get using Opus, which is a considerably um better better model.
So that that should be exciting for most people. Okay. So it's it's really busy here trying to understand how long works. So just looking through the chat. Does anyone have any questions? Even if it's not directly related to what we have covered today, could be something that you're just curious about. Okay. Um, so let's see. Let's see. Uh, Novoa is asking, "What's the MCP name?" what what do you mean? What do you mean? Which MCP are you referring to? for the workflow. If you're if you're talking about the rubber duck agent, it's not an MCP server.
It's a builtin agent. Uh so how it works is um remember with copilot you can configure custom agents and while copilot is running it can invoke different sub agents. So it's the same model. Rubber is not offered as um as an MCP server but it's basically a review agent. So you can see here it's a review agent that sits within co-pilot CLI and right now to use it you have to switch on experimental mode because it's still it is still in in its experimental stage. So just clarify what it is you meant by what's the MCP name because so far we have not really configured any MCP for any of the demos that we've done.
So Nav probably could clarify if uh that's what you are asking about. Okay. So we can see it has done a really comprehensive um research and it's now compiling the findings and generating a report for us. That's good. So I expect to have a research document here. And so far I have not seen the rubber duck agent being invoked by copilot cla. So we'll see if it does that at the end. If it doesn't we'll manually ask for a review. We'll manually ask for a review from the rob agent. So right now it is trying to pull together the findings into a report.
All right. Uh thank you for the for the message. Yes. So yeah, so rubber dak is not an MTP server. It is a builtin agent within copilot CLI. Okay. So it's it should be wrapping up on the report any minute now. Okay. Any more questions? We are just about to wrap up the stream, but any questions related not related to today's demos? A question about another recent announcement. Yep. Okay. So, has anyone tried bring your own key uh to Copilot CLI? I haven't seen anyone on the chat who's tried it before, but has anyone tried that?
not sure if it's my internet or the agent is slow on its own, but hopefully hopefully within the next few seconds it's going to wrap up the report. Okay, so Sally responds and says that I haven't tried bring your own key because I don't have any other keys. That's not a bad thing because the good thing with a copilot subscription, you easily get access to codeex, you get access to claude agent and you also get access to some of the best coding models out there. So you almost don't always have to sign up on different providers.
A copilot subscription in most cases is more than enough. So that's not entirely a bad thing. Uh, so you only use um copilot from your own GitHub account. I believe that's actually more than enough. But if you ever have scenarios where you want to test out open-source local models, that is available. So it's it's a supported feature you can always try out. All right. Um there's a question here in in what types of more specific processes can we have a more effective use of this agent? Yeah. So this agent I will interpret that to mean the rubber duck agent that we are currently talking about.
So yes. So my my thinking here is this agent comes in to provide that second opinion. Right? In most cases, you'll find that an agent will probably build a plan, but then it has its own limitations or there are gaps that it has not considered. There are approaches it completely missed. So probably just think of it as you have your primary agent. In this case, uh let's say you're working with cloton at 4.6. It has its own limitations. Okay. So it build out a plan and remember that how these agentic workflows work is from the plan everything else that comes after is using that plan as the foundation as the base.
So if the plan was wrong or it it included an approach that wasn't the best for the feature or whatever you're building then you can imagine that every single step the testing the implementation every single step that follows will be based on that initial plan. So a case where this agent can really help improve your process is before you even implement that plan remember the rubber duck agent works with a model from a different family. So the idea here is if you're using set 4.6 as your primary model it has its own non-limitations but then these limitations might be strengths for a model from open AAI say GPT 5.4.
So with this combination, you're actually canceling out these limitations where you're going to combine the strengths of cloet as well as the strengths from 5.4. So that's how I can look at it where it's bringing in a review not just from the same model, not even from a model within the same family, but a model from a different family with a different set of features, capabilities, and strengths. This way it's just going to do a better job at um providing a critique on the plan implementation testing approaches etc. So you can just think of it as a performance boost as you're working with agents within the sale.
So Nava just let me know if that's a good enough answer but yeah that's how I can see this agent being really really effective. It's just in terms of providing that second um opinion, that fresh perspective, bringing in strengths that my primary model might not necessarily have. So, this has taken longer than I expected it to. Yeah. So the only remaining step is to basically see the rubber duck agent in action provide its own critique on this report that um our set 4.6 model has created. So I'm just stalling long enough to get to see that um in action.
Um but yeah, so hopefully it should be wrapping up. Good questions there. Probably time for one last question. All right. You're you're welcome. You're welcome. Yeah. And I hope that you find use cases, more and more use cases. Again, this is still in experimental mode. So I'm pretty sure that the team is looking for feedback and initial reactions from the community in terms of whether this is helpful, whether you see that significant boost when you're using um models that are not considered to be the most performance. And then if you see an increase or a boost in the output, then that could be thanks to this um to this additional step.
Okay. So, it's finally writing our research report. so that that that to me was a very comprehensive um research. So, my assumption here is that if we pass this to a plan agent, it's working with the most upto-date, the most accurate information about Lchin. So it will have everything it needs to know about how it can integrate that agent functionality into my web application. All right. So waiting for it to finish working on that file. So as output we should get a research.m MD file which will just have a consolidation of the research findings from this research.
Okay Joseph um so the same can be reached if I pass output of model X to model Y. Absolutely. You can choose to do that manually and frankly speaking that has been the default workflow for most people but you do it manually or you just have an automated pipeline where you have an agent provide an output and then you manually pass that on to a different model. So you can easily do that. But now with rubber duck as you can see it's built in. So you might also never even know when rubber duck is invoked because it's part of the workflow as the agent is working.
As soon as it creates a plan, it just invokes rubber duck to get a critique to get a review and then it can improve based on that review. And something else that I forgot to mention is um let me Yes. So there's this statement here that the agent can also seek a critique reactively if it gets stuck in a loop or can't make progress. Consulting rubber duck can break the log jam. So if you're doing this manually, it might be difficult for you to spot exactly the point where the agent gets stuck. But from this statement, we see that with rubber duck being built into copilot CLI, if an agent is working on a task and then it's just stuck.
Probably the context got very large and the model to some point was unable to get out of a loop. then it can reactively call on the rubber duck agent and then it's going to come in and sort of help untangle that mess if you if you wish. So this is another useful case where to your point Joseph yes you can do this manually but uh working with a rubber duck agent it has more than you know just more than the predictable stages where you can manually invoke the agent but in this case if whichever model is at work if it feels like it's stuck or needs some critique to help unblock it then the rubber agent is always there.
So yeah, that's something I had forgotten to mention, but is also um a good way to think of how practical using the rubber duck agent can be. Okay, so let's see. Do we have our research ready? Okay, so our research is ready. Let me share the file research. Let's save it as research.m MD. So with that command, I am simply just pushing over that research into my codebase. So if I open this, I do expect to see a very deep deep research um report. This is really thorough. It should capture all the key details about building agents using um using launching.
So what I would do normally is to pass this over to the plan agent. But in the interest of time, I'm just going to ask for a review. So, can I get a review from rubber duck on on this report? Okay, so hit enter. Hopefully, this won't take forever. So hopefully the rubber duck review agent is available and and our model should be able to to use it to review this re research findings. Okay. So I do suspect that my internet might be the issue because we had to wait for so long to get that research report created.
So All right. Uh Sherifa, you're welcome. Thank you for joining the stream. This went uh way longer than we normally have this stream for. Yes, this is helpful. Specificity is key when using rubber duck for research to prevent bottleneck that cause output delays. Then 100% 100%. Yeah. Yeah. You wouldn't want to be introducing um to be introducing bottlenecks. So that's why and the this particular um article does mention that it's not designed to have rubber duck um being invoked at every single stage. So you can see here that it's an intentional design choice where the agent invokes rubber duck sparingly.
So I believe this is to the point from you Sharif there on the chat in terms of not having to introduce um alternates in the process but rather make it effective. All right. So I'm going to stop this. This is painfully slow but let's let's just restart that. Okay. So, let's give it one more try. So, I'm just I'm just looking for that confirmation that I have passed this research along to the robber agent and then hopefully factor in the critique from that agent into refining this Um, let me start a new question. All right.
Yeah. So, we see it's it's responding. So, it says that the user wants a rubber duck review of the research report. Let me launch the rubber duck agent to review it. So, in this case, our primary agent is cloet 4.6. So, we expect the rubber duck agent to use 5.4 um um GPT 5.4 as the model to do this research. So it's going to sort of combine the both of the the best of both worlds. Okay. Um yeah. So that's it. I I believe we can stop at this point. Um I was just waiting for that last confirmation that hey we've seen the agent invoke the rubber duck um agent to review our research.
So the expectation here is that the robot agent will do its own research and then compare the findings from Sona 4.6. If it can improve on something, it's going to basically collaborate with our primary agent to improve our research even and make it even better. So yeah, try that out today. I have shared this on the chat. Uh you can also find it on the GitHub blog page. So give it a try and uh let us know if that's helpful, that's useful, that improves the performance that you get working with Copilot CLI. And thank you all so much for joining the stream.
I hope to see you all next week. Um we do have I believe two one or two more um rubber duck Thursday sessions today in different time zones. So if you are available for those you can always hop in and probably uh focus on other topics with different colleagues. Okay. So with that we will end there today but I hope to see you all on the next one. Thank you so much and enjoy the rest of your days. Bye.
More from GitHub
Get daily recaps from
GitHub
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.









