Microsoft Build 2026 Day 2 LIVE | GitHub Copilot, VS Code, and more
Chapters16
Hosts Burke and Pierce introduce day two at Build, outline sessions, and preview streaming of Windows topics and breakout activities.
A fast-paced tour of GitHub Copilot, VS Code, and AI tooling at Build 2026 Day 2, with live demos, new models, and hands-on agent workflows.
Summary
GitHub’s Day 2 coverage at Microsoft Build 2026 brought Burke and Pierce guiding viewers through the latest in VS Code, Copilot, and the Copilot CLI/SDK ecosystem. Pierce explains the Copilot app as an agent-centric desktop experience that coordinates multiple tasks and agents, and Burke teases a live “vibe code” challenge and a Q&A on the new MAI coding model. Addie Asmani from Google joins to discuss agent skills, the SDLC-aligned approach to skills, and the practicalities of ship-ready prompts. Patrick Nicoletic walks us through the Copilot SDK/CLI, its multi-language support (Node, Python, Go, Rust, Java), and the shift toward a Rust-based runtime for better performance. The session also features deep dives into agent orchestration, model selection (MCP auto, Gemini 3.5/5.5 family), browser automation in the VS Code integrated browser, and real-world workflows for PMs and designers who now contribute code and PRs. The show highlights live coding on stage, hands-on demos with the new toolchains, and the evolving role of designers and product managers in an AI-augmented development era. Throughout, speakers emphasize trying newer models, giving feedback, and embracing “agentic engineering” as a spectrum between vibe coding and disciplined automation.
Key Takeaways
- GitHub Copilot evolves from a coding assistant to an integrated agent ecosystem, with the Copilot app, Copilot CLI, and Copilot SDK enabling multi-agent workflows across editors and apps.
- The SDK enables packaging the Copilot agent runtime into any app, with multi-language support (Node, Python, Go, Rust, Java, and more) and a future Rust-native runtime for smaller footprints and better performance.
- Agent skills (from Addie’s talk) are designed around the software development lifecycle (define, plan, build, verify, review, ship) and load on demand, enabling teams to reuse proven workflows across projects.
- MaI code flash is a dedicated coding model built from scratch for coding tasks, trained with real GitHub Copilot usage data, and designed for fast, context-aware code generation in VS Code.
- The Copilot ecosystem now supports model selection automation (auto) and smarter orchestration (MCP/app-wide model pickers), reducing token cost and latency while preserving developer control.
- The integrated browser in VS Code, plus the agent window and cloud/remote agent options, unlocks practical on-device and remote testing, including Playwright integration and DOM-level insights.
- Foundry Toolbox introduces tool governance and tool search to manage many tools behind a single MCP endpoint, addressing authentication, context window bloat, and policy enforcement for enterprise-scale AI.
Who Is This For?
Essential viewing for senior developers, engineering managers, and product designers who want to understand how AI agents are being embedded into the VS Code ecosystem, Copilot tooling, and enterprise workflows. Great for teams piloting Copilot SDK/CLI integration and those evaluating multi-model, multi-harness AI strategies.
Notable Quotes
"‘The app helps with all that stuff as well. So, I’ve noticed this with Copilot’s app—the code is intentionally backgrounded, you check if it did the thing, then you see how you feel about it.’"
—Pierce explains the Copilot app’s focus on outcomes and the idea of keeping the code out of sight to stay outcome-driven.
"“It is the first coding model we’ve built from scratch here at Microsoft.”"
—Pierce or Burke highlights MAI code flash as Microsoft’s ground-up coding model.
"“Skills load on demand. So the moment a task arrives, it matches with a skill and loads its steps.”"
—Addie describes the on-demand loading and reuse of agent skills in the Google/AI agent talk.
"“The harness is the system your AI is running in… the agent loop and how we call it.”"
—Patrick Nicolletic explains the concept of harnesses and the agent loop during the Copilot CLI SDK demo.
"“There is no right or wrong way… it’s just different ways of working.”"
—A candid reflection from Burke/Addie on workflow diversity in agentic engineering.
Questions This Video Answers
- How does GitHub Copilot’s SDK enable custom AI workflows in VS Code?
- What is the difference between using Copilot CLI vs the Copilot app for day-to-day tasks?
- What are agent skills and how do they map to the software development lifecycle?
- What is MAI code flash and how does it compare to previous coding models like Opus or Haiku?
- How can enterprises use Foundry Toolbox to govern tools for AI agents?
Microsoft Build 2026 Day 2GitHub Copilot appVS CodeCopilot CLICopilot SDKAgent skillsMaI code flashGemini modelsMCP auto model selectionPlaywright integration in VS Code embedded browser','Foundry Toolbox
Full Transcript
Hey, wow. Heat. Heat. Heat. Heat. Hey. Hey. Hey. Heat. Heat. Welcome to day two of Microsoft Build. How is everyone? Let's get the energy up. That was pretty good for first thing in the morning on day two. I'm not going to do the thing where I'm like, I was do better. I was asking. Anyway, I'm Burke. This is my colleague Pierce. We were just talking about like what is your title? What do you do here? What is it that you say that you do here? Uh, so I work on the VS Code team, but I also work on all things GitHub Copilot, GitHub Copilot app, CLI, etc., etc.
So, things things. And my name is Burke. I also work on Copilot CLI, Visual Studio Code. Yep. Uh, I do my best to make some YouTube videos from time to time. Uh, and otherwise just sort of hang out with peers. A couple of announcements for you today. Uh, so this right here is streaming Windows stuff after 2:15. So if you want to hear Windows stuff, this is where you definitely want to be. Um, hi to everyone watching here at Build. I guess that's y'all. Where? There's nobody. Yeah, that's just these people here. Okay. And then also the people watching online on our social handles, VS Code social handles on YouTube on X.
I guess we're streaming on X as well. Yep. Uh, GitHub, YouTube, Microsoft Developer, uh, Terminal Live, the live coding from the expo floor. Have y'all seen the live coding booth? It's like a glass. It's like a fish tank with people inside. Can anyone go in there or do you have to be I think you have to be sanctioned. I I'm going to be locked in there for 2 hours today. Are you really? What are you going to build? No idea. Huh. Interesting. We'll find out together. Anthony was in it last year. Didn't you build like a programming language or something?
Yeah, Anthony built a programming language in 10 minutes because that's what you can do now. Uh, I'm sure it works too. Everybody's using it. Uh, let's see. We have from 10:15 to 11 VS Code team's AI adoption story. Bre, that's breakout 204. Is that yours? That's my Yeah, that's you and me and Josh from the VS Code team. So, the basic concept is like, okay, you use AI. Now, what? There's a lot of things that break. I thought it was going to be about like our AI adoption story like why we put AI in this.
No, no, no. It's like literally like our team uses AI now. Okay. Now what? Ah, gotcha. Interesting. Okay. And then we have the Oh, then we have the co-pilot game show that's going to be happening on the keynote stage. Yep. That's going to be a ton of fun. Uh Kinsey Dods is going Do you have to say the C when you say Kent Dods? I don't know. Where's Kent? Yeah. Anyway, he'll be there co-hosting. We're going to try to live. We're gonna vibe code something in 45 minutes. See how far we can get. It's going to be a lot of fun.
Join us for that. Uh, and then we have monthly release live streams, y'all, on the VS Code channel if you want to keep because look, how many people have no ability to keep up with what's going on right now? Am I the only one? I work on the product and I don't know what's in it because they're shipping every 15 minutes. Those monthly relief live streams are a really good way to keep up with everything that's going on. Um, you still won't know, but at least you'll know more than you do now, right? It's not possible to keep up with everything.
Uh, and then they want me to mention the blog. I'm not doing that. Okay. Um, opener. We just said all that. All right, Pierce, let's talk about the highlights from day one. Let's do it. So, first, let's talk about the GitHub Copilot app. It's this new uh thing. It's an app. It's not an editor. It's not a CLI. Y what is it exactly? It's an agent native desktop experience for building things. Um so like that is a very marketing answer, right? But uh I think we've all kind of used these session managers, right? Where you you know you spin up some agents, you're running multiple things at once, they're isolated in get work trees and so like the app can do that too.
But really like when you think about your job, right, there's a lot of work that is around all that, right? like deciding what issues to pick up, triaging your backlog. Um, once you actually do stuff with agents, okay, sure, you submit a PR, you have to get it across the finish line. The app helps with all that stuff as well. So, I've noticed this. I don't know. I'd just like to pull the audience, but how many of y'all are actually running multiple agents at one time? Okay, so several folks, like more than two, more than three.
Anybody out here running like 50 agents and your screen is just a bunch of terminal sessions or is that only on social media? Like no one is doing that. So I did have this question about like how many agents is feasible in your opinion inside of these apps to run? Like should we be running 50 agents at one time? What is expected of me now? What am I supposed to be able to do as a developer? I don't even know anymore. I I think there's a lot of performative productivity of like to your point people running 50 things at once, but it's just like always like when I sit down and write my performance review, it's like what impact did I deliver, right?
And if I can get there by running a couple things at once, great. I personally don't think I can go beyond three or four cuz I really start like losing track of what's actually going on and I have to batch the tasks together. I can't like constantly just be jumping in and out of things all day. So for me, I think the the max is three or four. But with the app, like even if you're only running one thing at a time, the interesting thing is like the code is intentionally backgrounded, right? We have an awesome editor, VS Code.
If you want to see the code, you go to that. But the whole point of the app is you can stay outcome focused. So there's like a browser, you can click oneclick run. And so your first like kind of out outcome that you do after the agent is finished is you're actually checking to make sure did it do the thing. Right. Right. And then secondarily, you know, how do I feel about this code? And so the app is more focused on that first problem than the second. Do you really ask yourself how you feel about the code or depends ask yourself that question but then you're just like it works I'm just not going to look because here's what I've noticed y'all with these new tools it's like out of sight out of mind.
You know what I mean? Like if you don't see the code then you don't get nerd sniped into worrying about it and you just it just goes and you're like does it work? I don't know. This has been my experience. So uh the other thing we had is we got a lot of stuff on here. The SDK. Yep. which it's not obvious to me why I need a co-pilot SDK. What is what is that for exactly? So there is in every single GitHub copilot product there is this thing called the agent loop and that's the prompts tools context that actually is powering the experience you get in all these things.
So uh traditionally those had all been many different things. Uh and then we built GitHub Copilot CLI. And so basically what the SDK is is taking that thing that we do a ton of offline evaluation on, we work with the labs on uh to really make sure it's an awesome experience for everyone using GitHub Copilot and we take the core of that and it's an SDK and so you can basically package that agent runtime from GitHub Copilot into any application. Yeah, it's very cool. And if if you're into these like who's doing is anybody out here doing using Open Claw or Hermes agent any one of these personal only one hand?
Wow. Okay. So, yeah, that's been my experience as well, but one back there. So, if you if you're if you have a C-pilot sub, you can use the SDK and you can basically oneshot your own assistant just to play around with it and see how it works. I uh I encourage you to try that out there. All right, last question. You have 45 seconds. New coding model, MAI code one fast. Yeah. What is it? And is it true that it's better than Mythos? It it is the first coding model we've built from scratch here at Microsoft.
So I'm very excited about that. Um in the blog we compare it to Haiku. So it is kind of you know in that class of models that's built for like fast efficient work. Um so like a lot of the training philosophy for this model was how do we get this to be really good in GitHub copilot. So specifically trained on GitHub coils tools agent trajectories. So that should give it really really good results when using it specifically in our harness inside of GitHub copilot. And there's a whole bunch of other efficiency things that it's also trained on like it's supposed to dynamically control like its response depending on the prompt you give it.
So if you say hey it's not going to go on some diet tribe, right? So it has like adaptive solution length training. Um yeah, so it it has all that going on. Um so yeah, I'm super excited. I mean it's the first model that we've shipped for coding here at Microsoft from scratch. We've had some fine tunes we've shipped before. So is it really the first coding model? I I think so. Maybe there was one before like in the early days, but within recent memory, we had like Raptor Mini, but that was a GBD5 mini fine, too.
That's right. That's still that wasn't something that was built from scratch entirely. So, yeah, we're excited. Like, give it a try. Want to see your feedback. It's definitely like our first attempt in this space. So, like it's rolling out in Model Picker right now for VS Code developers. Uh you can also get it in auto. So, if you don't see it, like keep checking back. We're rolling it out slowly, but we'd love your feedback. Awesome. All right. Well, thank you so much, Pierce. Uh, round of applause for Pierce here. Please go see Pierce at his breakout session.
Uh, it's over there on the floor. I don't know what the number is anymore. There it is. All right. See you. Bye. See you. Thank you. Did you know you can use voice to text when prompting in VS Code? All you need is the VS Code speech extension and it's a game changer to be able to just directly speak to kick off your agents. I have this VS Code community contributor website that lists some of the top contributors for the latest VS Code release. It has a feature where you can give kudos to contributors. And I think that this could use some confetti.
To prompt this with just my voice, click the microphone icon in the chat input. Add a confetti animation. When a user gets their first kudos, we'll send that prompt off. The agent will get to work and in just a few moments, without typing a single key, we have that feature implemented. All right. All right. Am I Tom? Am I supposed to say we're back? We are back. We're back. All right, folks. As I was saying, uh, director Google Cloud AI bestselling author. Yes. speaker on AI developer experience, user experience, Adi Asmani. You're you're very kind.
Thank you for being here. Hi, folks. Nice to be here. So, I was I was I told you I was watching a Bloomberg piece about AI and you were in it and I told my wife I was like, "I know that guy." She's like, "No, you don't." I like, "I do." She didn't believe me. She didn't believe you. No, it was probably for the best. Yeah. But Addie, uh Addy, I've known you for a long time. You've been writing and uh making videos and speaking since way back uh jQuery days. Oh my gosh. Doing the web days.
Yes. A lot of web performance stuff. And now a lot of AI agents and skills. Yes. Yes. A lot of agents. And today we uh talked about you coming on and talking about you have a skills repo which was on the front page of hacker news multiple times. I try not to look at hacker news too much these days. cuz you never know if it's going to be good or bad. Well, tell us about it. All right. Um, so today I wanted to talk to you about agent skills. And you know, a lot of people here probably know what they are.
Uh, but very quickly, I'm I'm Addie. I work on a bunch of agent stuff at Google. So things like the agent development kit, agent platform, and we work on a number of skills related projects at Google. So we've been thinking about skills for a while. If you're not really sure um what skills are um they're basically a standardized way to give AI agents new capabilities and expertise um and they've been a really hot topic over the last couple of months um especially as do do you remember back in the days when people would like share their secret like shell config and there would be like a lot of files and they'd be like hey this is these are my magic tricks.
Yes, my dot files. These are my dot files that work really well for me. My magic tricks. These are my m this is my bag of tricks. and you like, I want that bag of tricks. But then you you think, well, my my bag of tricks might be different than your bag of tricks. And so I think of skills in a very similar way. You know, sometimes you want to be inspired by other people's ones. Sometimes it's okay to just take, you know, copy their bag of tricks. And so I think there's going to be some stuff in here where people can just copy and take themselves.
And sometimes there going to be things here that will be inspirational. Now, just in case uh skills are not something you have spent a lot of time on, I thought it'd be interesting to just give people a very quick primer. Um, and so I I vibe coded this very quick, very silly uh Windows desktop and a browser. Um, and I thought that I'd drop Clippy in here because I haven't seen Clippy. I haven't seen Clippy so much at build. We're not We're not allowed to mention Clippy. It's a very It's a very Okay. Yeah, it's very Well, there there actually is a clippy suit that you can wear, but if you wear it, you're not allowed to speak.
True story. Really? Those are the rules. Well, you didn't mention it. I did. So, so we're totally broke us into jail. I broke you into jail. This is weird cuz we have the Mac OS dock down here, but then the Windows above it. I like it. It's bringing the best of all. It's just This is just terrible. Worst of all the worst of all worlds. Okay, so if you're not really sure what skills are, basically there's a standardized way of packaging expertise and and context that you know your agent may not necessarily have. They can pack into workflows or instructions for your agent.
And you know, the structure of them is kind of this. Uh you have a name, you have a description. There can be a lot more content in here, but basically the name and the description are the main pieces um that every single uh skill file is going to have. And these are repeatable. Typically, you're not going to have a skill for something that's oneoff because a one-off thing is going to be more like a prompt, but a skill is something that you or your team might be using on a reusable basis, and it might be useful across a few different projects.
Now, um skills load on demand. So, you know, you have a task that arrives into your agent. It will match it with a skill. It will load its steps and then it will try to follow them. It doesn't necessarily need to know more than the name and the description. So, agent skills as you mentioned uh this is a project that I created a while back. Um and for some for some backstory uh I I had uh been taking a little bit slower on open source for a few years. You and I have been doing a lot of open source for a long time.
Long time, you know, a long time. As you know, you create an open source project, it's a lot like having a kid. You have to maintain it. And so, you have to be explicit about like when you want to get an open source project out into the world. and been a while. So I thought, okay, I I will get one of these things out there. I had seen that Gary Tan uh had put out GStack. That's right. Right. And I thought, well, you know, if if Gary Tan is doing GStack and people are are learning from it, there's nothing to stop me from putting my perspective on agent skills out into the world.
And so I started this project um and it got some traction and I thought, you know, it' be interesting to maybe walk folks through it. My perspective on agent skills is I would say just a little bit different to other folks. I ground my agent skills around the software development life cycle. Um I think that they're you know I I lean very heavily into agentic engineering AI assisted engineering that side of things where you try to be very disciplined about you know I'm going to be explicit about what I want built. I'm going to verify what gets built.
I'm going to keep quality in mind. And so for my agent skills, I have a lot of steps that kind of map to the SDLC in here. So define, plan, build, verify, review, and ship. And for each of these different phases of the SDLC, there are a number of different skills that are inside this project um that that map in here. So for defining, we've got skills that can interview you to get clarity around an idea. If you have a very vague idea, so maybe you have, you know, you you've had an idea in the shower and you're like, well, I kind of know what the outcome is, but I actually don't know what the UI is going to look like or I don't know what the details are going to be.
You've got a skill here that can help refine it into something more concrete. You can do specdriven development. We've got planning skills for turning things into tasks. We've got building skills that can help you sort of refine this to a place where you've got very good front-end UI, your API and interface design is in a good place. And then sort of skills for a number of these other phases for verification that's really important. You know, I think you and I have done a lot in the browser over time, right? I still build I I used to work on Chrome.
I still build a lot of, you know, projects that will render in the browser. And so testing in the browser automatically with an agent these days is something that I care about. So I have a skill that will do that. Automated debugging is also part of this workflow. I've got a code review loop in here too. Um, that includes things like code simplification, security, and then I've got chip skills as well that will help you with everything from your CI to your git workflow, your documentation, shipping, and launch. Now, there are a lot of phases in here.
I'm actually going to show you how these work, but there's basically a pattern that I use for all of my skills. There's an overview. I capture like when these things should be used. Um, I have rationalizations like why should this be used and why should this not be used? so that the agent has a little bit of a guard rail that it can it can keep in mind and then red flags as well so that the agent knows if things are really going kind of offkilter and then verification. So how do we know that we actually did the steps right and in a good job?
Now all that stuff just maps to commands as well. So you can explicitly just use like a slash command um in copilot in any of your agenda coding tools and trigger any of these phases. Now, you don't have to use, of course, my agent skills. As I was saying, they're a lot like your bag of magic tricks, your dot files. If there's something in here that's useful to you, copy it, steal it, you know, help, you know, have it make you productive. If you find that the way that I do things is helpful, you know, you can go in and use it wholesale.
But I wanted to just show you um a little bit of this in action. So, I've got um a scaffolded out project here where I've installed the skills. We've got some agent files. We've got prompts for just mapping all of those skills. This is what happens when you just install it inside a project. And then we've got some of the skills themselves. And I'm over here inside uh sort of the co-pilot sidebar. Um I've selected Gemini 3.5 Flash. Um I work at Google. I work on Gemini. So I'm going to be using Gemini 3.5 Flash here.
It's a great model for agentic coding and and for agent tasks. Um I believe that you know uh VS code has also got like an agents view these days as well right. So all of these commands you know you can just type in slash you will see that there are additional commands populated here through using the agent skills and you can use any of these. So like things like refine things like the code simplification etc. But just so that we can see what's actually getting built out I'm going to stick with using this view. Um, and we can start off with maybe just even doing uh a quick look at uh our skills and I'm just going to show you test driven development.
I was talking about, you know, maybe giving you a bit of an insight into what these files contain. Now, you'll see a lot of content in here. Um, I try to be very diligent about what ends up in context windows. So, again, if you feel that there is too much that is in here for your needs, you can always trim it back. You can always optimize it further, but I try to be very explicit. So for test-driven development, I capture the TDD cycle. I have a very explicit take on like what does it mean to have a failing test?
What does it mean to have a passing test? How should you think about refactoring? What are other patterns that I personally think work really well? Many of these patterns are ones that have been inspired by how we build software at scale at Google. And so I thought they'd be interesting to kind of uh encode inside these skills. So you'll see things like the test pyramid in here, decision guides. There's a number of best practices that are encoded in here as well. Lower down you'll also see that I capture things like test antiatterns, things to explicitly avoid.
Um how to approach browser testing. So this is that connective tissue that can then take us back to things like browser testing, browser debugging. Um I tell it what to check. I tell it how to think about security boundaries. Um where testing is concerned. You can be as explicit or open-ended as you kind of want with these things. I tried to be explicit just because I think that there's value to being intentional with how you you approach these things. Now, what I thought would be interesting, um, I I've been someone that for a lot of my life has been trying to optimize my productivity.
I think I've probably failed at it a lot. Um, and so habit tracking is something that I look at now and then, right? I think a lot of people try to improve their habits and their systems. So, I thought it'd be interesting for us to try building um out a habit tracker. And maybe, you know, if you take a look at uh everybody's kind of looked at their GitHub contributions graph, maybe we have a habit tracker that uses that kind of visual to show you like how how well are you doing at following your habits over time.
So, what we're going to kind of start off with here is I'm just going to trigger the refine command. And the reason I'm going to do this is I have a raw idea. So, I want to build out a habit tracker, right? So, um we're going to say just something like uh a habit tracker, uh GitHub inspired, uh and I don't really want a a database backend. Maybe let's just have it all work in the browser. So, um store everything in the browser. That's relatively vague. It's pretty vague. Are you going to say contribution graph or you going to leave that out?
Keep it. I'm just going to keep it GitHub inspired. I could say contribution graph, but that's being explicit, right? So, we're just going to go and trigger refine now. And so, it's analyzing our prompt parameters. It's analyzing the core components of the request. It's triggering the skill. You can see that it's been reading that skill in place. Now, the first thing that's popped up here is it's asking me clarifying questions. So, who is the primary target user? Great question. It doesn't know if this is just for me or I'm building this for a team or I'm building this for my family or my friends or what.
So the audience is going to be let's say a a room of software engineers at build right or YouTube because people are going to be watching this later. Just anyone just anybody. What are the constraints and expectations for browser only storage? Great question. You could be using index DB local storage session storage lots of different options. I'm going to keep it very simple. I'm okay with local storage um for this beyond the standard contribution grid. So here it was able to figure that out itself. What GitHub analogies do you want to explore like committing a habit branching variations pull requests?
I don't want to go too deep into these analogies. So I think let's keep it let's keep it simple with the contribution grid. Interesting that it's leaning into that. It's like it wants to make the whole thing about GitHub. And this is one of those things if you take a look at how that skill is defined, it actually has language around trying to begin converging on the direction the idea that you have uh leaning into like, okay, okay, well, I kind of get what you're going for. Let's keep going down that path. So, what does success look like for the MVP?
Because very often, if you define like just a high uh a highle prompt, it doesn't know if you're trying to build out something for a startup or a side project at the weekend or what. So, I feel like um the performance requirements are good. Like all of this stuff seems good. I don't necessarily need it to be offline first with service workers just yet. So, um load fast, clean UI. Uh that's about it. Love fast and have a Be responsive. Okay, so we've got all of these answers now. didn't take us very long just to start getting clarity around this.
Right? I didn't have to go writing a big spec myself or anything like that just yet. And so it's trying to read the rest of my project yet just now trying to see if there's any other context in there. This is a fresh project, so there's not a whole lot that it needs. And here's the output, right? It has a problem statement. Um it has statements around like how we differentiate directions, think about stress tests, and it gives us a few directions we can take. So direction A is a dev dev profile configurator. So a very minimalist highfidelity clone of a GitHub profile.
It allows us to track habits the same way that you would GitHub repositories. There's a lot more detail in here. Direction B is more of a keyboard driven habit terminal. Um I'm more of a are so hot right now. Yeah, twos are twos are so hot. Direction uh C is a markdown SVG exporter. So a plain text backend tool. Your habit configuration and grid data are represent. So, this is more data. This is more like you really want to enter in raw data and manage that, which is not really the vibe I was going for.
I think that the first one's probably the closest to what I had in mind. So, I'm going to go with direction A. And you can also see that it's got other assumptions in here and push backs. So, it's asking me which direction resonates the most. I'm just going to say direction. Does it did it recommend one? I couldn't see under the counter there. So, we can scroll back. Uh once you pick A, B, or C, I will produce one. it didn't recommend one. You can actually change your skill files to say, well, I want you to provide your own perspective on it.
Okay. Um, in this case, I'm trying to, you know, cuz I noticed models like to recommend. They almost always when they give me choices be like, well, this is what I do. And then I'm like, well, if that's what you do, then that's what I want to do. Well, and sometimes I will ask the model like, give me three choices and tell me what you think I should do just in case. Yes. But then I feel like I'm not I'm just always doing what the model wants. You know what I mean? I'm like, well, if you think it's best, then it's probably best, which is not true.
I have so much to say about cognitive debt and cognitive surrender. That's right. Your next article on paper. Um, so what we have here, so the app is called streak. We have a problem statement. We've got a direction here. We've got some assumptions to validate like what is the retention rate. Um, what is like the frequency going to be around habits? What's the mobile usability going to be? So since engineer configure setups on desktops but often complete habits on mobile, is local storage enough? Well, probably yeah, local storage is well supported everywhere at this point.
So I think that pretty much all of these ideas are cool. Um, so we have our MVP scope defined for us here and it also says what it's not doing. So it's not doing cloud sync. It's not going to be doing advanced social features and it's got some open questions for us if we want to provide clarity around that. And it also asks us, do we want to just save it? So I'm going to say yes, save it. So we've now taken a very high level idea and we've refined it. Now the next phase of this is actually turning this highle idea.
So we've gotten it into a better place. Let's turn this into a more detailed spec. Y now for some folks you might say, well it feels like it's a few it's a few steps. You don't have to follow my way of doing things. If you want to just have like one step for creating your spec, you can do that. I just happen to like specificity. So you like to plan you you want the plan, the spec, and then the implementation. I'm a plan implement guy myself. Just skip over. Yeah, that's totally okay. Everybody's got their own workflow.
Here you can see the file that it's generated. And this is going to be a starting point entry for the next step. So the next step is going to be spec. So we've got our idea. We can go and just type in /spec. And it's then going to read in the output from that last phase. and it's going to start being able to generate something that's much more concrete. So, it's taking in those assumptions, is defining success criteria, um scaffolding out the tech stack a lot more in detail, and then it's got those open questions in a little bit more detail.
That is really snappy. Yeah, it's pretty quick. That model is quick. Use Gemini 3.5 Flash or whatever model is out at the time you're watching this on YouTube. Yeah, cuz it move AI moves faster. Everything will be different. Everything will be different. So, we've got some open questions. Um, grid generation technique. Should we use a standard GitHub contribution grid? Uh, do we prefer rendering grids as modular dynamic SVG group structure or CSS grid element layouts? I'm going to be honest grid. I use SVG, but I'm not an expert. So, I'm just going to for one, I'm going to say choose what you think is best.
And then first visit boilerplate. What default habits should we set up for new visitors? Um, it's a good question. Oh, uh, bad habits or are we tracking bad habits? We're track We're tracking good habits. We're good habits. We should Good habits, exercise, hydration, uh, reading. Reading. Meditation. Do people meditate? Doom scrolling. Doom scroll. Doom scrolling. Okay. Meditation. Meditation probably better than meditation after you're doom scrolling. Okay. Do you have a desired location for this file? I'm just going to let it uh do what you think is best. Cool. What I love about this is all of this UI, all these questions are directly inside of the co-pilot sidebar.
I don't have to use some other app for it. Whatever your agentic UI surface is, you can just have a similar experience there. So, it's now created a spec for us. I'm going to keep this. And the next phase, you can go and you can take a look at the spec, by the way. It's over here. It's going to be a much more detailed version of that original file that we had, but you'll see that we have success criteria. We've got metrics in place here for performance. So, things like largest contentful paint. Um, we've got a text stack that's defined.
This is going to be very simplistic. So, vanilla JavaScript. I didn't say I wanted something in React. I'm okay with it. Never asked you about the text stack, did it? Yeah, it didn't ask me about text stack. And that that's another thing that you can encode inside of your skills. So, this is an assumption that you just let it go. an assumption that I'm okay with. Um you can assume like it's defined uh vast and playright for our endto-end testing which I'm okay with uh our storage and then project structure and code style. And if there's anything in here that you don't like, you can always go back and either manually tweak it or work with your agent to configure this more.
So you've got your spec. Now the next thing that you might want to do is actually start breaking this down into individual pieces. Now this is useful if you've got a large project that you're working on. If you're working on something small, you probably just want to go straight into implementation. But I'm going to show you this plan in action anyway. So we've got a plan step here. And what we can do with plan is uh I can go and I can run plan. I'm just going to show you this working. And what this can do is help break down what we say in software engineering often is if you've got a big problem, break it down into smaller pieces, right?
And plan kind of helps you with that. It breaks your big spec down into granular chunks that can then be implemented. And that's really good if you're working with TDD because you've got, you know, very individual pieces that can be tested, verified as you move on to other pieces. People have lots of different ways that they like doing TDD. So, I'm not going to assume that my way is is the right way or anything like that. So, it's going and it's scaffolding out the basic parts of the project right now because that's fine. Like, it can set up our boiler plate before we actually go and start implementing the rest.
I'm just going to do keep on that and you'll see that it's asking before beginning phase one um I should initialize our structure to-do list to provide full transparency and it's going to do that now. So um what you're going to have uh here is the package.json file uh which we see has been created. Um the implementation plan is established and saved to plan. So we've got a task list here. You see phases and tasks. So we've got phase one task one which has its acceptance criteria and this is like build the pure logic typescript interfaces your types your functions to compute the streaks your total contributions within the last like 365 days the local storage pieces basically stuff that doesn't necessarily touch your UI and we've got some acceptance criteria here as well so we've got a list of things that have to be true as well as verification so the test suite has to be able to run on that piece of logic in order to move forward It also suggests like what files are probably going to be touched as a part of this just to like really hone them down.
If you've been doing any vibe coding oric engineering for a while, you know that one of the ways in which we try to limit our blast radius is by saying, "Well, I'm just going to target these specific files because especially if you're working on an existing project, you don't want it to start overwriting or touching files that maybe have nothing to do with the task at hand." Right. So, we have a few different phases of tasks here. Some of the other phases will include our data storage, so that layer for local storage. It'll include sort of the visualization of our habits um as that GitHub style contribution graph as well as the rest of our UI.
So, we've got our plan here. What you can do, again, this supports being very granular or or not having to think about that in as much detail. So, if I wanted, I could do something like /build task one and just have it do the first task on that list. So, let's see what it does. It's created four to-dos as a part of that. Um, it's evaluating it. It's reading our test driven development skill right now. You can see that it's installing those first things. I'm also just going to do Yeah, if you go allow you can do down to the very bottom and turn on bypass approvals if you want to.
I'm learning something new. Where do I where Oh, wait, wait, wait. I see what you're you're talking about here. Right there. Yeah. And we'll stop asking. Are you a bypass approvals? Do you recommend bypassing? I bypass approvals on all my personal projects. So now you're building one phase at a time. Is this generally how you work or are you more I think it really depends on what you're building. Okay. So if I am building a personal side project and I just want to build something like maybe it has one purpose, I'm probably going to skip building out one phase at a time this way and just have it build the entire thing.
If I'm building something relatively complex, I'm okay doing it a phase at a time. It might feel like it's slower, but the fact that you go through all of these steps and you know, well, this slice is tested as we build out the UI. I know nothing else is broken. I've now done our data layer. Nothing else is broken. This gives you a lot more confidence as you're working through. It's just just a different way of approaching it. It's such a trade-off, though, because then, you know, the human's back in the loop like once again, you're the problem.
you're slowing the agent down. It's tough. I struggle with this myself. Yes. Yes. Absolutely. And I would say that, you know, you have to make a judgment call at the end of the day about how much time you want to invest and where in in the software development life cycle you want to invest in. Do you want to put a lot of time into your spec? just have your agent build out a lot of code and then spend time reviewing it, defining your tests at that point, figuring out like how do I evaluate quality like that?
That can be okay as well. Or do you want to approach it with TDD in mind? Do you want to have like an explicit strategy around how you're going to verify quality so that as you're going through it's already addressing these things? And I don't think there's a right or wrong way. It's just different ways of working. You know what I mean? So what you can see with this first phase is that the implementation is test driven. We have our first target failing red test to uh to tell us that okay we have a test it's failing.
We're going to implement the minimal clean logic in streaks. So we're we're red green here. Yes. Yes. And as I said you don't have to do this with like individual tasks. We can just keep running build to work through the rest of this. Um and then we can get on to other phases. So you can see that it is thanks to to Burke's tip around bypassing approvals. We're just going to keep running through this. And I'm a person that people will often talk about, you know, how do you as an engineer understand what is getting generated if you're not reading all of the code?
Very often as as your agent is generating stuff, what are you doing? Are you switching over to another agentic task? Like very often like that's I found myself doing that and so I started to slow down a little bit and these days I try to read the trajectory a little bit more like what did it say? What was it thinking? Why was it thinking way it was thinking and is there any nugget of information in there that I can extract that just lets me know okay well that was the way I I I was expecting you know Yeah.
So it it's interesting because I I don't know what everybody else does but I I watch the agent work. Now, I'm not reading everything that it's doing. I'm mostly just staring at it when I should be doing something else. When I'm just I'm literally watching agents spin. But I'm curious like are you like are you looking at the code? Great question. Or so I try to I generally unless it's a personal project because a lot of personal projects end up being throwaway things. Oh, I have this I have this need for an hour. I'm just going to build something and throw it away.
And in those situations, I'm very okay with saying if I don't understand the code, the agent is going to go fix something for me. Um, however, if it's going to be something I'm going to maintain for at least a few weeks or a few months, I generally go back and at least understand the architecture and I try to read through some of the core files so that if I discover that the agent wasn't able to fix something or it goes off on a tangent, I am able to go back in and make fixes myself, you know.
Yeah. um very often like so I'm going to keep working through build and we can keep chatting. Um so it's going to keep doing its TDD phases as each slice gets implemented. It's going to go from red to green, implement the phase and get everything done. This is so interesting. I didn't even know about red green testing until SpecKit came out. It was my first It was the first time I'd ever seen that. That's the idea of writing failing tests and then making them work. And I was like 40 5 years old. It's kind of sad.
You do not look you do not look over 45 at all. Not. Yeah. 45 yesterday. It was yesterday. H happy birthday. Thank you so much. So, we're going to keep doing this. Um I also wanted to to have to share a comment on this. I think that very often in the entire sort of agentic engineering diaspora, the industry paints very broad strokes about best practices. Like if you're if you're on Twitter, if you're on any social network, very often the best practices for a solo founder or a small startup team will be shared. And the best practices for that kind of team that are maybe don't have users yet, maybe they're working on an evergreen codebase.
Those are very different to people who are working on something either legacy brownfield with a team. And so I think that it's important just to understand that, you know, there are situations where it's okay to just give in the vibes and have the agent figure things out and if something goes wrong, we'll just have the agent fix it and other situations where actually understanding what's happening behind the scenes is still quite important. Yeah, 100% agree. I actually So what's super interesting about this is that I think myself and most people are looking for what it just tell me what to install that's going to make everything work, right?
Is it is it agent skills? Is it superpowers? Is it Gstack? Like what is it? And the answer is it might be some of any of those things, but your job is less about writing code now and more about building these workflows that actually work. And it's not always the same even for different projects within the same organization. So absolutely it's a it's a very it's a I don't know what to call it. It's not software engineering. It's it's not automation. It's like how would you even describe this? It's like we all work at McKenzie now.
I I don't know. I I I acknowledge that um agentic engineering like we're we're we're now working on a spectrum, right? And there are going to be times when we're vibe coding, there's going to be times when we're agentic engineering. And the more diligence you have to apply to the process, the closer you are getting to aentic engineering. And I think that at big companies, small companies, anytime that you're working on something that that exists and has existing users, you owe it to them to care about quality, to care about not breaking things in ways sometimes agents subtly do.
Yeah, 100% agree. If you're building a a habit tracker, that's one thing. If you're shipping VS Code, you have to be pretty meticulous about not breaking that. So, we have uh about 10 minutes left here. What I'm going to do is I'm just going to skip ahead a little bit and just show you very quickly some of these other phases. So the agent skills package also includes a verify step. Um now what verify does is it's able to just check that the implementation as it um has been has been done so far whether it's complete or whether we've partially implemented something that can actually be used just yet can run in a browser and actually do what it needs to do.
So, it's going to go and check that it's actually able to spin something up. Let's see if enough of it has been built that it's able to do that. So, it's doing its npm run build right now. It's going to spin up its npm rundev and we'll see if this actually works. So, we have we have something that looks like a GitHub profile. Where did you pull that picture from? I have no idea. Who is that? The magic of LLM. That's someone. If that person's watching this, someone in the world. Hey. Hello, whoever this is.
This could this could be like a non-developer as well. This could be a person that just likes to like bake cookies somewhere in the world. It just happens to be on GitHub. Yeah, it just happens to be on GitHub. But, you know, it's it's opened up this browser directly inside of VS Code. Um, it's able to run its checks behind the scenes to see, okay, well, is anything actually broken? If I interact with UI, does anything actually begin um to break? And I think that this is very interesting. Like if you actually go to um the prompts and you go to our verify skill, it will end up trying out the browser testing with dev tools skill.
There are many ways to do sort of automated browser testing these days. Whether you're using um puppeteer, playright, uh the versel, like agent browsers, there's a lot of different options. I'm not going to say there's there's a right one, but choose what makes sense to you. Um I just think there's a lot of really great tools these days for just making sure that your thing basically works. Whether you decide to kind of encode all of your user journeys and test them out very very holistically is kind of up to you at the end of the day.
We web devs as usual have it better than anybody else, right? Like this so much harder to do with native apps. Now, um for for my skills, the review phase ends up being really important and people have a lot of opinions about review. Um I tend to find and this is something that I found talking to engineers at Google as well. We still have a culture where we tend to have uh engineers manually review code. Even even with that, even with trying to bring AI into our code review process, having um you know an agent do a local first pass or multipass is really really valuable.
Now there are some people that will take this all the way to adversarial code review with very deep, you know, patterns. There are people who will have other models you know like hey I'm going to I'm maybe I'm going to use Gemini for the implementation and opus or codeex for other so you have a lot of flexibility here there is again no right or wrong way people will often decide these things based on vibes but what you'll see is we have a number of different kinds of review checks that were done here so what's working well what's technically considered correctness so all of the unit tests that were implemented so far appear to be you you know passing um we have no security issues it seems um you know we're working with vanilla JavaScript uh it we of course want to avoid XSS issues or crossite you know cross-ite scripting anything like that um readability and architecture seems to be in a good place performance seems to be in a good place this is a simple you know application and it also applies some recommendations so keyboard navigations for that contribution graph makes sense right like uh that's not something that I had considered um and you might find that depending on the complexity of what you're working on that you you can have this be much more um elaborate.
Uh another part of code review that I find useful these days is code simplification. Now very often what we'll do is we'll have an agent implement something. will verify. Maybe you'll read the code. Maybe you'll verify that, hey, it at least runs in the browser or you know, if you're building a native app, hey, it at least runs. But you don't necessarily go back and ask yourself, hey, could this actually be simpler? And one of the wonderful things about TDD or having tests or quality gates um around is that you can now go and do that code simplification loop and have something that can verify the rest of your logic still works.
Yeah. Right. Very interesting. So, a question on the security one. How much do you trust the non-deterministic? Because to me, this is the this is the this is where the the rubber meets the road, right? The rest of this stuff you can deal with the security stuff cannot. How much do you trust that review that you're not pushing a key, you're not pushing some SQL injection, something that's just going to get you in a ton of trouble down the road? Oh, yeah. I think that there are a lot of good open-source and commercial offerings that go even deeper into security than than these skills do.
And so there are a lot of things that like we've seen anybody that's been doing VIP coding and gening engineering for a while has run into these issues, right? Very often people will be using API keys in horrendous ways. Yeah. Right. um they will be quickly putting together something and not realize that oh hey wait there are actually tons of data leakage issues or the way that o is implemented is leaving you exposed to all kinds of problems right or somebody can just nail your API over and over again because it's not locked down exactly and if you're if you're a senior engineer if you're experienced you know what to look out for if you're intermediate if you're not quite as technical if you're coming at this without as much of that background or maybe you are an experienced engineer and you're just lazy cuz sometimes we are lazy.
It's useful to encode those things in skills just as like a sanity check for yourself. Oh, actually list specifically. So I find that I cannot think of all the edge cases, but models are actually really really good at this. Like if you when if you ever ask a model to review a codebase, it will almost always find something that you need to change or input, right? They're really really meticulous and good at that. So yeah, again, I don't know. I I feel like we need maybe better products or whatever it is. But all of this to me, Agenta Code should go through some deterministic gate that's like, yes, security sign off has been achieved cuz I don't trust I'm scared to death to push anything live on Twitter and be like, here's what I built.
And then have somebody be like, oh, and here's your Gemini key. And I think that's that's where there is a lot of value in understanding what does good mean? What does done mean? Right? And if you care about security, you care about quality, you care about performance, you care about accessibility, you care about any of these things, that's a good reminder to try encoding those into how you approach quality gates in your projects. Yeah, skills are just one way of of encoding that into how you think about all this stuff. But I think that just having tooling in place to make sure you're not merging in code that's going against those best practices is always is always a good idea.
Yeah, I feel like we need more of these deterministic gates. I just don't know what they are. You know, like right now markdown is the solution to everything. It's like, well, we have non-determinism. How are we going to fix that with more non-determinism? With text files? With text files. So, we're at about 6 minutes here. I'd like to do some Q&A if we could. You can continue building. Yeah. Yeah. Let me So, I'll wrap this up in a minute. So um there are a lot of great teams labs who've been thinking about code simplification and very often what I love to do is anytime that they open source their work or they write up a blog post about their work I go and study like is there something that I can learn from what they're doing that I can then incorporate into my skills or how I work.
Um you can do that for code simplification as well. Even for this simple app it's already found like three or four opportunities to simplify the code. So that's great. We've got a ship step in here as well. Just for time, I'm just going to skip ahead and show you like one that I built earlier. This is like this is called streak. It's got a GitHub contribution graph style, you know, graph at the very bottom. And it just shows us over time like are you actually following some of these best practices. So for me, am I going screen for some part portion of the morning?
Am I, you know, getting my workday goals planned? Um am I reflecting? So you can, you know, this is not a complicated app, but you can pop open, you know, your browser dev tools. You can go into the application panel, local storage, and we can see that all of our data is kind of here. We've got all of our entries for each of these things. It's all local. It all works. Like all this will work offline, but it does the trick. Now, um, you can do we can do Q&A. You want to do Q and so let's spend a few minutes.
Uh, do we have a Tom? How do we do this? We have a mic or if we're going to do Q&A. Oh, right there. Yeah. So, if you do, we'll continue building here. If you want to ask questions, uh, now is your chance to do so. In the meantime, we can continue building or chatting about skills if you want to. It's up to you. Awesome. If folks have questions, I'm happy to take it. Otherwise, I'm happy to keep showing you stuff. All right, we got one. It's Anthony. Hey. Uh, yeah. question for you. How would you split up the work between like a flash type model and a pro model?
Would you plan in flash and then implement in pro? How would you split the work? Did you hear that? Yeah. Interesting question. So, how do you split up between models? Like, do you plan in flash and then implement in pro? Like, are you multimodel between the Gemini family? Like, what's your personal take here? Yeah, I that that's exactly my workflow. I tend to and especially if you're trying to optimize for cost that's a great way to go about it. Um, you can use a lower cost model often for the planning phase, especially if you're trying to like optimize for tokens.
Use a lower cost model for the planning phase. Use a much more um capable model for your implementation. And whether that's, you know, going to be one of the Gemini Pro models or Opus or Codeex, any of those options can be good for your needs. But I found that that flexibility works well. And Copilot and many other tools also let you just like easily switch right between different models for these needs and these different phases, which is which is great. Interesting. I have a question for you. Where does MCP factor into this workflow? That's a great question.
I have too many strong opinions on MCP these days. I think that let's do it. We have we actually have some extra time so we can go over a little bit. Right. I I personally think that things are things have been moving so fast, right? And earlier on um we would talk about the difference in you know hey MCP was really good for you know the the connectivity right between data and agents so and I think the people found that very often there were cases where you actually want a CLI um instead of an MCP and we've started to see workflows begin to evolve a little bit more because maybe you don't need to have so many tools exposed.
Maybe just directly using a CLI is going to be a more efficient way to achieve your goals. So I personally feel like we are still in that kind of evolution phase of seeing like okay where is MCP going to be the best fit versus just having people directly use the CLIs. For things. This is kind of a contentious area. I feel like uh I cuz I at one point I did say your MCP server should probably be a skill in a CLI. I I kind of still stand by that just because AIS are so good at using CLIs.
Yeah, the only rejoinder I have to that I was chatting with um um with Ree who I think is over at Open Code now about this on on X and he pointed out that the authentication is a real problem and that MCP does this really really well and with CLIs it's really hard to control O and permissions for your agent. So I agree with it but generally speaking I almost use entirely skills. The only MCPs I use is of course work IQ, right? Because if you're in the Microsoft ecosystem, that's like the best thing ever.
And then Context 7. I I always talk about it's amazing. If y'all aren't using the Context 7 MCP server, that's the one thing you should install today and just tell your agent always read the docs before you do anything and it will use Context 7 to read the documentation. Brilliant. I feel I love that, you know, even in Google Cloud, I know in lots of other places, people have started to release and maintain more skill packs just because you can you can get so far these days with making sure that regardless of like what cuto off date your agents data is going to be.
You've got up-to-date documentation, up-to-date instructions about like where to go, where to find things, exactly like what the developer workflow should be. So, I'm a I'm a huge fan of skills. People don't know that by yet. Yeah, they're they're amazing. And I one of the things the other thing advantage you talked about this early on is like that they're loaded progressively right so one of the things we're all struggling with right now is that the context window is a problem and honestly I feel like we need a better solution than a context window in general at some point in the future this is like a really rudimentary primitive for interacting with LLMs but the problem is that the context window fills up really fast and MCP has to specify all the tools so if you didn't know skills are progressively loaded So what the model gets is like the skill name and a brief description and then from that it then will load in the skill and then the skill may point to other files and then it will load in those files and so it's just really really economical um for token usage which is more important now it's becoming more important every day as we are moving from a world of use as much AI as you want to to this is how many tokens you have use them wisely.
Absolutely. All right. Well, Addy, thank you so very very much for being here. Uh I learned a ton. I was looking forward to it just for the free uh training on your skills repo. So, I got something out of it. Thank you for being here. Thanks for having me. Appreciate it. It's great. Uh all right. Thank you so much. So, we will be uh right back with our next guest and uh I guess maybe folks can come chat with you if they want to. Sure thing. All right. Thanks, folks. Here we go. Uh we have an SDK.
It's available in in node um typescript uh python go and net and you can use this SDK for multi-turn conversations tool execution and full life cycle control. Now this is in technical preview but already we're seeing people build some really fun stuff with it. Beyond that, there are some great new features that have been added to the GitHub copilot CLI in the last few weeks, including access to more models, built-in custom agents for things like explore, task, plan, code review, and then you can use those custom agents with agent skills to have a really robust workflow.
You can also install the CLI now by Homebrew, uh, Windgit, and there are install scripts. And if you install any of these methods, then Copilot CLI will automatically update, which is great. But don't worry, we still have standalone releases if you're needing that because of your distribution or however you want to get things. There are a bunch of new flags for using the Copilot CLI in scripts. There are improvements to the DIP view. There's some web controls. There's lots and lots of stuff. So, if you have not had a chance to play around with the GitHub copilot CLI, give it a shot.
Especially now with the SDK, it's getting really good and the team is working really hard on adding stuff to it all the time. We are also actively seeking out your feedback. So if you have any thoughts, any uh wish list items, you know, feedback, let us know. You can let us know in the comments of this YouTube video. Uh let us know, you know, on social media, create an issue or or you know, post a discussion in in the repo. Hey, what's up everybody? Welcome back. And we are here at Build 2026 live in a vast open area and I'm here with Patrick Nicoletic from the uh Copilot SDK.
Yep. CLI SDK team. We're one big team. Uh cuz we ride on the CLI server. Mic is really I can hear you better than I can hear myself. That's amazing. Yeah. So, for those of y'all watching at home, uh we're in this like vast What even do you know what this was before? No idea, honestly. It's uh beautiful building, well set up, but no idea what it was. Yeah. Looks like it's like an airplane hanger. No, it's not wide enough, is it? Maybe a boat, you know, a boatyard. All right. Uh so, we're going to be talking about the co-pilot SDK today.
Uh, and this is something I've used extensively and I love it. This is one of these things where it's specifically for developers. Like once you see this, you'll be like, "Oh man, I that's amazing. I can do anything." So, take us into it. Yeah. Wonderful. Um, well, we just g the SDK yesterday. Uh, which is fantastic because we've been working on it since January this year. You just ged just yesterday 1.0 stable. Not. And so we we've been quietly working internally mostly to drive up quality and and also make sure teams across Microsoft's, LinkedIn, Xbox, GitHub, we can all adopt the SDK and we can use a shared agent to harness together as much as possible.
Nice. Uh and so that journey internally is now going outward. We're a GA which means customers can use the SDK stable. They should stick it in production can support it in a wide array of environments and and we're here to help you uh put our harness in whatever solution makes sense for you. Awesome. So, do we have any uh demos or anything we can walk through in terms of looking at how this actually works? This is one of those things where when they first told me about it, I was like, I don't get it. And then I actually used it and I was like, well, that's incredible.
Yep. I I I'll show probably my favorite use case cuz it's very personal. Uh even though it's the same SDK that's rolling out in things like Office Excel right now and co-work as an example. Oh, yes. Uh I can actually bring up a quick slide to just highlight a lot of the collaboration that's been happening across literally dozens of teams, hundreds of people, all the people on the ground building these products across these different solutions. We've engaged with learn how they're trying to adopt aic solutions and modernize using the copilot SDK. So just to clarify when we say copilot SDK, what do we mean there exactly?
Yeah. Yeah. So the copilot SDK it actually comes in uh six languages uh which are right here. So there's the four we released at uh technical preview in January. There's the two that we've added since which is uh Rust and Java. What it really is is a thin client that sits over the copilot CLI server that we've all known to grow and love over time. Um and it's the same exact CLI server that's used in things like our GitHub copilot app. Um we use the CLI very hands-on and the SDK is yet another way to consume the CLI but in a way where you can integrate it into custom applications.
Yeah. So, the way that I thought about it was it just basically allows you to add the the agent that's in the C-pilot CLI to any application. Yeah. And and we really mean any application, too. Um I mean, there's a lot of things that we're doing beyond just what was shipped in G. We are rewriting our runtime in Rust so it can actually go in more places, more performant. Um all of those things are on the horizon. And wait, the SDK runtime, uh the SDK runtime. Yes, correct. So, we're we're making it more performant and we're optimizing it.
Uh it's another piece that I I showed off yesterday during the session where we just highlight was it node originally? It was all node. So when you were talking about loading up the CLI server, it's probably 100 120 megabytes of RAM that it's using up. We're trying to shrink it to sub 10 megabytes for the most part and we're destined to get there for the most part. Um we have run some some earlier tests and things like that which show the difference between node and rust. Uh and even while this was a PC, it was a fully working runtime.
Um, I can actually show you in a a different demo as well, but the main core outcome of that is that we just saw benefits across every performance dimension that we cared about. And then there was also the need to actually get our SDK into more platforms that require a native runtime. Interesting. That's phenomenal. Uh, I see Rust popping up in so many places now. It is. It is. It's a very good language for for the other day I was building some of the copilot CLI and I said, "How should I build this?" And it was like I would do it in Go.
And I was like, but I don't know go. And then I remembered that is irrelevant. The agent knows Go. And so I just bu I built the thing in Go because I could. Absolutely. And and I think that's what's really amazing is it's pushing the teams inside of GitHub to think beyond the boundaries of our experience. Um fundamentally just the environment has changed and we recognize that probably six to seven months ago and we've been really internalizing what that means for ourselves. And so when you look at the the copilot CLI and SDK team, we fully embraced um very large experiments.
We fully embraced seeing what the technology can do. Uh it's one of the luxuries that we have at GitHub to just burn a ton of tokens to really see what the limits are of the models themselves. Best perk of the job. Honestly, it's should start as a line item in the modern era because it is it is the biggest cheat code in general. This is going to become like health insurance, right? Like you have how good is your health insurance? But more importantly, how many credits are you getting? How many tokens do I get this year?
Exactly. Exactly. Um mainly because you can just shorten the gap between idea to outcome and and when you don't feel as token constrained, you can go test those outcomes much faster. I hate it, man. It's it's really really bad when you have to every decision you make is based on like, well, should I spend these tokens in this way? Do I really want to use Opus here? And it really constrains your productivity. Oh, absolutely. Do we have any demos that we can see on the I I'll show you one of the when we shipped the technical preview.
It was a technical preview. So we didn't expect it to go into like production grade applications at the time. So we saw a lot of personal assistants and I I use one of my own of course uh being that I work on the SDK. I made one that was one of the first things I did multiple flavors but but I actually have really just been loving my current form because I'm a PM. Uh and one of the things that's really essential to being a great PM is you have to write all the time. Uh, and so I think coding tools are awesome.
They're wonderful. But really like this desire for a great way to write deploy agents to support me in that action is where I'm currently at in my trajectory and what I'm really obsessed about dayto-day. Can you show it to us? Yeah, absolutely. Cuz was this did you use this cuz we were working on something together and you did a write up and you sent it over to me. I was like I did a one pager and I read through I was like that's really good and you used your agent for that. Exactly. Yeah. I I call it whim because I want everything to be at a whim, which is why, you know, for example, I can hotkey it anywhere I am in the system.
You can leave it up, you can pin it, uh you can resize it, but really the core of it is that when you are thinking about being a PM and you're putting things on schedules and you're trying to really encapsulate the job that you're doing and then transform it into something that's repeatable, um I only really need kind of spaces to write. I need little workers and I need skills. Um, if I have those three things, you can accomplish, sadly enough, a whole range of PM responsibilities. Um, and I find that it's more fun than ever to ideulate, think about the right thing to do.
And so, it's all canvas based. Uh, what I mean by that, similar to that you see in the GitHub copilot app where we're exploring ways to render content and make it interactive, this is the inverted version of that, meaning you start with the writing canvas and you spawn all your agents out of it directly. Uh, and so what I love about this is wait, wait, wait. What? You're going to write up the the file first and then it's simpler than that. Um, oh, you're not going to write anything. I will I'll actually just highlight first that there's a variety of agent personas in my Win app, which allow me to one run my agents in ephemeral cloud environments.
I can now spawn an infinite number of workers. Um, not worry about my local machine. I can close my laptop. I can pick up and resume whenever I want. That's one version of an agent I can use. There's my development agents where just by calling into action, it will code for me. It will develop. It will check out repos. It will modify, it will do all of those things you would expect. And then I have my editor which is my my main uh agent that I use because we're always working on the contents. And then the last piece is my sandbox.
If I want to run something local, it's a bit dangerous, I can deploy this agent and I can let it run it in a nice secure MXC sandbox that we shipped uh yesterday as well. Um, but when it comes down to the action, right, there's intent and then there's the outcome. And normally there's a whole lot of steps in between. And in reality, what I like to do is shorten that as much as possible. Uh, and so when I think of things I need to do today, I just start building a list like most humans do.
And I'm like, well, I need to triage copilot. Uh, should actually spell correct copilot SDK issues. And then I can just say editor deploy. Oh, you're tagging the agent there. I've actually put it already in motion and I don't have to think about it any more than that. And the way that it actually shows up is you'll see it present in the document. You'll see that the session has been fired open right here. I can actually drill down into any of the chats. So as I deploy the agents across my canvas, I can actually check on what they're doing.
If I care about what's happening across canvases, I can get that view as well. Um, you get similar controls. So, if I want to turn these into sessions I can remote into, I can just click a little button. It'll give me the remote link. I can dive into it from my phone so I can pick up on the go. Um, you have, of course, things like YOLO mode cuz, you know, I like to run unconstrained. Yeah, it's asking for approvals right there. And you can continue to drill down a little bit deeper. So, this idea of multi-agent orchestration, not really feeling like multi-agent orchestration is where I think in general we're all headed cuz there's a lot of cognitive load we're all feeling.
And so when I I think about the way I engage with people, how do I weave agents in to feel as normal and as common as how I communicate daily? So I I had the same thought where it's like I don't want a coding agent. I just want an agent that's everywhere that I am. So if I'm in Outlook, it's the same agent. It knows I've been chat I was chatting with it about code. Now I'm chatting with it about email. Right. We don't need an agent just for coding. Exactly. But I this is in I've never seen a modality here that's not a chat.
And and the nice well the nice thing is you can close these things out and I I rarely go into the chats anymore in this thing to begin with because the output actually ends up back on the canvas. So I'm communicating through the canvas with the agents themselves or they're off doing work and they'll open a PR whatever it might be. And um if I open up my whim workspace you'll actually see me working on whim in whim where I'm like giving it bug fixes sending it off it's opening PRs and it's just updating my canvas.
And I I think what's really beautiful about that is the record itself is being built as I'm doing the work. And so any point in time I can reflect back on that. I can roll that up and I've already encapsulated what I've been doing and how it's been done. Man, that's super interesting. There's so much like the thing that's so irritating right now with AI is there's so many people talking about their like like I have you know I got open claw and it's running everything in my life and you're like well what's it doing?
Y and they're like everything. Yeah. You're like, "Yeah, but what?" They're like, "All of it?" But but then you're like, "But but I don't know what all of it is." But so this is it's it's interesting to see like how you're actually using it. You know what I mean? Like it does feel like snake oil sometimes to me where people are like, you know, yeah, I use it to automate my job, but they can't tell you what exactly it is that they're doing. Yep. Yep. There's so much of that going around. little semblance to a few other industries in that light, you know, in terms of making sure that we ground it in real use cases.
They're easy to communicate, people understand them. I think that's really important because the technology is very generalized. Yeah, 100% agree. But often you have to build your own in order to realize that and that's what the SDK allows you to Oh, definitely. and and to build something like this, it, you know, used to be a tremendous amount of work and now you can spend a weekend talking to your agent and get a whole lot of distance and and we continue to pack more capabilities in the SDK that make all of this just tremendously much easier.
So things like remote didn't exist until a few weeks ago. The ephemeral sandboxes in the cloud is yet again another way for you to break free of some of the constraints that are available. Everybody use those now? Yes. Yes, we we ship them. The cloud sandboxes are in preview. Um but they are available. they're in there and they can be used as well as local sandboxes were shipped in preview today or yesterday as well. Yeah. So again, you know, for those watching online, those are here of you who are here like if you're interested in like how can an agent act how can I use AI to do something actually useful, I would absolutely start here and use the co-pilot SDK if you have a sub to build your own personal assistant to do what you need to do.
Forget about what everybody else is doing. Forget about installing something that's just going to magically work out of the box. Build your own that works the way that you want to work and start just trying to automate the simple things that you're doing throughout the day. Absolutely. Like for me, that was the most transformative thing was when I built my own that works how I work. And I think it's like you're you're getting at a really interesting insight, probably one of the greatest, which is that we all have to build intuition for how the models perform and they behave.
Similar to when you engage with a person, you're trying to understand the nuances of other person themselves. Um, and trust is built over time and then that intuition scales across models. You can understand how to deploy them and leverage them. But it's really important to dive deep and that's why I love starting with copilot CLI. It's like the simplest way to go straight into a model, get a very pure experience. You can scale that up through the GitHub Copilot app which will give you cross session multi- aent orchestration. But I think diving in and really understanding the parts and the bits using the SDK will will help you go an extra distance that's truly valuable.
And there's something that's so rewarding about building your own agent. Even though it is the CLI harness, but like cuz mine's called Max and Max is like like I have like a I have a repertoire with Max. I know Max. Max knows me. Uh that's probably some sort of psychosis, isn't it? I'm going to need to see if I need help. I mean, you have to lean in a little bit to the psychosis to to embrace the the capabilities that are available there. Yeah. Amazing. Okay, Patrick, one question for you. Sure. What's your go-to model?
What's going on behind the scenes here? Which one are you going uh GBT 5.5 these days has honestly been really great? And I was somebody that sat on Sonnet and Opus models for almost a couple years at this point. Um, but I find there was a nice balance, especially between the two classes of models. And so I'm a big, you know, let one model do the implementation, let one model verify. You get a nice mixture of answers and responses to balance it out. Um, they all have different training data and inputs that help them decide what the right thing is to do.
And so I I find GPT 5.5 is really great, especially for reviews. Um, still love Opus for generating code as well. But I've started to use GPT 5.5 pretty much exclusively the last few weeks or so. So I've had the exact same experience. I just recently posted about this, but I'm mostly on 55 now. Opus to me still use that for like design and creative thinking but for implementation and actually doing executing I'm a 55. Well Patrick thanks you so much for being here. You got any other sessions? Um I do. It's uh 4 to 5 we're giving a co-pilot SDK session where we'll talk a little bit more about the great work we've been doing across Microsoft what's going on right now.
Where is that? Is that here in this should be in uh breakout one. Breakout one. All right. So if you want to know more about the SDK breakout one at uh 4:00 today. 4:00 today. All right. Thank you so much. Perfect. Thank you. Cheers. We'll be right back. I'm Aie. I'm a senior product designer at GitHub and I help to design the remote control sessions with Copilot CLI or VS Code on GitHub mobile. You get to start work on your computer and then you can remotely connect to it on your phone and work on the go.
I'm Josh. I'm an engineer at GitHub. I've been getting all of GitHub's Agentic offerings speaking the same language.…
Transcript truncated. Watch the full video for the complete content.
More from GitHub
Get daily recaps from
GitHub
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.





![AI With Python Full Course 2026 [FREE] | Learn Artificial Intelligence With Python | Simplilearn thumbnail](https://rewiz.app/images?url=https://i.ytimg.com/vi_webp/dcsmcM7yLWY/maxresdefault.webp)



