How Copilot auto mode selects the best AI model | GitHub Checkout

GitHub| 00:11:56|May 19, 2026

Chapters5

The chapter introduces the goal of Copilot to “just work” for developers and sets up GitHub Checkout as a space to discuss making Copilot tasks seamless and trustworthy.

Copilot auto mode picks the best model in real time, simplifying tasks and boosting efficiency across IDEs, CLI, and cloud agents.

Summary

GitHub Checkout spoke with the Copilot team to reveal auto model selection, a feature designed to remove the guesswork of choosing models. The goal is to let developers focus on work rather than model tinkering, whether in VS Code, via the Copilot CLI, or using Copilot Cloud Agent. The auto system dynamically re-ranks models using real-time metrics like latency, capacity, and errors to deliver the best fit for a given task. In demonstrations, Claude Haiku 4.5 handled a simple 32°F to Celsius conversion, while more demanding tasks triggered Claude Sonnet 4.6, showing the smart routing between models based on reasoning needs. The team emphasizes that prompts aren’t re-evaluated after every keystroke to preserve caching and reduce costs, instead balancing speed and quality. They also describe a model selection engine and a task-based router that considers reasoning, tool orchestration, and debugging needs to assemble an optimal model list for a task. Looking ahead, auto selection will land in VS Code soon, with further controls and even finer-grained routing via sub-agents to optimize execution. The conversation ends with invitations to share feedback, encouraging ongoing improvement and roadmap ideas while teasing more enhancements to the CLI and copilots across platforms.

Key Takeaways

Auto mode uses a model selection engine that scores models based on latency, capacity, and errors to pick the best one in real time.
Claude Haiku 4.5 is used for simple tasks, whereas higher-reasoning needs trigger Claude Sonnet 4.6, illustrating task-based routing.
Caching considerations mean prompts aren’t re-evaluated on every request, preserving token efficiency and reducing cost when switching models.
The router analyzes task nature—reasoning, tool orchestration, and debugging needs—to generate a ranked list of suitable models.
GitHub evaluates models with offline benchmarks (e.g., SweepBench) and controlled online experiments before making them available.
Intelligent auto selection aims to scale with new models and will bring VS Code integration soon, plus broader CLI and copilot cloud agent enhancements.

Who Is This For?

Essential viewing for developers using GitHub Copilot across VS Code, CLI, or Copilot Cloud who want a smarter, hands-off model selection that improves performance and reduces cost.

Notable Quotes

"The best part about auto is that it's extremely simple. There's no configuration, no policy toggles that you have to do."

—Intro to the zero-setup philosophy of Copilot Auto.

"We re-rank those models in real time with capacity, latency, and errors."

—Describes the core model-selection engine.

"Some tasks could be solved by both, some only by the larger model, and some only by the smaller model."

—Explains the task-based routing and model coverage.

"Intelligent auto selection is coming to VS Code within the next couple of weeks."

—Teases future IDE integration.

Questions This Video Answers

How does Copilot auto model selection decide which model to use for a given task?
Can I customize which models Auto uses for different task types in Copilot?
What benchmarks or tests does GitHub use before adding a model to Auto's pool?
Will Copilot Auto be available in VS Code, the CLI, and Copilot Cloud at launch and how soon?
How does caching interact with model switching in Copilot Auto to save costs?

GitHub CopilotCopilot auto model selectionClaude Haiku 4.5Claude Sonnet 4.6model routinglatencycapacityerrorsSweepBenchVS Code integration

Full Transcript

When I'm using Copilot, I just want it to work for me. I don't want to spend time trying to run through tasks with four different models and comparing that all. I want to be able to trust that we can [music] do that. Welcome to GitHub Checkout, new. Yeah, thanks so much for having me. I'm so excited you're here because as many of us we've been experimenting using GitHub Copilot and assigning tasks, switching using that model selector. And a lot of times I don't really know what model to add. And now, thanks to you and the work in your team, we have the auto selector mode. The model landscape is changing so quickly, and that's great for AI advancements in our industry, but what it does add is that extra layer of complexity. So, questions around what model do you use, what scenarios do you use them for, and how do you actually scale that when there might be a new model version or a new model that comes out the very next day. We want it to be able to abstract that complexity away from you and actually automatically give you the best available model for your task in real time, so that developers can focus on what they want to do. And so we do that through Copilot auto model selection or auto for short. The best part about auto is that it's extremely simple. There's no configuration, no policy toggles that you have to do. All you have to do is go into your favorite IDE, the GitHub Copilot CLI, or Copilot cloud agent and go into your model picker. So, for today's demo I'm actually going to show you two scenarios. I already have auto enabled, but to show you how this works, all you're going to do is go through model and you'll notice auto. So, the first thing I see here is a 10% discount, and that's actually applied for all paid users, so it's 10% off any models used through auto. And I want to make a point that we're not actually doing this just because we think that we're adding the cheapest or the worst models in there. It's actually quite the opposite. We're very intentional about the models that we include per plan and per scenario that you're actually using it in. So, I'll do two different scenarios. The first one is a low task. So, I'll just use something simple like convert 32° Fahrenheit to Celsius. And pretty immediately, we see that Auto is using Claude Haiku 4.5. Let's see what happens if we were to switch it over to something that requires a higher reasoning model. So, I'm going to try that task in this version of Auto that I have on VS Code. So, I already have my plant app open. I'm just going to say refactor my app code base. And as this is working through, I do want to make time to talk about why I had to clear the session on CLI and why I would have to do that for VS Code to show different models used here. And that's because we actually call the intelligent version of Auto or task intent with Auto only a few times. The first time at the beginning of a conversation, and then the second time after compaction happens when you hit a certain percentage of your context window. And so, the natural inclination is, why don't you just look at the prompt every single time I send a request? So, every user prompt, we would classify that. But in practice, when we actually did that, we noticed that it actually adds more cost to the user. So, essentially, you're just destroying the cache as you're switching models mid-session, and that compounds when you're switching models between model providers. And so, one of the key props of Auto is that we're passing over token efficiencies and higher quality experiences to you. So, we need to find a balance there, which is why we're actually routing less than what you might expect. So, for this scenario, I'm not going to let it go all the way because that might take a while. I'll just stop it here. And we notice that we actually have Claude Sonnet 4.6, and that would be what I expect for a higher reasoning example such as this one. It automatically understood the task, knew it needed a higher level of reasoning, and then swapped the model. Yeah, in that scenario, I'm not really thinking about anything but trying to get the refactoring task done. When I'm using Copilot, I just want it to work for me. I don't want to spend time trying to run through the task with four different models and comparing that all. I want to be able to trust that we can do that. And whether I'm on the CLI or VS Code, all I have to do is just select auto. That's it. Just pick auto on the menu and then Copilot's going to do the the thinking for me, figure it out what model I should choose. Exactly, and that's how it works for a Copilot chat agent as well. How? I'd love to understand how the team built this cuz this is amazing. I'd love to offload that bit of thinking and honestly a lot of times I think like unless you're deep into research and really doing your own testing and benchmarking, doing your own emails, we don't really know. The first problem that we had to solve was how do you actually pick the best available model at any given time. And what we did here was create a model selection engine that looks at different key utilization metrics like latency or um capacity capacity or errors. This is a visualization of what the dynamic model selection looks like um and I find this easier to help us understand how it works given some of these improvements are pretty under the hood for auto. So, I mentioned how do we find the best available model? What we're doing is we're re-ranking those models in real time with capacity, latency, and errors. You'll always get that consistent and reliable experience with whichever model actually has the highest um score based on some of those factors. That's the current version of auto, but what I'm excited to talk about is that more intelligent, that task-based routing that we have. And so, for that, I'll show you another one of these mock-ups. Um we actually developed this by pitting different models together. So, we had higher reasoning models versus lower reasoning models, and so we would have a ton of different benchmarks or tests to see which ones would solve what task. We saw three different scenarios. The first being that some tasks could be solved by both, Some are only solved by the larger model, and some are only solved by the smaller model. And that overlap is really that um opportunity space we have to really make an impact to maintain the level of quality or make it even better with all those token efficiencies. So, the router now actually looks at the nature and complexity of your task, and it's built to look at several different dimensions like reasoning needs, tool orchestration needs, debugging needs, things like that to give us that final ranked list of what models would be best for your task. And then you when you combine the dynamic model selection on top of that, you're not only getting the model that's best for your task, but also the model that is most reliably served for you. So, it's that power in the combination of all of these systems that actually makes Auto an attractive feature. It's not only to save on that usage and to keep your heavier task for the models that actually require the the the heavier task, but it actually does some sort of optimization where it's actually going to make it be better. Because some tasks were only solved by smaller models, one of the learnings that we had when we evaluated all those models together was that a system that consistently routes to the best model for your task doesn't only save on cost, but it actually outperforms using one single model alone. And so, of course, you could use one model like Opus over and over, and that'd be great, but you lose on some of those efficiencies or those cost optimizations that you might have from using a smaller model that could do the same thing. How is that going to move forward because there's going to be a lot of new models that are coming up, and I'm assuming when they're added, they will be already part of this intelligent um model selection? The goal is definitely to make all the latest and greatest models available to you through Auto. I think there's some work that we need to do to improve kind of the control around that, especially for more of the expensive models like the Opuses of the world and things like that. But in terms of benchmarking and evaluation, um our team, not only my auto team, but also the models team does a ton of rigorous evaluations before we make any model available. And that's a combination of both offline evaluation on some of the benchmarks that you might be aware of, like SweepBench or some of the other ones that we have internally, but also online experimentation. And we're heavily investing in that area, especially for something like auto because, you know, it's a bit difficult to estimate what the impact or the models use would be, especially in this dynamic environment. We don't know what task you have, what policies you have in place, and what the real-time metrics are. And the only way to do that is through controlled online experimentation. We understand how the intelligent routing is happening and how the model selection is happening, what models are included as part of the selection that we can use. We do choose the models based on the quality and what scenario you're actually using it in. So, in auto today, you do have access to many different models across the model providers that we have at GitHub. That's kind of our special sauce with the diversity of models in our ecosystem. And you'll notice that they do differ between scenarios. All of these models will change and evolve as we get more models, as the router improves, and as we run more evaluations to see what are you actually gaining from using these models in each of these scenarios. Wonderful. So, if folks have seen the auto selection and they've been a little bit kind of wary of selecting it thinking that maybe it's just going to default to a model that would not be their choice, there's actually a lot of that has gone behind the scenes to make sure that this intelligence is applied as as Copilot is picking the best model from cost efficiency and to get the job done the first time. So, give it a try. That's awesome. I mean, I read all the feedback. We're always on socials. We're on the GitHub discussion post. So, if folks have tried it before and are a bit hesitant, come back and have other suggestions or ideas for customizations, please feel free to post somewhere. I'm sure one of our team members will see that and we'll try to get it on the road map. Anything else coming up that you can share? Intelligent auto selection is coming to VS Code within the next couple of weeks, which will be huge. And then we'll follow up with all of the rest of our ideas in the portfolio and the CLI and cod agent very soon. So, keep an eye out on our change log for the most recent announcements. But other than that, some other things that we're starting to think about is around controls and customization. So, you might have a scenario where you want to use specific models for smaller tasks or specific models for larger tasks. We want to be able to give you that level of control so that you can also learn along the way, how do these models all work together for specific scenarios? And then the other thing that I'll mention is all this routing is happening on the main agent. So, if you get Haiku, you're going to get Haiku for the entirety of your cod agent session. What we want to work on moving forward is further fine-tuning and training our models so that it actually looks at sub agents. Because I think it'd be so cool if you had a triage agent that uses a lower reasoning model, if you had a spec agent that uses slightly higher reasoning model, and then execution agent that's actually able to switch between all of the available models to get you that response that you want and also the efficiencies that you can get. That is awesome. So, much to look forward to for this new feature and apparently there's been a ton under the wraps, under work. Uh I'm looking forward to see how it evolves. Thank you so much for showing it to us. Yeah, thank you. And that was your first look at GitHub Copilot's new auto model selection. More intelligent than ever, doing things that go beyond saving you usage. So, thanks to the team for showing it to us. And if you've tried auto before, give it a go once again. It has improved a lot. Please let us know what you think about this new feature in the comments. I reply to every single one of them. And don't forget to like and subscribe so you never miss another developer tip or feature update. Push those changes to main and I'll catch you on the next release.