Computer use in Codex
Chapters5
The chapter introduces Codeex's ability to control a computer’s graphical user interface and operate any application, highlighting onboarding that guides users to grant permissions and start automations. It showcases the easy setup and demonstrations of Codeex performing tasks across the computer.
Codex now handles real computer use on Mac, controlling apps like UTM, Spotify, and Reminders in parallel to boost productivity.
Summary
OpenAI’s Roma and Arie (Harry) introduce CodeX’s new computer use capability, expanding Codex from a coding assistant to a true desktop teammate. Unlike prior modes that only ran commands or edited code, computer use lets Codex interact with graphical user interfaces across any local app, enabling multitasking without locking you out of your computer. The onboarding is designed to be intuitive—permission prompts and a simple two-drag setup get you up and running fast. In demonstrations, Codex spins up a Mac VM in UTM, launches Spotify for music, and even adds reminders while other tasks run in the background, showcasing true multitasking. A key innovation is using accessibility data and fast Spark models to make interactions feel fast and precise, sometimes even faster than a human. Safety is addressed with app-by-app permissions, ensuring Codex can only access apps you explicitly allow. The conversation also emphasizes practical, time-saving moments—like updating spreadsheets in Numbers or financial tracking—highlighting how computer use could become indispensable for everyday life. The overall message: with computer use on Mac, Codex performs real work in the background, unlocking a more seamless, human-friendly workflow—and Windows support is on the roadmap.
Key Takeaways
- Codex can now control any macOS application, enabling tasks across multiple apps simultaneously (e.g., UTM VM setup, Spotify music, Reminders reminders).
- Onboarding is frictionless: a two-drag setup after the permission prompt immediately shows Codex actions in the Settings window.
- Using the Spark model with accessibility data makes computer use dramatically faster and more accurate than previous, screenshot-based approaches.
- Safety is a core design: Codex only accesses apps you explicitly allow, with permissions requested per app to protect sensitive data.
- Real-world impact is demonstrated by updating spreadsheets and financial tracking via Codex, illustrating productivity gains beyond coding tasks.
Who Is This For?
This is essential viewing for developers and product designers who want to understand how AI can augment everyday desktop productivity, especially Mac users curious about hands-off automation that still respects safety and privacy.
Notable Quotes
""Codex can do all of that for you also.""
—Arie explains how Codex moves beyond code-writing to interacting with graphical interfaces.
""It's going to spin up UTM. And what we can see here is once it starts the app and once it starts using the app, you'll see the cursor fly in.""
—Demonstrating a real-time automated app interaction and multitasking.
""The model can see the interface and click and type by coordinate.""
—Explaining the multimodal capability and accessibility-based improvements.
""Safety is so important... Codeex can access your developer applications and your productivity applications without accessing, you know, anything that's more sensitive.""
—Outlining the case-by-case, permission-based safety approach.
""Computer use is available for a Mac today and we cannot wait to bring it to Windows users very soon.""
—Closing note about availability and roadmap.
Questions This Video Answers
- How does Codex computer use work on macOS with UTM VMs?
- Can Codex run multiple apps at the same time without interrupting my workflow?
- What safety measures protect my privacy when Codex uses local apps?
- What are the benefits of using Spark for computer use over multimodal image-based approaches?
- When will Codex computer use be available on Windows and other platforms?
Full Transcript
And we'll roll in both cameras. Great. Thank you. Yeah. Hi everyone, Roma here. Codex has quickly evolved from a coding agent into a real teammate. But not just a coding teammate anymore. You can literally use Codex for any tasks and computer use is a big part of that shift. It takes Codex beyond your tools and files and into the real work you do with your local apps. Today I'm joined by Ari who has spent a lot of time thinking about this problem. So Arie, why computer use? Tell me more about how this works. Yeah, I'm so excited about computer use.
Codeex already had the ability to do so many things on your computer because it could run commands. It could write code. So it can solve all kinds of problems for you. What's new is that there's all this software on your computer that is a graphical user interface. It's sort of something that as a human you use by looking at it, by moving your mouse, by clicking, by typing. And now Codeex can do all of that for you also. So it can use literally any application on your computer which is so powerful. It's really exciting to get to sort of see this come together and and make something that people can use for so many different things.
One of the things that I that I found very delightful was the on boarding. So for people watching this who want to get started with codeex and want to try these like amazing features, the very first onboarding screen is very easy, right? Do you want to show it to us? Yeah, I'd love to. Yeah. Let's say this is the first time I'm using computer use. It's going to ask for my permission first, right? And so when it does that, I'll get this window that says enable codeex computer use. And when I press allow, it animates the panel straight into the settings window, which just helps helps you know where to look and and what you're supposed to do next.
It tells you how to drag it and it tells you how to drag the list. And then you have to authorize because you're making changes to your system settings. And now, um, I was able to set up the whole thing in two drags. And then now you can see it's going and clicking and and doing the task for me and it's now done. Amazing. Cool. So let's see computers in action. Um do you have one task on top of your head that you want to show us? Yeah, absolutely. So one thing that I do every so often is I need to test software in older Mac operating systems.
And so for that I use virtual machines. And I have an app I love called UTM. But it's a pain to create a virtual machine. I have to click through a bunch of things. I have to run the Mac OS setup assistant. And so, um, sounds like a perfect use case. Perfect use case. So, now I can save a whole bunch of time by having the agent do it for me. So, I'm going to go into codeex and I'm going to say make a new Mac VM in UTM. And so, when I type at, it shows me a list of the apps I have on my computer and I can run the query and then it'll actually um start using the the app I selected.
So, um, in this case, it's going to spin up UTM. And what we can see here is once it starts the app and once it starts using the app, you'll see the cursor fly in. That's awesome. It's so cool. What's cool about it is that it's different from my cursor. So Codeex can click around without interrupting what I'm doing on my computer. So you can keep on using your computer while Codeex is working in the background. Yeah, that's exactly right. You know, a lot of computer use implementations, in fact, every computer use implementation I've ever seen, takes over your entire computer.
So, you can't use your computer while the agent is using your apps. And now it's already done. It sounds like it's downloading Mac OS. It's downloading Mac OS. So, um you know, once Mac OS finishes downloading, it can also complete the next step, which is actually um setting up Mac OS for me. Um which so much time. Should we like uh try to do another uh of these computer use like tasks in the background? Can you do multiple? Absolutely. So, I want to focus play some good music for uh for work for me in in Spotify.
So, now the agent's going to start using Spotify. Um, but what's super powerful about this is it can actually do things um across multiple applications. It can do multiple curses in multiple apps at the same time. So, I'm going to say add a reminder in the reminders app to uh tonight to look through my tax documents. The music's coming. Music's going in. Spotify starting to add some reminders for me. Um, so now all of a sudden my Mac is this multitasking environment where I can do uh so many things at once and have agents do all the things that I don't want to be spending my time on.
Okay. That's that's so cool. Now you have like three apps in the background that Codeex has been driving and everything you've done with the cursor has been so delightful too. Like do you want to tell us more about this? Yeah, we wanted to make something that felt fun to use, that felt natural. And so the motion of the cursor is something that is important when you're watching it use your apps. You sort of want to understand what it's doing. And so we put some effort into finding these uh curves of motion that feel natural and feel kind of whimsical where the arrow turns in the direction of motion so it looks like it's swimming across your screen.
It's makes it fun to use. Yeah, it's really delightful also to like have a better sense and understanding of what the agent is actually doing with every one of these Yeah, I really love it. Um, one thing that I want to touch on is you can use computer use with a faster model like Spark, right? Tell me more about how you guys start about multimodal and accessibility combined. Yeah, we've put together some really exciting things uh with the way that it works with the model and it's just only the beginning of the kind of work that we're able to do.
Historically, computer use has been only something that works with screenshots. It takes advantage of the power of multimodal models. So the model can see the interface and and click and type by coordinate. Um, which is is great, but it turns out there's all this hidden information that is possible to extract about the interface of an application through the accessibility framework. And so we have spent a lot of time figuring out how to make use of this in a way that enhances the model's abilities. We pull a bunch of information that is textual describing the interface and the model can use that to see things even that are scrolled off screen.
it can understand more deeply the role of each element that's on screen. And so this just makes the model super accurate at performing tasks. And then the other benefit of it which you were alluding to is that because it doesn't require images necessarily, we can use non- multimodal models like codec spark which are super fast. And so all of a sudden you have this experience where computer use can when you use one of those models use use software even faster than you can. That's amazing. Do you want to try do you want to try one of these tasks for instance?
were like we would switch the model to Spark. Yeah, absolutely. Relied remain in messages to try computer use for debugging apps. And so what we'll see is before computer use was, you know, pretty performant, but now with this with the Spark model, it's like super human. It uses the software literally faster than than than a human would. We see it here like open the text uh type the message now to me and in a second it sent that's pretty incredible. Pretty sick. So it just did this in the background. I was able to do other things on my computer at the same time and it's super and we got it.
I have it. Very nice. Asking me to try computer use for debugging apps. Sick. Incredible. You brought so much from your knowledge of Sky into the Codex app. Now this is incredible. U working with the research team now at OpenAI. Where do you see the the future uh of computer use? Yeah, earlier products like operator and chatbt agent for those products we used to train dedicated models for computer use. And since then the research team has done this amazing work to actually bring those capabilities into the main GPT models. And so now we're actually building this on codeex on the same models that are available through the API and everyone can build these amazing computer use capabilities.
So that's been um super super nice and also great streamlining for our workflow internally. I think that, you know, it's amazing how fast we've been able to get this to work, um, you know, with the mainline models and with Spark. Um, but I think we are going to want to get to a place where computer use is superhuman. You know, I think that we can get to a place where computer use can operate a computer two, five, 10 times as fast as a person. And I think that's where it's going to become indispensable. You know, you're going to want to use it um for for so many computing tasks, for really everything you do in your life.
and it's going to save you so much time and let you focus on the things that are important. Um, and so I'm I'm really excited about what the road map looks like there. One thing I wanted to touch on maybe that people might be curious about is like the safety approach to all of this, you know, like you have these amazing capabilities for Codeex to now kind of drive some apps on your Mac. How are you guys all thinking about like safety? Yeah, it's such a good question. And I feel like the this type of technology has the potential to be kind of scary because it's actually taking over, you know, the actions that you would do on your computer and it has access to so much stuff.
So we feel like it's so important that people feel really comfortable using this technology and so we've been spend a lot of time thinking about how to do that. One of the things that we've done here is we've made computer use such that it can only access applications that you allow. Every time Codeex goes to use an app for the first time, it asks for your permission, right? And when you say yes, Codeex can see and type into that app, but it can't see or interact with any other app on your computer. So, if you have some stuff that's, you know, maybe a little bit sensitive um in in one of your applications, you can feel very confident knowing that Codeex can access your developer applications and your productivity applications without accessing, you know, anything that's that's more sensitive.
And so, that just builds a lot of trust, I think, for the user. Absolutely. Yeah, that's pretty amazing cuz it's not like streaming your entire desktop or accessing all of your files, anything like that. It's very much like case by case, app by app. As you're trying to be productive, you're giving Codeex the permission to do so. I mean, obviously this is a simple task, but now that we've seen the power of kind of computer use, I'm curious like what have you used uh comput? Um what were kind of your magical moments that you've experienced with it?
Yeah, I have all these spreadsheets that I use for like financial tracking and now I actually ask Codex to update them for me and I don't have to do it myself anymore. Super super powerful. That's incredible. I mean, it's hard to even imagine these days starting a task without Codex. Yeah, that's so true. Like nowadays when I want to start something new, whether it's programming or or even something else on my computer, I I feel like I want to turn to Codeex first because it saves me so much time. and we had the file system, we had the plugins to access all of these services online.
It feels like the missing piece was computer use to be able to access the local apps. I definitely think so. Especially for me, I use a really wide variety of applications. I use a lot of web applications. I use a lot of Apple native apps. I actually track my spreadsheets in the numbers app. And so now this just brings all of that online, all of it into a place where Codex can uh can access it end to end. Pretty incredible. Thanks, Harry. Computer news is one of those capabilities that are hard to fully appreciate until you try it.
All of the sudden, your computer works in a whole new way. And it's not just codecs moving around your computer. It's Codex actually doing real work for you in the background without breaking your flow. So try it on your hardest task. Maybe the one that has you bouncing around five apps and eats multiple hours of your day. We genuinely can't wait to see what you think. Computer use is available for a Mac today and we cannot wait to bring it to Windows users very soon. Thank you so much Harry. See you next time.
More from OpenAI
Get daily recaps from
OpenAI
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.









