Watch an AI Agent Learn to Fly a DJI Tello. Will it work?
Chapters5
An introduction to testing whether an AI agent can fly a drone using a game controller and the drone's camera feed, with the experiment set up in the garden to see if AI can operate like a human pilot.
A playful, tech-forward experiment shows an AI agent piloting a DJI Tello using UDP control, camera feedback, and Moonream for object detection—demonstrating a practical AI-on-edge workflow from Cloudflare Developers.
Summary
Cloudflare’s Confidence guides us through a daring experiment: can an AI agent fly a tiny DJI Tello drone using only a PS5 controller as the input device? The setup streams the drone’s camera to the agent, which uses Moonream for object detection to locate a target and sends precise commands back to the drone over UDP. Confidence narrates the garden test where the drone autonomously navigates toward an orange T-shirt, all while wind adds a layer of challenge. He breaks down the architecture: the controller talks to the drone over UDP, Moonream analyzes frames, and the agent (consisting of a chat agent and a drone controller agent) orchestrates actions via a web socket connection. The demo includes real-time telemetry like an 83% battery readout and a live visual feed of the drone’s view. Confidence highlights the practical learning journey, the reliance on internet-hosted models, and the role of the Agent SDK from agents.cloudflare.com, which enables multiple sub-agents and workflow control. He points viewers to the GitHub repo “T agent” for full source code and explains how the system could be extended, even suggesting a future test with a robo-vacuum. The video closes with invites for comments on what to try next and a reminder to check the documentation and SDK. Overall, the proof-of-concept demonstrates an end-to-end AI-to-robot loop, powered by UDP drone control, video streaming, and cloud-based AI models.
Key Takeaways
- The DJI Tello drone is controlled via UDP over its own Wi‑Fi network, not via a traditional HTTP API.
- Moonream is used to perform object detection on frames from the drone’s camera to identify the target (orange T-shirt).
- An AI-powered controller and a dedicated drone agent communicate over web sockets to translate AI decisions into drone commands.
- The architecture combines a local controller (drone link) with internet-hosted models, showing how edge and cloud components cooperate in real time.
- The setup demonstrates practical lab work: battery at 83%, real-time video feed, wind considerations, and autonomous landing when the target is reached.
Who Is This For?
Developers curious about real-time AI control of physical devices, and teams exploring AI agents and cloud-enabled robotics workflows. This is especially relevant for those experimenting with edge-to-cloud AI pipelines and the Agent SDK.
Notable Quotes
"Would it work? Would an AI be able to fly this drone just like a human would?"
—Confidence frames the core question of the demo.
"So that is connected. So I'm going to start up the agent and also the controller."
—Shows the moment the system comes online.
"The drone agent interfaces between the human and the controller which gets things executed on the drone itself."
—Explains the role of the drone agent in the architecture.
"Moonream is a really cool vision model and it does object detection because the goal is to get the drone to fly towards a target or a destination."
—Describes the object-detection component used by the system.
"This is really cool. The agent can take the data coming from the sensors of the drone and run some detections and give the drone the right input to get to the t-shirt."
—Highlights end-to-end data flow from sensors to actions.
Questions This Video Answers
- How do you control a drone with AI over UDP and camera feedback?
- What is Moonream and how does it perform object detection in a drone setup?
- How does the Cloudflare Agent SDK enable multi-agent robotics workflows?
- Can an AI agent autonomously navigate a windy outdoor environment with a small drone?
- Where can I find the source code to replicate this Tello demo (GitHub T agent) and the Agent SDK docs?
Full Transcript
Hey guys, welcome back to the channel. My name is Confidence and I am a developer advocate at Cloudflare. And today we're doing something interesting. I have a few things on my desk here. The first thing I have here is a game controller. This is a PS5 controller. And more interestingly, I have the DJI Tele drone, which is a really tiny, fun, programmable drone. And the way this works is that you pair this up to a device like your phone and you're able to use this game controller to make the drone fly and do things. So, I was wondering what if we were able to give this same controller to an AI or an agent and also give it access to the camera feed coming from the drone so that it has information about what the drone is seeing in its environment.
Would it work? Would an AI be able to fly this drone just like a human would? Uh, let's find out. I already have this all set up um in the garden downstairs. So, I am going to grab my jacket because it's quite cold outside and let's head downstairs and find out if an AI can fly the drone. I'll see you downstairs. Hey guys. So, I'm out here in the garden and I have the demo set up. Uh, I have the drone over there. Say hello to the little drone. Hello little drone. And we're going to get the drone to fly all the way over there by itself to the orange cloud flare t-shirt.
If you want to get some cloud for slag, I have links in the description below. So go check that out. All right, so that's where the drone is and we're going to get it to fly all the way over here. And I have a secondary camera there recording stuff. And I have my laptop with the agent and controller running. All right, so that's what I have for my setup. Let's actually get this demo running. So I am here on the terminal and I will uh connect to the Wi-Fi network of the drone. Uh but for that I need to get the drone turned on.
And as you can see the drone's light is beeping. So it's on right now which is good. and I'm going to come to the laptop and let's connect to the drone's Wi-Fi network. All right, that is connected. So, I'm going to start up the agent and also the controller. All right, and we can head to the browser. Can do a quick refresh and let's check what the battery is. So I can ask the agent what is the battery and the agent is telling us the battery is 83% which is good. So we can actually start our flight.
I'm just going to tell the agent to fly the drone to the orange t-shirt and I'm going to hit that and that should start the mission. As you can see, the drone is flying and it's on the air right now trying to get to the destination we want it to get to. And while that's processing, I'm just going to show you what's happening behind the scenes inside of the brain of the the the agent and the controller. You can see that we have a preview of what the drone is currently seeing, and that's the orange t-shirt all the way over there.
So, it's going to use that detection to figure out how to get the drone to the t-shirt. And that's the drone over there. Finding the t-shirt. So, it's doing a 360 sweep of uh the area around the drone to figure out where the t-shirt is. And when it's able to find it, it will start flying towards it. It just did. So, it's flying towards it now. Also, I should say it's quite windy outside, so it'll be a struggle for the agent to fly the drone under this windy conditions. But it seems to be doing a really good job right now, so I'll let it rip.
slowly but surely flying towards the detected t-shirt on the frame. And taking a look at the console, you can see that it's definitely seeing the orange t-shirt. And you can see it's flying towards it um on the frame, which is cool. And when it gets close enough, um, it should decide to land the drone by itself. So, I think it's getting pretty close. And then it should decide to lands the drone when it's gotten close enough. And it's landed the drone. And that was really really cool. That was such a cool demo. Uh you you can see how the agent was able to take the data coming from the sensors of the drone name on the camera, run some detections and uh through that give the drone the right input to get it from its platform to where we wanted it to fly towards uh the t-shirt and then he landed it when it got close enough.
So that's the demo. This is really cool. Um now let's get back upstairs and let me show you how this actually works in practice. Now, that was really cool. And um surprisingly, an AI agent can fly a drone just like a human would. So, I'm sure you're wondering, how does it work? How is it possible that it's able to fly the drone even though it was not specifically programmed to fly a drone? So, I'm going to explain how this system works and then I'll also show you uh links to the repository where you can see the source code for the implementation if you like to go pull it down and run it yourself.
So let's uh start by looking at how this works. What you're seeing on my screen here is a diagram showing you like the high level architecture of what this system looks like. Because the drone requires you talk to it via UDP. So you have to connect to it over Wi-Fi. It doesn't have like a web server you post HTTP request to to connect to it. That would be actually fun, but you actually have to connect to the Wi-Fi network created by the drone and send commands and receive data from the drone over UDP. But then at the same time, you have to connect to the internet because the models were using to run the agent actually live on the internet.
So, how do you do that? Um, my setup here has my phone connected via Wi-Fi to the internet and ft to my laptop over Ethernet. And that's how I'm able to talk to the remote models over Wi-Fi through my phone uh to the internet and also connect to the drone via Wi-Fi at the same time. That's really cool. So, let's actually look at how this works in practice. How we're able to give the drone access to the controller and also data coming from the drone and get the agent to fly the drone. So the first system you're seeing here is the controller which is the game controller and this does uh some of the heavy lifting.
It connects directly to the drone over UDP to send commands to the drone and get them executed on the drone and listen for the responses from the drone. It also receives the stream coming from the drone's camera and runs ffmpeg in a loop so that we can take a snapshot of each frame after every couple of seconds and send that off to a different model called moon. Moonream is a really cool vision model and it does object detection because uh the goal is to get the drone to fly towards a target or a destination and we tell Moonream to look for that target in the frame and tell us where the coordinates are and that gets returned back to the controller.
Now all of this would be useless if the controller doesn't have a brain which is where the agent comes in. So the controller connects to the agent via web circuit and we can take a look at what the agent system looks like in a bit. This is the agent. Actually the agent is made up of multiple agents. We have two sub agents. Uh we have one responsible for the chat interface where the user can come and uh have a conversation with the entire system like ask what is the battery level or tell it to fly towards a particular location.
Uh then we also have the drone agent which is responsible for actually interfacing with the controller. So the drone agent has a web socket server that the controller connects to and the drone agent is able to relay messages from the chat agent to the controller get them executed and relayed back. So the way it works is that when the user comes to the web interface or the chat interface and types in a question like what is the battery level of the drone, the chat agent receives that query and then sends it off to the drone agent and the drone agent makes makes sure that that gets sent off to the controller.
um the controller executes that command on the drone and returns the response which gets sent back to the drone agent and the drone agent fors that response back to the chat agent and you see it on the interface. So the drone agent interfaces between the human and the controller which gets things executed on the drone itself. So looking at the agent system we have the chat um interface which which is being hosted by the agent. Uh the agent also has a bunch of tools that is used by the drone agent to actually talk to the controller and uh you have an LM in the loop because this agent was not specifically designed to uh fly drones.
So, we need an agent or an LLM to look at the uh possible commands for the DJI drone and to generate the command that makes the most sense to get the drone from where it is towards the target giving the detections from the frame we are receiving from the controller. So, that's how the um controller and the agent work together to get the drone uh to fly towards a given destination. and I think it's really cool when you go take a take a look at the source code. So, I'll be leaving a link to the source code, but it's going to be on my GitHub uh profile.
Don't forget to smash this button to follow me when you get here. But let's go to my repositories. And if you go take a look at the repository called T agent, this explains in detail how the entire system works. And of course, you can see the source code for the agent and as well see the source code for the controller. So this is the implementation for it. It's going to be on on this GitHub repository. I'll have it linked in the description below. But what I think is most exciting, what makes it so easy to build systems like this is the agent SDK.
And you can get that from agents.cloudware.com. And you can install it in one command by typing mpmi agents. And uh this homepage tells you what's possible. You can have multiple sub aents. You can have a system that um elicitates waits for the user to approve um an action before it actually goes on to perform the action. You can have agents that hibernate and wake up to do things at a certain point in time. It's really cool. Uh go check out the documentation. It's a really cool SDK and I want you to try this out. So that is it for this video.
I think this has been a really awesome demo or experiment and uh surprisingly it works. Um now I'm wondering what would you guys like me like to see me do next? Would you like me to give an agent access to my robo vacuum? Let's see how that works. Would you like to see that? Let me know. Uh let me know in the comments section. And also tell me what you guys are building using agents right now. All right, I'll see you in the next video. Don't forget to like, share, and subscribe. And I'll catch you next time.
Bye.
More from Cloudflare Developers
Get daily recaps from
Cloudflare Developers
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.









