Kimi Found 40+ Security Issues in Our Code. Open Source AI Is Here | Michelle Chen
Chapters10
The chapter opens with host introductions and previews upcoming segments on dynamic workers and a Women at Cloudflare feature, then highlights recent Cloudflare blog posts including Kimmy 2.5, smarter client-side security, a Kubernetes fix, and visualizing code workflows, ending with a teaser about upcoming News and events.
Cloudflare’s Michelle Shen explains how Kimmy 2.5 and open-source AI enable cheaper, private model hosting, while highlighting new tools like dynamic workers and Cog 0.17 to empower developers.
Summary
Cloudflare’s Michelle Shen sits down with host to discuss the rapid evolution of open-source AI and why Cloudflare is betting on open weights and hosted inference. They highlight Kimmy 2.5 as a pivotal step in making open-source models competitive with closed systems, especially for enterprises that require privacy and control over data. Shen shares how Cloudflare’s security team already found more than 40 confirmed security issues using Kimmy across code bases, illustrating both the power and the ongoing work ahead. The conversation then dives into why open-source models reduce costs and allow fine-tuning for specific use cases, with Kimmy serving as a cost-effective, scalable option for internal workloads. Cloudflare’s Workers AI platform, including the AI gateway and Cog 0.17, is presented as a unified path to bring models from Replicate and other providers to customers with easy switching and a single API. They also touch on performance improvements, such as session affinity, prefix caching, and faster tokens per second, all aimed at lowering token costs and hardware strain. The episode closes with a preview of April announcements and a teaser about dynamic workers—sandboxed, on-demand code execution that dramatically speeds up tool calls and expands what developers can build on top of Cloudflare’s edge.
Key Takeaways
- Kimmy 2.5 delivers strong performance for open-source models, with Cloudflare internally validating its capabilities weeks before public release.
- Open-source models reduce API token costs and training overhead by hosting inference locally, enabling cheaper, private deployments.
- Kimmy’s real-world value is demonstrated by saving enormous token-associated expenses in codebase scanning and security tasks at Cloudflare.
- Cog 0.17 and Replicate integration enable Bring Your Own Model workflows and easier packaging of open-source models for faster deployment.
- Dynamic workers provide millisecond startup, sandboxing, and policy-controlled execution to run code snippets at the edge, expanding tool access while preserving security.
- AI gateway and a unified API strategy make switching between model providers (Google Gemini, Kimmy, Replicate models) seamless for developers.
- Open-source models offer tunability (fine-tuning, reinforcement learning) and privacy advantages critical for enterprises handling sensitive data.
Who Is This For?
Software engineers, DevOps, and security teams at enterprises evaluating AI workloads on the edge or with strict data privacy requirements. Ideal for teams weighing open-source models vs. proprietary APIs and looking to implement scalable, cost-efficient AI at Cloudflare’s edge.
Notable Quotes
"we found I think this as of this morning over 40 plus confirmed security issues that we are working to solve."
—Michelle Shen notes real-world security findings from using Kimmy on codebases, underscoring both capability and ongoing risk management.
"open source models are now so good and so competitive that we were able to see real benefits from hosting it and using it internally."
—Emphasizes the shift to hosting open-source models like Kimmy for cost and control advantages.
"the benefits of having open source models on us because they know that we don't do anything with their data."
—Cloudflare privacy and security stance for enterprise customers hosting models on Cloudflare infrastructure.
"we launched Cog 0.17... a pretty big rewrite actually."
—Highlights the significance of the Cog 0.17 release in model packaging and deployment.
"dynamic workers allow you to spin up and execute other workers at runtime... we get millisecond startup"
—Dina Coslov explains the practical benefits and speed of dynamic workers for executing code in a sandboxed environment.
Questions This Video Answers
- how do open-source AI models like Kimmy compare in cost to proprietary APIs?
- what is Cloudflare’s dynamic workers and how can I use them for AI tooling at the edge?
- how does Cog 0.17 improve deploying open-source models on Cloudflare?
- what are Bring Your Own Model capabilities in Cloudflare’s Workers AI?
- why is open-source AI considered more private and customizable for enterprise workloads?
CloudflareKimmy 2.5Workers AIOpen Source AIReplicateCog 0.17 Bring Your Own ModelDynamic WorkersCode ModeAI Gateway
Full Transcript
how our security team is using Kimmy to go through like different code bases and they found I think this as of this morning over 40 plus confirmed security issues that we are working to solve. But it's really cool. Like I a year ago, I don't think I would have said that this might happen. And I think a year from now, things might change as well. Hello everyone and welcome to this weekend. Today we're going to talk about open-source models. Models that are changing the world in a sense for at least three years. But I would say this past year and even more so this past 6 months or even 3 months things are really changing so much.
So a good opportunity to talk about that with my colleague Michelle Shen. I'm your host the drone to based in Lisbon Portugal. As usual at the end of this episode we're going to have two little segments. One is with Dina Klov to talk about how you can explore dynamic workers and sandboxes and what can do with those. Those are really cool. It's related to code mode. You'll get a sense of what that is at the end with Dina. Dina, of course, we've spoke about that topic with Kenton Varda in previous episodes. You can also stay tuned for those and just check the podcast feed.
Also, at the end, we'll have another women at Quer segment, in this case with Alexandra Messi Rodriguez. Before we go into the actual conversations, let's go to the Cler blog. We also had not only the Kimmy 2.5 new model being presented and we're going to talk about that with Michelle but also even this week Clare client side security smarter detection now open to everyone. This is a very cool new feature I would say. So it's all about advanced client side security tools to all users featuring a new cascading AI detection system. So it combines graph neural networks and LLMs.
So, we've reduced false positives by 200 times while catching sophisticated zero day exploits. There's also a blog post about a oneline Kubernetes fix that saves 600 hours a year, also a cool one you can check in our blog. And of course, how we use abstract syntax trees to turn workflows code into visual diagrams, also a workers blog in that sense. And now, without further ado, here's my conversation with Michelle Shen. Let me just give a teaser. Tomorrow, April the 1st, that's actually very important day usually for Crawler. We have products, real products that are launched on that day is not a April Fool's Day.
It's also uh Apple's 50 uh birthday uh tomorrow, April the 1st. And also it was when Gmail was launched for example and when quad 1 public resolver 1.1.1.1 was also launched in April the 1st uh 2018. So tomorrow there will be some news for you to check in our blog and real ones no fake ones. Stay tuned for that. Hello Michelle how are you? I'm good. How are you? I'm good. For those who don't know where are you based? Where are you now? I'm based in New York. I'm on the product team here for our workers AI platform and our AI platform in general.
So love to talk about open source models and how workers AI's announcements have come out like two weeks ago and lots of new news here to share. Exactly. One of the things I find very interesting in the in these day and age is how things are changing fast. Can you actually before going into the models explain to us a bit when you joined Clar and how these changes of the last year or so have been incredible in a sense? Yeah, I looking back it's so funny. I I've been around Cloudflare for seven years now, which is kind of insane for me to say.
I was actually an intern in 2019. So, I was working um actually on Warp and I was in Austin working with Dne, our CTO, a lot. Um and then I I moved to New York and I left to do consulting for a little bit. Uh but then Dne convinced me to come back. And so I ended up as Dne's chief of staff for two years. And then when I was his chief of staff, we helped kind of brainstorm and ideate for new products. And one of those products was like these AI products that were going on.
So this was when chat GPT was blowing up and we were like what should we be doing in the space? So helped kind of start that and then really wanted to take that and run and own it a little bit more. So then I uh switched over and did product for the AI platform. So products including like workers AI and AI gateway and vectorz as well. Exactly. These day and age is much more around workers AI and those products. For those who don't know what are what what is workers AI? What can we explain in terms of the general perspective there?
Yeah. So workers AI is a serverless inference platform. It's you pay for tokens. Uh you don't have to actually go on um a lot of platforms are like you have to run to GPU and deploy the model yourself. How workers AI works is there's just a single API that you call and then you're immediately getting inference back from like Kimmy or Llama or different kinds of models. We also have multimodal models. We have image models, audio models, OCR models, things like that as well. And so we want to make this a really good developer experience.
Any Cloudflare user, any Cloudflare developer can use um and access the latest AI models uh with just like a really simple API. So you can use it through the worker binding, which is really just one line of code and then you're immediately accessing like frontier intelligence, which is kind of insane. Um but there's also a rest API so lots of people that are not even in the cloudflare ecosystem can also consume this. Makes sense. One of the things is also surprising is not only we're adding models but models are becoming better. So can you give us a run through of the improvements of models because what led to this conversation of course was the launch of Kimmy 2.5.
We can discuss that a little bit more. But even on the general side, the improvements of models, how often it is these days, every day almost. How is it there? It's kind of insane. I was thinking about this recently and Llama 4 came out less than a year ago. It's just about to like hit a year. Um, and it feels like there's been so many advancements since then and it's crazy. It's it's just exponential growth really. You know, like open source models have been around since the beginning of AI. um just recently I think they've really accelerated in which they're really pushing the frontier and catching up with like a lot of these closed source models and that's also why kind of workers AI decided to go into this space was because we were seeing like significant improvements with these really large open source models where it would you know be competitive with the proprietary models from closed source labs but you know still open open weights open source can perform but still be a lot cheaper so that's actually why we decided to go into this like big model hosting game is because I think we hit this inflection point where these open source models are now so good and so competitive that we were able to, you know, see real benefits from from hosting it and using it internally.
We've actually tested Kimmy internally for like weeks before we actually launched it to the public. We wanted to see how it performed and you know, we were working on like optimizations and things like that as well. But it's blown, I think, a lot of us out of the water here in terms of how how well it's doing. One of the examples I cited in the blog was how our security team is using Kimmy to go through like different code bases and they found I think this as of this morning over 40 plus confirmed security issues that we are working to solve.
But it's really cool like I a year ago I don't think we I would have said that this might happen and I think a year from now things might change as well. Uh I think the pendulum keeps swinging but this is where we are today and I'm really happy that we got to bring Kimmy onto our platform. One of the things uh I I I also played a bit with Kimmy because we have have it available for a number of weeks. I think one of the things that surprised me is how capable it is in many tasks in terms of even if search of organizing folders for you and because of course if you give it access to a set of folders it can actually make a difference there as well.
That that's my use case for Kimmy there specifically in terms of the highlights of the model a large model open source large module model you already mentioned that specific example but what are the benefits of having this open source of having this around currently this this type of open source large model specifically. Yeah, I think in terms of it being open source, one of the benefits that we see is as Cloudflare, you know, we have a lot of people trust us to to do privacy and security and those that's one of the things we care about a lot.
And so if we're doing work with um you know, sensitive information or proprietary like code bases and things like that, that's sometimes things we don't want to ship off anywhere else. So being able to host an open source models ourel ourself means that we have full control over the infrastructure stack and where the data is and who's seeing it and where it's being evaluated and things like that. And so that makes it a lot it makes it a kind of a no-brainer for a lot of these really secure organizations to be thinking about um hosting models themselves.
So that's what Cloudflare is doing. But I also think like you know Cloudflare is not in the business of training. So we have customers that are also looking to us for privacy and security reasons wanting to use open source models on us because they know that you know we don't do anything with their data. We don't keep it uh we don't trade on it anything like that. So I think there's some of the these enterprises will see just a lot of benefits from open source models especially if they're concerned about security and privacy. So I think that's a a lot of the reason um why open source is also leading here especially in enterprises.
It also gives you the flexibility to be able to do like fine-tuning or like reinforcement learning on models. So last week in the news um has a lot a lot of people have been talking about like cursors new model intercom and etc. There's been lots of news about that. Um the benefits of having open source like open weights models is that you can kind of tune it and play with it and use it to suit your own needs. So yeah, I think there's the open source foundation here is really really important and I think it's kind of the future actually is because everyone will want to have more specific inputs and outputs from from models that are tailored to their specific use case.
So yeah, I think the the open- source ecosystem is really really important uh in the world of AI here. Makes sense. One of the things of course really important is and I think many companies are now dealing with this companies that are using for some reason claude opus 4 4.6 6 or uh codeex 5 GPT 5.4 or something like that. They're seeing like costs uh the costs of tokens now usually people are just attempting and trying out but costs are a big thing for the future as well. And one of the things I'm surprised is how not that much in terms of price these open source models at least Kimmy uh 2.5 it's so cheap compared with other but it's like mind-blowing how cheap it is.
Can you tell us a bit on that? Why is that and how relevant it is really? Yeah, the when I think about proprietary labs, when they are pricing their APIs, they're actually factoring the cost of training into the cost of APIs, right? It's kind of like the pharmaceutical R&D model where the cost of the drug factors in all the years of R&D that go into it. So with proprietary models, you'll see their APIs be a lot more expensive because of that. with open source models because they're they've you know opened it up the the lab kind of takes on the research costs but then for inferencing it can actually be a lot cheaper because as Cloudflare like I'm hosting the model but I didn't do anything in in training it so the only costs that are incurring for me are the infrastructure costs of of me hosting this model I will say it still takes a lot of work to get to those prices basically how I think about it is like there's a triangle of price as like fast cheap and performance like quality and it's kind of like a pick two and you have the the price is kind of dictated by the market because you know it's almost a commodity you use API here you use API there it gives you the same amount of tokens and so people are are not going to differentiate based on price that much so they price is set by the market here and then you have to do a lot of things in the back end to make that price profitable and so things that we do are are things like optimization so actually running writing like our own custom kernels for it we are working on like compression and things like that to to make serving the model a lot more efficient and so so that we can make it more affordable to run for for customers and then for us to serve it as well.
You have in the blog actually a very cool example in terms of tokens like if an agent processes over 7 billion tokens per day using Kimmy it has caught more than 15 confirms issues with a single codebase and with rough math you can see that one would spend uh 2.5 uh $2.4 4 million a year for this simple use case, but running Kimmy that cost would be a fraction 77% lower, which is uh quite incredible. It is. And it's important to note that that figure is actually for that single agent on a single codebase. So like Cloudfl has so many code bases just like any any other enterprise, right?
So that that cost actually triples like quadruples and then then you you just see like this cost reduction with Kimmy that's really remarkable here. Sometimes it takes a few more tokens and a few more loops to get to the same thing. But it's the cool thing about it is that it's just an asynchronous process that's running for us. Um, and it's just like scanning different code bases and things like that and and catching issues. So yeah, it it's it's really I think you'll see this pattern a lot. A lot of enterprises will be seeing this like these crazy costs from proprietary models um and then wanting to bring that down.
And I think the the natural alternative for that is open source models which are a lot cheaper. Uh one of the things that I find really interesting in this area is also the ability of changing models uh and just using it for what you need. In the blog you also mentioned for example open claw many people are are using even to try out and the diff difference there in terms of price is also quite big. Um having the possibility of doing some tasks hey you want to use uh GPT 5.4 four codecs or 5.3 uh codecs or claude for specific tasks but for most of the tasks you can actually potentially use Kimmy and the cost will be lower having that possibility of changing of using for what you want specific models I think is really interesting and I've been using a bit of that and you you can definitely understand how things can play around not with just one model but with more alternatives right what do you think there We we definitely switch models a lot here um and just compare.
We even run like parallel tasks and against different models and kind of see what's best suited for the task here. Some of these some of the things are overkill just like you mentioned like OpenClaw has like a heartbeat to just check if it's still alive. And for like a heartbeat kind of inference, you don't need to be paying $25 per million output tokens for that, right? Like really any model will do there. So I think the ability to switch models is really important. Um, and that's one of the goals that we have with our AI gateway product as well, which is that you can switch to any model provider on on the internet really hit any model.
We're actually coming out with a new unified API for that soon. So that you can do like AI.run like Google Gemini for example, and that will will take you to the Gemini model, but then you can also do AI.run, you know, workers AI Kimmy for example, and that will take you to the Kimmy model. It's also made possible because like everything is mostly chat completion standard. to switch it between models is really easy, but we also have a compat um compatibility layer to to make switching models easy. So yeah, there's a lot of I think benefits from from switching models and I think people are still like a lot of the trend today is like people going on and using the most expensive model because it works but then I think we'll see a lot of right sizing of different use cases and people you know seeing the bill after a few months and then being like all right we got to do something about this now and then right sizing their use cases and finding the right model.
Then I again I think like open source models is is the natural fit there. One of the things uh we didn't mention but it's it's worth reading the blog because it's specific and technical there about how we try to use improvements for efficiency for inference so less tokens less costs. Can you give us without saying everything of course people can actually read the blog for that but at least give the cache some of the methods that we use to to make that possibility. Yeah. So, we've blogged about this before, but we have a proprietary inference engine called Infire.
We're going to have an update updated blog for it in the next few weeks as well. So, we'll go into more technical depth about how we're doing like custom kernels and optimizations uh there. But we are also doing kind of the software and infrastructure layer different things that to help this. So, for example, we have this like session affinity header that we're introducing here. Basically, you pass us a string and we make it route to the same instance. So it's like a sticky kind of routing and with that you utilize you take advantage of like prefix caching which is the fact that you know in a lot of these agentic use cases when you're sending a new prompt it's not just sending that prompt it's actually sending the prompt that new prompt and everything else before that that has been said between you and the agent as well.
So context and inputs can get really really long. But the only thing that's really new there is your new prompt that you sent, right? And so prefix caching takes advantage of that by not having to recmp compute the tensors of the past uh conversation and only doing the prefill stage on the new prompt that was sent in. So like the delta between the messages and then so it's a lot faster. You'll get a lot faster time to first token because you're not recomputing the tensors for every single message, every single turn. Uh and then just faster tokens per second output as well because you're not bottlenecked by prefill there, not blocked on the prefill stage where the GPU is idle.
So like things like that are on the software and infrastructure and routing layer that we can also do to to improve the productivity and efficiency of Kimmy. Makes sense. One of the things that I find in very interesting from uh a few years ago actually is related to efficiency is not only helpful for your costs may that be tokens but it's also helpful for in terms of industry less use of GPUs because GPUs are under constrained on the needs that uh we need to have. So having layers of efficiency in many areas of the stack is not only good for you directly in terms of costs but for the industry even for the environment.
So there's a bunch of improvements efficiency things there that are quite relevant in a whole. Yeah hardware is is pretty fixed right like there's only certain GPU types. There's only so much supply. There's only so much supply you can get your hands on as well. And so I think a lot of the value here and I think that's one of the things that uh working at Cloudflare we're so we're so good at is is finding efficiency in software. And so we have such like talented engineers here that are able to um you know find and and get gain these efficiencies through through software and and not be limited by the hardware that we're serving.
So it's really a pleasure to be working with like really really cool and smart and talented engineers here that are able to like get every bit of efficiency out of out of our hardware. And fun fact, we're going to have an episode all about hardware and Clifair's Gen 13 servers in a few days with that with that team, the hardware team. So stay tuned for that. That's actually a double blog, two blogs that we wrote about uh Gen 13 servers and improvements there. Quite interesting as well for from a few days ago. Also worth reading for sure.
on this area specifically. There's some announcements more general because Kler acquired a few months ago replicate. So there's also some areas there that we shared and there's news there, right? What can you say about replicate? Yeah, without spoiling too much because we have our innovation week coming up in a few weeks where we're going to be kind of doing our big launches. We've been working with the the replicate team for the last three months and in fact it's it's not even really the replicate team anymore. It's like we're all one team here. Um, so it's kind of the AI team here.
The people are working on like workers AI, people are working on AI gateway, some folks are are working on like replicate infrastructure as well. So we kind of like to think of ourselves as as as one team. Uh, but yeah, there's so much that we're working on together. There's going to be, you know, bringing the replicate models onto AI gateway so that anyone else can connect to it. Uh there are things such as uh cog which was released yesterday. So cog is a way to package up open source models and uh makes it a lot easier so that you avoid things uh like CUDA dependencies and it actually makes like model loading and weight loading faster and things like that.
So it's a really cool open source tool. In fact before we acquired replicate we were thinking about how we built this tool ourselves but then you know replicate came in and they had this tool already and we're like oh my god of course like let's let's use that and you know we've seen really good adoption from the market already and lots of good feedback from customers. So we decided to kind of hone in there and work on taking that on and and making that work with uh Cloudflare. So we announced Cog 0.17. Lots of good work went into it.
It's a pretty big rewrite actually. So it's pretty cool to see uh kind of our first joint ship together I would say. But we are using that to uh do bring your own model to to workers AI. And so we've kind of talked about bring your own model to workers AI before. We have done you know we have customers that we do dedicated deployments for Aki and so customers will bring us the a model that they've trained and we we host it for them or maybe they want um just Kimmy but they want you know dedicated throughput for that.
So we also do a lot of that actually and I'm trying to figure out a way to make that more self-s served so that anyone that wants to run a machine learning model can do that on our platform without having to uh talk to us or things like that because that's kind of how it is today. Um a little bit love the gatekeeper here. So wanting to bring that to the masses, right? and and give that that comes with a bunch of engineering challenges which is kind of what we're working through now. But the idea is that anyone should be able to run a machine learning that they model that they want on Cloudflare's infrastructure on on workers AI.
So that's kind of one of the milestones we're working on together uh with replicating Cloudflare. So it's really really exciting. I think we we've got some good momentum here and the COG 0.17 release is just one of the first pieces of the puzzle that's landing. But we are actively testing it with internal customers and some kind of design partners and I'm kind of really excited to get some feedback and and see how that goes. uh the level uh I didn't knew too much about replicate before Coffler acquired replicate but just exploring the level of models they have especially for video and things like that is astonishing the ability that some of those models are able to do in terms of output videos amazing videos it's really incredible specifically and you sorry go it's really funny because when we first started launching the AI products um I was responsible for helping getting AI gateway off the ground and I think like the only model providers we supported at the time was OpenAI and Replicate.
Uh, and it's just kind of come full circle because they were kind of one of the first AI companies that were were out there. One of the one first ones I wanted to integrate into AI gateway and now they've joined us and so it's really really cool to see see that come full circle and it's really a testament to like how long they've been in this game and how good of a brand community they've built. Um, so we're really excited to have them here for sure. and you already teased, but April will be a month with so many things that I can only say stay tuned because it's really mind-blowing the things that are coming.
And I'm saying on the Cloudflare side specifically. So stay in tuned for sure. I can't even keep up. There's just so many announcements and it's not even just like workers AI announcements ourselves, which we have, but like the fact that we're supporting so many announcements and so so many announcements are actually built on us. So I think like one of the biggest wins that we've seen in workers AI is that a lot of products are actually built on top of us. So we're not just like externally for for customers like hitting us and we're providing inference for them but we're actually powering a lot of internal products as well.
So like when you're interacting with like the dashboard for example like a lot of that is powered by workers AI which is really fun to see. So like I love going through the agents week announcements and looking for you know the ones that are built on us and I didn't even know because it just worked and it was all successful. So I'm really excited about that. It's all all always about building Cloudflare with Cloudflare. So the layers are are there for sure. Also a few days ago, unrelated to workers AI specifically, but to the AI ecosystem for sure.
We had a great blog post about code mode sandboxing AI agents 100% faster introducing dynamic dynamic workers to allow execute AI generated code insecure lightweight isolates. That was a a big announcement that I feel that first it got a great attention but I feel that will be helpful for in many situations currently and in the future. Quite important as well. Yeah, we're like dropping different pieces of the puzzle right now and I think we're going to paint the full mural soon. Um, but it's so fun to see all these pieces kind of get launched to the public and then people are kind of catching on to the vision that we're having and excited to see that like come to life.
Makes perfect sense. Before you go, I have two quick questions for you, more general ones. One is the your favorite use case of AI. May that be workers AI specifically or other things, but your favorite use of AI currently. Okay, this this is kind of just top of mind for me. on the side. My friend and I actually run a fashion label and so I just recently introduced her to Poke as the the AI assistant there and she took a photo of a sample that the atilier was making and we were thinking about like creating like e-commerce like photos right uh from that and she just kind of gave it to to Poke the agent assistant was like can you try and make like a ghost version of this which is like you know transparent background it looks like there's a it's on a but it's like it's transparent for example um and it just got it perfectly in one shot And I think it was just so cool to see her, especially with someone I think who is not in the AI ecosystem, be like this is so magical, this is so unreal and and the fact that like it it just like I messaged her, right, and sent it to her immediately.
So I think like seeing that kind of application of AI in in non-traditional spaces like this is fashion, right? Um is really cool and I think it's really been like an aha moment for her and for I just to like witness that. So I think that was that's a really relevant uh AI experience in my head. That's a a great one. Another one specifically is about specifically workers AI. What's the one thing about workers AI that most people don't realize but they should? Oh, that's a good one. Yeah, I think that it's a very like little known secret that we actually do a lot of our traffic is for dedicated deployments.
So for customers that are they've trained a model and they want to host it on us and or um you know even for internal customers. So like teams at Cloudflare that need a specific model for their use case and we host it for them. Like the overwhelming majority of our traffic is actually from dedicated deployments and like serverless is a is a portion of it of course but I think when the public thinks of workers AI they think of the model catalog um and and the models we serve there which is really important and things that uh something we're dedicated to big engineering problems that we're working on are are usually like related to dedicated deployments.
So we've done a lot of really really good work there and I think people don't know that people I think it's we haven't advertised that too much but like we definitely do dedicated deployments it takes a lot of work uh it takes a lot of engineering to improve the platform for for that but it trickles down right as well it helps our serverless platform as well so I feel like our platform itself has just become a lot more well-rounded and and just more capable as well so yeah I think that's a very little known secret that the model catalog and those models are not the only thing we focus on and a lot of the work we do is for like dedicated deployments.
I have another one actually which is I I'll make it a double a double which is the coolest thing someone built with the workers AI or even a use case that you wish more developers tried. Yeah. Oh well recently at NVIDIA GTC two Cloudflare people demoed uh like self-driving car sort of it like selfarns and it uh it's like virtual but it's not like a real car but um they they use like workers AI models for that and I think that's super cool. I think there's a lot of like robotic and hardware applications of AI that I haven't seen just yet.
And I think like it also plays into workers AI's edge inference as well because you're probably not going to have like well maybe you might but like you probably don't want to have a GPU on your drone for example but you probably want your drone to hit the closest GPU there is and that's kind of where the edge inference plays plays out. So, I'm personally really excited about like seeing more like hardware applications and and like robotic applications of AI. I think that's super cool. And I also think it really plays well into the edge inference kind of sphere.
Makes perfect sense actually. I can see that. And the network as a computer in a sense being useful there as as well. Totally. This was great, Michelle. Thank you for doing it. And stay tuned. Geek out for what's coming. All right. Thank you so much. Hi everyone, I'm Dina Coslov. I'm the product manager of Cloudflare Workers and I'm so excited to be here today because we put dynamic workers in open beta, which means that anybody can go and use them. What are dynamic workers? Dynamic workers allow you to have a worker that can spin up and execute other workers at runtime.
But what this means in more normal terms is that you can essentially give it a snippet of code and it's going to run that code in a sandbox isolated environment is built on top of workers which means that you get millisecond millisecond startup much faster than a containerbased solution. You also automatically get sandboxing and isolation built in. It's also super lightweight and it's also so easy to use. This is literally the API. You sayload and you give it the snippet of code. The other amazing thing about it is that you can also control what this dynamic worker can do.
You can have it block network access. You can also intercept any fetch requests that it makes to the internet. You can also when you intercept it, inject a secret. So, for example, if you don't want the dynamic workers, you know, to have access to these secrets, you can actually um inject them here and allow them to make calls to an inference provider, for example. But I always think it's better to show than to tell. So here we have our dynamic workers playground which you can deploy yourself. We open sourced it and this allows you to deploy dynamic workers and this actually uses a new library that we put out called worker bundle which is going to take the code that you give it.
It's going to bundle it and it's going to put in the exact format that the dynamic worker expects. And as you can see 9 millisecond startup. And the amazing thing is that you can actually cache the worker so that all the subsequent requests are 0 milliseconds to load. um can also you know um have a hono starter example that you import here. You know this one's a bit bigger so startup time is going to be a bit you know 99 milliseconds but all subsequent requests are going to be super super fast and as you can tell you also have observability that's built in but yeah but how did dynamic workers come about?
It actually started with code mode. Um we found that LLMs they're not trained to make tool calls um but they are trained on writing code and so MCP came out. MCP is absolutely amazing. But in the beginning, it was really hard for companies to expose their API surface through tool calls because if you expose, you know, let's say a thousand tools, that's going to clog up the context window and the client's going to struggle and you're not going to get any results and, you know, instantly it's going to use up a ton of tokens. But what if instead you could have the tool call logic be executed in code?
So it's a lot more efficient. And so what we do is we for example the Cloudflare MCP server we actually only expose two tools on it. One which is search and one which is execute. But what those tool calls do is they make a call to dynamic worker environment where it executes a snippet of code that actually makes calls out to the API. And so we are able to expose the whole API surface of Cloudflare which is like a thousand plus APIs and it only takes up a thousand tokens which is absolutely nothing. So highly recommend that you use code mode.
And so actually as a part of today's announcement, we put out three new helper libraries that will help you build with dynamic workers. One of them is code mode, which will allow you to take any MCP server and upgrade it to use code mode. So you know, if you have an MCP server built on Cloudflare, this is a no-brainer. You should definitely do this. The other one is worker bundler. This is really great if you're building applications. If those have static assets, highly recommend bundling it whenever the code changes and then storing the cache result and serving that on subsequent request.
That is what I just showed you as a part of the playground. And the last one is a shell which is essentially allows you to give your agent a virtual file system inside of a dynamic worker. So super super cool. You can read about how our customers are using it. Zite is actually using this to give their users the ability to build custom automations across different providers. And you can actually get started today with two different examples that you can deploy in one click. We also have brand new documentation for dynamic workers. So go get started there.
And so excited to see what you build. Please, please share it with us. Yeah, have a great day. Hi, I'm Alexandra, but everyone call me Alex. I'm a senior executive assistant to our amazing co-founder Michelle Zatlin. I have been working at Cloudflare for over three years now and I'm based in San Francisco. What is one word that describe the woman at Cloudflare? Trailblazing. Why? Because the women at Cloudflare are just leading the way. They bring new ideas and they help move the industry forward. They are not just part of the change. They help create the change.
They lead with vision. They drive change and they shape the future of the company. but not only to the industry as well. What is a piece of advice or tip you can give to others just starting out in the industry or field? Be proactive and stay curious. Try to anticipate what's needed before it's asked and don't hesitate to ask question. Understanding the bigger picture pictures will make you much more effective. Don't be afraid to think outside the box. With AI and and all the constant changes in the industry, what's expected in this world keeps evolving and changing.
The more you are willing to learn, to adapt and anticipate needs. The more you will stand out and grow into a true partner. What I like most about my role, the people and the pace. What I enjoyed most is supporting my co-founder and helping make things run smoothly. There is never two days the same and I just love being a partner behind the scenes to anticipate needs and solving problems and making life easier for the people I support, but also for my co-workers.
More from Cloudflare
Get daily recaps from
Cloudflare
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.









