Building Replicate with a swarm of cloud agents with Andreas Jansson | Immerse Stockholm 2026

Cloudflare| 00:18:03|Jun 9, 2026

Chapters4

The talk outlines Replicate's harness for running AI models, reflects on the evolving landscape of agent and harness engineering, and explains the context of Replicate’s acquisition by Cloudflare and how it fits with other Cloudflare AI products.

Andreas Jansson breaks down a cloud-first harness for Replicate, showing how a swarm of cloud agents can automate end-to-end AI workflows inside a Google-Chat-like UI.

Summary

Andreas Jansson, speaking at Immerse Stockholm 2026, shares how Replicate’s team shifted into Cloudflare's AI stack and built a highly opinionated harness. He emphasizes moving beyond prompt engineering to harness and agent engineering, with a focus on tools, sandboxes, and durable agents. The demo centers on a Google-Chat-like interface rebuilt from scratch to run conversations, threads, and parallel agent workloads. Jansson explains the architecture: cloud agents, dynamic workers, D1 databases for memory, R2 storage, and a developer-centric platform to automate complex workflows—sometimes claiming that three engineers can replace thirty with a strong harness. The talk highlights practical features like preview environments, webhook-driven debugging, an auto-generated Markdown progress report, and autonomous maintenance of agent code and logs. Security is acknowledged as a hard problem, with Cloudflare access, dynamic API keys, and guardrails still needing reinforcement. The narrative also touches on collaboration across time zones, the idea of agents publishing to the internet, and the future potential for autonomous features under guardrails. The session concludes with reflections on whether this model will define future software development and how agents might evolve—from prompts to full autonomous deployments.

Key Takeaways

Replicate’s migration into Cloudflare’s AI gateway shows the value of consolidating three products into a single, automated harness.
An agent stack can include 60+ tools, including code editing, web browsing, PDF reading, and GitHub automation, all accessible within a single thread.
Memory is implemented as a real database (D1), not markdown files, enabling richer context management and knowledge bases.
Preview environments are essential for testing feature changes end-to-end before production, despite substantial setup work.
Security is acknowledged as a hard problem; API keys are injected dynamically and most tools require guardrails to prevent data leakage.
Claude integration via Cloudflare’s sandboxes and dynamic workers reduces manual setup work, enabling autonomous tool invocation.
The future of software development could move toward fully autonomous agents with human oversight for critical actions—depending on guardrails and risk modeling.

Who Is This For?

Developers and engineering leaders exploring AI agents and harnesses; ideal for teams considering cloud-native, end-to-end automation and the shift from prompt engineering to agent/ harness engineering.

Notable Quotes

"I am the I was the co-founder of Replicate that was acquired by Cloudflare in November last year."

—Origin of Replicate’s transition and the organizational backdrop for the harness discussion.

"We can probably do the same work of 30 engineers with three engineers if we have a really good harness."

—Central claim about efficiency gains from a strong harness.

"Agents are durable objects."

—Technical framing of how agents are modeled in the system.

"Security with agents is really hard."

—Acknowledgement of guardrails and risk management challenges.

"Claude can just do that automatically now."

—Announcement of Claude integration reducing manual work.

Questions This Video Answers

How does a cloud-first harness enable end-to-end AI workflows for Replicate?
What makes a 'durable object' approach better for multi-agent collaboration in production?
Can you implement autonomous feature work with guardrails and what are the current limits?

Cloudflare ReplicateAgent engineeringHarness engineeringD1 database Dynamic workers Claude integrationSentry webhooksPreview environments Durable objects

Full Transcript

Next session uh I am going to talk about building replicate with a swarm of cloud agents. There's a lot of buzzwords in there. Uh there's a lot of talk about harness engineering these days. I think I think one of the nice things about these kind of events is that I think we're all kind of exploring. We're doing kind of a breath first search of everything you can do with AI. we're all building our our own agents, our own harnesses, our own ways of working and just kind of meeting and seeing how other people do it and learning from each other is really interesting. So I wanted to just sort of dive into the harness that we have built to for replicate. Uh it's kind of opinionated and kind of weird uh and uh probably does a lot of things that you know could be done a lot better. Uh but at least it's kind of a snapshot of what one of these harnesses look like in that we have in in production now. Uh so I am uh let's see if this works. Uh I am the I was the co-founder of Replicate that was acquired by Cloudflare in November last year. Uh I'm based in Udavala on the west coast. Uh Replicate is a platform for running AI models. So we started six and a half years ago now uh before stable fusion before chatty. Uh we started with just open source models and then we added all of these proprietary models. Uh now we're more of a kind of a gateway products. We when we started we were more giving access to GPUs. Now people are less interested about deploying their own models to GPUs. They just want to run the the sort of the best possible models and they are usually hosted by someone else. uh but we we have all of them. Um and what happened when replicate got acquired was that uh we still want to keep replicate going but Cloudflare already had an AI gateway called AI gateway uh also had workers AI. So what do we do then like do we do we sort of invest in all three of these products and sort of duplicate the the work between us? Um that's a bad idea. Uh so what we did is we sort of lifted the replicate team into AI gateway. Uh we are starting to move the whole back end of replicate onto AI gateway. Uh that meant that basically the whole replicate team is now uh working mostly on AI gateway. So what do we do with replicates? Um we still have a product to maintain. We have lots of customers. Uh we let's just automate it. That that was the solution. And we can we can probably do the same work of 30 engineers with three engineers if we have a really good harness. That was the theory and it it like it sort of works. Uh so like harness is a very overloaded term. Um a harness is basically everything that is on top of the the raw LLM, right? So it's it's the system prompt, it's the tools, it's the sandboxes, it's the agent loop, uh it's how you manage your your secrets, authentication, all of that stuff. Uh, I like to think of it as we we've moved from prompt engineering where where you just craft the perfect prompt and do a single sort of run that prompt into harness engineering where we're now building tools and sandboxes and UIs and sub aent interactions and all of that stuff. And it feels like we're moving to more like agent engineering where we're not just building the tools and the and the prompts, but we're building like an entire stack for for the agent. Maybe even with a custom agent loop. You know, you've seen some of these products like um Pi has has a a different agent loop than Claude Code, for example. There's some subtle differences in in in how they actually like run the tools and that sort of thing. Uh but then we also get into like what if an agent has access to databases or what if an agent can publish files and it it's starting to look a lot more like general sort of software architecture than than specialized harness engineering. So the harness we have at at replicate is it looks like this. It looks exactly like Google chat. Um initially I wanted to build this on Google chat. I think that is the like the natural place to the for this. Um, I know a lot of companies are like using agents inside chat uh as co-workers. That's what I wanted to do. Turns out getting an API key to add an agent to Cloudflare's internal chat for very good reasons is extremely hard. So, it was actually faster to slop fork or vibe code a complete replacement pixel by pixel of Google chat and to get an API key. So, that's what I did. So I uh re-implemented Google chat took a week of of I coding maybe uh and and now we have a chat app where you can um you know that's me asking a question about some bug that is uh someone else on the team refreshing the featured models on the website. there's another bug. There's someone adding a new model to to replicate and it's all visible in this chat interface which means that everyone can see everything that's going on. Uh everyone can uh follow along in the conversations. Uh so this conversation for example uh Louise here he he says that the featured models are a bit stale. Uh the agent then gets the collection lists the models in collection uh runs a curl command against replicate downloads the web page and continues on and on and on for for a bunch of steps and then at the end of it the website has been refreshed. Um so yeah so each thread is a separate conversation. Each thread has its own sandbox. You can have infinite number of parallel threads. Um spaces have different agents. So specialized agents for for different spaces. Agents are durable objects. Um each agent has a specialized harness. I think this is a kind of controversial point um that that I can talk more about later maybe. Um cloud agents versus local ads. So the these are agents that are running in the cloud. They're always running. You can prompt them on your phone. You can close your laptop. Uh all work is visible and it has this like shift work element of it. So I work from Sweden. A lot of my team work in the US. They can start on a feature, they go to bed, I pick up the same thread and I continue. So it's like it brings us out of this sort of silo where we're sitting in front of our own claw codes into more of a collaborative like pair programming environment. It has a lot of tools. Uh I think this the like the biggest agent here has like 60 tools. So code editing sandbox, bunch of internal tools, tools for browsing the internet, searching the web, using playright to actually like click around on web pages, publishing files, uh reading PDFs, images, list reading other threads. It has memory, it has u databases. this I think that's an interesting point. Uh tools for reading and writing to GitHub, opening PRs, kicking off workflows, uh observability, it actually knows what's happening on the replicate platform as well. Uh so what I did here is I for the process of adding a new model for example, I wrote up like what are all of the individual steps that I take when I add a new model. uh it turns out to be like 40 steps and then I map that to tools and I create internal API endpoints on replicate that the agent can call and then I document in those tool descriptions exactly what each step does and with that the agent can then um go and sort of do the the entire workflow end to end. Uh some people say that you just you know shell is all you need. This is like meme. Uh at the moment I think individual tools are powerful because these workflows are algorithms and I think algorithms are better described in code than in markdown pros. It's my opinion but uh that's that's uh kind of worked for us. Um, so bunch of random ideas just sort of throwing out that that might be interesting to to your agents as well. Um, our agents have access to databases. So our memory is a database. It's not just like markdown files. It's an actual D1 database. We have knowledge bases, uh, a database of model providers, uh, different workflows. That's all all in D1 databases. Agents can schedule messages. So you can say, "Hey agent, every morning give me a list of all of the new models that have come out." And if you find any models that are interesting in replicate, add them to replicate. Th this is like this is probably my favorite feature. Um we have web hook support. So we pipe sentry into the the chat. So here uh there's an error with search and replicates. We have an agent called searchy. This error cannot read properties of null blah blah blah. So then searchy goes and and investigates the error, spins up a preview envir fixes the error, spins up a preview environment where it can test that the error has been fixed and then opens a PR and all I have to do at the end is go and approve the PR and then it's shipped. So that sort of I still leave myself in the loop a little bit because I don't trust the agent to fully, you know, do do the right thing. But all I have to do is just review the PR and approve it. Um, preview environments are are really important for extending the horizon that the agent can work in. Um, so it takes a lot of work to to make it possible for the agent to spin up a preview environment. There's a lot of manual work involving that, but I think it's worth it because it means that the agent can implement a feature end to end and then test it in a production-like environment. Um and this reporting feature is really useful as well. The agent can write markdown reports that get a static URL that I can share with other people on the team. Uh and the agents are kind of self-aware. So each agent has access to its own code in the sandbox. So if if there's a bug with the agent, it can query its own logs, read its own code, and fix itself and submit a pull request to its own repository. Security with agents is really hard. Um so what we have everything is behind cloud for access. Uh the API keys sit outside the tools. So they're injected dynamically into the tools. So the agent itself doesn't read the tools the the API keys. Um a few tools need human approval but most of them can just can just kind of work. You know you don't have to sit and click approve on everything. It's just when when a model needs to get published then I have to hit approve. But there's still lots to do here like with context pollution I think is the biggest issue. Uh what if the agent gets downloads some private information and then it has a tool that can publish information to the internet. There's no currently there's no sort of um enforced guardrails in my agent that stops it publishing that to the public internet. So that's something that needs needs some work. Um but yeah so doing this is just it made it clear to me that for to build real sort of these capable agents you do need a full developer platform. Uh so this is like the cloud for pitch here. Um for the agents we have agents SDK which has this project think that's like a high level uh agent API that makes it really easy to to build these types of agents. sandboxes. Both we have, you know, full Docker containers, but also these dynamic workers that are uh JavaScript containers that that are super fast to spin up. Use D1 for databases, R2 for file storage, uh file file server that's runs as a worker, durable object alarms for the scheduled tasks, um the UI is a work and then we access in front of it. So, it's sort of we're using a lot of parts of the stack to build this. And I think I think that this is sort of coming back to the original point that agent engineering is you know software engineering and it needs it needs a lot of of of parts of the stack. Uh and now we yesterday we we partnered with claude. So now you can use uh cloudfire's sandboxes and dynamic workers to invoke the tools with claude's managed agents. I think this is a really interesting paradigm that makes takes a away a lot of the manual work that I had to do to to build this agent. Claude can just do that automatically now. Um, okay. That's that's it. Yeah. Thank you. Any questions? Yeah. So, really cool, man. As somebody who implemented XMPPP in a durable object, I have I give you lots of credit for it. It wasn't me. It was all agent. It's all vibe coded. It's just But inside the durable object, you've got access to object storage. Do you use that much? Uh yeah, the conversation history is stored by the AD SDK uses that to store the entire conversation history. Yeah. And then inside that, are you doing commits on each one of those messages? Uh good question. Just thinking about how you were managing roll backs because sometimes when my agents go off, they really go off and you can have like 30 agents having a party. And then it's like, okay, we need to roll this back, guys. So So there's there's no roll back built into this system. You can edit messages and resubmit and then it it uh truncates the history and starts from that point. But the roll backs of the code that all happens in GitHub. So, so GitHub is is the source of truth for for you know rolling back to previous states. Yeah. Because what I'm trying to mitigate is token efficiency. Ah because once you start having conversations with them to unpick stuff you're just burning tokens and that's not useful. It's so much better to go back to to mileston. Oh, I see what you're saying. Yeah. Yeah. Yeah. Well, the the Claude's cache it's only it's not that long, right? Like is it like 5 10 minutes or something? Well, your context render depends on how how many tokens you're pushing. Yeah. Yeah. Yeah. But if you if you en enable the cache, then you get like it's just the incremental stuff that you're paying for, but then rolling back to previous state from half an hour ago, I think you then you you're still paying for the entire context up to that point, right? But particularly if you've got multiple agents working, they're going to have a conversation with each other burning tokens to figure out what you want rolled back. So that's why if you've got a threaded context with like multiple part multiple agents as parties, it's super useful. I just seeing it as you done it on screen, it occurs to me that you could roll the conversation. Yeah. Yeah. Yeah. Ah, yeah. It's a really interesting interesting idea. Yeah. Thank you. Thanks. Anyone else? Cool. Yeah. Let me drop you a mic because uh it's for recording as well. Super cool. Uh do you think this is how like software development is going to be in the future like all in a slack thread like that is and do you see any like you you probably experienced life before that right when you were building replicate and you did another type of software development I guess. What do you compare those two and like do you see this as the model for the future? I I say like future like as defined for like the next you know two months or something. I think this is the model. Uh what's his name? Michael the the founder of cursor had a really good Twitter thread uh a month ago or something when he talked about the evolution from just prompting to like cursor's model like you know the the autocomplete and then going into the sort of the clawed code kind of model and then moving to sort of chat ops you know like the these kind of agents that sit in in Slack and that's how they do a lot of their development internally at curser but I mean this is not the end state right like there's something's going to come after this I don't know what it is. Maybe it is agents that are that have those guardrails that can be fully autonomous that you can actually remove the human from from the loop for you know some types of of features. Maybe there's like a I was talking to someone who had a company who who compared this I think it was like Fiat who used to have a um a pipeline like in the car factory where some of the pipelines had a lot of humans looking at each of the parts you know these are the critical parts of the car some of them just kind of let them go right if it's just like a little detail that that doesn't really critically affect the car how can we build that for agents are there pipelines that you can just allow the agent to completely autonomously push features and fix bugs in the middle of the night when you're asleep and how do you know which features actually need human approval and that sort of thing. It feels like at some point, you know, the agents are going to be completely autonomous. Whether that's in, you know, 10 months or 10 years, I don't know, but it's it feels like that is kind of the direction. Thank you. Any last one? Cool. Uh thank you so much in the rest and then we Thank you. Thank you.