I Open-Sourced My Own AFK Software Factory

Matt Pocock| 00:11:25|Apr 30, 2026

Chapters9

Explains why AFK agents require sandboxing to avoid permissions prompts and potential security risks.

Matt Pocock shows how to run AFK AI coding agents with Sand Castle, Docker sandboxes, and GitHub issues to choreograph parallel workflows in a fully reproducible TS setup.

Summary

Matt Pocock unveils Sand Castle, a TypeScript library that orchestrates AI coding agents inside isolated sandboxes so you can run tasks completely AFK. He walks through setting up a repo, installing sand Castle, and choosing Docker as the sandbox provider, plus a backlog managed by GitHub issues labeled for SCastle. The tutorial covers building a parallel planner, implementing changes via agents, and then having a reviewer and merger agent auto-handle code quality and merges. You’ll see how the planner analyzes open issues, how the implementer drafts code inside the sandbox, and how the reviewer flags improvements before the merger commits to main. Pocock demonstrates practical knobs like using Anthropic and GitHub tokens, and outlines how to run the whole flow through an npm script and npx commands. By the end, the setup yields a mini software factory that accelerates velocity while keeping everything sandboxed and auditable. If you’re curious about scalable, engineer-friendly AI automation, this hands-on walkthrough is a solid blueprint.

Key Takeaways

Running AFK AI agents requires sandboxing to prevent permissions prompts and potential system risks; Sand Castle provides a TypeScript interface to run prompts inside isolation containers.
Sand Castle can be initialized with npx sand castle init, selecting an agent (e.g., claw code) and a sandbox provider (Docker in the demo).
A backlog is driven by GitHub issues filtered by a specific label (S castle) so agents know what to work on next.
The workflow uses a planner → implementer → reviewer → merger pattern, enabling parallel work and automated merges while preserving code quality.

Who Is This For?

Engineers and AI practitioners who want a hands-on blueprint to run AI coding agents AFK inside sandboxed environments, with reproducible DevOps-like workflows and GitHub-backed task management.

Notable Quotes

"Run this prompt inside this sandbox using this agent."

—Shows the core Sand Castle usage pattern: running a prompt in a sandbox with a chosen agent.

"Sand Castle is a TypeScript library for orchestrating AI coding agents in isolated sandboxes."

—Defines the library’s purpose and how it fits into the workflow.

"The issues will be filtered by this label."

—Explains backlog management via GitHub labels to scope work for the agents.

"This pattern has been incredibly powerful because the implementer can make mistakes, but the reviewer generally picks it up."

—Describes the robustness of the planner-implementer-reviewer-merger loop.

"Sand Castle runs inside this Docker container."

—Gives a concrete detail about the sandbox hosting environment.

Questions This Video Answers

How can I run AI coding tasks AFK inside Docker sandboxes with TypeScript?
What is Sand Castle and how do I integrate it into an existing repo?
How do planner, implementer, reviewer, and merger agents coordinate in practice?
What are the trade-offs of using GitHub Issues as a backlog for AI agents?
Can Anthropic API keys be safely used in sandboxed AI workflows like this?

Sand CastleAI agentsDocker SandboxGitHub Issues TypeScript Claw Code AnthropicCommanderTSXSCastle workflow

Full Transcript

One of my goals for the last 6 months have been trying to get my agents, my coding agents to run totally AFK. These AFK agents have been picking up backlog tasks, have been implementing features for me, have been doing QA, and crucially, they have been running in parallel. So, I've had lots of them running at the same time. However, in order to get them to run properly, you need to handle the permissions requests that they make. And a question that you probably have right now is how do I get my agent to run without it constantly battering me with with requests for permissions? Now, of course, you could just go into YOLO mode and have it totally bypass any permissions requests. But if you do that, Claude will do mad things on your system like delete your home directory. Or if you're in an enterprise setup, then it might be concerns about, you know, it exfiltrating data or sending your code off to a random third party. So, in order to get agents to run properly AFK, you need them to be sandboxed, and there are a bunch of solutions for this. However, I was not particularly happy with any of them. The one I really tried to use and try to make work was Docker Sandboxes. However, there were just so many problems with running it AFK that I won't bore you with now. What I wanted was a simple TypeScript function that I could run and just say, "Run this prompt inside this sandbox using this agent." And all the tools that I found were trying to sell me some third-party service. So I realized I needed to build something here. And that thing is sand castle, a TypeScript library for orchestrating AI coding agents in isolated sandboxes. You can use this to build TypeScript scripts here where you simply say run passing in the agent, passing in the sandbox, and passing in the prompt. If you look in any one of my open source repos, you'll see this little sand castle or dots castle directory here, which has a main.ts ts file. And this is full of these sand castle.run little functions here. With this simple function, you can build really, really complex systems. You can build systems that parallelize agents running side by side. You can have systems that review their own code and then merge it in. I've been really, really enjoying using this and now I think it's time to make a video on it. Let me show you how to get this set up inside a repo. We first run npm install AI hero sand castle. Once that's done, we can run npx sand castle init. And you'll first be asked to select an agent. Let's select claw code. Why not? You can then select between one of the first class sandbox providers that we provide. My plan in the future is to add many, many more of these, but and you can also implement your own if you like. For now, let's just choose Docker. Sand Castle also uses a backlog manager because AFK agents need some way of picking up tickets and knowing what to do next. My preferred way of doing this is GitHub issues. We also ship with five templates here. currently. I mean, there may be many more by the time you run this. Let's actually max out here. Let's go for a parallel planner with a review step. And since we've chosen GitHub issues, we're going to create a S castle GitHub label. The issues will be filtered by this label. And it means that only things with the S castle label on our GitHub issue list will be picked up by the agent. We can see at this stage that a bunch of stuff has been thrown into a Sanccastle directory just up here. The thing to know about now is this Docker file here, which is essentially the Docker container or the instructions for setting up the Docker container that we're going to be using. Sand Castle runs inside this Docker container. And it means that you can just install anything you like inside here. We're installing some important system dependencies. We're installing the GitHub CLI. We're doing a little bit of setup to rename the um home directory to agent. We're installing claw code. And then we're just ready to go. So, let's go ahead and build this default Docker image. Now that was really fast and it has now completed. Our next steps are we need to set the required environment variables in dots castle/.n. If we have a look in castle where it is castle/in.example we can see that we have an anthropic API key and a GitHub token required. If you want to use your clawed subscription instead of an API key then you can head to this issue here that will tell you more about it. If you don't know, Anthropic is a little bit funny about people using their subscription for these kind of things. And so, there's some up-to-date advice there. For me, I'm going to copy over some environment variables that I've had already. Once that's done, I'm going to go into my source control. I'm going to commit this code and I'm going to push it up because I'm going to show you how we can use uh GitHub issues to schedule some work for this agent that we've created. So, let's go to our repo and create a new issue. Let's say scaffold me a basic TypeScript template in the repo. give me a basic typ script application that uses vtest that uses type checking that has a very very simple CLI that I can call use commander for the CLI add a CI script that does type checking and runs the tests. So now I'm going to create that issue and we can now run our agent to see what happens. So after that it should be ready to be picked up. First I'm going to add this little piece of code to my package.json here which is just going to allow me to run a script here. So let's say scripts and then add this sand castle script here. This is just going to run npxtsx and tsx is just a way that you can run typescript as a script and it's going to run this files castle/main.mm. So let's actually go ahead and run this and see what happens. We can see immediately that it's kicked off a planner agent here and we can controlclick these logs to see what it's up to. We can see that it's successfully set up the sandbox. It's the planner agent running on Docker and it's looking at the open issues here and it sees that there's only one open issue. It then spits out this plan here which is a set of issues which are going to be worked on. Finally, at the bottom here, it shows the amount of context window that it used. If we zoom back to our terminal here, we can see that an implement agent was kicked off too. Let's control-click these logs and take a look at them. And we can see that it called GitHub issue view 1. It has a clear picture and it asked for a basic TypeScript script app vest for testing type checking simple CLI using commander. Great. We can see that it's running bash commands inside here. It's uh doing okay good dependencies installed and I've even got it prompted. So it's doing a little bit of red green refactor here where it's writing the test first vest run etc. We can see it all happening. It's now moved on a little bit further and we can sit and watch this if we want to or you know we can go and have a cup of tea. we can relax and uh this will just do its work without us. So while this is running, why don't we go and have a look at the maints file here. We can see the planner that we saw earlier is just down here where we have a sand castle.run command that takes in a name of planner. It takes in an agent here. So we can just change this if we want to. If we want to do planning with codeex, let's say instead of claude code, we totally can. And it's also using this prompt file here. So plan prompt in here. This is scaffolded by the template and you can totally edit this as much as you want to to run anything inside a sandbox. This one is taking all of the open issues from the repo that have the label sand castle. It's grabbing all of the labels, all the comments, grabbing all of the comments body as well. And then it's working out which ones can be done right now. So it's only looking for unblocked issues here. And finally, we tell it to output its plan in a JSON object wrapped in plan tags. If we go back to maints, we can see that this then gets picked up here. We then grab the JSON out of the plan here and figure out the issues. And then for each of the issues, we run a a separate sandbox here. We run an implementer. And this one has an implement prompt that's just inside here. So implement prompt. This one takes in some prompt arguments here. So it takes in an issue title. It takes in the task ID, which is the issue ID. Then it says you're going to be working on a specific branch. Again, all of this is just a setup that I cooked up. Really, this is not sand castle giving you any kind of prescription on how you want to run it. This is just a really cool workflow that I tend to use in my repos. So, I figured it belonged in a template. If we zoom back to main.MTS, we can see that the result here is captured in a variable. And if there are more than one commits here, we then run a reviewer. This pattern has been incredibly powerful because the implementer can make mistakes, but the reviewer generally picks it up. And of course, if you want to do an adversarial review where you have one agent run or review another agent's code, then you can just do sandcastle.codeex. If you want to have multiple different agents spawn at the same time, come up with an implementation, and then some other reviewer takes all of those branches, chooses the best one, or makes a like a mix of them, you can. That's the power of having a totally agnostic setup to what agent you're running. That's the power of using your or owning your own process. Anyway, let's take a look at the review prompt here. It's worth noting this little syntax here because this is really nice. This is something I copied from uh Claude skills where if you specify an exclamation mark before a bunch of back ticks here, it will run this when it's resolving the prompt. And so it will actually execute git diff source branch branch here. This review prompt just uses a very basic process. understands the change, analyze it for improvements, check correctness, maintain balance, and crucially, it's a great step for like adding your own project standards. So, for instance, I've added this coding standards in here that you can fill in with any project standards that you want to be added. Let's look back at main.MTS and we can see what happens after all of these branches get created. We can see that they then get passed into a merger agent down the bottom. And this one takes all of the branches, takes all of the resulting issues, so it understands the changes that were made, and then merges them back to the main branch. The reason we use an agent for this is that there might be merge conflicts between them. And I usually like to have a really powerful agent handling those merge conflicts for me because they can sometimes be pretty gnarly. And so at the end of this, we have had multiple agents running at the same time, all committing to their branches, and then we get like a senior merger developer to pull them back into main. Just this setup has massively increased my velocity and it works super duper well. And again, Sand Castle is not opinionated here. If you wanted to make these into PR branches, you totally could. Okay, let's go and check in with our running process. And let's see what happened. All right, we can see that we had an implement kickoff here. Then a reviewer. Let's check the logs for the reviewer. We can see that it found that the code was already clean and well structured. Minimal scaffold template naming is clear. And then let's see what happened in the merge. So we can just pull up the merger here. And the merger ran the type checks, it merged in the branch and it also closed the issue with a comment. Beautiful. We can see too that if we go and have a look at the rest of our codebase here. Wo, we now have a bit more code going on. We have a tsconfig.json. We have a vtest doconfig.ts. And we have a few files knocking about inside the CLI here. So you can start to see how SN castle is working here. You can build these relatively complicated flows using a simple primitive using really nice ergonomic markdown prompts. You can get it to run on different branches and just merge that back into main. Or you can get it to do really nice PR flows as well. You know, it's just code. It is a programmatic way to run Claude code, to run codecs, and to build these workflows that turn into these mini software factories. I've been incredibly happy with it, and I'm really excited to see what you built with it, too. If you're thinking about these hard problems, too, then you should check out my newsletter for AI skills for real engineers. These follow the skills repo that went absolutely viral a few days ago. And I also post tips and tricks there for getting the most out of agents using good old software fundamentals. So, thanks for watching folks. I'm really excited about this tool. I think it's going to be a really nice contribution to the ecosystem and I've been loving using it. So, nice work and I'll see you in the next