AST Parsing with AI: Better UX from Complex Code | Cloudflare Engineering Meetup Lisbon

Cloudflare Developers| 00:24:09|Apr 17, 2026
Chapters11
Introduction to the comeback of engineering meetups and the focus on AI agents during the agents week at Cloudflare.

Cloudflare engineers reveal how AI-assisted parsing (VitA mix) unlocks UX gains by visualizing and validating complex workflows in the dashboard.

Summary

Huven kicks off the Cloudflare Lisbon meetup with a candid note about reigniting in-person engineering sessions during Agents Week. Andreas Jalau then dives into AS parsing with AI, detailing how Vitamix—the team’s parser—transforms messy workflow graphs into user-friendly diagrams. The talk explains why a type-safe dashboard experience matters, contrasting it with the more guesswork-heavy REST/API usage. Selo’s initial, imperfect attempts gave way to a Rust-based Vitamix worker using OxideSC to parse the AS (A S D) graph, with a path from deployment to visual output that could surface actionable UX improvements. The engineers describe a pragmatic, test-driven approach: failing tests guide fixes, and reviews (including human common sense) prevent LLM drift and code duplication. They showcase how the resulting diagrams enable autocomplete-like API parameter hints, detect missing awaits, and surface deterministic behavior issues for better developer feedback. Finally, the team emphasizes that even eye candy UX gains—like real-time diagrams and surfaced metrics—are valuable because they scale with the product and reduce support friction. Opus and LLMs are presented as powerful tools when combined with disciplined review loops, tests, and a shared desire to keep the codebase maintainable. The talk closes with a call to explore incremental UX improvements that come from automatic parsing, not just pretty visuals, and invites questions from the audience about tooling and workflow choices.

Key Takeaways

  • VitA mix (Vitamix) enables fast, scalable parsing of complex workflow graphs from deployed workflows into visual diagrams.
  • Rust-based Vitamix with OxysC provides robust parsing for AS and reduces the brittleness seen with slower, JS-heavy containers.
  • Failing tests and a strict review loop (including human common sense) keep LLMS-generated code maintainable and correct across many node types.
  • Diagrams unlock practical UX benefits: API parameter autocompletion, detection of missing awaits, and highlighting of deterministic issues (e.g., unintended retries).
  • The approach scales through CI/CD, with PR-based iteration and automation that rebuilds diagrams when deployments occur.
  • LLMs are powerful but require governance: tests, explanations of fixes, and cross-checking to avoid code duplication and drift.
  • Operational metrics surfaced from parsed diagrams allow teams to monitor and enforce workflow health at scale.

Who Is This For?

Engineers and product designers building AI-assisted developer tools or complex workflow systems will gain concrete ideas for turning parsing results into tangible UX improvements through diagrams, autocompletion, and governance practices.

Notable Quotes

"And so we had the idea of having the diagram on the dash for your workflow execution."
Intro of the UX goal: visualize workflows directly in the dashboard.
"We can parse every and all workflow that gets deployed from all users for a while."
Describes the持续 parsing pilot that fed the diagram generator.
"Opus is to understand what opus did. And then you get into one of those loops where opus is just doing nothing."
Illustrates the debugging loop between parsing, tooling, and LLMS.
"The best way to do it is just parse every and all workflow that gets deployed from all users for a while."
Reiterates the initial continuous-parsing approach.
"If you have a million instances a week... one of those steps always retries."
Shows how diagrams surface runtime patterns for UX and reliability.

Questions This Video Answers

  • How does Vitamix parse Cloudflare workflows and generate diagrams in real time?
  • What role do tests and human reviews play when using LLMs to build tooling for engineers?
  • Why did Cloudflare choose Rust and OxysC for the AS parsing pipeline over a JS/TS approach?
  • How can parsed workflow diagrams improve API parameter autocompletion in Cloudflare dashboards?
  • What are practical ways to enforce determinism and detect unnecessary retries in AI-assisted workflows?
Cloudflare DevelopersAS parsing with AIVitamix parserOxysC Rust toolWorkflow diagramsLLM governance in engineeringCI/CD with AI toolingDeterminism and retries in workflows
Full Transcript
All right, we've got a full house today. First and foremost, welcome everybody. My name is Huven. I'm a director of engineering here at Cloudflare. And I'm super excited today for a couple of reasons. One is because this one, this engineering meetup marks the comeback of our engineering meetups. We haven't had one in like 4 months or so, I think. And the second reason why I'm super excited is because this meetup is actually happening in something called the agents week here at Cloudflare, which is a very important week for us here. It's a week that marks a lot of product launches, a lot of innovation, and something that we establishes Cloudflare has the place to build, specifically AI agents, but we have a bunch of other offerings. We're going we're going to have two talks today uh with our Andreas over there. First is going to be Andreas Jalau. He's going to talk about um as parsing with AI and we're also going to have Andre Jazush which is going to talk about and I know everyone is very interested in in in that talk but essentially agentic readiness for um websites. Please help me in welcoming the first speaker Andre Schllo. Please round of applause for him. White is yours. Okay, so first and foremost, welcome. We are going to talk today about ASD parsing and how we unlocked a better UX for our users just by doing something that no one wants to do, which is ASD parsing. It's just the worst thing that you'll ever do as a software engineer. But let's first talk about workflows. I'm an an engineer on workflows. Workflows are uh durable primitive so that you can write your code with built built-in retries, built-in uh timeouts. And so it's very easy to define an app that doesn't crash or that is very resistant to crashes. It's very easy to to build something where even if your your fetches your API calls fail a lot and are rate limit rate limited a lot you can just write your code you are expecting it to fail and it will eventually work and so this is a normal workflow a very small one where we fetch an image we do something to it it's not very fun but what's very fun about workflows is that we can do stuff like wait for web hooks say that If you want to execute, you are an e-commerce site and you are waiting for the payment to process and then a web hook comes in and you keep carry you carry on with the execution of your code. And if you look closely, you'll see that here we have a type safe API or now when you are using our bindings. And then below we have a boring REST API which most of you won't call because you'll use our SDKs for to consume the API or use our bindings. But sometimes we have we cloudflare have to use our APIs on the dash and it's not that fun when you have say create create an instance and pass a bunch of params and you have to do it in a non-type safe way because you might not remember what your workflow takes it as params. And so we'll get how we'll get to how we fix that or how we can fix it. And first we I'm just going to introduce all of the primitives that we have on a workflow. We have step.2 which is our wrapper to which just we pass a callback. It has retries. It has uh built-in timeouts. You have exponential backoffs. You have constant backoffs. Anything that you would want. You have slips. Slips let you sleep your workflow for say five hours, 10 hours, seven days, up to a year. We also have sleep until. So, say that you want to run something and then wait for the next 9:00 a.m., you just do sleep until 9:00 a.m. and it will work. We also have the wait for events that I've talked about where you wait for a web hook. And so, how can we give users a better experience when using our dashboard? If we are using our bindings, it's easy. It's type safe by default. If you are using the dashboard, it's a guessing game. You have to try and understand what's happening. You don't remember anymore what the params are. Maybe you have 50 workflows. You have 100 workflows. Some people on my team and myself included, we have more than 20 workflows on our accounts. And so sometimes you don't remember anymore. And so I for a while I thought about how could we give users and ourselves uh a better experience? And well, we had no answer for a while. We we could parse the SD, but no one wants to do that. I personally didn't want to do it for a while. And so we just left the ID on the back burner for a few months maybe. And Selo will correct me later if I if I lie, but I think Selso asked me for it on the first like six months ago and we just let it stew and I was like, "No, we can't do it." And to be honest, I lied and I didn't want to lie, but I feel like I was forced to cuz we didn't have time for it. And so when Christmas came and we had more time to ship new ideas and try new things, I started thinking about all of the ideas that we had on the back burner. And one of them was parsing the SD to build what Selso wanted at the time. And I mean we selfishly I selfishly didn't want to build it because it was very bothersome to build. And so we had the idea of having the diagram on the dash for your workflow execution. And so maybe I can try it try to build it really quickly while we are on Christmas break. But you'll see that you have all of these node types. No one wants to parse all of them. No one wants to look at all of them and see how they map to another node type of our own. And so how could we build this fast just during Christmas? No one wants to parse them. I didn't want to. But there was one thing that I really wanted to try near the beginning of beginning of December. Opus. All of you know about Opus. All of you know about GPT5.4. But and maybe with this we could parse them quickly. we could parse them easily or we could at least try to and so I defined our nodes and you have you have here all of the nodes on workflows and I'll start on the right because you know about the steps now and all of those make sense and then we have our control flow nodes like ifs switch cases whatever else is there that I can't read anymore loops and so all of those will are very helpful to show to the user. You have a loop that runs these two steps. And then I started thinking a lot like we probably want to see how much uh concurrence concurrency you have between steps because you might end up with uh might end up overming your uh overwhelming the memory on your workflow instance. And so it's helpful to understand how many things you are running in parallel or concurrently. And then we also have the start of your workflow which is our entry point to the graph. And you have function definitions and function calls because if a user wraps their steps inside of a function, it's they probably know the function and not the steps anymore. And so it makes a lot of sense to show the function call instead of just pass showing the the steps. And so this was my P, my two-day P. On the first day, I had the API working barely. And on the second day, I had a very awful looking diagram that I cursed my colleague's eyes with for a while. But let's say that we want to make it look better. Well, the best way to do it is just parse every and all workflow that gets deployed from all users for a while. and I will get a bunch of emails with sentry errors every time something fails. And so I'm going to give you an overview of our control plane, the workflows control plane, and how we hooked the workflow generation uh the diagram generation into our process so that we could test everything that users were doing forever. And now I have this problem that I have to fix all of the a bugs that we find. But it's not that bad as you'll see in a bit. So when a workflow gets deployed, when the user does Wrangler deploy or uses the API to deploy a workflow, that request it hits our worker that resolves the API and then the script gets sent to our account controller which is a DO that manages an accounts resources or um an accounts workflows resources. Then we have our S chefs which are our control plane charts. And then we have engines that are one for every instance of a workflow. And so anytime you spin up an instance will spin up one of these engines. They will register themselves with the sushef and everything works. And I just glossed through a bunch of very hard stuff. But we don't we are not here for that today. We are here for the little line that connects to Vitamix. Vitamix is the name of our parser. For those of you who don't know, a Vitamix is a blender, like a kitchen grade pro blender. And we have a bunch of puns on our control plane about kitchens, as you can see, the sushef, we have the Vitamix, we have some other things that we are working on that I can't tell you about today. And so we send all of our requests that come to config service our our worker. We had we had we put into a queue the script ID and the account ID and then we sent uh our Vitamix worker consumes that queue and then does whatever it needs to do. We didn't start with this. This is the final version. This is the good version that's very fast. We first had to have we had a container and containers are very prone to cold starts especially when you are iterating very fast and you want a very quick feedback loop and so we had the container I think for a week I rewrote or rewrote partially so that we could have a a rust worker which work great and are fantastic and they're super fast. We have all of the nicities of having Rust. And you might be wondering why Rust if workers are mostly TypeScript or most workers that you see are TypeScript, why why are we building in Rust? The library that we wanted to use to parse the SD is called OxysC. It's the oxidized compiler. It's a Rust builds um tooling system for parsing AS and so we had no option is that we could only like convert the library to WASOM or we could just write a rust worker that is also but it's more manageable because you have everything in one place and so we ended up with that every parsed workflow would get sent to either R2 if it's if it parsed correctly or it would get sent to Sentry if it fet And so we had this running I think for a month. So the whole month of December and part of January where every workflow deploy would get sent to us. We would parse it. No one knew that we were working on this besides the engineers on the team. And after and after a while the code base got very gnarly because we were just hammering new rules. And so LLMs are very good at some things. They are very very bad at some other things. They won't like to they don't like to write simple code. They will rewrite the same code a bunch. Let's say that you have a function that parses an array. they will rewrite the same function how many times they have to call it. And so we had a big problem here where the code was not maintainable. We couldn't debug it. We couldn't I mean we couldn't even understand it because it was just so messed up and hardly that we couldn't parse it visually with our eyes. We had to use opus to understand what opus did. And then you get into one of those loops where opus is just doing nothing. And so I did a rewrite to keep the code base slim. And now we will talk about how we reign open sin. And so I have like my three rules which are link rules, review loops, and common sense. Common sense and reviews are not the same. And we'll look at it right now. So what's my process for solving a bug after I defined all of the notes? My process is first we need to write or opus needs to write a failing test and if it doesn't write a failing test we won't carry on. We want to do it we'll I create a new session and we will keep iterating there until we have a failing test. After we have a failing test we have Opus explained to us what the bug bug was. Maybe maybe it's something that we don't want to solve because a user did something that's specific to them and that no one else will do and they are requesting something that's just for them and so you won't fix it. You won't even treat it as a bug. And then after the bug is found, if we agree that it's a bug and that it needs to get fixed, then we ask it to explain what the fix is. And explaining is just writing the codes in a message before actually trying to write it in the codes because if it writes it once it will keep reading it and try to iterate on it instead of actually rewriting it in a different way. And then the last step after the fix is approved is what I call the review loop which is I have a a sub aent a skill wherever it doesn't matter. We just have the agent think about the fix that it did and search for how it did it. So let's say that we have a step node and the step can be created from many places. You will look for every place that the step is found is is where the step is created or the node is created and you'll see if you have duplicated codes. If the code is the same then we extract it. We don't want to have that code that's repeated everywhere and then you you find a bug and you have to fix the bug everywhere and that's where the LLM will will forget something. You won't see it because you didn't write the code. And so it's it's a process and then you have the human reviews which is what I call the common sense. And as you can see, the common sense was in every one of these steps where we forced the LLM to do what we wanted by saying you first have to do the text the test. We have to agree with the test. We have to agree with the fix. We have to agree that it's a bug. And that's where we as engineers still need to be very hooked into it. And so with this for the month that we were running this, we we had our friends from the design engineering team builds this beautiful UI for diagrams for the for the graph that we outputed via the API. And every fix that we had to do was like 15 minutes worth of work. It's just like this account has a bug on this script. Please tell me what happened. And then we just keep iterating on it for a while until we have the proper fix. Um, you might be wondering, I talked about UX at the start and a diagram is not necessarily UX. It's eye candy. Sure, it helps with understanding what the code flow is supposed to do, but sometimes that's not enough. You that's not really UX. That's uh a little bit of something that you can give to a user so that they get a little bit of joy when they open their their dashboard. But if you have a diagram and if you have parsed the a you can on the dashboard on cloudflare's dashboard we can give you the types for those create instances that we saw at the start an API call that that has somewhat of a autocomplete. That's that's not something that happens regularly and we can do that because we parse the SD. We can also if we parse the SD find uh incorre incorrections where the user let's say that the user forgot to await a step and that will affect how their workflow executes because they are not assuming that they have done something wrong and we can show them on show to them on the diagram that you have missed something. are you aware? And after the user says to us that they are aware, everything is fine. Otherwise, they will know that they have a bug. Other things that might get weird and that are small little UX gains are say that you have a million instances a week or 5 million instances a week. We have some customers that those volumes and their workflow is complete. They don't error, but maybe one of the steps retries every time at least two times. And that's not something that you would expect. Like if you write the code, the retries are there to catch your mistakes or to catch mistakes. They are not there to to be part of the solution. They are they are a fix for a problem. And so we can tell tell to a user in the diagram in one of those beautiful diagrams can say this step always retries. I don't know if that's we we don't need to say anymore. The user will look at it and be like oh okay we have missed something. I didn't expect this to fail this many times. And so users will get a better understanding of their system. We also get a better understanding of what's happening because everything that we show to the customer we can surface as a metric to us. And then you can also do stuff like say enforce um rules where so workflows should be deterministic by definition and maybe a user did a math.random random outside of a step and that might affect how you execute the the code. And so by having these kinds of rules, we can show to the user in the code you call before this step math.random and you are using the variable for a branch and that will affect the terminism when we have to say replay the the workflow because something fit. And so what I want you to take away from this from this talk is there are a lot of things that that are not UX that can lead to way better UX just because you did something and now you get all of these new ideas and all of these small incremental unnoticeable changes keep happening and happening and your users will be delighted with the exper experience that you give to them and Opus allow allows us to do this. We we wouldn't ever parse an a and find the 10 to 20 different nodes where a step can be created. We wouldn't track the dependencies between steps, between promises. All of this is complete nonsense for us to do because we have to make the platform be available and ready. But if it's easy, well, we can just do it. And if you have any questions, now it's the time to ask. Yeah. Okay. So, just ask um thanks for the talk. I'm curious on the workflow of using LMS whether you're using any sort of orchestration tooling or you're just using like cloud code directly like can you explain probably a bit more about how you're doing that? Yeah. So I have my own tooling. I mean I think everyone on on our team on the workflows team has their own tooling and we are trying to see what works for each and every one of of us. And then over lunch most of the times we're like oh I don't like what mine is doing. I really like what yours is doing. And I feel like we are all converging towards the same thing. And so, for example, my flow is I really like long sessions, but I I can only do them now that we have 1 million context windows for for LLMs. And what I tend to do is every time I have an idea, instead of doing creating a gira, oh yeah, I'll get to that when when I have time for it. I just do I do a CLI command that opens creates a work tree, creates a branch and then I do a prompt and I get notifications every time something ends or that it needs my intention. You still need all of those uh review loops. It's still you still have to think about the code that you are writing or that LLM is writing for you. But it really helps when you have a way a way to use them that really fits your work your workflow when working. But are you are you doing this manually as in you're doing like the build stage then the test phase and you just like prompting the um Yeah, I I still do the builds. I still do um we still do the we have CI/CD like we you you would have normally so every time you create a PR and you can have the LLM create a PR for you that's fine for example my tool and Lisha's tool one of my teammates they create stacked PR so that you can build a feature incrementally uh yeah and every PR has their CI you have you can have your tool not the LLM actually the tool track of the PR is getting built or if the CI is passing, if everything's okay and then it auto rebates everything so that you don't have to do all of that work. Thank you. Yeah. Any other questions? You got that there? Uh no, no, it's one of my own. Mine is called uh is called sh for commander. Uh Lishas is is sh is commander in Japanese. Luish Lisha's one is G stuff for Git stuff because it does a lot of things on Git and so we end up with all of our little on tools and then we we discuss what what we are feeling that's correct what's not correct for us. Okay. Any other questions or are we ready for the other Andrea to come in? All right, I guess that's it. Round of applause. Professor Law, please.

Get daily recaps from
Cloudflare Developers

AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.