How AirOps chases friction to build AI products with Claude
Chapters8
The speaker emphasizes that building accessible agents is very challenging but essential for marketers.
AirOps shows how Claude-powered agents become practical for marketers by using a document-based playbook, governance, and focused tools to cut friction and scale content creation.
Summary
Dylan from Claude introduces AirOps and explains why making agents accessible to non-technical roles, like marketers, is a hard problem. He traces AirOps’ shift from a node-based workflow builder to a document-style playbook that leverages Claude, Opus 45’s improved tool-calling, and a cloud-native agent SDK. The team’s goal is to lower the entry barrier while preserving enterprise governance and output quality. Quill, AirOps Next, and Playbooks are highlighted as core launches, with Quill acting as a data-aware agent captain and Playbooks delivering a collaborative, versioned, skill-based building experience. A key theme is balancing transparency and control: marketers want to see tool usage at each step, while governance must prevent low-quality or misaligned content. Dylan walks through a demo of the playbook and the inbox/grid interfaces that surface human review and feedback. The talk details two friction points they tackled: endless use cases require intentionality, and maintaining consistency through sub-agents, brand-context artifacts, and dedicated tools. Early results show an 8% reduction in token usage for certain tools, faster content delivery (two-week beta to publish for 10 enterprise customers), and tangible wins like a 130% bump in citation rate and 42% gain in share of voice for a case study with Parallel. Looking ahead, AirOps plans to stress-test self-improvement loops, memory/summary strategies for traces, and benchmarking methods to quantify improvements beyond “taste.”
Key Takeaways
- AirOps replaced a brittle node-based workflow with a document-style playbook to make agent-building intuitive for marketers while keeping governance intact.
- Opus 45 enabled reliable tool-calling and instruction-following, helping Claude operate more deterministically within AirOps’ workflows.
- Quill gives marketers data access (AI search data, brand kit) and guides content creation from insights to action in the UI.
- Playbooks adds collaboration, governance, and versioning so teams can co-create and manage multiple iterations of a content workflow.
- Sub-agents (compliance, writing, brand kit) and contextual artifacts reduce context-windows strain and improve output quality.
- AirOps-side results include 8% fewer tokens for context tooling, two-week time-to-publish for 10 enterprise customers, and a 130% increase in citation rate for Parallel.
Who Is This For?
Essential viewing for marketing teams and AI product builders who want to deploy Claude-powered agents at scale with governance and predictable results. It’s especially relevant for teams moving from traditional workflow builders to document-based agent playbooks.
Notable Quotes
"The main really big takeaway I want you guys to come away with is building agents and just making agents accessible is honestly a really hard problem."
—Dylan frames the core challenge AirOps is addressing: accessibility of intelligent agents for non-technical users.
"We wanted to lower this barrier to entry for content marketers to build and ship their ideas"
—AirOps’ design goal to democratize agent building for marketers.
"Quill had access to all the data that we provide to teams. Whether that be AI search data, the brand context and brand kit"
—Explains what Quill can access to drive clueful, brand-consistent content.
"We enforce human review by adding the ability to assign different users at the end of each section"
—Shows governance integration within the playbook for quality control.
"Two friction points we focused on: endless use cases forces intentionality and governance through human review and grid tooling"
—Highlights the strategic friction points AirOps targets to improve adoption and reliability.
Questions This Video Answers
- How does AirOps leverage Claude Opus 45 to improve tool-calling in marketing workflows?
- What is a document-based playbook in AirOps and why is it better than a traditional workflow builder for marketers?
- How do sub-agents and brand-kit artifacts improve content quality and governance in Claude-powered workflows?
- What metrics did AirOps see when deploying Playbooks with enterprise customers like Parallel?
- How can teams benchmark improvements when changing agent harnesses or adding new skills?
Full Transcript
Hey, how's it going everyone? Um, I'm Dylan. Uh I work on the product team at AerOps and yeah excited to walk you through uh how you know Telus says how Aerops chases friction um with building AI products with claude. And I guess the main really big takeaway I want you guys to come away with is building agents um and just making agents accessible is honestly a really hard problem. Um I guess like with developers it's a bit easier. people are used to um kind of all these different concepts but when you try to make these accessible to you know personas like marketers there are a lot of friction points and going to talk through some of the friction points that we um have seen and battled with.
So just to start off quick intro of who we are at Aerops. Um we are a growth marketing um platform for AI search and AI search for you guys is kind of like SEO but for engines like Chashbt, Gemini, Claude, you know buyers are asking on Claude different you know questions of hey I want to buy like these pair of sunglasses you know how do you know and how are you making sure that you are showing up for these searches. So, we help brands see how they're appearing in search, um, identify gaps, take action on those gaps, whether that be, you know, creating content, refreshing content, um, and then being able to measure the impact of whether or not, you know, the actions that they're taking actually worked.
And so, quick agenda. I'm just going to walk through real quickly how we got here, kind of our approach, um, for agents. Um, a quick just run through of what we just launched last week with AOPS next. Um, and then two friction points that we really focus and dialed in on for our launch. Um, when it comes to making and giving people the power of Claude, um, and making that super accessible and then closing with a couple of other friction points that we're really looking at for our next um, you know, kind of next act.
So, how we got here, um, AirOps used to be or still is, but we were mainly focused on orchestrating content through a traditional workflow builder. So nodebased style kind of like an nan nan where you can drag and drop these different nodes you can um orchestrate different variables and how things flow through this you know elaborate workflow and with a workflow builder style and especially with our core customer and audience being marketers um you would hit this complexity ceiling where you're trying to teach a content marketer what liquid text is what JSON is and all these different concepts um and they also had this short shelf life which um as new models came out, you know, Claude would release um, you know, Opus 46, 47, you know, it keeps on going.
You would have to update these different steps. Um, it changes the way that you're building this workflow. So customers are constantly having to go and update this workflow. They just spend so much time in building. Um, and also just if they update a step in like step one, they don't realize that variables and outputs are referenced in step 20. So there's there's a bunch of like complexity and just it was really brittle and scaling enterprise use cases obviously then required someone technical to help really guide through this workflow creating process. So our goal was to really lower this barrier to entry for um content marketers to build and ship you know their ideas and the way they want to create content.
Um and also while still maintaining um that quality bar which um is a very big focus for them because of course brands don't want to just be pumping you know AI slop out and having governance around keeping those enterprise standards. So this is a quick just preview of or not preview but a look at what our workflow studio looked like. This use case was actually to create a content brief with internal links. So as you can see it's it gets kind of gnarly in there um just for something that does seem pretty simple. Uh it just goes to show like how much thought goes into a lot of these workflows in the way that they want to create and you know orchestrate content.
So how do how can we take this you know kind of structure and create an agent experience while still harnessing you know the way that we use cloud code and these other uh agent tools. So I think the first really breaking point when we decide to invest uh heavier into agents was really the lo the with the release of Opus 45. I think that's when a lot of people started to see um really how smart the models were in tool calling being able to follow instructions um to uh in in a way that you know they weren't breaking the standards that they were setting for it.
And for us we at the same time we're trying to make building easier and one of the ways that we do that and I I'll dive more into it a bit later is through kind of this document-based style builder. And the first thing that we tried doing was actually taking this kind of like Google doc in a way and with an LLM compiling a workflow in the background which was an interesting idea. It was a bit brittle like there's a lot of you know error points that can happen there trying to turn like this non-deterministic um you know kind of instructions and making this actual workflow.
Uh then went after that was then you know using a traditional agent orchestration framework um where you're you know kind of defining these nodes you're letting an LLM decide different um decision points and we got to a point where we were getting pretty good outputs but with those traditional frameworks they honestly are like pretty brittle if I ever want to change the way I want to orchestrate different sub aents um I basically have to make code changes right like I actually have to go and change the way I'm routing these different nodes steps together. And that's when we really decide to, you know, invest heavier into the cloud agent SDK where it's honestly pretty pretty awesome that you can kind of orchestrate agents just through markdown files and provide skills um and different contexts really just by manipulating this environment and harness rather than actually like you know doing that programat pro programmatically through um you know a traditional agent framework.
So that's kind of led to our launch which we just had last week with AOPS next. And briefly just to run through exact what did what did we exactly launch. Um the first one was Quill which was kind of our our branding of our agent captain for content marketers. Um Quill had access to all the data that we provide to teams. Um whether that be AI search data, the brand context and brand kit which houses like literally everything about a brand. So, you know, you know, Quill is following the instructions of how content should be created and um just making it easier throughout the UI to get people from insights they see in the dashboards to actual actions.
Um being able to take the findings and gaps and get to that very next step. The next one is playbooks. So, playbooks is basically our new building experience. Um and for all you like developers out there, it's pretty much like a skill. That's very much like what we grounded in is like how can we make skills accessible and marketers are all like used to kind of a document based style. Um but we just allowed for like collaboration on these playbooks skills and also uh governance and versioning. So you know you'll have people with you know 10 different versions of this playbook of how do I want to create this this piece of content and just real quick just results that we've seen from customers.
Um we did you know case study with parallel we helped them produce and create content and they saw 130% increase in citation rate uh 42% increase in share voice and they were able to go live in one week which for us is a huge accomplishment just because traditionally especially because we work with these enterprise customers um it usually takes around a month at least with like the workflow builder of constant feedback going back and forth like hey this is you know not really how I want to be speaking in in my blog blog. Um, there's, you know, citations that are like kind of being loose or aren't right.
They're not. These are other citations I want to use for this piece of content. So, it was really incredible to be able to get to, um, that acceptance criteria in such a short amount of time. And other just a quick customer, too. Um, incredible to see how agents has moved what really like LMS can do and how marketers are viewing using, you know, these more objective workflows. um you know animals saying it felt more like a mid-level strategist for their team and um at Ripling being able to focus on offboarding a lot of the tedious tasks and really focusing on like where does my expertise come in and being able to add that value to the agent while it's creating that content and um feeding that unique context.
So we were able to really accomplish this by focusing on two main friction points. So when building and trying to bring high-quality agents to these more, you know, highly professional and enterprise use cases, um we really had to focus on how do we fit into the actual workflow of these content marketers and you know how do we make sure that we ensure uh quality outputs through the way that we're building our harness and orchestrating our harness. So the first one um first point I want to talk through is uh I I have it saying like endless use cases forces intentionality and I think a lot of us have probably been there too when we first start using cloud code or using an agent we're like this thing is like really powerful like I can do a ton of different things you're like it's really easy to start sprawling into this spiral of um yeah just there's so many different use cases and I think not only on like the product engineering side but also for customers, how do we be very intentional about what is the actual like problem that we're trying to solve here and how do we really force it and understand like what is um you know this workflow that we're trying to solve for people.
So um kind of like in my head now I have this like like mini like Steve Balmer head just going like marketers marketers marketers you know just like really trying to focus on who is like my customer in this use case that I want to tackle. Um, and just a quick glimpse into what does like this workflow look like for content marketer. This is one example use case for content creation where they'll discover um what do they want to create on they'll research of this specific topic that they want to make sure that they're ranking in AI search or traditional search for.
They'll draft a brief generate the article and then add any sort of like internal linking and best practices when it comes to SEO and AEO. And throughout this whole process, there are different human review points. Um, human review comes up a ton and human the loop comes up a ton in content marketing. Uh, especially just with when you're being cognizant about the content that you're pushing out, you really want to make sure that it is adhering to the way that your brand talks to the information that you're serving out in the internet. um and just giving you the best shot of also being surfaced um within kind of like generative AI search.
So that whole process kind of led us to focus on two parts when it comes to the most I guess these are kind of like the two most important parts we saw with the content marketing flow. One is having a document based kind of like IDE which again was like our playbook view. We wanted you know marketers are super familiar with documents. They are used to Google Docs. um they've probably been working with docs for for years and ages even before like technology like you have this piece of paper that you're kind of defining how you want to do something.
So making that familiar versus you know this nodebased workflow builder. Um transparency was also really important. One thing that users actually really liked about the workflow builder was that they could actually see what tools were being used at each step. So how do you still have a document but at the same time when I read through I can understand like this this is the exact tool that I'm using at this instant. Um this is the type of context I'm feeding at this point. Um and just being transparent around that. And that kind of goes hand inhand with control as well.
How can I make sure that I'm still in control of this you know set of instructions or uh workflow that I'm building? Um especially now that it is really like not as like deterministic because it is like a document. So um those are three areas that that we focused on and then again on enforcing human review. How can we bring governance, configurability, accountability into an agentic workflow which I think is pretty unique for us and I don't think um a lot of brands like kind of like tackle like that problem. I think a lot of it is like with agents also with coding is kind of let it go.
It'll it'll finish what it's done with and human review in that instance with coding is usually like with PR reviews after it's literally finished the whole job. rarely ever like while it's going through the actual you know coding process it'll you know kind of like ask for feedback of taste in different areas. So um that was like another area that we that we really focused on. And with that I actually want to jump into a quick uh just a demo of what this looks like within um within air ops. So yeah the first thing I want to cover is our playbook.
And again, it's kind of like this skill, this natural language builder where I can go in and either like type in like with a slash command and, you know, define different inputs, different outputs, and also all these different tools that, you know, content markers can use and that they're super used to using. And with it, when it comes to tools as well, you can add any MCP. So, if there's other outside connectors that you usually use, you can use those and access them. Um, we also have the ability to schedule different triggers. So this gives this kind of like always on skill or playbook or agent that you know can do the certain action at either a scheduled cadence based off web hooks.
Um we also have monitor which we've um kind of like partnered with like uh parallel when it comes to just being able to put a query and I like saying you know watch the internet in a way. So when certain things happen um it would trigger off this playbook to then run. And then the last one is uh AO insights. So whenever a metric drops, let's say like my cit my citation rate dropped in the last um like last week, then it would trigger off one of these playbooks, it can go and like do this research and come back to me of hey like this is the reason why um this happened.
And as I jump through into this kind of process of you know someone basically created this SOP of how they want to create this blog. Uh if I come to the outline section, we enforce human review by adding the ability to assign different users at the end of each section. So as it as you know the agent goes through this whole playbook. It'll come to this section step and decide to we I think we have a tool in the background that will then like fire off and basically since I'm assigned I'm the only person that can actually unblock this agent.
other people can still leave like comments on the outputs and artifacts and different feedback, but I'm kind of like this gatekeeper now that um has to review this piece of content and we're trying to we do that to ensure like that governance and what that actually looks like and how that's surfaced to users. One way is through our inbox. So we have an air we have an inbox within air ops where every single time human review happens or different opportunities are surfaced, users can actually come in here and see those directly. So I can you know click on one of these um items.
It will open up our basically like agent run. So on the right side is basically this agent like running through its whole process. I can kind of see its thought traces on the left side all the different you know outputs and artifacts that are you know one defined by me through that playbook and I can then either you know edit this existing document leave certain comments and then you know kind of kick it off and approve it. The other way that we also surface this governance and human review is through our grid. So our grid is our way of orchestrating content or orchestrating basically content at scale.
So in this specific example, I'm just kind of showing um like the way that you can actually collaborate with these. So Augustine is also in this document with me. Um I'm able to edit it, also leave human review. If I close this out, I can see within the grid all these different um you know kind of outputs that are that are running. We're basically kind of like running uh skills at scale where each one of these rows is a specific job I want to accomplish and I'm running that playbook. Um and I can also then see like human review at scale and click in these different uh cells to um then leave like my feedback and ensure that you know all this content is talking in my right tone uh tone of voice.
Uh awesome. The second friction point I want to talk about was with agents. Um the biggest worry is around consistency especially when you're coming from this workflow base. So how you know how did we tackle going about producing quality outputs? Um how do we you know make sure that customers are also seeing that and and you know being aligned to that as well. one of um the ways that we visualize this and actually like this this graphic here was um created and and used and referenced by our VP of sales which like funny enough is actually like one of the best explanations I feel like I've seen of harness engineering um you know on the product side we we tried a couple of times but it's just funny like from you know the go to market side is the best way that we've been able to explain this where um you kind of have this car where you then have an engine the model obviously being you know claude opus or sonnet whatever engine that you want to pick for that and everything else that goes around that and everything else that you build on top of it is super important in terms of creating a super great agent.
So the two that I want to focus on and the one that we focused on the most was around tools and the way that we were orchestrating context and you know the clawed agent SDK and also claude managed agents uh API have been super helpful in terms of being able to iterate on this quickly. um and just making sure that we're like kind of like programmatically setting these different um sub aents and yeah they they've been like super in uh instrumental in terms of us getting to a quality uh output. So the first one I want to cover is around tools and basically the background on tools are you could give an agent a bunch of primitive tools.
So in this specific example, what we were always trying to accomplish with Claw and our agent was helping um Claude understand what is wrong with a page on my website. We have different tools like access to uh you know traffic data, citation data, um scrapers so I can find like similar competitor pages. And we started off like honestly with a skill of here's like here's the way of dissecting whether or not a page is losing, what's wrong with a specific page? are like the schemas off, how's it compared to competitor pages, and it would kind of go on like these like Safari trips and it's honestly like a bit like token inefficient.
So, one thing that we want to focus on is how can we create specialized tools of jobs that Claw does that we know are is going to do like over and over again and how can we just make that a bit more deterministic where you know Claude can then put a URL and immediately spit out everything about that page as well as different structured you know content gaps. uh versus like other similar pages in that um in that space as well as what should be my target keywords and my target prompts for it. The second tool that we made uh and just like workflow that we saw that was super common with content marketers was our page versus tool which was essentially being able to benchmark my page against you know top ranking pages in that space um and finding what exactly is behind those pages and how can I you know how can I close those gaps.
So, this is just a really simple way of getting context efficiently. Um, it's it's kind of like a code mode in a way. I know that's been, you know, something that's been popular nowadays is being able to be more programmatic in terms of how we're fetching context versus kind of like looping through these different um, you know, tool calls. It's like, can I actually just like produce code that will fetch exactly what I need um, in, you know, one loop. The second one is through sub aents. sub agents have definitely been instrumental and crucial in terms of getting to that quality of output.
Um, in general, like what we tell users too with playbooks and honestly when you're first creating your agent harness is to actually just start off with, you know, claude itself and just have it go through its own tool calls and really not trying to make it too complex and all the context you're trying to, you know, give to it. So that's that's where we start off with and we were reaching a couple of um kind of like air spots when it came to the quality of the outputs we're getting. So what we did was decide to add on like over time certain sub agents.
The first one was a compliance check. So being able to just make sure that we're not like kind of polluting that main context window just because context raw honestly is like still a huge problem and will probably continue to be a problem when it comes to like what are you know the tokens and text that you're attending to. Um so being able to spin off the sub agent that will go through and have everything that needs to know about my brand and whether or not the content that I produced is following those rules. It'll you know come back with a score whether or not it adhereed to them.
What were the things wrong with it? and then the agent can then take that feedback and then kind of make edits. The second one we did was around writing. Um we first again like tried was just using like the regular clawed you know harness to actually write the content but we found that it was better to spin off a sub agent with its own very focused context window to focus solely on just writing that piece of content. um so it's not distracted by any of the research that was created um or any of maybe like old compliance checks um and just being able to like very focus in on that one job.
The next one is um a brand kit sub agent. So again like brand kit for us is basically just like kind of like a knowledge base or context layer of everything you need to know about a brand. And this is something that we kick off actually at the beginning of all of our runs where we have a um bracket sub agent that will go and fetch all the relevant contexts it needs and then stores it as like an internal artifact. So then throughout the whole process um our main agent loop can then reference those artifacts versus having to like use tools in our MCP to refetch like that context because then what happens is um in different sub agents you might actually have different brand context that was fetched over time.
So just being able to fetch that up front, store as an artifact and then just redirect the agent to always be referencing um that same artifact. And then finally um just still being able to add custom ones. This is more more for us like an internal um tool where like some of our like solutions architects who work for our customers can spin off different sub aents when when needed. Um and just really helps with just maintaining that context. I think that's definitely been the biggest learning is although like context windows do continue to grow um you know you have like a million context window with um with opus 47 but just still being really cognizant that just because it is you know larger doesn't mean that you should be like using the whole thing you should still be very efficient of what sorts of contexts are you know you're letting the model attend to.
Yeah. And in terms of results, what we saw was actually a 8% decrease in like fewer tokens consumptions around at least that specific tool. So instead of again like that tool spinning off different primitives and you know going over and over again, you actually just have this one tool call that fetches a page um and then all the results from it. The second one from that also like those specialized tools was also speed. um beforehand while it was making like those 20 different tool calls to fetch all the context it needs you just have this one entry point that spits out everything that you need.
Um and then when it comes to the quality side we had 10 enterprise customers while we were doing running this beta actually start publishing uh publishing content in under two weeks. So they were able to self-s serve and also you know get to a quality of output through these agents which was something that used to be kind of like a hold handy experience. Um it was also had like a really high um kind of like ceiling. So yeah, um when it comes to like building these agents and with Claude being able to execute and making it easier to execute on things, um it's really easy to think that there are no more problems to solve, but really every single time a problem is solved, just that friction point always keeps moving.
So those are kind of the two that we um were really chasing after and we really wanted to tackle. And there's a ton more that we also want to continue chasing. But I think just overall it's a great thing just continue to chase friction because that really is how you create production you know agents and make those more accessible to users outside of you know more technical spaces. So just to quickly close out um the next two kind of friction points that we're looking to battle and excited to share more learnings on are around really self-improvement and feedback loops which there were awesome talks yesterday around dreaming sequences and you know how I think the most interesting one is actually how do you structure summaries of different traces um what what's the best way of collecting the most relevant memories and also like forgetting is actually a feature like being able to forget certain types of memories and the last one is bench benchmarking content creation agents.
Um, I think something really interesting about our space is that it isn't uh law, it isn't coding, it's not something that is really easy to say whether or not it's, you know, something's correct. There's a lot of taste that goes into a piece of content that you create and there's a lot of opinions of how I want certain context to be formatted and created. So, what are like the best ways that we are creating benchmarks so that every single time we do change our harness, we add a sub agent or we decide to add a skill to our harness, how do we know that it's actually improving outputs and not just, you know, kind of vibes that that we're going after.
Um, but yeah, I'm around all day. Would love to, you know, chat with you guys and hear about what you guys are building. But yeah, I hope this was was helpful. Thank you.
More from Claude
Get daily recaps from
Claude
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.









