Building the best agentic analytics harness: Powered by Claude, built with Claude Code

Claude| 00:26:46|May 22, 2026

Chapters7

How adopting cloud infrastructure accelerated Omni's development and culture, enabling rapid experimentation and transparent demos.

Omni's CTO describes how Claude Code powers Blobby, their agentic analytics assistant, and shares hard-won lessons from 18 months of iterative development.

Summary

Omni, led by its CTO, explains how Claude Code supercharges their analytics platform Blobby. The talk traces Blobby’s evolution from a single-question app to a robust agentic system with an outer and inner loop, a semantic layer, and a growing toolset. They highlight practical gains in velocity, transparency, and customer impact, including faster, on-demand SQL generation and richer data interactions. A key theme is grounding LLMs in business context: defining data models, terminology, and permissions so Claude can answer questions about a specific company. Omni also emphasizes observability through traces and evals, using them to diagnose failures and drive continuous improvement. The demo reveals live experiences with dashboards, workbooks, and on-the-fly data exploration, illustrating how Blobby translates natural language into semantic queries and SQL. Throughout, Claude is positioned as a powerful partner, while Omni builds UI, validation, and governance around the model’s outputs. The talk closes with a nod to the engineering culture and a call to connect for attendees who want Blobby swag.

Key Takeaways

Claude Code dramatically accelerates data queries by enabling one-shot SQL generation via a robust parser, reducing multi-shot attempts.
A semantic layer on top of a data warehouse curates data, enforces permissions, and localizes context to improve answer accuracy.
An agentic loop (outer checkpointing plus inner toolset) dramatically improves recovery from errors and elevates answer quality in complex sessions.
Consolidating brain architecture—avoiding split-brain subagents—led to clearer, more reliable reasoning and fewer unpredictable outputs.
Version transitions (HA Coup to Sonnet) increased token efficiency and enabled longer, more capable conversations with Blobby.
Introducing AI context, sample queries, and field-value disclosures helped the model ground questions in realistic business data, boosting usefulness.
Eval traces and live debugging became central to diagnosing failures and guiding continuous improvement.”],

Who Is This For?

Ideal for CTOs, ML engineers, and product teams building enterprise analytic assistants who want practical, battle-tested lessons on building agentic AI—especially with Claude Code. Suitable for teams seeking faster iteration cycles, better data governance, and reliable evaluation pipelines.

Notable Quotes

"I thank Claude very much for making me uh still able to do some software engineering from time to time."

—Demonstrates the personal productivity boost ClaudeCode provides to the CTO.

"This is one of the biggest benefits we’ve seen is the velocity—the speed at which we can build and demo new capabilities."

—Highlights cultural impact of shipping features faster.

"Claude is incredible at answering questions, but you need to tell it more about your business... and if you're asking a data question, you need to tell it how your data looks."

—Emphasizes the need for business context and data schema in prompts.

"We consolidated the brain... pulled these tools up into the outer agent harness."

—Key architectural learning that reduced output unpredictability.

"We switched over to Sonnet and saw customers say, ‘I just asked this question in 2 minutes.’"

—Demonstrates the impact of model choice on real-world performance.

Questions This Video Answers

How does Claude Code power one-shot SQL generation for analytics dashboards?
What is a semantic layer in data analytics and why is it important for LLMs?
What lessons did Omni learn about agentic loops and error recovery when building Blobby?
Why did Omni move from a HA Coup to a Sonnet model for Blobby?
How can eval traces improve reliability in AI data assistants?

Claude Claude CodeBlobbyOmnisemantic layerdata warehouseSQL parsingagentic loopevaluation (evals)GitHub data analytics

Full Transcript

Hey, thanks everybody. Um, great to be here today. We're Omni is a AI analytics platform and today I'm going to talk a little bit about you know how we build with cloud and you know what we've built with claude and and how cloud powers that. Uh so start just with how cloud has enabled us. I'm the CTO. I run the engineering team at Omni. Uh we have a team of about 25 engineers. Uh this is a slide of our uh commits to the main branch of our repository over time. Uh I think it kind of speaks for itself. Um one of the things hidden in here kind of a very small piece of that line is my own commits. I think as CTO of a growing company with hundreds of customers, I sort of assume that at some point I'd have to stop writing code. Uh and I thank Claude very much for making me uh still able to do some software engineering from time to time. Uh so that's been a really you know a fantastic sort of unexpected benefit of of this roll out and you know to speak a little bit about how this went earlier in 2025. We said to the team, I don't know when and I don't know how, but I know our jobs are changing. So, let's just start experimenting, start using these tools, figure out what works. And we did a bunch of it with some, you know, sort of fits and starts. Uh, and when Cloud Code with the Opus model released, that was when some of our senior engineers said, "Wait a minute, no, this is real. This is actually helping consistently." And you know, it's kind of been off to the races since then. Uh, and starting around, you know, I I sort of felt like everybody went away for the holidays and then came back in January and had sort of skilled up and figured it out and was ready to start hitting the ground running with cloud code. Uh, and you can see the slope of the line from there. So that velocity is a big part of our culture at Omni. We have a core value called ship it. Uh, we also have a a core value around transparency. Uh so you can actually see uh if you go on our website omni.co there's a topline uh navigation page to our demos. Every Friday at Omni we have an all hands meeting. It's the most important meeting of the week. We do about 10 minutes of announcements, shout outs to each other and then we do 50 minutes or increasingly more than 50 minutes of demos. Uh, we record all this and our CEO's favorite job is Saturday morning he wakes up, he cuts those demos, posts them on YouTube and shares them with the world to see. So, if you're curious about what we're up to, uh, you can go to omni.co. This has been a really cool way to show our customers, our prospects, the community around us sort of how we're thinking and how we're building. And that speed and velocity has been a huge benefit for us as a company. So let's talk a little bit about what we have built with claude. Uh so Omni's AI analytics, you know, we let you talk to your data, right? So how does that actually work? Uh so user comes in and asks a question. In Omni, we're using Claw to actually translate that question into a semantic query. I'll explain more what that means later. Uh we have a semantic layer and that is think of that as sort of a translation layer that sits on top of your data warehouse database or maybe multiple of those uh to actually you know provide some additional benefit about how to use the data enforce it sort of give it a map for how to actually translate that data or excuse me that query into SQL that then runs against the the warehouse. We'll go into more of this later, but that's just a high level view of kind of how the system works. And this is important because Claude is incredible at answering questions, but you need to tell it more about your business if you want it to answer questions about your business, right? Can tell you incredibly deep insights about how businesses work generally. But if you wanted to know about how your business works, you need to tell it not only, you know, how the business works, the terminology you use, and then of course, if you're asking a a data question, you need to tell it how your data looks and how that works. And so that's where we come in. That's the problem we're solving. And it's subtly difficult, right? Like even last quarter means drastically different things at different companies. Even in our our own company, you know, in the product and engineering organization, last quarter refers to the calendar year in our sales team, it refers to our fiscal quarter. So all of that needs to actually get coded in to the context and awareness and even the the sort of data layer and definitions of the data so that it can be used appropriately to ultimately get you the right answer to your question. So this semantic layer, what's it doing? So it's like I said it's a translation layer that sits on top of the database. Uh it's doing a few things. It's very easy to sort of come up with a toy demonstration of how an LLM or frankly a human can get correct answers on top of you know 10 data sets in a database. Real company data warehouses and databases are not like that. They have tens of thousands, hundreds of thousands of data sets, sometimes more. Uh, and all of them, you know, have there's a hundred revenue tables, right? There's a hundred opportunity tables. It's very unclear how to actually stitch those things together and use them in the correct way. So, this is one of the benefits that our semantic layer provides is it allows you to sort is define how to use these things together and also curate it. Say, hey, listen, this is the one that matters. Ignore those other 10. With that, it it's a way to encode the context. I think one of the things that we've learned is that context is great, but context localized to what the actual definition that the context refers to makes it all the all the better, right? And so I think about this, we're at code with claude. If you use cloud code, which I assume most of you do, you have your cloud. MD files, right? And sort of the more you can do to sort of localize that that context next to the parts of the code that it applies to, the better results you're going to get. That's what our semantic layer does as well is it helps you provide that context next to the field definition that it applies to rather than in a separate file over somewhere else. Uh and then finally permissions, right? It's a permissions layer. Make sure that people see the data they're supposed to see and don't see the data they're not supposed to see. And inside of our application, this feedback loop is an important part of how this stays current and accurate because guess what? In a real organization, this stuff changes constantly. And so our application provides a feedback loop where you know the next question that gets asked of the data can then be fed back into the definitions into the context for a continuous learning loop. So this is our agent name is Blobby. Uh if you look at Blobby you could probably see this is you a mature professional refined data analyst. Uh and that is what Blobby is today. But Blobby hasn't always been that refined. Uh started building Blobby let's say about 18 months ago and we've learned a lot along the way. Uh you know Blobby's grown up quite a bit in the past 18 months. So what I want to talk about is a little bit of you know how we develop the different phases and also what we learned uh along the way that kind of helped increase the quality and capabilities of Blobby. So just to ground this conversation, I just want to show you a really quick demo video of what Blobby does in real life. So if we can cut to the video quickly, right? And so what we're going to see here is like I said, ask questions of your data. Blobby's sitting there dutifully waiting to answer your question. Uh, and as we do this, you'll start to see some of the phases that we go through to actually break down the question and then answer it. So, right, the first thing Blobby's doing is saying, "All right, you're asking about PRs. Blobby's very smart because Cloud's very smart. It knows that PRs refer to GitHub pull requests. let me go find that in your semantic model and figure out what data you're actually referring to. Uh then it goes and looks up the values of the data set because we said we only wanted it from a particular repository. So it has to apply a filter. It needs to know what that filter value is. And guess what? When I type out questions to an LLM, I make typos all the time. So it needs to do a little bit of fuzzy matching to make sure that it's actually finding the right thing. And then right it goes through it generates a query runs that query against the data warehouse gets the results provides a nice visualization and then does a nice little summary at the end to tell you what you're seeing. Cool. So just grounding you a little bit in what the actual experience is like. Uh so what did we learn along the way? So the very first version of Blobby was basically single question, single answer. uh and we quickly realized that we needed to give a lot more metadata about you know how to use the data and how the data is typically used. So you we always had these label and description fields in our in our definitions. Uh but we needed to add some additional context. So we we added this AI context concept which is sort of specifically for an LLM. Hey, how should you use this? you know, when you're asked a question about this, you might want to use this reference or this field just to kind of help the data team and the the administrators of this to actually steer in the right direction and ensure that you get a a quality answer. Uh, and then sample queries, very self-explanatory, really helpful to kind of ground it in, hey, this is a typical use case. This is what you would uh this is the query you would run to answer uh a question that like X. Uh, and then finally, values. This one was kind of subtle. What we realized is again back to that example of, you know, I'm asking a question about a certain repository or in this example, you know, asking a question about like a region. Uh, it's really helpful to give the LM just like a taste of what the values of that field are because, right, you can see, you know, region all values are AMIA, NAM, APAC, right? So, it can it can infer the next 10 values because it sees all right, these are abbreviations of regions of the world, right? Um, but it's useful for it to know like, okay, this is an abbreviation. So, if somebody asks for United States, I can just put US in there. So, this did a nice job of helping to improve the quality of the question and answers that we were getting. But at this point, Blobby was still really not an agent. So, that was the next big leap here was adding an an agentic loop around this. Um, this is a, you know, a big engineering effort. We built our own agentic harness. uh and you know it included this concept of tasks like all good agents have uh included a lot of other stuff too. I think one of the biggest things we learned was that uh the the agentic loop was really great at recovering from errors. So like one of the earliest like massive uh quality increases we made was to just a tell Blobby how to recover from errors and give give it some budget to go do that. Uh and b then go invest in providing great error messages that were descriptive about what was happening and how you might fix it. Uh and that alone allowed the the quality score to increase dramatically. We saw our evals like a lot of our more difficult evals got a lot better once we did that. At this time though because we were sort of in this mode of question and answer we were using the ha coup model and the haiku model is great but once you get into these more elaborate agentic conversations it's just not designed for those right and so uh we switched over to sonnet um and we're showing a graph here of token consumption uh and the reason is twofold one these are longer conversations they're more complex they consume more tokens that was by design two, this was a really big unblock. So, we all of a sudden started getting our customers saying, "Wow, like I just asked this question that, you know, either I never would have been able to answer myself or even if I did, it would have taken me hours and it just nailed it in 2 minutes." Uh, and so the usage of Bobby started dramatically taking off at this point. So, at this point, our CEO, who is our loudest and most critical user, like all good CEOs are, was telling us, "Listen, guys, I know this thing's really good, but it screwed up this question. Go fix it." And we said, "You know, Colin, LM are a little unpredictable. like you're just going to have to accept that it's not always going to be perfect. He said not good enough. Go fix it. So, okay, fine. And where this led us was, you think one of the big uh efforts that we undertook at this point was to say, okay, let's really invest in understanding the traces and being able to see the traces of these, you know, quote unquote bad sessions. And this led us to a series of major surgeries that we refer to as the blabbotoies. Uh and the blobottomies really were traced back to what we were seeing in these traces. So when you look at the traces, you kind of get to see the the inner workings of how the agent is sort of talking to itself and reacting and responding in these loops. And that really clarified, you know, why some of these, you know, seemingly kind of just, you know, bad random sessions, you could actually start to see where they're they were rooted in real problems. And so an example of this was the original design of our agent might have been a little too clever. Uh, and we had sort of an outer agent that was responsible for, you know, producing the task list. It understood where all like all the data available to it, but it was not in charge of query generation. It had a sub agent that was in charge of query generation and felt like a reasonable design. It was also handy because we could use that query generation sub aent in a few other contexts. Uh but what we found once we started digging into these traces was that the sub agent its job was to generate one query based on whatever it was asked. And the outer agent didn't know what was actually able to be answered in a single query. So it would say hey sub sub agent go answer me a question about you know GitHub pull requests and support data and uh summarize these things and then the sub agent would say I I can't answer that in a single query. I can only you know I can I would have to run multiple queries. And so it's sort of the light bulb went off when we started seeing this that like we have to be really careful about how we separate the information and the knowledge of the you know outer agent and inner agent. And what we ended up doing was what our engineer Joel referred to as consolidating the brain. Right? You you want to be careful not to have a split brain between any sort of sub agent system and outer agent system. And so we just pulled these tools up into the the outer agent harness. Uh and this got rid of a lot of this kind of like seemingly unpredictable surprising behavior. So this is a really big learning dramatically improved a lot of our kind of more complicated evals. So the next phase was we started saying okay that's great but guess what I actually go use to generate SQL it can answer some really impressively hard questions that candidly sometimes blobby doesn't really do very well on this and there's a really interesting backstory here the short version of it is when Omni first built, we had actually built a full SQL parsing engine into it and we ended up discarding it because it wasn't reliable enough. People would throw random SQL at it and it just couldn't handle every possible permutation. Uh, and so that had been sitting on the shelf for years. Uh, but we sort of got thinking. And we said, listen, if if Cloud can generate this really powerful expressive SQL and we can parse this SQL as long as it sort of fits into a general form that we understand, maybe there's an opportunity here. And we also kind of said, I think it's probably a safe bet to assume that the good people at Anthropic are investing heavily in making Claude really good at SQL. So that seems like a good pet, a good bet to put our chips on. So, uh, our engineer Stephen sort of dusted off that old parsing code and really just fundamentally changed the interface of how we were enabling or how we were exposing SE or query generation to Blobby. Initially it was this sort of jsonified form of a query that was like highly structured and then we switched it to this sort this SQL parsing mode where we said listen you can now produce SQL and we can parse through parse through it and we were able to sort of give it some of the guidelines that prevented us our parser from falling over. Uh, and this enabled Blobby to now, you know, take a lot of questions that it might have taken three or four attempts to actually answer or sometimes it would have to chain together three or four queries in sort of awkward ways. Uh, and actually write it in a oneshot query. And I think, you know, one of the things I noticed is it seems like Claude really likes to write SQL with CTE, common table expressions for any of the SQL nerds in the room. Uh, and our parser was really good at parsing those actually. So this turned out to be a really nice sort of marriage where we found the efficiency of the system went way up because we were we were able to a we didn't have to teach Blobby about this proprietary JSON form of a query that we had invented. We were just telling it to write SQL which it already knew about. Uh and B, it was able to just produce much more efficient queries. So it didn't have to do two or three time uh two or three shots on it. So this is where we are today. We have our agentic system. We have this kind of outer loop that's in charge of uh checkpointing our our executions to make sure that we can recover from any failures. And then we have this inner loop where we have a bunch of tools available. That set is growing dramatically. Uh you know in addition to the examples I just talked about, we have tools for generating dashboards. We have tools for generating uh visualizations. We have some validation tools. I'm going to hopefully do a live demo at the end if the gods are with me and we'll show some of those. Uh and uh the the surface area of those tools is constantly growing. We also have tools that enable Blobby to actually do the data modeling and improve that semantic layer. We also have an eval system. We have an internal eval system. We're also building an eval system for our customers because it's really important and one of the key benefits that we provide is predictability and quality. The CEO asks the question, needs to get the right answer, needs to get the same answer every time, right? Evals. I actually I love evals. I think I love evals for a different reason than most people love evals. Like I said, I my favorite thing about the eval is just having that raw trace data. It's really like the the observability part of eval to me was enlightening. So it's I maybe this is a personality trait of like being a brute force type person, but I really like just being able to go and say like, okay, this was bad. Why? And look through that data. Uh and then obviously sort of capturing that into a judge is a nice efficiency gain as well. And this is a big one. So I talked at the beginning about how we build with claude. I I when you're building an agentic system when you're building any system, it's really important for your engineers to have an understanding of what the users of that system actually care about, right? Like it's really hard for somebody to build a system that they couldn't themselves picture using or can't relate to the users of. Uh so I I actually think that you know beyond just the massive productivity gains we got with cloud code being users of cloud code helped us understand what a good harness looks like right and then we can take some of those lessons and bake them into our harness right and so it's like hey should we go you know should we go build uh a new way for us to go explore the the semantic model well let's see what cloud code does because guess what semantic model is not that different from a codebase right So maybe we should sort of tap into some of the ways that the cloud code harness does this. Uh and I I feel like that actually helped our engineers really relate to the problem deeply uh and see some really you know see kind of what the the latest and greatest techniques were for for solving it. All right demo mode and if all goes well I'll show you just a quick a quick glimpse at how Blobby actually works live. uh if all goes well uh so first I'm going to create a dashboard. So uh create a dashboard of engineering activity in the Omni repository. So creating a dashboard creates a lot of queries. Uh it also does a lot of thinking about sort of how to lay out the dashboard. It can sometimes take a little while. So while that's running, we'll we'll take a look at sort of the starting point here. Um but right, it's going to create a bunch of uh it's it's got a plan. It's going to go through and look at the different relevant topics. An omnia topic is like a domain of data. So it's uh think of it as a big wide data set that combines all the other subd going to go start actually building this dashboard. While it's doing that, uh, I will switch over to just like that that demo that I showed earlier. Same exact type of query. You know, I think one of the other things that we're tapping into here is our philosophy from a product perspective is AI to build, UI to sort of validate and troubleshoot and refine. And so that's baked into this chat experience in a really deep way. So this is pretty much the same exact session. And let's say I want to actually go sort of touch this data, understand it in a more deep way. You instead of having to kind of squint at a SQL block, I can actually open this in a workbook. And in a in Omni, a workbook is just a way to actually generate queries and manipulate your data, right? So I can see, okay, so these are the uh you know, this is the the GitHub pull request data set. It's correctly filtered for the Omni repository. You know, it's correctly looking at the main branch. Um, and it's looking for all of the merged uh the merged pull requests. Um, and so, you know, I can go manipulate this chart if I want to. Um, and then additionally, you know, I can start actually looking at other aspects of this data. So, let's say I actually just want to look at uh a specific user or you know what, we'll do a little simpler here. I can just look at a different repository, right? And just very very quickly like let's look at our docs repository instead. So, just becomes a much easier way if you want to then go sort of riff on what the what Blobby has done for you, you can do that. Get back to our our dashboard here. Uh, and so if you look, let's see how we're doing. Oh, great. Okay, so it's built a dashboard. Uh, let's see. It's given us a little summary. Engineering activity, key metrics. Cool. Okay, let's go check this out. So, I'm going to preview this. pulls open the dashboard in a split pane. Um, and nice, it's done a very nice job here. So, past three months, uh, top PR authors. Oh, man. It's going to go to the the heads of those engineers. Um, PR volume over time, you can see some of this trend. Actually, if we look at further back, let's look back over the past 12 months. And we should actually see some of that trend that I showed in the original Oh, yeah. There you go. uh in the original slide deck about or the original slide about uh our activity. Um oh not surprisingly AI is a very hot topic at Omni today. Um looks like some of the workflow data didn't come through quite correctly here. That happens. Um so yeah and then you know as I was saying right I can go through and actually just troubleshoot this live. So maybe I just want to go understand okay why is this chart blank? There we go. And I'll be honest, I don't know at a glance why it's blank. I actually don't think we have that data populating very well. Um, so anyway, we'll go back to the slides for now. Uh, thank you for indulging me in the live demo. Uh we have specifically designed our harness to be optimized for Claude and the Claude family of models. Have some great customers, fortunate to be surrounded by phenomenal engineers and and other teammates. We're based in San Francisco. Uh if anybody wants a blobby sticker after the talk, I'd love to chat with you and give you one.