The 5 Levels of AI Explained in 10 Minutes

Maddy Zhang| 00:10:09|Jun 14, 2026

Chapters8

The video lays out the five levels of AI and argues that progress comes from shifting where your judgment is applied, not just from using more tools, while cautioning that adding AI to existing workflows can add overhead if processes aren’t changed.

Maddie Zhang distills AI mastery into five levels—from consumer chatbots to orchestrated systems—and shows practical steps to move up the ladder.

Summary

Maddie Zhang presents a clear, practical map of AI maturity, arguing that true skill comes from where you apply judgment, not just how many tools you use. She starts by debunking the idea that more AI usage means better performance, citing a study where AI-assisted developers were slower despite feeling faster. The video then outlines five levels: consumer, practitioner, builder, architect, and orchestrator, detailing what distinguishes each stage. At level two (practitioner), she highlights repeatable processes, prompt templates, and system prompts, plus techniques like chain-of-thought prompting and few-shot prompting. Level three (builder) covers the basics of retrieval-augmented generation (RAG), embeddings with vector stores (Pinecone or Chroma DB), and evaluation methods. Level four (architect) emphasizes system-wide thinking, logging, eval harnesses, and concrete cost and reliability trade-offs, including caching and tool selection. Finally, level five (orchestrator) envisions autonomous AI pipelines with multiple agents and clearly defined specifications, illustrated by Jared Sumner’s Bun port example. Maddie also plugs HubSpot’s AI Model Cheatsheet Bundle as a practical decision aid for model selection. The episode ends with guidance for engineers at different career stages and a reminder that higher levels offer greater leverage and resilience against replacement.

Key Takeaways

Using AI tools to measure speed alone is misleading; the real value comes from changing the workflow around AI and the judgment you apply.
Level 1 (consumer) involves one-off interactions with chatbots like Claude or ChatGPT, where context doesn’t carry over across sessions.
Level 2 (practitioner) introduces repeatable processes: prompt templates, project instructions, and integrated AI workflows that compound productivity.
Level 3 (builder) requires understanding RAG, embedding pipelines (e.g., Pinecone or Chroma DB), and formal evaluation of answers, not just eyeballing results.
Level 4 (architect) is systems thinking: intent classification, multi-component pipelines (RAG, SQL generators, hard refusals), detailed logging, eval harnesses, and caching for cost control.
Level 5 (orchestrator) designs autonomous, multi-agent pipelines with explicit specs and end-to-end governance, like the Bun port example with hundreds of agents.

Who Is This For?

Software engineers and tech leaders who want a practical ladder to advance AI skills—from leveraging chatbots to designing autonomous AI systems and large-scale pipelines.

Notable Quotes

"This is a cautionary tale about bolting AI onto an existing workflow and assuming something will improve."

—Introductory warning about blindly integrating AI into existing processes.

"Here, engineers start to see real compounding productivity gains, and it's also where skills diverge fast."

—Describes the leap at level two from mere usage to repeatable workflows.

"The gap between levels two and three is smaller than most people think. The main barrier is building something real, not studying more theory."

—Emphasizes practical work to reach level three.

"At level four, you know that a well-engineered context with a mid-tier model will consistently outperform a poorly engineered context with a frontier model."

—On system-level design and model choice.

"Writing a spec that's detailed and unambiguous enough for autonomous execution is pretty hard."

—Highlights the challenges of level five orchestration.

Questions This Video Answers

How do you move from using AI tools casually to building repeatable AI workflows at work?
What is retrieval augmented generation (RAG) and when should you use it in production?
What are the best practices for evaluating AI outputs in a production system?
What tools and models should I consider for scaling AI pipelines in 2026?
What does it mean to design autonomous AI pipelines with multiple agents?

Maddy ZhangAI levelsConsumer AIAI PractitionerRAGPineconeChroma DBPrompt engineeringHubSpot AI Model CheatsheetAutonomous AI systems

Full Transcript

This video is sponsored by HubSpot. Many people think they're good at AI, but they've barely scraped the surface of what AI can do for them. And in this video, I'm going to show you the five levels of AI that everyone should understand. [music] Hi friends, I'm Maddie. I'm a senior software engineer who previously worked at Google and internet other big tech companies like Amazon, IBM, and Microsoft. So, leveling up with AI isn't about using more tools. It's instead about where your judgment gets applied. By the end of this video, you'll have a clear map of where you are and exactly what to work on next. Let's get into it. So, most people frame AI skill as a question of how much you use it. Like, I'm using AI every day, so I must be pretty good. But that's like saying you're a good driver because you commute a lot. In fact, there's solid research showing that developers using AI tools can actually get measurably slower, not faster. A randomized control trial found that experienced open-source developers using AI coding tools completed tasks 19% slower than developers working without them. While those same developers estimated they were 24% faster. However, this isn't an argument against AI, but it's a cautionary tale about bolting AI onto an existing workflow and assuming something will improve. When you add a new tool without changing the process around it, [music] you get more overhead. You're evaluating AI suggestions, correcting code that's almost right but not quite, debugging errors that look fine until they aren't. So, here's how you should approach this instead. Every level of working with AI changes one specific thing: [music] where your judgment is applied. At the beginning, your judgment is going into what to type into a chatbot. By the end, at the most advanced layer, your judgment is going into the architecture of entire systems. Let's talk about level one, the consumer. At this level, you're using AI chatbots, Claude, ChatGPT, Gemini, on their web UI. You ask it to explain things, draft emails, paste in your code to debug. This makes you faster, but your judgment is almost entirely at the output end. You're evaluating whether that answer the AI gave you is correct without really guiding it. And most tellingly, every interaction is one-off. No context carries over from one session to the next. When you level up from level one, you get to level two, the practitioner. This level is one most engineers never get past. At this level, you've internalized that how you ask changes what you get. Now, your judgment has moved upstream, and most importantly, you've started building repeatable processes instead of treating AI as one-off. For example, you have prompt templates you reuse, you've set up projects or custom instructions, so you're not re-explaining yourself every single session, [music] and AI has become integrated into how you actually work, not just something you pull out when you're stuck. Here, engineers start to see real compounding productivity gains, and it's also where skills diverge fast, because the engineers who document their workflows and build onto them keep getting better, while the ones who don't just get slightly faster at the same loop. Here are some specific techniques that show up at this level. You're using chain-of-thought prompting to get the model to reason through a problem before answering. You're using a few-shot prompting, so showing your model two or three examples of the input-output pair you want, so it pattern matches rather than guesses. [music] You're writing system prompts that set role, tone, and constraints up front rather than repeating them every message. And you've started managing context and deciding which model to use. And once you're actually making those real decisions with AI, which model to use, when to use [music] a fast cheap one versus a powerful expensive one, that's when having a good reference matters a lot, which brings me to today's sponsor, HubSpot. Using the wrong model for the wrong task either costs you too much, gives you worse results, or both. And the landscape right now is genuinely confusing. There are a dozen frontier models, open weight options, cheap fast options, and expensive reasoning-heavy options, and the right answer actually differs by task. HubSpot put together a free resource called the AI Model Cheatsheet Bundle that cuts through this and helps you make an informed decision. It's a one-page cheat sheet that maps use cases to models. For example, Opus 4.7 for the best overall reasoning and review grade code, GPT 5.5 for identic tasks and fast tool loops, Gemini 3.1 Pro for massive context windows up to 2 million tokens, Perpetually Sonar Pro for real-time web research, Haiku 4.5 or GPT 5.4 Mini when you need cheap high-volume text, and so on. There's also a full decision matrix that breaks it down by category. Coding, summarization, image generation, private on-prem deployment, [music] so you're not guessing. If you're building anything that makes model API calls, this is the reference you'll want to come back to. It's completely free and the link is in the description. Thanks to HubSpot for sponsoring this one. Now, let's talk about level three, the builder. At this level, you can design the AI layer of a real feature from scratch. You understand the primitives, how to make API calls to models like Claude or GPT, how to structure assistant prompts for consistent behavior, how to handle streaming responses, how to think about what goes in the prompt versus what you retrieve from a database. Here, you understand how to build and use rag, retrieval augmented generation, and not just the buzzword, but you know the basic pipeline. [music] Chunk your document, embed them into a vector database like Pinecone or Chroma DB, and at query time, pull the most semantically similar chunks to feed the model. And you've started thinking about evals, not just eyeballing outputs and hoping they're right, but defining what a correct answer looks like and writing something systematic to check it. This is the level where hiring conversations sound different. Companies aren't just looking for engineers who use AI. They want engineers who can build features powered by it. [music] The gap between levels two and three is smaller than most people think. The main barrier is building something real, not studying more theory. Now, let's talk about level four, the architect. Now, you're thinking in systems, not features. So, if a level three engineer can build a chatbot that calls an LLM and returns an answer, A level four engineer builds a chatbot that does intent classification to route the query to the right subcomponent, maybe a RAG pipeline for knowledge questions, a SQL generating agent for analytics questions, and a hard refusal for anything out of scope. It logs [music] every LLM call with latency, token cost, and output, so you can debug failures in production. It has an eval harness, a set of [music] 50 to 100 ground truth question-answer pairs that runs automatically before you ship any prompt [music] change. And it has a fallback when the primary model is down or over rate limit. At level four, you know that a well-engineered context with a mid-tier model will consistently outperform a poorly engineered context with a frontier model, [music] which means model choice isn't the only lever and sometimes isn't even the most important one. You think about AI the way senior engineers think about distributed systems, with explicit attention to failure modes, cost at scale, and observability. You know which steps in your pipeline are deterministic versus stochastic, and you design your error handling accordingly. You know when to use streaming versus waiting for response. You know how to use caching, both semantic caching at the prompt level and standard response caching, [music] to cut costs on repeated queries. This is also when tool calling becomes a real design problem. Giving an agent tools, so like web search, code execution, database [music] access, API calls, sounds powerful, but every tool you add is another failure mode. A level four engineer thinks carefully about what tools the model actually needs versus what would just be nice to have. And finally, level five is where we're headed as an industry, and it's worth understanding even if you're not there yet. [music] At this level, you're not building individual AI features or systems. You're designing pipelines where AI does the work and you're setting the direction. You're working with multiple agents that hand off tasks to each other. You're thinking about how to write a specification clearly enough that an AI can execute it autonomously without you in the loop. Your job is increasingly to define the goal, set up the scaffolding, evaluate whether output was right, and [music] iterate on this back. This might sound abstract, so let me give you a concrete example I've seen of what this actually looks like. Jared Sumner, the creator of Bun, which if you're not familiar is a JavaScript runtime that Anthropic acquired, recently ported the entire Bun code base from Zig to Rust using Claude code's dynamic workflows, and I was lucky enough to see his workflow live at Code with Claude and Anthropic developer conference this year. We're talking 750 lines of Rust, 11 days from first commit to merge, with 99.8% of the existing test suite passing. This involved hundreds of agents working in parallel with two reviewers on each file. It also involved a fixed loop that ran continuously driving the build and test suite until both came back clean. Not until a human said it was done, but until the test said it was done. Jared didn't review individual lines of code, but he set the direction, defined what success looked like, and the system ran until it got there. The teams actually operating this way are still rare as of now, but they're real and they're getting more common. Writing a spec that's detailed and unambiguous enough for autonomous execution is pretty hard. It requires that you understand the problem deeply enough to anticipate every question that AI doesn't know how to ask. In short, level five rewards exactly the skills that have always separated great engineers from adequate ones. Clear thinking, systems reasoning, and understanding users and constraints. So, what do these levels mean for you? If you're early career, your goal is to get to level three. Not because levels one and two don't matter, they do, but because level three is where you can actually demonstrate something on a resume and in interview. Build a real project that involves LLM API calls. Ship something. Learn what breaks in production. If you're a mid-level engineer, get to level four. Can you design the AI layer of a system? Can you build an ETL pipeline? Can you [music] make cost and reliability trade-offs in production? And regardless of where you are, the engineers who who to be the most valuable over the next three to five years are the ones who can operate at level four and at least understand what level five looks like. In conclusion, the five levels of AI are consumer, practitioner, builder, architect, and orchestrator. Each level changes where your judgment is applied. And the further upstream your judgment operates, the more leverage you have and the harder you are to replace. [music] And that's all I have for you in this video. If you found this useful, hit that like button, hype the video, and subscribe. I post weekly videos on software engineering, AI tools, and career stuff. Thanks for watching and I'll see you in the next one.