System Design was HARD until I Learned these 6 Concepts

Maddy Zhang| 00:09:42|May 31, 2026

Chapters8

The video argues that senior engineers thrive not by memorizing many concepts, but by understanding the underlying principles that let them reason about system design and architecture.

Six practical system-design concepts that transform how you reason about architectures, shared with Maddie Zhang’s clear, real-world examples.

Summary

Maddie Zhang—a senior software engineer with Google and other tech giants—explains that the gap between juniors and seniors isn’t just years of experience but the ability to reason about systems. She emphasizes that memorizing terms like DNS or sharding isn’t enough; you need a mental model built on why decisions matter. The first concept, statelessness, shows why removing local session data lets you scale with interchangeable servers and fault tolerance via shared stores like Redis. Next, caching is framed as a speed-versus-freshness tradeoff, with practical guidance on TTLs, cache-aside patterns, and where to cache (CDNs, Redis, database). The CAP theorem concept reframes consistency and availability as design choices within different parts of a system, rather than a blanket rule. Then she covers message queues, illustrating how asynchronous event-driven flows decouple services to improve reliability (think Kafka or SQS). The fifth concept, databases, clarifies when to use SQL with ACID guarantees versus NoSQL for scalable, flexible schemas, and how most real systems blend both. Finally, API design is presented as a contract—whether REST or GraphQL—with emphasis on explicit versioning, resource-focused endpoints, and upfront documentation. Throughout, Maddie underscores the importance of understanding the “why” behind each concept and applying one concept at a time to real systems like Spotify or Venmo. The video weaves in a sponsor segment for Opera Neon, highlighting its AI-powered research workflows, then circles back to practical advice for interview prep. If you’re studying system design, the takeaway is to build a layered mental model and test it against real-world use cases rather than memorizing lists of tools.

Key Takeaways

Stateless servers enable true horizontal scaling: remove local session data and use a shared store (e.g., Redis) to keep every server interchangeable.
Caching is a speed vs. freshness tradeoff; apply TTLs and cache-aside patterns to balance latency with data staleness across CDNs, Redis, and databases.
CAP theorem reality: partition tolerance is a given; choose where to apply strong or eventual consistency for specific operations (e.g., finances vs. feeds).
Message queues decouple services and improve reliability: publish events (e.g., order placed) and let downstream services process asynchronously (Kafka, SQS).
ACID vs. BASE: use SQL for critical transactions requiring atomicity, consistency, isolation, and durability; use NoSQL for scalable, flexible schemas where slight staleness is acceptable.
Good API design is a contract: prioritize REST for simplicity or GraphQL for flexibility, but always version early, design around resources, and document upfront.

Who Is This For?

Software engineers preparing for system-design interviews or transitioning from junior to senior roles who want a principled, practical framework instead of memorizing jargon.

Notable Quotes

"Stateless means the server doesn't remember anything about a previous request."

—Defines the core idea of statelessness and why it enables interchangeable servers and fault tolerance.

"Your load balancer's hands are tied, your servers aren't actually interchangeable, and if that one server goes down, the session is just gone."

—Illustrates the risk of sticky sessions when state is stored locally.

"The server treats every single request like it's meeting you for the very first time."

—Explains the practical mindset shift needed for stateless design.

"Partition tolerance is essentially a given constraint."

—Reframes CAP theorem by treating partition tolerance as non-optional in real systems.

"Be explicit about versioning from day one. Design endpoints around resources, not internal operations."

—Highlights API design best practices for long-term stability.

Questions This Video Answers

How do you decide when to use SQL with ACID guarantees versus NoSQL in a real app?
What are practical strategies for choosing between REST and GraphQL for an API?
Why is statelessness critical for scalable web architectures?
How can message queues improve reliability in microservice architectures?
What are common caching patterns and where do Redis, CDNs, and database caches fit in?

System designStateless architectureCaching strategiesCAP theoremMessage queuesACID vs NoSQLAPI designREST vs GraphQLInterview prep

Full Transcript

The gap between a junior engineer and a highly paid senior developer isn't years of experience. It's how you think about systems. Senior engineers get promoted, get better offers, and get higher pay because they can reason about architecture, not just write code. And in this video, I'm going to break down the six concepts that completely change how you will understand system design. Hi friends, I'm Maddie. [music] I'm a senior software engineer who previously worked at Google and internet other big tech companies like Amazon, IBM, and Microsoft. When I first started studying system design, I made the classic mistake. I tried to memorize everything. DNS, sharding, CDNs, item potency, but I was building just a glossary, not a mental model. So, I could recite definitions, but I couldn't actually reason through a new problem and would freeze during interviews. What completely changed my approach wasn't learning more concepts. It was understanding the why behind them, the principles that tie everything together. Once that clicked, I stopped blanking interviews and actually started designing systems with confidence. Our first concept is statelessness. Everyone learns horizontal scaling means adding more servers. That is true, but it took me a while to internalize that you can actually really only do that if your servers are stateless. Stateless means the server doesn't remember anything about a previous request. Every request carries everything the server needs to process it, usually via a token that lives on the client side. The server treats every single request like it's meeting you for the very first time. So, why does this matter so much? If your server stores session data locally, like a user's login state or an in-progress cart, you can't just route their next segment to any server anymore. That user has to go back to the specific server that remembers them. That's called a sticky session, and it's a real problem. Your load balancer's hands are tied, your servers aren't actually interchangeable, and if that one server goes down, the session is just gone. When you offload state to a shared store, like Redis or a distributed cache, suddenly every server is identical. Your load balancer can route freely, you can spin servers up and down without losing data, and you get real fault tolerance. So, before you ask, "How do I scale this?" ask, "What state is each server holding and where should it actually live?" Before we get to the next topic, I want to thank the sponsor of this part of the video, Opera Neon. If you're studying system design, you know how much of it is just piecing together lots of research, reading docs, comparing approaches, and trying to synthesize a dozen different sources at once. Opera Neon is a browser built to make working with AI actually feel manageable day-to-day. You get access to multiple AI models directly inside the browser, and you can switch between them depending on the task without bouncing between a bunch of separate tools. The workflow I keep coming back to is deep research. For example, when I was learning about distributed caching, I wanted to compare Redis versus Memcached across latency tradeoffs, eviction strategies, and how each one holds up at scale. Normally, that's 10 tabs and a bunch of context switching. With Neon, I can just run a research session right in the browser, and it then runs a multi-agent parallel investigation to bring information from all my sources, creating an incredibly detailed and in-depth report for me. There are also cards, save shortcuts for tasks you do repeatedly. I have one set up for summarizing technical documentation, which comes up constantly when I'm trying to quickly understand a new framework, API, or model release without losing the important implementation details. And beyond just the product, Neon has a Discord community where you can share AI workflows with other people and actually shape how the browser evolves. Here, you can get early access to updates and direct conversations with the team building it. It's $20 a month includes access to all of those models in one place. The link to check it out is in the description. All right, let's get back to the concepts. Now, let's talk about our second concept, caching. It shows up everywhere in system design, your browser, your CDN, your app server, and your database. And it can feel like a dozen separate things to learn. But every cache is essentially making the same tradeoff, speed in exchange for freshness. You're storing a copy of data somewhere faster and closer to where it's needed, and accepting that the copy might be slightly out of date. When that clicked for me, I stopped memorizing when to use Redis or CDN versus browser cache, and instead started asking things like where's the bottleneck and how stale can we afford the data to be. CDNs serve static content close to users geographically and are great for assets that change rarely. Application-level caches like Redis sit in front of your database and are great for read-heavy queries where you need millisecond latency. And database query caches live inside the database itself. So, make sure you get comfortable with concepts like TTLs, cache-aside patterns, and write-throughs versus write-back strategies. Understanding these three things will cover the vast majority of caching questions in interviews. Concept three is CAP theorem. Most people learn CAP theorem as you can only have two out of the three of consistency, availability, and partition tolerance. But, in reality, partition tolerance isn't optional. In any real distributed system running over a network, partitions will happen. Nodes will lose connectivity and packets will drop. So, partition tolerance is essentially a given constraint. It's not a choice. What you're actually choosing between is now consistency and availability. Consistency means every read gets the most recent write. So, if I update my profile, anyone who reads it right after sees the updated version. Availability means the system always responds even if it can't guarantee it's the very latest data. But, for all practical intents and purposes, most systems don't need to make this choice globally. Different parts of the same system can make different decisions. At Google, there were services where strong consistency was completely non-negotiable: financial transactions, access control, and permissions. And there were other features where eventual consistency was totally fine, like content feeds. So, in a system design interview, don't just say, "I'll go with eventual consistency." Instead, explain which specific operations need which guarantees and why. Concept four is message queues. Let's say a user places an order. In a synchronous world, that means this: Place order, call the inventory service, call the payment service, call the notification service, all in real time, all before the user gets a response. However, in this scenario, you've created a chain of dependencies. If the notification service is slow or temporarily down, the entire order flow fails. Your system's end-to-end reliability is now the product of every dependency's uptime. Message queues break that dependency. Instead of calling services directly, you publish an event like order placed. Something like Kafka or SQS holds that event. Each downstream service, inventory, payment, notifications, picks it up and processes it independently on its own schedule. The result is that your services are resilient to each other's failures. If the notification service goes down, the message just waits in the queue and gets processed when it comes back up. The order still goes through. Concept five is databases. The SQL versus NoSQL debate gets framed as if it's about old versus new or simple versus scalable, but it's really not. It's about what guarantees your data layer needs to make. SQL databases are built around a concept called ACID: atomicity, consistency, isolation, and durability. These four properties are worth actually knowing instead of just recognizing this acronym. Atomicity means a transaction is all or nothing. If you're transferring money between two accounts, debiting one and crediting the other, either both operations succeed or neither does. You never want to end up in a state where the money left one account but didn't arrive in the other. Consistency means the database always moves from one valid state to another. Your schema constraints, foreign keys, and rules are always enforced, and you can't write data that violates them. Isolation means concurrent transactions don't interfere with each other. So, two users hitting the database at the same time see a clean, predictable result as if the transactions ran one after another. And durability means once a transaction is committed, it stays committed even if the server crashes right after. It's on disk and permanent. NoSQL databases, on the other hand, trade some of these guarantees for scale and flexibility. They drop the strict schema, often relax consistency, and that's specifically what lets them scale horizontally across many machines so easily. That trade-off is a feature and can be great for the right use case. So, the actual question here isn't if we should use SQL versus NoSQL. It's does my application need ACID guarantees? Things like financial transactions, inventory systems, anything where partial rights are catastrophic that you would use SQL for. Use your activity feeds, product catalogs, real-time analytics at massive scale where slightly stale data is fine means that you can often use NoSQL. A lot of real production systems usually use both with each handling the part of the workload it's suited for. And last but not least, let's talk about API design. Your API is a contract with every client that depends on it. Once you ship it, changing it isn't a code update. It's a coordinated migration affecting everyone downstream. Now, let's talk about REST versus GraphQL. They optimize for different things. REST is simple, cacheable, and easy to reason about and is great for things like public APIs, mobile clients, and teams that need a stable, well-documented interface. GraphQL is flexible and lets clients request exactly what they need, which is great when you have many clients with different data needs like a web app and a mobile app hitting the same back-end. But regardless of which approach you choose, the principle of good API designs are the same. Be explicit about versioning from day one. Design endpoints around resources, not internal operations, and document your contracts before someone else has to reverse engineer them. In conclusion, those were the six concepts that made system design click for me. If you're actively preparing for system design interviews right now, my advice is to take one of these concepts at a time and apply it to a real system used daily. So, for example, how does Spotify use caching across different layers? When does Venmo need strong consistency versus when can it relax? And that's all I have for you in this video. If you want more content on systems, technical interviews, and navigating a career in software engineering, give this a like, hype the video, and subscribe. Thanks for watching, and I'll see you next one.