System Design Mastery — Study Guide (SE → Staff/Architect)

ⓘ About this guide · legend · how to use

Depth-first study system. Follow phases in order; each topic is a study card with priority, depth, learning material (or a deep-dive map), curated resources, and a mastery check. In the HTML build these become real checkboxes with per-section progress saved in your browser.

Built for Staff/Architect, not just Senior: beyond knowing the tech, the guide trains decision-making under uncertainty. Phase 0 (Architectural Thinking) up front gives you the decision lens; flagship deep-dive cards carry ⚖️ when-NOT / why-not-the-alternative, 🎤 interview probes, and ⚠️ common mistakes; Phase 8 (Operability & Judgment) adds capacity, cost, system evolution, failure studies, and design-review. Apply the Phase-0 Decision Lens to every deep-dive.

Phase order: 0 (mindset) → 1 Foundations → 2 LLD → 3 Storage/DBs → 4 Distributed Systems & Architecture → 5 Big Data → 6 AI Eng I → 7 AI Eng II → 8 (Operability & Judgment) → 9 Practice & Strategy.

Legend

Progress: [ ] not started · [~] learning · [x] mastered
Priority (for Staff/Architect + top product-company interviews): 🔴 High (must own) · 🟡 Medium (important depth) · 🟢 Low (niche / nice-to-have)
Depth: 🟢 Fundamental · 🟡 Intermediate · 🔴 Deep-dive
Resources: 📄 blog/doc · 📜 paper · 🎥 video · 🛠️ hands-on · 📖 book

Build status: ✅ Complete (v2.1, Staff-level) — Phases 0–9 fully carded. Includes the decision lens + quick decision matrix, flagship-card ⚖️/🎤/⚠️/🚩 red-flags, architecture-evolution case studies, operational close-out in the interview playbook, and core-vs-enrichment guidance. Ready for review → then interactive HTML.

0Architectural Thinking & Decision-Making (the Staff mindset)

Staff/Architect interviews are reasoning exams, not knowledge exams. You're rarely asked "explain Raft" — you're asked "would you use Raft here? why not Kafka? why not Postgres? what failure modes are you accepting?" This phase is the lens you apply to every technical card that follows. Study it first, then re-apply it to each deep-dive.

Architectural Principles HIGHDEEP-DIVE

Learn (inline): the heuristics that recur in every design discussion —

Optimize for change — requirements will change; design for what's likely to move (the "hard to reverse" decisions get the most thought).
Minimize coordination — coordination (locks, consensus, distributed txns, cross-service sync) is the enemy of scale & availability; push work to be local/async/idempotent.
Isolate failure domains — blast-radius thinking: bulkheads, cells/shards, per-tenant isolation, so one failure ≠ total outage.
Reduce coupling, raise cohesion — clear ownership boundaries; teams and services own their data.
Observability first — you can't operate what you can't see; instrument before you scale.
Simplicity scales — the boring, well-understood solution usually wins; complexity is a cost you pay forever.
Optimize the bottleneck, not everything — measure, find the constraint (Amdahl/Little), fix that; ignore the rest.
Make it correct, then fast, then cheap — but know the cost curve early.

Resources: 📖 Fundamentals of Software Architecture (Richards & Ford); 📄 Martin Fowler — architecture; 📖 The Pragmatic Programmer.

✅ You know it when: in any design you can name which principle a choice serves and which you're trading away.

The Decision Lens (apply to every deep-dive) HIGHDEEP-DIVE

Learn (inline): the reusable reasoning template that turns knowledge into staff-level judgment. For any technology/approach, be able to answer:

What problem does it solve (and what did people do before)?
When to use it — the sweet spot.
When NOT to use it — the anti-patterns (this is the senior→staff separator).
What's the better alternative here, and why ("why not X?" — have a crisp answer).
What failure modes / tradeoffs am I accepting (consistency? cost? ops burden?).
How does it fail, and how do I detect/limit it?
Apply this to every deep-dive card (Redis, Kafka, Cassandra, RAG, agents…). The flagship cards below are annotated with ⚖️ When NOT / why-not, 🎤 interview probes, and ⚠️ common mistakes to model it.

Resources: 📖 DDIA (every chapter is tradeoffs); 🎥 mock Staff interviews (watch the "why not X?" follow-ups).

✅ You know it when: for any component you place, you can immediately answer "why this and not the obvious alternative, and what am I accepting?"

Tradeoff Axes Catalog HIGHINTERMEDIATE

Learn (inline): the recurring dimensions you trade along — name them explicitly in interviews:

Consistency ↔ Availability/Latency (CAP/PACELC) · Strong ↔ Eventual
Latency ↔ Throughput · Latency ↔ Cost · Read-optimized ↔ Write-optimized
Complexity ↔ Flexibility · Coupling ↔ Autonomy · Coordination ↔ Independence
Freshness ↔ Cost (caching) · Normalization ↔ Denormalization
Build ↔ Buy · Generalize ↔ Specialize · Now ↔ Later (tech debt)
Fail-open ↔ Fail-closed (availability vs safety)

✅ You know it when: you instinctively frame a choice as "A vs B along axis X; I pick A because the requirement prioritizes Y."

Technology Decision Matrix (when to use / when NOT / instead-use) HIGHDEEP-DIVE

Learn (inline): carry a crisp "reach for it when / avoid when / instead use" for the common building blocks. Sample (fill/expand as you study):

Postgres/MySQL — use for relational data, transactions, ad-hoc queries; avoid for >single-node write throughput or schemaless scale → then shard (Vitess/Citus) or wide-column.
Redis — use for cache, sessions, leaderboards (zset), rate limiting, ephemeral fast state; avoid as a system of record / for data that can't be lost → durable DB.
Cassandra/wide-column — use for massive write throughput, known partition-key access, time-series; avoid for ad-hoc queries, joins, strong multi-key transactions → RDBMS.
DynamoDB — use for predictable KV access at scale, serverless; avoid for analytics/complex queries or when access patterns are unknown → RDBMS/warehouse.
Kafka — use for high-throughput event streaming, replay, multiple consumers, log/CDC; avoid for simple task queues or per-message ack/priority/delay → SQS/RabbitMQ.
SQS/RabbitMQ — use for task queues, work distribution, RPC-ish messaging; avoid for replay/high-fanout/streaming → Kafka.
Elasticsearch — use for full-text/faceted search & log analytics; avoid as primary store (not a source of truth).
Spanner/CockroachDB — use for global, strongly-consistent SQL; avoid if single-region + cost-sensitive → Postgres.
gRPC — internal service-to-service, streaming, strict contracts; REST/GraphQL — public/flexible clients.
Microservices — use when team/scale autonomy pain is real; avoid early → modular monolith.
Vector DB — use for semantic/RAG retrieval; avoid when keyword/BM25 suffices → Elasticsearch.

Quick matrix (internalize these):

Need	Choose	Avoid	Why
Strong consistency + txns + ad-hoc queries	PostgreSQL/MySQL	Cassandra	relational + ACID; wide-column has no joins/multi-key txns
Massive write throughput, known partition access	Cassandra	single-node MySQL	LSM + leaderless scales writes; RDBMS write-caps
Predictable KV at scale, managed/serverless	DynamoDB	self-managed Cassandra	managed autoscale — but bake in access patterns
Global, strongly-consistent SQL	Spanner/CockroachDB	single-node Postgres	TrueTime/consensus; pay latency + $
Event replay · multi-consumer · high throughput	Kafka	RabbitMQ/SQS	retained ordered log vs transient queue
Task queue · per-msg ack/delay/priority	SQS/RabbitMQ	Kafka	simple managed queue; Kafka has no per-msg priority/delay
Low-latency cache / ephemeral fast state	Redis	DynamoDB/DB	in-memory + rich data structures
Full-text / faceted search	Elasticsearch	primary DB	inverted index; not a source of truth
Semantic / similarity retrieval (RAG)	Vector DB (pgvector/Pinecone)	keyword-only search	embeddings + ANN (use hybrid)
Internal service↔service, streaming, contracts	gRPC	REST	HTTP/2 + protobuf + streaming
Public / flexible client API	REST/GraphQL	gRPC	browser/partner friendliness
Analytics / aggregations over big data	Columnar warehouse (BigQuery/Snowflake)	OLTP row store	scan/aggregate-optimized
Team & scale autonomy pain is real	Microservices	premature split	else modular monolith first

✅ You know it when: you can answer "why not \<popular alternative>?" for each row in one sentence.

Deep-Dive Priority Tiers (allocate your time) HIGHFUNDAMENTAL

Learn (inline): there are many deep-dives — study them in this order (shift by your track):

Tier 1 — must master (do first): Caching · Replication · Sharding/Consistent hashing · CAP & Consistency models · LSM vs B-tree storage engines · Consensus (Raft) · Kafka · Redis · Load balancing · Resilience patterns · (AI track: RAG, LLM serving).
Tier 2 — know well: Cassandra · DynamoDB · Distributed transactions/Sagas · Stream processing (Flink) · API design · Rate limiting · Vector DBs · Agents · MongoDB · Spanner/TrueTime.
Tier 3 — specialized / 🎓 enrichment: ZippyDB · Graph DBs · Time-series DBs · HBase/BigTable · Fine-tuning internals (LoRA/QLoRA) · niche storage engines.

✅ You know it when: you've mastered all of Tier 1, know Tier 2 well enough to design with them, and can speak to Tier 3 at the principle level.

Learning Threads (topics are interconnected) MEDIUMFUNDAMENTAL

Learn (inline): system design isn't isolated facts — trace these chains and you'll see how a single design pulls in many topics:

Rate limiter → Redis → consistency → observability
Kafka → delivery semantics → Sagas → Outbox → idempotency
Caching → replication → consistency → CDN → invalidation → hot keys
Sharding → consistent hashing → replication → quorums → CAP
Estimation → capacity planning (Little's Law) → bottlenecks → cost
Failure studies → resilience patterns → design review → observability
LLD (SOLID/patterns) → API design → HLD components
RAG → embeddings → vector DB/ANN → chunking → reranking → evals → cost/caching
Agents → tool use/MCP → guardrails → evals → observability → cost

✅ You know it when: given any one topic you can name the 3–4 topics it naturally connects to.

The Standard Deep-Dive Template MEDIUMFUNDAMENTAL

Learn (inline): study every major technology through the same 9-part lens (the flagship cards — Kafka, Redis, Cassandra, DynamoDB — model it; apply it to any deep-dive):

Why it exists (what came before / problem solved) · 2. Internal mechanics · 3. Tradeoffs · 4. When to use · 5. When NOT to use / why-not-the-alternative (⚖️) · 6. Interview traps & 🚩 red flags · 7. Common production failures · 8. Design scenarios (where it shows up) · 9. Mastery check (✅).

✅ You know it when: you can produce all 9 sections from memory for any Tier-1/2 technology.

1Foundations: Threading, Concurrency & Computer Networks

The bedrock. Everything about scale, consistency, and performance builds on these. Assumes basic programming; pick one language (Java/Go/C++) to ground concurrency.

1a — Threads & Concurrency

Process vs Thread vs Coroutine HIGHFUNDAMENTAL

What & why: The units of execution you scale with. Confusing them leads to wrong scaling models (e.g. thread-per-request vs async).

Learn (inline):

Process — own address space, isolated; expensive to create; IPC needed to communicate. Crash isolation.
Thread — shares the process address space (heap, file descriptors); cheap-ish; communicates via shared memory (needs synchronization). A crash/segfault can take down the process.
Coroutine / green thread / fiber — user-space "thread" scheduled by a runtime, not the OS (Go goroutines, Kotlin coroutines, Java virtual threads, Python asyncio). Millions can exist; cheap context switches; great for I/O-bound concurrency.
Context switch cost: process > OS thread > coroutine. OS threads switch via the kernel (µs + cache/TLB effects); coroutines switch in user space (ns–100s ns).
Mental model: CPU-bound work → parallelism across cores (threads/processes). I/O-bound work → concurrency (coroutines/async) beats piling up OS threads.

Resources:

📄 Jenkov — Java Concurrency and Multithreading: https://jenkov.com/tutorials/java-concurrency/index.html
🎥 Hussein Nasser — Process vs Thread (YouTube @hnasser)
📖 OSTEP (free) — Concurrency chapters: https://pages.cs.wisc.edu/~remzi/OSTEP/

✅ You know it when: you can explain why a Go service handles 100k concurrent connections cheaply while a thread-per-request server would exhaust memory.

Memory Model & Visibility MEDIUMINTERMEDIATE

What & why: Why one thread's write may be invisible to another without synchronization — the root of subtle concurrency bugs.

Learn (inline):

CPUs/compilers reorder instructions and cache values in registers/per-core caches. Without a memory barrier, thread B may never see thread A's write, or see writes out of order.
happens-before is the ordering guarantee: a lock release happens-before the next acquire; a volatile/atomic write happens-before a subsequent read of it.
volatile (Java) / atomic guarantees visibility + ordering for that variable, but not compound atomicity (count++ is still a race).
Same idea across languages: Java Memory Model, C++ std::memory_order, Go's memory model.

Resources:

📄 Jenkov — Java Memory Model: https://jenkov.com/tutorials/java-concurrency/java-memory-model.html
📄 The Go Memory Model: https://go.dev/ref/mem

✅ You know it when: you can explain why a flag written by one thread needs volatile/atomic to be reliably seen by another, and why volatile still doesn't make i++ safe.

Synchronization Primitives HIGHINTERMEDIATE

What & why: The tools to coordinate threads over shared state. Interviewers probe when to use which.

Learn (inline):

Mutex / lock — mutual exclusion; only one holder. Guards a critical section.
Read-write lock — many concurrent readers or one writer; win when reads ≫ writes (watch for writer starvation).
Semaphore — a counter permitting N concurrent holders; used for resource pools / rate limiting concurrency (e.g. "max 10 in-flight DB calls").
Condition variable — wait/notify: sleep until a predicate holds (classic producer-consumer). Always re-check the predicate in a loop (spurious wakeups).
Monitor — lock + condition bundled (Java synchronized + wait/notify).
Gotchas: hold locks briefly; consistent lock ordering (deadlock); prefer higher-level concurrent collections/executors over hand-rolled locks.

Resources:

📄 Jenkov — locks, read-write locks, semaphores: https://jenkov.com/tutorials/java-concurrency/index.html
📖 Java Concurrency in Practice (Goetz) — the canonical book.
📖 OSTEP — Locks, Condition Variables, Semaphores chapters.

✅ You know it when: you can pick the right primitive for "pool of 10 connections", "cache read-mostly", and "producer waits for buffer space", and explain why each.

Lock-Free Programming (CAS, atomics, lock-free queues) MEDIUMDEEP-DIVE

What & why: How to build concurrent structures without locks for lower latency / no deadlock — staff-relevant for hot paths.

Learn (inline):

Compare-And-Swap (CAS) — atomic "set X to B only if it's currently A"; the hardware primitive under lock-free algorithms. Retried in a loop.
Atomic references / counters — AtomicInteger, AtomicReference; wait-free reads, CAS-based updates.
Lock-free queue (e.g. Michael-Scott) — enqueue/dequeue via CAS; no thread blocks another.
ABA problem — value goes A→B→A and CAS wrongly succeeds; fixed with versioned/tagged pointers or hazard pointers.
When: ultra-low-latency, high-contention hot paths. Otherwise prefer locks/concurrent libs — lock-free is hard to get right.

Resources:

📄 Jenkov — Compare and Swap / Non-blocking algorithms: https://jenkov.com/tutorials/java-concurrency/compare-and-swap.html
📜 Michael & Scott — Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms
🎥 CppCon talks on lock-free programming (Herb Sutter, "atomic<> Weapons")

✅ You know it when: you can implement an atomic counter with a CAS loop and explain the ABA problem.

Deadlocks & Livelocks HIGHFUNDAMENTAL

What & why: Classic failure modes; interviewers love the four conditions + prevention.

Learn (inline):

Deadlock — threads wait forever on each other. Coffman conditions (all four needed): mutual exclusion, hold-and-wait, no preemption, circular wait.
Prevention: impose a global lock ordering; use lock timeouts / tryLock; avoid holding multiple locks; reduce lock scope.
Livelock — threads keep changing state in response to each other but make no progress (two people stepping aside in a hallway). Fix with randomized backoff.
Starvation — a thread never gets the resource (e.g. writer under a read-heavy RW lock); fairness policies help.

Resources:

📖 OSTEP — Deadlock section (Concurrency).
📄 Jenkov — Deadlock & Deadlock Prevention: https://jenkov.com/tutorials/java-concurrency/deadlock.html

✅ You know it when: you can spot a lock-ordering deadlock in code and fix it by ordering, and distinguish deadlock vs livelock vs starvation.

Blocking vs Non-Blocking I/O (and the event loop) HIGHINTERMEDIATE

What & why: The core reason Node/Nginx/Netty scale connections cheaply; central to "how many concurrent users can one box hold?"

Learn (inline):

Blocking I/O — a thread parks until data is ready → thread-per-connection; simple but memory/scheduler cost caps concurrency (the C10K problem).
Non-blocking + readiness selection — epoll (Linux) / kqueue (BSD) / IOCP (Windows): one thread watches thousands of sockets, handling whichever is ready → event loop (Node, Nginx, Netty, Redis).
Async models: callbacks → promises/futures → async/await; or virtual threads (Java) / goroutines that make blocking-style code cheap by parking coroutines on a small thread pool.
Rule of thumb: I/O-bound + huge connection counts → event loop / async. CPU-bound → thread pool sized ~#cores.

Resources:

📄 Dan Kegel — The C10K problem: http://www.kegel.com/c10k.html
🎥 Hussein Nasser — epoll / async IO explainers (YouTube @hnasser)
📄 High Performance Browser Networking (free): https://hpbn.co/

✅ You know it when: you can explain how a single-threaded event loop serves 100k connections and when that model hurts (CPU-bound work blocking the loop).

Producer–Consumer, Task Queues & Concurrency Models MEDIUMINTERMEDIATE

What & why: The in-process analog of message queues; how you decouple and bound work.

Learn (inline):

Producer-consumer — bounded blocking queue between producers and consumers; smooths bursts, applies backpressure when full.
Thread pool / executor — fixed workers pull from a task queue; bound the pool to protect downstream (don't spawn unbounded threads).
Concurrency models: shared-memory + locks; actor model (Akka/Erlang — message-passing, no shared state); CSP (Go channels); fork-join (divide & conquer).
Backpressure is the theme: bounded queues + rejection/blocking policies prevent OOM under overload.

Resources:

📄 Jenkov — Blocking Queues / Thread Pools: https://jenkov.com/tutorials/java-concurrency/index.html
📄 Go blog — Concurrency patterns / Pipelines: https://go.dev/blog/pipelines

✅ You know it when: you can design a bounded producer-consumer pipeline that degrades gracefully (backpressure) instead of OOMing under a burst.

Latency Numbers Every Engineer Should Know HIGHFUNDAMENTAL

What & why: The intuition behind every capacity/estimation answer. You must feel the orders of magnitude.

Learn (inline): approximate magnitudes —

L1 cache ~1 ns · L2 ~4 ns · main memory ~100 ns · mutex lock/unlock ~25 ns.
Read 1 MB sequentially from memory ~10 µs · SSD random read ~16 µs–100 µs · read 1 MB from SSD ~100–1000 µs.
Disk seek (HDD) ~1–10 ms · read 1 MB from disk ~1–20 ms.
Same-datacenter round trip ~0.5 ms · cross-region RTT ~50–150 ms · CA↔Netherlands ~150 ms.
Takeaways: memory ≫ SSD ≫ disk; network dominates cross-region; batch/caching exist to avoid the slow tiers.

Resources:

📄 Jeff Dean — Latency Numbers Every Programmer Should Know (jboner gist): https://gist.github.com/jboner/2841832
🛠️ Interactive latency (by year) — https://colin-scott.github.io/personal_website/research/interactive_latency.html

✅ You know it when: you can ballpark "serve from RAM vs SSD vs cross-region call" without looking it up, and use it in estimation.

Compute & OS Essentials MEDIUMFUNDAMENTAL

What & why: Enough OS to reason about performance (cache effects, syscalls, memory).

Learn (inline):

Memory hierarchy & cache lines — locality matters; false sharing hurts multi-threaded perf.
Virtual memory & paging — page cache is why sequential file reads are fast; page faults are costly.
Syscalls & context switches — kernel/user boundary cost; batching syscalls (e.g. io_uring, sendmmsg) helps.
File descriptors & ulimits — connection limits are often fd limits.
CPU: cores, hyperthreads, NUMA — pin/isolate for latency-sensitive work.

Resources:

📖 OSTEP (free): https://pages.cs.wisc.edu/~remzi/OSTEP/
🎥 MIT 6.1810/6.033 lectures (OS)

✅ You know it when: you can explain why sequential I/O + page cache beats random I/O, and why fd limits cap connections.

1b — Computer Networks

OSI & TCP/IP Models MEDIUMFUNDAMENTAL

What & why: The shared vocabulary; lets you place every other networking topic (TLS at L?, LB at L4 vs L7).

Learn (inline):

OSI 7 layers (Physical→Data-link→Network→Transport→Session→Presentation→Application); practical TCP/IP 4-layer (Link→Internet(IP)→Transport(TCP/UDP)→Application(HTTP…)).
What lives where: IP addressing/routing = L3; TCP/UDP ports = L4; TLS ≈ L6/presentation; HTTP = L7. L4 load balancer routes by IP/port; L7 by URL/headers.

Resources:

📄 Cloudflare Learning — OSI model: https://www.cloudflare.com/learning/ddos/glossary/open-systems-interconnection-model-osi/
🎥 Hussein Nasser — Fundamentals of Networking series

✅ You know it when: you can say which layer TLS, HTTP, and an L4 vs L7 load balancer operate at, and why it matters.

TCP Internals (and TCP vs UDP) HIGHINTERMEDIATE

What & why: TCP underpins most services; its behavior explains latency, head-of-line blocking, and connection cost.

Learn (inline):

3-way handshake (SYN/SYN-ACK/ACK) — 1 RTT setup before data; +TLS adds more RTTs (1 in TLS 1.3).
Reliability & ordering — sequence numbers, ACKs, retransmission.
Flow control (receiver window) vs congestion control (sender-side: slow start, congestion avoidance, CUBIC/BBR).
Head-of-line blocking — one lost segment stalls the ordered stream (motivates QUIC).
Cost — connections are stateful/expensive → connection pooling, keep-alive.
TCP vs UDP — TCP = reliable, ordered, congestion-controlled, heavier; UDP = fire-and-forget, no ordering/guarantees, low overhead (DNS, video, gaming, QUIC).

Resources:

📄 High Performance Browser Networking — TCP chapter (free): https://hpbn.co/building-blocks-of-tcp/
🎥 Hussein Nasser — TCP/IP deep dive

✅ You know it when: you can explain why many short connections are slow (handshake + slow start) and how keep-alive/pooling fix it, plus when to choose UDP.

UDP & QUIC MEDIUMINTERMEDIATE

What & why: QUIC (HTTP/3's transport) is the modern answer to TCP's limitations; increasingly asked.

Learn (inline):

UDP — connectionless datagrams, no reliability/order/congestion by default; you add what you need on top.
QUIC — runs over UDP; combines transport + TLS 1.3; 0-RTT/1-RTT connection setup; per-stream delivery eliminates TCP's head-of-line blocking; connection migration (survives IP change, e.g. Wi-Fi→cellular).
HTTP/3 = HTTP over QUIC.

Resources:

📄 Cloudflare — HTTP/3 & QUIC: https://blog.cloudflare.com/http3-the-past-present-and-future/
📜 RFC 9000 (QUIC)

✅ You know it when: you can explain what QUIC fixes over TCP+TLS (HOL blocking, setup RTTs, migration) and when it helps.

DNS & TLS HIGHINTERMEDIATE

What & why: Every request starts with DNS; every secure request does a TLS handshake. Both add latency and are common design levers (GeoDNS, TLS termination).

Learn (inline):

DNS — resolver → root → TLD → authoritative; caching + TTLs; record types (A/AAAA/CNAME/MX); anycast & GeoDNS route users to nearest POP; DNS as a crude load-balancer/failover.
TLS — handshake negotiates cipher + establishes keys; certificates + chain of trust (CA); TLS 1.3 = 1-RTT (0-RTT resumption); mTLS authenticates both sides (service-to-service). TLS termination at LB/edge offloads crypto.

Resources:

📄 howdns.works (illustrated): https://howdns.works/
📄 Cloudflare — What happens in a TLS handshake: https://www.cloudflare.com/learning/ssl/what-happens-in-a-tls-handshake/

✅ You know it when: you can trace a URL from DNS resolution → TCP → TLS → HTTP, and explain GeoDNS routing and TLS termination.

HTTP (1.1 / 2 / 3) + REST / gRPC / GraphQL (deep dive) HIGHDEEP-DIVE

Why deep-dive: The application-layer contract for almost everything; API-style choice is a recurring design decision.

Deep-dive map:

[ ] HTTP basics — methods, status codes, headers, idempotency/safety, caching headers (ETag, Cache-Control).
[ ] HTTP/1.1 vs 2 vs 3 — persistent connections & pipelining limits (1.1); multiplexing + header compression (2) but TCP HOL blocking remains; HTTP/3 over QUIC removes HOL.
[ ] REST — resources, verbs, statelessness, versioning, pagination, HATEOAS (in theory).
[ ] gRPC — HTTP/2 + Protobuf, streaming (uni/bi-directional), strong contracts; great for internal service-to-service.
[ ] GraphQL — client-specified queries, one endpoint, over/under-fetching fix; N+1 & caching challenges.
[ ] Choosing — public API (REST/GraphQL) vs internal RPC (gRPC); latency, tooling, streaming needs.

Resources:

📄 MDN — HTTP: https://developer.mozilla.org/en-US/docs/Web/HTTP
📄 High Performance Browser Networking — HTTP/2 chapter (free): https://hpbn.co/http2/
📄 gRPC docs: https://grpc.io/docs/what-is-grpc/introduction/
🎥 Hussein Nasser — HTTP/1.1 vs 2 vs 3, gRPC

✅ You know it when: you can justify REST vs gRPC vs GraphQL for a given service and explain what HTTP/2 multiplexing and HTTP/3 (QUIC) each solve.

Realtime Transports: WebSockets, SSE, Long/Short Polling HIGHINTERMEDIATE

What & why: How you push server→client updates (chat, notifications, live dashboards, streaming LLM tokens). A very common design sub-decision.

Learn (inline):

Short polling — client asks every N s; simple, wasteful, laggy.
Long polling — server holds the request until data or timeout; near-realtime over plain HTTP; connection churn.
SSE (Server-Sent Events) — one-way server→client stream over HTTP; auto-reconnect, simple; perfect for feeds / streaming LLM tokens. Text only, one direction.
WebSockets — full-duplex, persistent TCP; best for chat/games/collab; needs stateful connection management, scaling via pub/sub fan-out.
Choosing: need bidirectional? WebSocket. Server→client only? SSE. Simplest/occasional? (long) polling.

Resources:

📄 Ably — WebSockets vs SSE vs long polling: https://ably.com/blog/websockets-vs-sse
📄 MDN — Server-Sent Events: https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events

✅ You know it when: you can pick the right transport for a chat app vs a live price ticker vs streaming LLM responses, and explain how to scale WebSockets with a pub/sub backplane.

Back-of-the-Envelope Estimation HIGHFUNDAMENTAL

What & why: The step that sizes every design (servers, storage, bandwidth, cache). Interviewers expect fluent numbers.

Learn (inline):

Method: DAU → requests/day → QPS (÷86,400; peak ≈ 2–10× average) → per-request CPU/IO → #servers. Storage = objects × size × retention × replication. Bandwidth = QPS × payload.
Handy numbers: 1 day ≈ 86,400 s (~10⁵); 1M req/day ≈ ~12 QPS; char=1 byte, typical row/KV ~ hundreds of bytes; 2^10≈10³, 2^20≈10⁶, 2^30≈10⁹.
Read/write ratio drives caching & replica strategy; hot vs cold data drives tiering.
Always state assumptions out loud; round aggressively.

Resources:

📄 "Numbers Everyone Should Know" (Jeff Dean) + Alex Xu — System Design Interview Vol 1, estimation chapter.
🎥 ByteByteGo — Back-of-the-envelope estimation (YouTube @ByteByteGo)

✅ You know it when: given "design Twitter", you can quickly estimate QPS, storage/day, and cache size with stated assumptions.

2Low-Level Design & Design Patterns

Clean, extensible, testable code design + machine-coding rounds. Learned early (before big systems) because HLD components are built from well-designed classes. For Staff/Architect: emphasis on tradeoffs, extensibility, and knowing when NOT to apply a pattern.

2a — Principles

SOLID Principles HIGHFUNDAMENTAL

What & why: The five principles that keep OO code changeable. Interviewers expect you to name the violation and refactor.

Learn (inline):

S — Single Responsibility: a class has one reason to change. (Splitting Invoice printing/persistence/calculation into separate classes.)
O — Open/Closed: open for extension, closed for modification. Add behavior via new subtypes/strategies, not by editing existing code (avoid growing if/switch on type).
L — Liskov Substitution: subtypes must be usable anywhere the base type is, without surprises (the classic Square extends Rectangle violation).
I — Interface Segregation: many small focused interfaces beat one fat interface; clients shouldn't depend on methods they don't use.
D — Dependency Inversion: depend on abstractions, not concretions; high-level modules shouldn't depend on low-level ones (enables DI/testing).

Resources:

📄 Baeldung — SOLID Principles: https://www.baeldung.com/solid-principles
📄 Robert C. Martin (Uncle Bob) — original SOLID articles / Clean Architecture (book)
🎥 Christopher Okhravi — SOLID playlist (YouTube)

✅ You know it when: given a fat class or a type-switch, you can name which SOLID principle it breaks and refactor it.

Composition vs Inheritance HIGHFUNDAMENTAL

What & why: Overusing inheritance creates rigid, fragile hierarchies; composition is the flexible default ("favor composition over inheritance").

Learn (inline):

Inheritance = "is-a"; couples subclass to base implementation; deep hierarchies are brittle (fragile base class).
Composition = "has-a"; assemble behavior from parts (Strategy/Decorator lean on this); swap at runtime; easier to test.
Rule: use inheritance for true is-a + stable base; otherwise compose. Prefer interfaces for polymorphism.

Resources:

📄 Refactoring.Guru — Composition over Inheritance: https://refactoring.guru/design-patterns/composite (and the intro essays)
🎥 Christopher Okhravi — Composition over Inheritance

✅ You know it when: you can convert a brittle subclass explosion (e.g. FlyingDuck, RubberDuck…) into composed behaviors.

DRY · KISS · YAGNI · Cohesion & Coupling MEDIUMFUNDAMENTAL

What & why: The everyday heuristics for maintainable design; cohesion/coupling is the structural quality metric.

Learn (inline):

DRY — one authoritative place for each piece of knowledge (but beware wrong abstraction; some duplication is cheaper than the wrong DRY).
KISS / YAGNI — simplest thing that works; don't build for imagined futures.
Cohesion — how focused a module is (high = good). Coupling — how dependent modules are (low = good). Aim for high cohesion, low coupling; it's what makes systems changeable and testable.

Resources:

📄 Martin Fowler — Reducing Coupling, BeckDesignRules: https://martinfowler.com/

✅ You know it when: you can critique a module as "low cohesion / high coupling" and propose a split that fixes it.

2b — Design Patterns

Creational Patterns HIGHINTERMEDIATE

Learn (inline):

Factory Method / Abstract Factory — create objects without hard-coding concrete classes; decouple construction from use (e.g. PaymentFactory.create("stripe")).
Builder — construct complex objects step-by-step; avoids telescoping constructors (e.g. HttpRequest.builder()...build()).
Singleton — one instance, global access; make it thread-safe (enum / holder idiom / double-checked locking); use sparingly (hidden global state, hard to test).
Prototype — clone an existing object instead of building anew (expensive-to-create objects).

Resources:

📄 Refactoring.Guru — Creational patterns: https://refactoring.guru/design-patterns/creational-patterns
📖 Head First Design Patterns (very approachable)

✅ You know it when: you can choose Factory vs Builder for a given construction problem and write a thread-safe Singleton.

Structural Patterns HIGHINTERMEDIATE

Learn (inline):

Adapter — make incompatible interfaces work together (wrap a legacy/3rd-party API).
Decorator — add responsibilities dynamically by wrapping (e.g. BufferedInputStream, add-ons on a coffee order); composition-based, avoids subclass explosion.
Facade — a simple front over a complex subsystem.
Proxy — stand-in controlling access (lazy loading, caching, access control, remote proxy/RPC stub).
Composite — treat individual objects and groups uniformly (file/folder trees, UI components).

Resources:

📄 Refactoring.Guru — Structural patterns: https://refactoring.guru/design-patterns/structural-patterns

✅ You know it when: you can explain how Decorator adds behavior without inheritance and where a Proxy adds caching/access control.

Behavioral Patterns HIGHINTERMEDIATE

Learn (inline):

Strategy — swap interchangeable algorithms at runtime (payment methods, sorting, pricing rules); the go-to for Open/Closed.
Observer — publish/subscribe within a process; notify dependents on state change (event listeners, model→view).
State — object changes behavior with internal state (order lifecycle, vending machine); removes big state switch.
Command — encapsulate a request as an object (undo/redo, queues, task scheduling).
Chain of Responsibility — pass a request along handlers until one handles it (middleware, validation pipelines).
Iterator / Template Method — traverse without exposing internals / define an algorithm skeleton with overridable steps.

Resources:

📄 Refactoring.Guru — Behavioral patterns: https://refactoring.guru/design-patterns/behavioral-patterns
🎥 Christopher Okhravi — Design Patterns playlist

✅ You know it when: you reach for Strategy/State/Observer by name when a design shows the smell they cure.

UML & Modeling for LLD LOWFUNDAMENTAL

What & why: Communicating a design fast in interviews and docs.

Learn (inline): class diagrams (association/aggregation/composition/inheritance, multiplicity), sequence diagrams (interaction over time). Translate requirements → nouns (classes) + verbs (methods) + relationships.

Resources: 📄 Refactoring.Guru UML basics; 🛠️ PlantUML (text-to-diagram): https://plantuml.com/

✅ You know it when: you can sketch a class + sequence diagram for a small system in minutes.

Concurrency in LLD MEDIUMINTERMEDIATE

What & why: Many LLD problems (rate limiter, in-memory KV, logger, parking lot) require thread-safety.

Learn (inline): identify shared mutable state; guard with the right primitive (Phase 1); prefer immutable objects and concurrent collections; thread-safe Singleton; producer-consumer for async logging. State invariants that must hold under concurrency.

Resources: 📖 Java Concurrency in Practice; 📄 Jenkov concurrency.

✅ You know it when: you can make a shared counter/logger/rate-limiter thread-safe and justify the primitive.

API Design (LLD contract) HIGHINTERMEDIATE

What & why: The interface is the contract; good APIs are the difference between reusable and painful components/services.

Learn (inline): resource naming & verbs; versioning (URI vs header); pagination (offset vs cursor — cursor for large/changing sets); idempotency (keys on writes); consistent error contracts (codes + machine-readable body); filtering/sorting; rate-limit headers; backward compatibility (additive changes).

Resources:

📄 Google — API Design Guide: https://cloud.google.com/apis/design
📄 Microsoft REST API Guidelines: https://github.com/microsoft/api-guidelines
📄 Stripe/GitHub public API docs (exemplars)

✅ You know it when: you can design a versioned, paginated, idempotent REST resource with a clean error contract.

2c — 🧩 Machine-Coding / LLD Problems

LLD problem practice HIGHDEEP-DIVE

How to approach each: clarify requirements → identify entities/classes & relationships → apply SOLID + the right pattern(s) → define interfaces/APIs → handle concurrency & edge cases → note extensibility.

Problems (build these):

Core: Parking lot · Elevator system · Vending machine · ATM · Snake & Ladder · Tic-Tac-Toe · Chess.
Product-like: URL shortener · Splitwise · BookMyShow (booking + seat locking) · Ride-hailing · Food-ordering cart · Notification service · Meeting scheduler.
Systems-flavored: LRU cache · Rate limiter · In-memory KV store · Logger library · File system · Text editor.

Resources:

📄 GitHub — awesome-low-level-design (ashishps1): https://github.com/ashishps1/awesome-low-level-design
📄 Educative — Grokking the Low Level Design / OOD Interview
🎥 Gaurav Sen / Arpit Bhayani — LLD walkthroughs

✅ You know it when: you can go from a 1-line prompt (e.g. "design a parking lot") to clean classes + interfaces + patterns + concurrency handling in ~40 min.

3Storage Infrastructure & Databases

How databases work inside, how to choose among them, and how to scale them. This is the densest phase — the per-database deep-dive cards are where a lot of staff-level signal lives. Anchor text: DDIA (Designing Data-Intensive Applications, Kleppmann) — chapters map directly onto these cards.

Core vs enrichment (prioritize principles over implementations): for top product companies (Google/Meta/Stripe/Airbnb/Uber), the principles matter most — storage engines (B-tree vs LSM), transactions/isolation/MVCC, replication & consistency, sharding & caching — plus the canonical papers (Dynamo, Spanner/Bigtable) for the why. Treat individual store deep-dives as 🎓 advanced enrichment: know one of each family deeply (Redis · Cassandra · DynamoDB) and the rest (ZippyDB, MongoDB, graph, extra storage engines) as enrichment — don't memorize every implementation detail.

3a — Storage Engine Internals & Indexing

Storage Engines: B-Tree vs LSM-Tree (deep dive) HIGHDEEP-DIVE

Why deep-dive: The single most leverage-y DB internal — it explains read/write performance, and why Postgres/MySQL feel different from Cassandra/RocksDB.

Deep-dive map:

[ ] B-Tree — in-place updates, balanced tree, ~O(log n) reads/writes, read-optimized; powers most RDBMS (Postgres/MySQL InnoDB). Write-ahead log (WAL) for durability.
[ ] LSM-Tree — buffer writes in an in-memory memtable → flush to immutable sorted SSTables on disk → background compaction merges them. Write-optimized (sequential writes), reads may touch many SSTables (mitigated by bloom filters + level structure).
[ ] Compaction strategies — size-tiered (STCS) vs leveled (LCS); write vs space vs read amplification tradeoffs.
[ ] Read/Write/Space amplification — the three-way tradeoff; how each engine leans.
[ ] When which — read-heavy/range-scan → B-tree; write-heavy/ingest → LSM.

Resources: 📖 DDIA ch.3; 📜 O'Neil et al. — The Log-Structured Merge-Tree; 📄 RocksDB wiki (LSM in practice): https://github.com/facebook/rocksdb/wiki 🎥 Arpit Bhayani — LSM trees / storage engines.

✅ You know it when: you can explain why Cassandra ingests writes fast but may read-amplify, and why compaction exists.

Index Types HIGHINTERMEDIATE

Learn (inline):

Primary vs secondary — primary = clustering key/order on disk; secondary = extra lookup structures (point to primary).
Hash index — O(1) equality lookups, no range scans (in-memory KV, some engines).
B-tree index — range + prefix + sort; the default RDBMS index.
Composite index — multi-column; left-prefix rule (order matters).
Covering index — includes all queried columns → index-only scan (no table lookup).
Geospatial — R-tree / quadtree / geohash / S2 for "near me" queries.
Bitmap index — great for low-cardinality columns in analytics (OLAP), poor for high-write OLTP.
Inverted index — term → doc list; powers full-text search (Phase 5).

Resources: 📖 DDIA ch.3; 📄 Use The Index, Luke! (SQL indexing): https://use-the-index-luke.com/

✅ You know it when: you can pick indexes for a given query set and explain the left-prefix rule + covering index.

3b — Concurrency, Transactions & Isolation

ACID & Transactions HIGHFUNDAMENTAL

Learn (inline): Atomicity (all-or-nothing), Consistency (invariants preserved — app + DB), Isolation (concurrent txns don't corrupt each other), Durability (committed = survives crash, via WAL/fsync). Not all "NoSQL" is non-ACID (many now support transactions). "Consistency" in ACID ≠ consistency in CAP.

Resources: 📖 DDIA ch.7; 📄 PostgreSQL docs — transactions.

✅ You know it when: you can define each letter precisely and note ACID-C ≠ CAP-C.

Isolation Levels & Anomalies HIGHINTERMEDIATE

Learn (inline):

Levels (weakest→strongest): Read Uncommitted → Read Committed → Repeatable Read / Snapshot → Serializable.
Anomalies: dirty read, non-repeatable read, phantom read, lost update, write skew.
Each level prevents more anomalies at more locking/cost. Serializable = as-if one-at-a-time (via 2PL, SSI, or actual serial execution).
Know your DB's default (Postgres = Read Committed; MySQL InnoDB = Repeatable Read) and that snapshot isolation still allows write skew.

Resources: 📜 Berenson et al. — A Critique of ANSI SQL Isolation Levels; 🛠️ Kleppmann — Hermitage (isolation behaviors across DBs): https://github.com/ept/hermitage 📖 DDIA ch.7.

✅ You know it when: you can map an anomaly (e.g. write skew) to the weakest isolation level that prevents it.

MVCC (Multi-Version Concurrency Control) MEDIUMINTERMEDIATE

Learn (inline): readers see a consistent snapshot without blocking writers (and vice-versa) by keeping multiple row versions; each txn reads the version valid at its start. Enables snapshot isolation. Cost: version bloat → vacuum/GC (Postgres autovacuum). Used by Postgres, MySQL InnoDB, Oracle.

Resources: 📄 PostgreSQL docs — MVCC; 📖 DDIA ch.7.

✅ You know it when: you can explain how a long read doesn't block writes under MVCC, and what vacuum cleans up.

Pessimistic vs Optimistic Locking HIGHINTERMEDIATE

Learn (inline):

Pessimistic — lock rows up front (SELECT ... FOR UPDATE); good under high contention; risks blocking/deadlock.
Optimistic — no locks; read a version, then update WHERE version = X; retry on conflict; great under low contention / high read (also how you do it over stateless HTTP with an ETag/version).
Choose by contention level and whether you can retry.

Resources: 📄 Martin Fowler — Optimistic/Pessimistic Offline Lock; 📖 DDIA ch.7.

✅ You know it when: you can pick optimistic vs pessimistic for "inventory decrement under heavy contention" vs "edit a rarely-touched profile".

3c — Choosing & Deep Dives: SQL vs NoSQL

SQL vs NoSQL — choosing HIGHINTERMEDIATE

Learn (inline): model to access patterns first. SQL: strong consistency, joins, ad-hoc queries, transactions; vertical scale + read replicas; when relationships & integrity matter. NoSQL families: KV (speed, cache, sessions), wide-column (huge write throughput, time/partition keys), document (flexible schema, nested data), graph (relationship traversal). NoSQL trades joins/flexibility for scale/availability; you denormalize and design per query.

Resources: 📖 DDIA ch.2; 📄 System Design Primer — databases: https://github.com/donnemartin/system-design-primer#database

✅ You know it when: given a workload you can justify a family and a specific store, and name what you're giving up.

Redis (deep dive) HIGHDEEP-DIVE

Deep-dive map: [ ] data structures (string/hash/list/set/zset/stream/bitmap/HLL/geo) & when each shines · [ ] single-threaded event loop + why it's fast · [ ] persistence: RDB snapshots vs AOF · [ ] eviction policies & TTL · [ ] replication + Sentinel (HA) · [ ] Redis Cluster (hash slots, resharding) · [ ] pub/sub & streams · [ ] distributed locks (Redlock — and its critiques) · [ ] use cases: cache, rate limiter, leaderboard (zset), session store, queue.

Resources: 📄 redis.io/docs 📖 Redis in Action (free); 🎥 Hussein Nasser — Redis internals; 📄 Kleppmann — How to do distributed locking (Redlock critique).

✅ You know it when: you can build a leaderboard with zsets, a rate limiter, and reason about RDB vs AOF durability + Cluster resharding.

⚖️ When NOT / why-not: not a durable system of record (data can be lost between snapshots; AOF has a write-perf cost) → use a DB and cache-aside. Not for datasets ≫ RAM → too expensive vs disk-based stores. Redlock for locking is contested → prefer a real lock service (etcd/ZooKeeper) for correctness-critical locks.

🎤 Interview probes: "Why Redis and not Memcached here?" (data structures, persistence, replication vs pure LRU cache) · "How do you handle a hot key?" · "What happens on failover — can you lose writes?"

⚠️ Common mistakes: treating it as durable; ignoring eviction under memory pressure; one giant key/hot partition; assuming Redlock is safe for money-critical mutual exclusion.

🚩 Interview red flags: "Redis is durable like a database" ❌ · "Just cache everything" ⚠️ (invalidation + staleness cost) · "Redlock gives correct distributed locks" ⚠️ · "Redis is single-threaded so it can't scale" ❌ (Cluster + replicas).

🎯 Interview signal: they want to see you reason about cache patterns, invalidation, durability tradeoffs, and hot-key/failover handling — not memorize every command.

Cassandra / Wide-Column (deep dive) HIGHDEEP-DIVE

Deep-dive map: [ ] data model (partition key + clustering columns; query-first design) · [ ] ring + consistent hashing + vnodes · [ ] leaderless replication, tunable consistency (R/W/QUORUM, R+W>N) · [ ] write path (commit log → memtable → SSTable) = LSM · [ ] read path + bloom filters + compaction · [ ] hinted handoff, read repair, anti-entropy (Merkle trees) · [ ] no joins/limited secondary indexes → denormalize · [ ] HBase/BigTable contrast (master-based, strong-consistent).

Resources: 📄 cassandra.apache.org/doc + DataStax docs; 📜 Dynamo paper (the lineage); 📖 Cassandra: The Definitive Guide; 📜 Google Bigtable paper.

✅ You know it when: you can design a Cassandra table for a given query, choose consistency levels, and explain why it ingests writes fast.

⚖️ When NOT / why-not: avoid when you need ad-hoc queries, joins, or multi-partition transactions, or when write patterns don't map to a known partition key → use an RDBMS. Read-heavy with complex filters → not its sweet spot. Small data → operational overhead not worth it.

🎤 Interview probes: "Why Cassandra and not Postgres for this?" · "Your query needs a different access pattern — now what?" (denormalize / new table) · "QUORUM reads+writes — what consistency do you actually get, and the latency cost?"

⚠️ Common mistakes: modeling tables by entity instead of by query; unbounded partitions (hot/huge rows); using secondary indexes like SQL; expecting read-your-writes without R+W>N.

🚩 Interview red flags: "Cassandra is CP" ❌ (tunable, typically AP) · "Just add a secondary index" ⚠️ (anti-pattern at scale) · "Model it like SQL tables" ❌ (model by query) · "QUORUM = strong consistency always" ⚠️ (only with R+W>N).

🎯 Interview signal: they're checking query-first data modeling, tunable-consistency reasoning (R+W>N), and understanding of write-path/partitioning — not trivia.

DynamoDB (deep dive) HIGHDEEP-DIVE

Deep-dive map: [ ] partition key/sort key; item collections · [ ] single-table design & access-pattern modeling · [ ] GSIs vs LSIs · [ ] provisioned vs on-demand; partition throughput & hot partitions · [ ] consistency (eventual vs strong reads) · [ ] streams + TTL + transactions · [ ] the original Dynamo paper concepts (quorums, vector clocks, gossip) vs managed DynamoDB.

Resources: 📜 Dynamo: Amazon's Highly Available Key-value Store (2007); 📄 AWS DynamoDB docs; 📖 Alex DeBrie — The DynamoDB Book; 🎥 AWS re:Invent — Advanced DynamoDB design (Rick Houlihan).

✅ You know it when: you can model a multi-access-pattern app in a single table and avoid hot partitions.

⚖️ When NOT / why-not: avoid when access patterns are unknown/evolving (single-table design bakes them in), for analytics/ad-hoc queries (→ export to a warehouse), or for heavy relational/joined data → RDBMS. Cost can spike with hot partitions or scan-heavy workloads.

🎤 Interview probes: "Why DynamoDB over Cassandra or Postgres here?" (managed, serverless scaling vs self-managed/relational) · "How do you avoid a hot partition?" · "Strong vs eventual reads — cost & latency?"

⚠️ Common mistakes: low-cardinality partition keys (hotspots); designing before knowing access patterns; using scans; ignoring per-partition throughput caps.

🚩 Interview red flags: "Just scan the table" ❌ (cost + throttling) · "Model normalized like SQL" ❌ (single-table, access-pattern-first) · "DynamoDB scales infinitely for free" ⚠️ (hot partitions + cost) · "Add a GSI for any query" ⚠️ (cost/limits).

🎯 Interview signal: they want access-pattern-first modeling, hot-partition avoidance, and cost/consistency awareness — not AWS-console recall.

MongoDB / Document (deep dive) MEDIUMINTERMEDIATE

Deep-dive map: [ ] document model & embedding vs referencing · [ ] indexes (single/compound/multikey/text/geo) · [ ] replica sets (elections, oplog) · [ ] sharding (shard key choice = make-or-break) · [ ] aggregation pipeline · [ ] transactions (multi-doc) · [ ] read/write concerns.

Resources: 📄 mongodb.com/docs 🎥 MongoDB University (free).

✅ You know it when: you can decide embed-vs-reference and pick a shard key that avoids hotspots.

Graph Databases (Neo4j / Neptune) LOWINTERMEDIATE

Learn (inline): nodes + edges + properties; excel at multi-hop relationship traversal (social graph, fraud rings, recommendations) where SQL joins explode. Cypher/Gremlin query languages; index-free adjacency. Use when relationships are the data.

Resources: 📄 neo4j.com/docs 📄 AWS Neptune docs.

✅ You know it when: you can say when a graph DB beats recursive SQL joins.

3d — Scaling Databases

Partitioning / Sharding (+ consistent hashing) HIGHINTERMEDIATE

Learn (inline): split data across nodes. Range partitioning (good scans, risk hotspots) vs hash partitioning (even spread, no range scans) vs consistent hashing (minimal reshuffling on membership change — see Phase-1 sample). Challenges: cross-shard queries (scatter-gather), cross-shard transactions, resharding/rebalancing, hot shards, choosing a good shard key. Directory/lookup vs algorithmic routing.

Resources: 📖 DDIA ch.6; 📄 Vitess/Citus docs (sharding in practice).

✅ You know it when: you can pick a shard key, explain scatter-gather cost, and describe resharding without downtime.

Replication HIGHINTERMEDIATE

Learn (inline): copies for HA + read scaling. Sync vs async (durability vs latency; async risks data loss on failover). Topologies: single-leader (most RDBMS; reads scale on replicas, writes on leader), multi-leader (multi-region writes, conflict resolution), leaderless/quorum (Dynamo/Cassandra; R+W>N). Replication lag → read-your-writes issues; mitigate with read-from-leader / sticky reads. Failover & split-brain risks.

Resources: 📖 DDIA ch.5.

✅ You know it when: you can explain replication lag's user-visible effects and how quorums (R+W>N) trade consistency vs availability.

Caching (deep dive) HIGHINTERMEDIATE

Deep-dive map: [ ] layers (client, CDN, reverse-proxy, app, DB) · [ ] patterns cache-aside (lazy), read-through, write-through, write-back/behind, refresh-ahead · [ ] eviction LRU/LFU/FIFO/TTL · [ ] invalidation (the hard problem) & TTL strategy · [ ] stampede/thundering herd → request coalescing, locks, jittered TTL · [ ] hot keys → replication of the key, local cache, sharding the key · [ ] consistency (stale reads) · [ ] Redis vs Memcached.

Resources: 📜 Scaling Memcache at Facebook; 📄 redis.io/docs 📖 DDIA (caching threads throughout).

✅ You know it when: you can choose a cache pattern + eviction + invalidation for a read-heavy service and defend against stampedes and hot keys.

Storage Types (block / file / object) MEDIUMFUNDAMENTAL

Learn (inline): block (raw volumes, low latency, DBs — EBS) · file (shared POSIX FS — EFS/NFS) · object (HTTP blobs, infinite scale, metadata, cheap, immutable-ish — S3/GCS). Choose object for media/backups/data-lake; block for DB storage; file for shared mounts.

Resources: 📄 AWS S3 / EBS / EFS docs.

✅ You know it when: you can pick the right storage class for a DB, a media library, and a shared workspace.

3e — Specialized Stores & Case Studies

Time-Series Databases MEDIUMINTERMEDIATE

Learn (inline): optimized for append-only, time-ordered, high-ingest data + range/rollup queries. Tricks: append-only writes, time-partitioned chunks, columnar + delta/gorilla compression, downsampling/retention policies, TTL. Case studies: Prometheus (pull-based monitoring TSDB), InfluxDB, TimescaleDB (Postgres extension).

Resources: 📄 Prometheus docs (storage) + InfluxDB + TimescaleDB blogs; 📜 Facebook Gorilla (in-memory TSDB) paper.

✅ You know it when: you can explain why TSDBs use append-only + compression + downsampling and when to use one over a generic DB.

ZippyDB (case study) MEDIUMDEEP-DIVE

Deep-dive map: [ ] Meta's distributed KV store on RocksDB (LSM) · [ ] sharding + replication with Paxos-based consensus · [ ] tunable consistency (eventual vs strong reads) · [ ] how it differs from Spanner (TrueTime), Cassandra (leaderless), DynamoDB (managed). A great "how real KV stores compose the primitives" study.

Resources: 📄 Meta Engineering — ZippyDB: a distributed key-value store: https://engineering.fb.com/2021/08/06/core-infra/zippydb/ 📄 RocksDB wiki.

✅ You know it when: you can trace a ZippyDB read/write through shard→replica→consensus and compare its consistency model to Spanner/Cassandra.

Google Spanner + TrueTime (case study) MEDIUMDEEP-DIVE

Learn (inline): globally-distributed, externally-consistent (linearizable) SQL DB. Key trick: TrueTime — GPS/atomic-clock-backed API exposing bounded clock uncertainty [earliest, latest]; Spanner waits out the uncertainty to assign globally-ordered commit timestamps → solves ordering that Lamport/vector clocks can't give with real time. Paxos per shard; 2PC across shards.

Resources: 📜 Spanner: Google's Globally-Distributed Database (OSDI 2012); 📄 Google Cloud Spanner docs.

✅ You know it when: you can explain what TrueTime buys over logical clocks and the cost (commit-wait) it pays for it.

4Distributed Systems & Distributed Architecture

The hard core that separates SSE from Staff — theory, infrastructure, patterns, reliability, and applying it all in HLD. Anchor texts: DDIA (ch. 8–9), MIT 6.824, the Google SRE book.

4a — Distributed Systems Theory

CAP Theorem & PACELC HIGHINTERMEDIATE

Learn (inline): CAP — under a network Partition you must choose C (reject/stale-avoid) or A (serve possibly-stale). No system is "CA" in the real world (partitions happen). PACELC completes it: if Partition → C-or-A; Else (normal ops) → Latency-or-Consistency. Real systems tune per-operation (e.g. DynamoDB, Cassandra lean AP/latency; Spanner leans CP/consistency).

Resources: 📜 Gilbert & Lynch (CAP proof); 📜 Abadi — PACELC; 📄 jepsen.io (consistency in practice); 📖 DDIA ch.9.

✅ You know it when: you can classify a store as CP/AP and PC/EL and defend the tradeoff for a given feature.

Consistency Models HIGHINTERMEDIATE

Learn (inline): spectrum from strong to weak — linearizable (looks like one copy, real-time order) → sequential → causal (preserves cause→effect) → read-your-writes / monotonic reads (session guarantees) → eventual (converges eventually). Stronger = easier to reason about, costlier/less available. Pick the weakest model that keeps your app correct.

Resources: 📄 jepsen.io/consistency (the map); 🎥 Kleppmann talks; 📖 DDIA ch.5, 9.

✅ You know it when: you can place a requirement ("users must see their own post immediately") on the model that satisfies it (read-your-writes).

Time, Clocks & Ordering HIGHINTERMEDIATE

Learn (inline): physical clocks drift/skew → can't order events across nodes. Lamport clock (scalar) gives a total order consistent with causality but can't tell concurrent from causal. Vector clock captures causality + detects concurrent updates (→ conflict detection in Dynamo). Hybrid Logical Clocks blend physical + logical. TrueTime (Spanner) uses bounded real-time uncertainty for global ordering (Phase 3).

Resources: 📜 Lamport — Time, Clocks, and the Ordering of Events (1978); 📖 DDIA ch.9; 🎥 Arpit Bhayani — clocks.

✅ You know it when: you can use a vector clock to detect two concurrent writes and explain why Lamport clocks can't.

Consensus: Paxos, Raft, ZAB (deep dive) HIGHDEEP-DIVE

Why deep-dive: how a cluster agrees on one value/log despite failures — under every leader election, replicated log, config store.

Deep-dive map: [ ] the problem (agreement + validity + termination under crashes) · [ ] Raft — leader election (terms, votes), log replication, commit index, safety, membership changes (learn this one deeply; it's designed to be understandable) · [ ] Paxos / Multi-Paxos — the classic; harder to grok · [ ] ZAB (ZooKeeper) · [ ] quorums & why an odd number of nodes · [ ] where used (etcd, Consul, Kafka KRaft, Spanner, CockroachDB).

Resources: 📜 In Search of an Understandable Consensus Algorithm (Raft) + https://raft.github.io (visualization); 📜 Lamport — Paxos Made Simple; 🎥 MIT 6.824 lectures.

✅ You know it when: you can walk through a Raft leader election + log commit and explain why quorum needs a majority.

Failure Detection, Membership & Gossip MEDIUMINTERMEDIATE

Learn (inline): detect dead nodes via heartbeats / phi-accrual (adaptive); propagate membership via gossip (epidemic, scalable, eventually consistent — used by Cassandra/Dynamo/Consul). Split-brain when a partition makes two leaders — fenced with quorums/leases/fencing tokens.

Resources: 📜 SWIM (gossip membership) paper; 📖 DDIA ch.8.

✅ You know it when: you can explain how gossip spreads membership and how fencing tokens prevent split-brain damage.

Coordination Services (ZooKeeper / etcd) MEDIUMINTERMEDIATE

Learn (inline): consensus-backed stores for leader election, distributed locks, config, service discovery, metadata. Primitives: ephemeral nodes, watches, leases. Don't build these yourself — delegate coordination to ZK/etcd/Consul.

Resources: 📄 etcd docs; 📜 ZooKeeper paper.

✅ You know it when: you can implement leader election / a lock using ephemeral nodes + watches conceptually.

4b — Networking Infrastructure at Scale

Load Balancers (L4 vs L7) HIGHINTERMEDIATE

Learn (inline): distribute traffic + health-check + failover. L4 (transport: IP/port; fast, protocol-agnostic) vs L7 (application: route by URL/host/header, TLS termination, sticky sessions, WAF). Algorithms: round-robin, weighted, least-connections, least-response-time, hash (client affinity / consistent hashing). Global (GSLB/anycast/GeoDNS) + regional layers.

Resources: 📄 Cloudflare — load balancing; 📄 NGINX/HAProxy docs; 🛠️ set up NGINX as an LB.

✅ You know it when: you can choose L4 vs L7 and an algorithm for a given service and explain sticky sessions' tradeoffs.

API Gateway HIGHINTERMEDIATE

Learn (inline): single entry for clients → routing, auth, rate limiting, request/response transform, aggregation, TLS, observability. Distinct from an LB (higher-level, app-aware). Watch: don't turn it into a monolith of business logic.

Resources: 📄 Kong / AWS API Gateway docs; 📄 microservices.io — API Gateway pattern.

✅ You know it when: you can list what belongs in a gateway vs a service.

CDN & Reverse Proxy MEDIUMFUNDAMENTAL

Learn (inline): CDN caches static/edge content near users (POPs), offloads origin, reduces latency; supports cache-control, purge, edge compute. Reverse proxy (NGINX/Envoy) fronts origin for TLS, caching, compression, routing.

Resources: 📄 Cloudflare — how CDNs work; 📄 NGINX reverse proxy docs.

✅ You know it when: you can decide what to serve from a CDN and how to invalidate it.

4c — Asynchronous Messaging & Streaming

Message Queues & Delivery Semantics HIGHINTERMEDIATE

Learn (inline): decouple producers/consumers, smooth bursts, enable async. Queue (RabbitMQ, SQS — competing consumers, message removed on ack) vs log (Kafka — retained, replayable, ordered per partition). Delivery: at-most-once / at-least-once (default) / exactly-once (idempotency + dedup). DLQ for poison messages; ordering vs throughput tradeoffs; visibility timeouts; backpressure.

Resources: 📄 RabbitMQ & AWS SQS docs; 📖 DDIA ch.11.

✅ You know it when: you can pick queue-vs-log for a use case and design at-least-once + idempotent consumers with a DLQ.

Apache Kafka (deep dive) HIGHDEEP-DIVE

Deep-dive map: [ ] the log abstraction, offsets · [ ] topics & partitions (parallelism + per-partition order; partition-key choice) · [ ] producers (batching, acks, idempotent producer) · [ ] consumer groups, rebalancing, offset commit · [ ] replication & ISR, min.insync.replicas, unclean leader election · [ ] exactly-once (idempotent producer + transactions) · [ ] retention & log compaction · [ ] performance (zero-copy, page cache, sequential I/O) · [ ] KRaft (Raft metadata, replacing ZooKeeper) · [ ] ecosystem (Connect/CDC, Streams, schema registry) · [ ] Kafka vs RabbitMQ/SQS/Pulsar.

Resources: 📄 kafka.apache.org/documentation 📜 Jay Kreps — The Log; 📖 Kafka: The Definitive Guide (Confluent, free); 🎥 Hussein Nasser — Kafka; 🎥 ByteByteGo — Kafka internals; 🛠️ run a local broker.

✅ You know it when: you can design an event pipeline choosing partitions/keys for ordering and explain ISR + acks=all durability, and when a log beats a queue.

⚖️ When NOT / why-not: overkill for a simple task queue or low-volume async work → SQS/RabbitMQ (managed, per-message ack/delay/priority, no partition ops). Avoid when you need per-message priority, easy delayed delivery, or tiny scale. Operational weight is real (unless managed/MSK/Confluent).

🎤 Interview probes: "Why Kafka and not SQS here?" · "How do you guarantee ordering?" (partition key) · "A consumer is slow — what happens, and how do you not lose/duplicate?" · "Exactly-once — really?"

⚠️ Common mistakes: assuming global ordering (it's per-partition); too few/many partitions; ignoring consumer-lag & rebalance storms; treating at-least-once as exactly-once without idempotent consumers.

🚩 Interview red flags (things not to say): "Kafka is just a queue" ⚠️ (it's a replayable log) · "Exactly-once makes duplicates impossible everywhere" ❌ (it's within-Kafka; your side effects still need idempotency) · "One partition per consumer always" ❌ · "Kafka guarantees global ordering" ❌.

🎯 Interview signal: they're testing whether you understand ordering, replay, consumer-group scaling, and durability/ops tradeoffs — not whether you can recite broker internals.

4d — Distributed Transactions & Architecture Patterns

Distributed Transactions & Sagas (deep dive) HIGHDEEP-DIVE

Deep-dive map: [ ] why cross-service ACID is hard · [ ] 2PC/3PC (blocking, coordinator failure) · [ ] Saga = sequence of local txns + compensating actions · [ ] orchestration (central coordinator) vs choreography (event-driven) sagas · [ ] transactional outbox + CDC for reliable event publishing · [ ] idempotency + dedup for retries · [ ] eventual consistency & user-facing implications.

Resources: 📜 Garcia-Molina & Salem — Sagas; 📄 microservices.io — Saga / Outbox patterns (Chris Richardson); 📖 DDIA ch.9.

✅ You know it when: you can design an order flow across payment/inventory/shipping with a saga + compensations + outbox, no 2PC.

Integration & Migration Patterns MEDIUMINTERMEDIATE

Learn (inline): Anti-Corruption Layer (translate between your model and a legacy/external model so their design doesn't leak in) · Strangler Fig (incrementally replace a legacy system by routing slices to the new one) · Backend-for-Frontend (BFF) (a tailored API per client type — web/mobile).

Resources: 📄 Martin Fowler — StranglerFigApplication; 📄 Sam Newman — BFF; 📄 microservices.io / Azure Architecture patterns.

✅ You know it when: you can plan a legacy migration with strangler-fig + ACL and justify a BFF.

Event-Driven Patterns: CQRS · Event Sourcing · Outbox · Idempotency HIGHINTERMEDIATE

Learn (inline): CQRS — separate write model from optimized read models (denormalized projections). Event sourcing — store the sequence of events as source of truth; rebuild state by replay (audit, temporal queries; complexity + eventual consistency cost). Outbox — atomically persist state + an event row, publish via CDC/poller. Idempotency — (Phase-1 sample) essential for safe retries.

Resources: 📄 Martin Fowler — CQRS / Event Sourcing; 📄 microservices.io.

✅ You know it when: you can decide when CQRS/event-sourcing pays for its complexity and wire an outbox for reliable publishing.

Resilience & Load-Management Patterns HIGHINTERMEDIATE

Learn (inline): Circuit breaker (stop calling a failing dependency; half-open probes) · Retry with exponential backoff + jitter (never synchronized retries) · Timeouts & deadlines (propagate deadlines across hops) · Bulkhead (isolate resource pools so one dependency can't sink all) · Rate limiting / throttling (token/leaky bucket) · Backpressure & load shedding (drop/queue under overload) · Dead-letter queue.

Resources: 📖 Release It! (Nygard); 📄 AWS Builders' Library — Timeouts, retries, and backoff with jitter; 📄 resilience4j / Hystrix docs.

✅ You know it when: you can add circuit breaker + jittered retry + timeout + bulkhead to a dependency call and explain each.

4e — Scalability, Reliability & Operability

Scaling Patterns & Statelessness HIGHFUNDAMENTAL

Learn (inline): vertical (bigger box; simple, capped) vs horizontal (more boxes; needs statelessness/sharding). Make services stateless (externalize session/state to Redis/DB) so any node serves any request → trivial horizontal scale + autoscaling. Read replicas + caching for read scale; sharding for write scale.

Resources: 📄 System Design Primer — scalability; 📖 Web Scalability for Startup Engineers.

✅ You know it when: you can turn a stateful service stateless and describe autoscaling triggers.

Availability, Reliability & SLOs HIGHINTERMEDIATE

Learn (inline): SLI (measured, e.g. p99 latency, error rate) → SLO (target) → SLA (contract + penalty). Nines: 99.9% ≈ 8.7h/yr down, 99.99% ≈ 52m, 99.999% ≈ 5m. Availability = MTBF/(MTBF+MTTR) → reduce MTTR. Error budgets balance velocity vs reliability. Redundancy (N+1), failover, no single point of failure.

Resources: 📖 Google SRE Book (free): https://sre.google/books/ 📄 SLO chapter.

✅ You know it when: you can define SLIs/SLOs for a service and compute downtime budget from a nines target.

Multi-Region & Disaster Recovery MEDIUMINTERMEDIATE

Learn (inline): active-active (all regions serve; needs conflict handling / geo-routing) vs active-passive (standby failover). RPO (data-loss tolerance) & RTO (recovery-time target) drive the strategy (backup/restore → pilot-light → warm standby → hot active-active, increasing cost). Geo-replication + data residency.

Resources: 📄 AWS Well-Architected — Reliability; 📖 Google SRE.

✅ You know it when: you can pick a DR strategy from RPO/RTO + budget and explain active-active conflict handling.

Observability (Logs · Metrics · Traces) HIGHINTERMEDIATE

Learn (inline): the three pillars — logs (structured events), metrics (aggregatable time-series; RED rate/errors/duration, USE utilization/saturation/errors), traces (request path across services; spans/context propagation). Plus alerting on SLOs (symptom-based, not cause-spam), dashboards. OpenTelemetry as the standard.

Resources: 📄 OpenTelemetry docs; 📖 Google SRE (monitoring); 📖 Cindy Sridharan — Distributed Systems Observability (free).

✅ You know it when: you can instrument a service with RED metrics + tracing and design SLO-based alerts.

Deployment Strategies MEDIUMFUNDAMENTAL

Learn (inline): blue-green (two envs, flip traffic, instant rollback) · canary (small % first, watch metrics, ramp) · rolling (batch-by-batch) · feature flags (decouple deploy from release, kill-switch) · safe rollbacks + DB migration compatibility (backward/forward-compatible schema changes).

Resources: 📄 Martin Fowler — BlueGreen/CanaryRelease; 📄 LaunchDarkly — feature flags.

✅ You know it when: you can roll out a risky change safely (canary + flags + rollback + compatible migration).

Performance & Tail Latency MEDIUMINTERMEDIATE

Learn (inline): optimize p99/p999, not just averages; sources of tail latency (GC, queuing, contention, cold caches, retries) and fixes (hedged/tied requests, connection pooling, batching, avoiding N+1, backpressure). Little's Law for capacity.

Resources: 📜 Dean & Barroso — The Tail at Scale; 📄 AWS Builders' Library.

✅ You know it when: you can explain why averages hide problems and name three tail-latency mitigations.

4f — Microservices & Modern Architecture

Monolith vs Microservices vs Modular Monolith HIGHINTERMEDIATE

Learn (inline): microservices buy independent deploy/scale/team autonomy at the cost of distributed-systems complexity (network, data consistency, ops). Start with a modular monolith; extract services when team/scale pain justifies it. Split by business capability/bounded context, not by layer. Each service owns its data (no shared DB).

Resources: 📖 Sam Newman — Building Microservices; 📄 Martin Fowler — Microservices / MonolithFirst; 📄 microservices.io.

✅ You know it when: you can argue when to split and how to draw service boundaries by capability.

Service Discovery, Gateway & Service Mesh MEDIUMINTERMEDIATE

Learn (inline): discovery (services find each other — DNS/Consul/etcd, client- vs server-side). Service mesh (Istio/Linkerd + Envoy sidecars) moves cross-cutting concerns — mTLS, retries, timeouts, traffic shaping, telemetry — out of app code into the platform.

Resources: 📄 Istio docs; 📄 microservices.io — service discovery.

✅ You know it when: you can explain what a mesh sidecar handles vs the application.

Domain-Driven Design MEDIUMINTERMEDIATE

Learn (inline): bounded contexts (explicit model boundaries → natural service boundaries), ubiquitous language, aggregates (consistency boundaries), context mapping. Strategic (boundaries) + tactical (entities/value objects/aggregates/repositories) DDD.

Resources: 📖 Eric Evans — DDD; 📖 Vaughn Vernon — Implementing DDD; 📄 Martin Fowler — BoundedContext.

✅ You know it when: you can carve a domain into bounded contexts and map them to services.

Containers & Orchestration (Docker · Kubernetes · 12-Factor · Serverless) MEDIUMINTERMEDIATE

Learn (inline): Docker (immutable images, isolation). Kubernetes (pods, deployments, services, ingress, HPA autoscaling, config/secrets, rollouts). 12-factor app (config in env, stateless processes, disposability). Serverless (FaaS; scale-to-zero, event-driven; cold starts, statelessness) & edge compute tradeoffs.

Resources: 📄 kubernetes.io docs; 📄 12factor.net; 📖 Kubernetes Up & Running.

✅ You know it when: you can describe how K8s autoscales a stateless service and the 12-factor rules that make it possible.

4g — Security, Privacy & Multi-Tenancy (cross-cutting)

AuthN & AuthZ (OAuth2 · OIDC · JWT · Sessions) HIGHINTERMEDIATE

Learn (inline): AuthN (who you are) vs AuthZ (what you can do). Sessions (server state, cookie) vs tokens/JWT (stateless, self-contained, revocation is harder). OAuth2 (delegated authorization — auth code w/ PKCE, client credentials) + OIDC (identity layer on OAuth2). RBAC vs ABAC. Token expiry/refresh, key rotation.

Resources: 📄 oauth.net + OAuth 2.0 Simplified (Aaron Parecki); 📄 jwt.io; 📄 Auth0/Okta docs.

✅ You know it when: you can design login + service-to-service auth choosing sessions vs JWT and the right OAuth2 flow.

Encryption & Secrets MEDIUMINTERMEDIATE

Learn (inline): in transit (TLS/mTLS) + at rest (disk/field-level, envelope encryption with a KMS). Symmetric vs asymmetric basics; key rotation; secrets management (Vault/KMS/Secrets Manager — never in code/env-in-repo). Hash+salt passwords (bcrypt/argon2), never encrypt them.

Resources: 📄 OWASP Cryptographic Storage Cheat Sheet; 📄 HashiCorp Vault / AWS KMS docs.

✅ You know it when: you can design envelope encryption with a KMS and store credentials safely.

Threats & Mitigations MEDIUMINTERMEDIATE

Learn (inline): OWASP Top 10 (injection, broken auth, XSS, CSRF, SSRF, broken access control…). DDoS (rate limiting, WAF, CDN/anycast absorption, autoscaling). Input validation, least privilege, defense-in-depth, secure defaults, audit logging.

Resources: 📄 OWASP Top 10: https://owasp.org/www-project-top-ten/ 📄 OWASP Cheat Sheets.

✅ You know it when: you can name mitigations for injection, CSRF, and a volumetric DDoS.

Multi-Tenancy HIGHINTERMEDIATE

Learn (inline): isolation models — silo (DB/infra per tenant; strong isolation, costly) → bridge (shared infra, separate schemas) → pool (shared tables + tenant_id row-level isolation; cheap, noisy-neighbor + blast-radius risk). Enforce tenant scoping everywhere (row-level security, per-tenant keys); rate-limit per tenant; data residency/PII/GDPR; audit logging.

Resources: 📄 AWS SaaS Lens (Well-Architected); 📄 Microsoft — multi-tenant SaaS architecture guidance.

✅ You know it when: you can pick silo/bridge/pool per requirements and prevent cross-tenant leakage.

4h — HLD: Apply It

HLD Framework (end-to-end) HIGHINTERMEDIATE

Learn (inline): the repeatable flow — 1) Requirements (functional + non-functional: scale, latency, consistency, availability) → 2) Estimation (QPS, storage, bandwidth) → 3) API design → 4) Data model & storage choice → 5) High-level architecture (LB, services, DB, cache, queue, CDN) → 6) Deep-dive the 1–2 hard parts → 7) Scale & bottlenecks (sharding, caching, replication, hotspots) → 8) Tradeoffs & wrap-up (failure modes, monitoring). Drive it; state assumptions; justify choices.

Resources: 📖 Alex Xu — System Design Interview Vol 1 (framework); 📄 System Design Primer; 🎥 ByteByteGo.

✅ You know it when: you can run any prompt through the 8 steps without missing estimation, data model, or bottleneck analysis.

HLD Problem Practice (graded ladder) HIGHDEEP-DIVE

How to approach: apply the framework; time-box ~45 min; always cover estimation, data model, one deep-dive, and bottlenecks.

Problems:

Tier 1 (foundational): URL shortener (TinyURL) · Pastebin · Distributed rate limiter · Key-value store · Unique ID generator (Snowflake).
Tier 2 (core scale): Notification/alerting · News feed (Twitter/FB) · Typeahead/autocomplete · Web crawler · Distributed cache · Distributed message queue · Slack/Google Chat.
Tier 3 (advanced): Chat/WhatsApp · Cab-matching (Uber/Lyft) + geospatial · Nearby/proximity (Tinder/Yelp) · Video streaming (YouTube/Netflix) · Google Drive/Dropbox · Instagram · Payment/wallet · Ad click aggregation · Distributed job scheduler · Leaderboard · Google Maps.
Tier 4 (staff/hard): Google Docs (collab editing, OT/CRDT) · Ticketmaster (booking + concurrency) · Stock exchange/matching engine · S3 (object store) · Distributed logging & metrics platform · Search engine.

Resources: 📖 Alex Xu Vol 1 & 2; 📄 System Design Primer; 🎥 ByteByteGo, Gaurav Sen, CodeKarle, Hussein Nasser; 📄 company engineering blogs (real designs).

✅ You know it when: you can design 2–3 problems per tier end-to-end with confident tradeoffs and failure analysis.

5Big Data, Data-Intensive Systems & CDC/Streams

Processing data at scale (batch + stream), the architectures that combine them, probabilistic structures for scale, and keeping systems in sync. Completes your HLD foundation before AI. Anchor: DDIA ch. 10–11.

Batch Processing (MapReduce · Spark · Hive) MEDIUMINTERMEDIATE

Learn (inline): MapReduce — map → shuffle → reduce over distributed data; fault-tolerant via re-execution; disk-heavy. Spark — in-memory DAG engine, RDD/DataFrame, 10–100× faster; unifies SQL/ML/streaming. Hive — SQL over big data (compiles to jobs). Batch = high-throughput, high-latency (minutes–hours) over bounded data.

Resources: 📜 Dean & Ghemawat — MapReduce; 📄 spark.apache.org + 📖 Learning Spark (free); 📖 DDIA ch.10.

✅ You know it when: you can explain map→shuffle→reduce and why Spark beats classic MapReduce for iterative jobs.

Stream Processing (Flink · Kafka Streams · Structured Streaming) HIGHDEEP-DIVE

Deep-dive map: [ ] bounded vs unbounded data; low-latency continuous processing · [ ] event time vs processing time, watermarks, late data · [ ] windowing (tumbling/sliding/session) · [ ] state management + checkpointing + exactly-once · [ ] Apache Flink (true streaming, stateful, event-time) · [ ] Kafka Streams (library) & Spark Structured Streaming (micro-batch) · [ ] Flink as a Kafka consumer; joins/aggregations on streams.

Resources: 📄 flink.apache.org + Ververica blog; 📖 Streaming Systems (Akidau et al.); 📄 Kafka Streams / Spark Structured Streaming docs.

✅ You know it when: you can design a windowed real-time aggregation with event-time + watermarks and explain exactly-once via checkpointing.

Data Architectures: Lambda vs Kappa · Warehouse vs Lake · OLTP vs OLAP MEDIUMINTERMEDIATE

Learn (inline): OLTP (row stores, transactional, low-latency point ops) vs OLAP (columnar, scan/aggregate analytics). Lambda = batch layer (accurate, slow) + speed layer (fast, approximate) merged — powerful but two codebases. Kappa = stream-only, replay the log for reprocessing — simpler. Data warehouse (structured, schema-on-write — Snowflake/BigQuery/Redshift) vs data lake (raw, schema-on-read — S3/HDFS) vs lakehouse (Delta/Iceberg — both).

Resources: 📄 Jay Kreps — Questioning the Lambda Architecture (kappa); 📖 Kimball — Data Warehouse Toolkit; 📄 Databricks — lakehouse.

✅ You know it when: you can choose Lambda vs Kappa and warehouse vs lake for an analytics requirement.

Probabilistic / Approximation Algorithms HIGHINTERMEDIATE

Learn (inline): trade a little accuracy for huge memory savings at scale —

Bloom filter — probabilistic set membership; no false negatives, tunable false positives (used in LSM/DBs/caches to skip disk lookups).
HyperLogLog — approximate cardinality (unique counts) in KBs instead of GBs (unique visitors).
Count-Min sketch — approximate frequency of items in a stream (heavy hitters, trending).

Resources: 📜 Flajolet — HyperLogLog; 📜 Cormode & Muthukrishnan — Count-Min sketch; 📄 Redis probabilistic data structures docs.

✅ You know it when: you can pick Bloom vs HLL vs Count-Min for "is it present?", "how many unique?", "how frequent?" and state the accuracy tradeoff.

Change Data Capture (CDC) MEDIUMINTERMEDIATE

Learn (inline): stream a database's row-level changes (from the WAL/binlog) to other systems — keep caches, search indexes, warehouses, and downstream services in sync without dual-writes. Log-based CDC (Debezium reading binlog/WAL) is preferred over triggers/polling (lower overhead, ordered, complete). Pairs with the outbox pattern for reliable event publishing.

Resources: 📄 Debezium docs; 🎥/📄 Kleppmann — Turning the Database Inside Out; 📄 Confluent — CDC.

✅ You know it when: you can explain why log-based CDC beats triggers/polling and how it keeps a search index in sync.

Search Systems (Inverted Index · Elasticsearch) MEDIUMINTERMEDIATE

Learn (inline): inverted index (term → posting list of docs) powers full-text search; analysis pipeline (tokenize, normalize, stem); ranking (TF-IDF/BM25); Elasticsearch/OpenSearch (shards + replicas over Lucene, near-real-time, aggregations). Kept in sync via CDC. (Vector/semantic search → Phase 6.)

Resources: 📄 Elasticsearch docs (the "definitive guide"); 📖 DDIA (full-text/fuzzy); 📄 Lucene internals write-ups.

✅ You know it when: you can design a search feature: index pipeline, BM25 ranking, sharding, and CDC-based freshness.

6AI Engineering I: LLMs, Vector DBs, RAG, Prompting, MCP & Agents

The modern differentiator. Systems increasingly must expose data as context, run LLMs, be reachable as MCP servers, and be optimized on tokens / context / cost. Start after your HLD foundation (Phases 1–5) is solid.

Track note (generalist/backend Staff): treat serving/inference, vector search, retrieval/RAG, evaluation, agents, and cost/latency as core — these show up in backend Staff interviews now. Treat fine-tuning / training internals (Phase 7) as an optional ML-infra track — go deep only if targeting ML-platform/AI-infra roles. Nothing here is removed; just prioritize by your target.

6a — LLM Fundamentals

What is an LLM (capabilities & limits) HIGHFUNDAMENTAL

Learn (inline): a next-token predictor trained on huge corpora; produces fluent, probabilistic output. Strengths: language understanding/generation, summarization, extraction, classification, code, reasoning (with prompting). Caveats (design around these): hallucination (confident wrong answers), no built-in fresh/private knowledge (→ RAG), stateless per call (you resend context), non-deterministic, token/context limits, cost + latency, prompt-injection risk. LLM system design is largely about compensating for these.

Resources: 🎥 Karpathy — Intro to LLMs; 📄 Anthropic / OpenAI model docs.

✅ You know it when: you can list the failure modes an LLM system must engineer around and which technique addresses each.

Tokenization & Embeddings HIGHFUNDAMENTAL

Learn (inline): Tokens — text is split into subword tokens (BPE); billing, context limits, and latency are all in tokens (~4 chars ≈ 1 token in English). Embeddings — text → dense vector capturing meaning; semantically similar text → nearby vectors (cosine/dot similarity). Embeddings power semantic search, RAG retrieval, clustering, dedup, recommendations.

Resources: 📄 OpenAI/Cohere embeddings guides; 📄 Hugging Face — embeddings; 🎥 Jay Alammar — The Illustrated Word2Vec.

✅ You know it when: you can explain why "count tokens, not characters" for cost/limits and how embeddings enable semantic search.

Transformer Architecture & Attention MEDIUMINTERMEDIATE

Learn (inline): the architecture behind modern LLMs. Self-attention lets each token weigh all others → captures long-range context in parallel (vs sequential RNNs) — the breakthrough that made scaling work. Enough intuition: tokens → embeddings + positional info → stacked attention + feed-forward layers → next-token distribution. You don't need the full math, but know why attention matters and what "context" means mechanically.

Resources: 📄 Jay Alammar — The Illustrated Transformer; 📜 Attention Is All You Need; 🎥 3Blue1Brown — attention/transformers; 🎥 Karpathy — Let's build GPT.

✅ You know it when: you can explain attention's role and why transformers parallelize/scale better than RNNs.

Generation Controls (Temperature, Top-p/k, Sampling) MEDIUMFUNDAMENTAL

Learn (inline): temperature (↑ = more random/creative, ↓ = more deterministic/factual — use low for extraction/classification, higher for ideation), top-p (nucleus) / top-k sampling, max tokens, stop sequences. Also how long vs short inputs are processed (all within the context window; long context costs more + can dilute attention — "lost in the middle").

Resources: 📄 Anthropic / OpenAI API parameter docs; 📄 promptingguide.ai — settings.

✅ You know it when: you can pick temperature/top-p for a factual extraction task vs a brainstorming task.

Context Window & Optimization HIGHINTERMEDIATE

Learn (inline): the token budget for input+output per call. Design levers to fit/cheapen it: trim/summarize history, retrieve only relevant chunks (RAG), compress/route, cache (below), structure prompts so key info isn't "lost in the middle". Cost & latency scale with tokens → context engineering is a core skill.

Resources: 📄 Anthropic — long-context tips / prompt caching; 📜 Lost in the Middle (Liu et al.).

✅ You know it when: you can reduce a bloated prompt's tokens while preserving answer quality, and explain the cost/latency impact.

6b — Vector Databases & RAG

Vector Databases (deep dive) HIGHDEEP-DIVE

Deep-dive map: [ ] why (store + search embeddings by similarity at scale) · [ ] similarity metrics (cosine/dot/euclidean) · [ ] ANN (approximate nearest neighbor) — exact is too slow at scale · [ ] index types: HNSW (graph, great recall/latency), IVF, PQ (compression) · [ ] recall vs latency vs memory tradeoffs · [ ] metadata filtering + hybrid (vector + keyword/BM25) · [ ] chunking strategy's effect on retrieval quality · [ ] options: pgvector, Pinecone, Milvus, Weaviate, Qdrant, FAISS · [ ] scaling (sharding, replication of the index).

Resources: 📄 Pinecone Learning Center: https://www.pinecone.io/learn/ 📜 Malkov & Yashunin — HNSW; 📄 pgvector / FAISS docs.

✅ You know it when: you can explain HNSW vs IVF, the recall/latency tradeoff, and design hybrid search with metadata filters.

RAG — Retrieval-Augmented Generation (deep dive) HIGHDEEP-DIVE

Why deep-dive: the dominant pattern for grounding LLMs in fresh/private data — and a top LLM-design interview topic.

Deep-dive map: [ ] ingestion pipeline: loading → chunking (size/overlap/semantic) → metadata → embedding → index · [ ] query pipeline: embed query → retrieve top-k (ANN) → optional rerank (cross-encoder) → assemble context → generate · [ ] chunking & retrieval quality (the make-or-break) · [ ] hybrid retrieval (vector + keyword) + reranking · [ ] evaluation (retrieval precision/recall, faithfulness, answer relevance) · [ ] failure modes (missing/irrelevant chunks, context overflow, stale index → CDC re-embed) · [ ] cost/latency (retrieval + generation).

Resources: 📜 Lewis et al. — Retrieval-Augmented Generation (2020); 📄 LlamaIndex / LangChain RAG docs; 📄 Anthropic — Contextual Retrieval; 📄 Pinecone — RAG guides.

✅ You know it when: you can design an end-to-end RAG system (ingestion + query + rerank + eval) and diagnose why answers are wrong (retrieval vs generation).

RAG Patterns (Agentic RAG · Graph RAG · Hybrid) MEDIUMINTERMEDIATE

Learn (inline): Agentic RAG — an agent decides when/what to retrieve, can multi-step and use tools. Graph RAG — retrieve over a knowledge graph for multi-hop/relational questions (better global reasoning than flat chunks). Hybrid — combine dense (vector) + sparse (BM25) + rerank. Query rewriting, HyDE, multi-query.

Resources: 📄 Microsoft — GraphRAG; 📄 LangChain/LlamaIndex advanced RAG docs.

✅ You know it when: you can pick flat vs graph vs agentic RAG for a given question type.

LLM Caching (incl. Semantic Caching) HIGHINTERMEDIATE

Learn (inline): LLM calls are slow + expensive → cache aggressively. Exact cache (same prompt → stored response). Prompt/prefix caching (reuse KV cache for a shared prompt prefix — big system-prompt savings; provider feature). Semantic cache (embed the query; if a past query is similar enough, return its answer — big cost win, risk of wrong hits → tune threshold). Essential to avoid runaway bills.

Resources: 📄 Anthropic — Prompt caching; 📄 GPTCache docs (semantic caching).

✅ You know it when: you can design a semantic cache with a similarity threshold and explain prompt-prefix caching's savings.

6c — Prompting & Context Engineering

Prompting & Context Engineering HIGHINTERMEDIATE

Learn (inline): system prompt (role/rules/constraints), few-shot examples, conversation history, output-format instructions (JSON/schema), chain-of-thought / step-by-step, delimiters, grounding with retrieved context. Context engineering = assembling the right minimal context (instructions + examples + retrieved facts + history) under window/cost limits. Prompt-injection defense starts here (separate trusted instructions from untrusted input).

Resources: 📄 Anthropic — Prompt engineering docs; 📄 OpenAI prompting guide; 📄 promptingguide.ai; 📄 Lilian Weng — prompt engineering.

✅ You know it when: you can structure a robust system prompt with few-shot + format constraints and reason about injection risk.

6d — MCP (Model Context Protocol)

MCP Servers HIGHINTERMEDIATE

What & why: MCP is an open protocol that lets LLM apps connect to tools/data sources in a standard way — so your system can be exposed as an MCP server that any MCP-capable LLM/agent can call (tools, resources, prompts). Increasingly, "make your service AI-consumable" = "expose an MCP server."

Learn (inline): MCP concepts — tools (functions the model can call), resources (data it can read), prompts; client/server/transport; how it differs from ad-hoc function calling (standardized, discoverable, reusable). 🛠️ Build a small MCP server and connect a client.

Resources: 📄 Model Context Protocol docs: https://modelcontextprotocol.io/ 📄 Anthropic — MCP intro; 🛠️ MCP SDK quickstarts.

✅ You know it when: you can stand up an MCP server exposing a tool + resource and connect an MCP client to it.

6e — Agentic AI (basics)

AI Agents & the Agentic Loop HIGHINTERMEDIATE

Learn (inline): an agent = LLM + tools + memory + a loop that lets it decide actions, not just answer. Agentic loop: perceive/observe → plan/reason → act (call a tool/MCP) → observe result → repeat until goal/stop. Key pieces: tool/function calling, short- & long-term memory, planning, stopping criteria + budgets. Prefer the simplest thing that works (often a workflow, not a full agent).

Resources: 📄 Anthropic — Building Effective Agents (canonical): https://www.anthropic.com/research/building-effective-agents 📄 Lilian Weng — LLM-Powered Autonomous Agents.

✅ You know it when: you can diagram the agentic loop and decide when a workflow suffices vs a true agent.

Agent Patterns (ReAct · Reflection · Plan-and-Execute) HIGHINTERMEDIATE

Learn (inline): ReAct — interleave reasoning + acting (think → tool → observe → think…). Reflection/Reflexion — the agent critiques its own output and retries (self-correction). Plan-and-Execute — plan the full task upfront, then execute steps (cheaper, more controllable). Also: routing, prompt-chaining, orchestrator-workers, evaluator-optimizer (Anthropic's workflow taxonomy).

Resources: 📜 ReAct (Yao et al.); 📜 Reflexion; 📄 Anthropic — Building Effective Agents; 📄 LangGraph docs.

✅ You know it when: you can choose ReAct vs Plan-and-Execute vs a fixed workflow for a task and justify it.

Multi-Agent Systems & Production Concerns MEDIUMDEEP-DIVE

Learn (inline): multi-agent — specialized agents (planner/researcher/coder/critic) coordinated (orchestrator or handoffs); more capable but more cost/latency/failure surface — use only when a single agent can't. Production: handling tool failures/retries, timeouts & budgets (token/step caps to avoid runaway loops), determinism/observability of agent runs, hallucination mitigation (grounding/RAG, tool-use over recall, verification/critic steps, constrained outputs), guardrails (Phase 7), human-in-the-loop.

Resources: 📄 Anthropic — multi-agent research system write-up; 📄 LangGraph / OpenAI Agents SDK / CrewAI docs.

✅ You know it when: you can decide single vs multi-agent, cap runaway loops, and list concrete hallucination mitigations.

6f — Classic ML System Design (complements the LLM track)

ML System Design Framework MEDIUMINTERMEDIATE

Learn (inline): framing → data → features → model → serving → monitoring → feedback loop. Clarify the ML objective vs business metric; offline vs online metrics; latency/throughput targets; build-vs-buy. The interview framework mirrors HLD but adds data/feature/model/serving/monitoring stages.

Resources: 📖 Chip Huyen — Designing Machine Learning Systems (canonical); 📖 Machine Learning System Design Interview (Aminian & Xu).

✅ You know it when: you can run an ML problem through framing → data → features → model → serving → monitoring.

Data/Feature Pipelines, Serving & Experimentation MEDIUMINTERMEDIATE

Learn (inline): feature stores (offline/online parity, avoid training/serving skew); batch vs online vs streaming inference; model registry/versioning/rollback; A/B testing, shadow & canary models, offline↔online metric gaps; monitoring drift/decay + guardrail metrics.

Resources: 📖 Chip Huyen — DMLS; 📄 Eugene Yan blog (applied ML/recsys).

✅ You know it when: you can design a serving path with a feature store and an A/B + drift-monitoring plan.

Classic ML SD Problems MEDIUMDEEP-DIVE

Problems (build these): Recommendation system (candidate generation + ranking, two-tower, embeddings) · News/feed ranking · Ad CTR prediction · Search ranking · Fraud detection · Spam/abuse detection · Image/visual search.

Resources: 📖 ML System Design Interview (Aminian & Xu); 📄 Eugene Yan / company ML blogs (Netflix, Meta, Pinterest recsys).

✅ You know it when: you can design a recommender with candidate-gen + ranking + features + serving + metrics.

7AI Engineering II (Advanced): Evals, Guardrails, Fine-Tuning, Serving & LLMOps

The deep end — reliability, safety, cost, and running LLMs in production. Explore after Phase 6 lands. This is where Staff/Architect-level AI system design is decided.

Evaluation Harness & Evals HIGHINTERMEDIATE

Learn (inline): you can't improve what you can't measure — and LLM outputs are non-deterministic. Build evals: a curated dataset + graders. Grader types: exact/heuristic, LLM-as-judge (another model scores against a rubric — watch its biases), human review. For RAG: retrieval precision/recall, faithfulness/groundedness, answer relevance (e.g. Ragas). Run evals in CI to catch regressions when prompts/models change.

Resources: 📄 Hamel Husain — Your AI product needs evals (hamel.dev); 📄 OpenAI Evals; 📄 Ragas (RAG eval); 📄 Anthropic — evaluating outputs.

✅ You know it when: you can design an eval set + grader for a RAG bot and gate deploys on it.

Agent Harness MEDIUMINTERMEDIATE

Learn (inline): the scaffolding to run agents reliably — the loop runtime, tool registry/execution, memory, step/token budgets & timeouts, tracing of each step, retries/fallbacks, sandboxing tool calls, replay/determinism for debugging. Turns a fragile prompt-loop into an operable system.

Resources: 📄 LangGraph / OpenAI Agents SDK / DSPy docs; 📄 Anthropic — Building Effective Agents.

✅ You know it when: you can describe the components that make an agent debuggable and bounded in production.

Guardrails HIGHINTERMEDIATE

Learn (inline): validation around the model — input guards (block prompt injection, PII, unsafe requests; separate trusted instructions from untrusted content) and output guards (schema/format validation, toxicity/PII filters, groundedness checks, refusal handling). OWASP Top 10 for LLM Apps (prompt injection, insecure output handling, data leakage, excessive agency…). Deterministic checks + model-based checks; fail closed on high-risk actions.

Resources: 📄 OWASP Top 10 for LLM Applications; 📄 NVIDIA NeMo Guardrails / Guardrails AI; 📄 Lakera — prompt injection.

✅ You know it when: you can layer input/output guardrails on an LLM feature and defend against prompt injection + insecure output handling.

Fine-Tuning: When, LoRA & QLoRA MEDIUMDEEP-DIVE

Optional for generalist/backend Staff — know the decision (prompt→RAG→finetune) at app depth; go deep on internals only for ML-platform/AI-infra roles.

Learn (inline): decision order — prompting → RAG → fine-tuning (fine-tune for behavior/format/style/domain tone, not for injecting fresh facts; use RAG for knowledge). Full fine-tuning is expensive; PEFT trains a small number of params: LoRA (inject low-rank adapter matrices, freeze base weights — cheap, swappable), QLoRA (quantize base to 4-bit + LoRA → fine-tune big models on one GPU). Also: instruction tuning, RLHF/DPO (alignment) at a high level; data quality > quantity; eval before/after.

Resources: 📜 LoRA (Hu et al.) + 📜 QLoRA (Dettmers et al.); 📄 Hugging Face PEFT docs; 📄 OpenAI/Anthropic — when to fine-tune vs RAG.

✅ You know it when: you can decide prompt-vs-RAG-vs-finetune for a use case and explain how LoRA/QLoRA cut cost.

LLM Serving & Inference (deep dive) HIGHDEEP-DIVE

Why deep-dive: the infra that makes LLMs fast/cheap enough to serve — core to designing an LLM product's backend.

Deep-dive map: [ ] prefill vs decode phases; why decode is memory-bandwidth-bound · [ ] KV cache (and why long context is expensive) · [ ] continuous/dynamic batching (vLLM) · [ ] PagedAttention (KV-cache paging) · [ ] quantization (INT8/4-bit: GPTQ/AWQ) for throughput/memory · [ ] speculative decoding · [ ] GPU economics (VRAM limits, batching for utilization, tokens/sec) · [ ] serving stacks (vLLM, TGI, TensorRT-LLM) · [ ] self-host vs API tradeoffs.

Resources: 📜 PagedAttention / vLLM paper + vLLM docs; 📄 NVIDIA TensorRT-LLM; 📄 Hugging Face — LLM inference optimization.

✅ You know it when: you can explain why KV cache + continuous batching matter and estimate GPU needs for a target tokens/sec.

Cost & Latency Optimization HIGHINTERMEDIATE

Learn (inline): model routing/cascades (cheap model first, escalate hard queries to a bigger one), prompt/prefix caching + semantic caching (Phase 6), streaming responses (perceived latency), context trimming/compression, batching, smaller/distilled models where they suffice, an LLM gateway (central routing, caching, rate limiting, cost tracking, fallbacks across providers). Track cost per request/user; set budgets.

Resources: 📄 RouteLLM; 📄 Anthropic prompt caching; 📄 LiteLLM / LLM gateway docs.

✅ You know it when: you can cut an LLM feature's cost/latency with routing + caching + streaming and justify each.

LLM Observability & LLMOps HIGHINTERMEDIATE

Learn (inline): production LLM ops — tracing every call/agent step (prompts, tokens, latency, cost, tool calls), prompt/version management, online quality monitoring + feedback collection (thumbs, implicit signals), drift/regression detection, dataset curation from prod traffic to grow evals, cost dashboards, alerting. The LLM analog of Phase-4 observability.

Resources: 📄 LangSmith / Langfuse / Arize Phoenix / Helicone docs; 📄 "LLMOps" overviews.

✅ You know it when: you can instrument an LLM app with tracing + feedback + cost tracking and close the loop into evals.

🧩 LLM / GenAI System Design Problems HIGHDEEP-DIVE

How to approach: HLD framework + AI-specific layers (retrieval, model serving/routing, caching, guardrails, evals, cost/latency, observability).

Problems (design these):

ChatGPT-like chat service — multi-turn, streaming (SSE), history/context management, rate limiting, scale, cost.
RAG over a company knowledge base — ingestion + freshness (CDC re-embed), hybrid retrieval + rerank, guardrails, evals, multi-tenant isolation.
AI coding assistant — context gathering, tool/function calling, latency, safety.
Semantic search — embeddings + vector DB + hybrid + reranking at scale.
LLM content moderation — classification + guardrails + human-in-the-loop.
Real-time voice AI agent — STT → LLM → TTS pipeline, streaming, latency budget, barge-in.
LLM-powered recommendations — blend embeddings/LLM with classic recsys.
Agentic workflow platform — orchestration, tool sandboxing, budgets, observability, scaling agents.

Resources: 📖 ML System Design Interview (Aminian & Xu); 📄 Chip Huyen — Building LLM applications for production (huyenchip.com); 📄 provider architecture guides; 🎥 recent AI-system-design talks.

✅ You know it when: you can design a scalable, grounded, guarded, observable RAG/chat/agent product with explicit cost/latency tradeoffs.

8Staff-Level Operability & Judgment (capacity · cost · evolution · failures · review)

The 20% that separates strong Senior from Staff. Not more theory — the judgment layer: sizing systems, spending money wisely, evolving what already exists, learning from real outages, and critiquing designs. Study after the technical phases; apply to every HLD.

Capacity Planning & Performance HIGHDEEP-DIVE

Why deep-dive: Staff engineers reason in throughput/latency/utilization curves, not vibes. "Your solution does 50k QPS — does it hold at peak? where's the knee?"

Deep-dive map:

[ ] Little's Law — L = λ × W (concurrency = arrival-rate × latency); the master equation for queues, thread pools, connection pools sizing.
[ ] Utilization & saturation — the USE method (Utilization/Saturation/Errors); why latency explodes as utilization → 100% (queueing theory: the "hockey stick" past ~70–80%).
[ ] Throughput vs latency curves — find the knee; run to ~60–70% for headroom.
[ ] Bottleneck analysis — Amdahl's law; find the single constraint (CPU/mem/IO/network/lock/downstream) and fix that.
[ ] p99 budgeting — allocate a latency budget across hops (each service/DB/cache gets a slice); fan-out multiplies tail risk.
[ ] Load & stress testing — closed vs open models; k6/Locust/wrk/JMeter; test to failure to find limits.
[ ] Graceful degradation — shed load, serve stale/partial, degrade features (not crash) under overload; brownout over blackout.
[ ] Headroom & autoscaling — scale before saturation; account for warm-up, cold starts, and downstream limits.

Resources: 📖 Brendan Gregg — Systems Performance + USE method (brendangregg.com); 📖 The Art of Capacity Planning (Allspaw); 📜 Little's Law; 📖 Google SRE — Addressing Cascading Failures, Handling Overload.

✅ You know it when: given a QPS + latency target you can size pools/instances with Little's Law, find the bottleneck, allocate a p99 budget, and describe how the system degrades (not dies) at 2× load.

Cost-Aware Architecture HIGHDEEP-DIVE

Why deep-dive: modern Staff interviews ask "this costs $4M/year — cut it." Cost is a first-class design axis, not an afterthought.

Deep-dive map:

[ ] Cloud pricing intuition — compute vs storage vs network; on-demand vs reserved/savings-plans vs spot (interruptible, 60–90% cheaper — for stateless/batch).
[ ] Egress & cross-AZ/region traffic — the silent budget killer; keep chatty traffic in-AZ; colocate; watch inter-region replication + CDN egress.
[ ] Storage tiering — hot/warm/cold/archive (S3 Standard→IA→Glacier); lifecycle policies; compression/columnar for analytics.
[ ] Cache economics — a cache hit is cheaper + faster than a DB/LLM call; compute the break-even; but cache infra + staleness has costs.
[ ] CPU vs memory vs IO tradeoffs — right-size instance families; memory-bound vs CPU-bound; avoid over-provisioning.
[ ] Autoscaling economics — scale-to-zero (serverless) vs steady fleets; cost of cold starts vs idle capacity.
[ ] Data volume levers — sampling, aggregation/rollups, retention cuts, tiering (huge for logs/metrics/observability bills).
[ ] AI/GPU cost — GPU utilization/batching, quantization, model routing/cascades (cheap→expensive), prompt/semantic caching, token reduction, self-host vs API break-even.

Resources: 📄 AWS/GCP Well-Architected — Cost Optimization pillar; 📄 FinOps Foundation; 📄 Corey Quinn — Last Week in AWS (egress/cost intuition).

✅ You know it when: given a costly architecture you can identify the top 3 cost drivers (often egress, over-provisioned compute, unbatched GPU/LLM calls, log volume) and propose concrete cuts with the tradeoffs.

Evolving Existing Systems (a first-class theme) HIGHDEEP-DIVE

Why deep-dive: most real work — and increasingly interviews — is "we already have X; how do we evolve it safely?" not greenfield.

Deep-dive map:

[ ] Zero-downtime schema migration — expand → migrate → contract (parallel change): add new column/table, dual-write, backfill, switch reads, remove old. Online DDL tools (gh-ost, pt-online-schema-change, PlanetScale).
[ ] Backward/forward compatibility — additive changes only in a release; never break the wire while old + new run together; schema/versioning (Protobuf/Avro compat rules).
[ ] API evolution — versioning, deprecation policy, tolerant readers, additive fields.
[ ] Strangler Fig — incrementally route slices of a legacy system to the new one behind a facade until the old one is dead (Phase 4d).
[ ] Data migrations — dual-writes + backfill + reconciliation + shadow reads to verify before cut-over.
[ ] Rolling upgrades — N and N+1 coexist; make deploys reversible.
[ ] Feature flags, dark launches & shadow traffic — decouple deploy from release; dark-launch (run new path, discard output) and shadow (mirror prod traffic) to de-risk before switching users.
[ ] Rollback strategy — every migration needs a reverse; forward-fixes vs rollbacks.

Resources: 📄 Martin Fowler — ParallelChange (expand-contract), StranglerFigApplication, DarkLaunching; 📖 Refactoring Databases (Ambler & Sadalage); 📄 GitHub/Stripe/Shopify engineering migration write-ups.

✅ You know it when: you can lay out a zero-downtime plan to rename a hot column / split a table / replace a service, with dual-writes, backfill, shadow verification, and a rollback path.

Architecture Evolution Case Studies (why systems change) HIGHDEEP-DIVE

Why deep-dive: interviews increasingly ask "you have system X, it's hurting — how do you evolve it?" Walking the trigger → change → new tradeoffs is more instructive than greenfield. For each: what pain triggered it, the migration path, what you gained, what new problems you took on.

Journeys to be able to narrate:

[ ] Monolith → Modular Monolith — trigger: tangled code, slow builds; change: enforce module boundaries/ownership without distribution; gain: clarity; still one deploy. (Do this before microservices.)
[ ] Modular Monolith → Microservices — trigger: team/scale/deploy-independence pain; change: extract bounded contexts, own data per service, async where possible; new problems: network, distributed data, ops, sagas.
[ ] Single-region → Multi-region — trigger: latency for global users / DR; change: geo-routing, replication (active-passive → active-active), data residency; new problems: conflict resolution, consistency, cost (egress).
[ ] PostgreSQL → Sharded Postgres (Vitess/Citus) — trigger: single-node write/storage ceiling; change: pick a shard key, route, resolve cross-shard queries/txns; new problems: rebalancing, scatter-gather, hot shards.
[ ] CRUD DB → Kafka + CQRS/Event-Sourcing — trigger: many consumers, audit, read/write contention; change: events as source of truth, projections for reads, outbox; new problems: eventual consistency, replay, complexity.
[ ] Add a cache after scaling pain — trigger: DB read overload/latency; change: cache-aside + invalidation strategy; new problems: staleness, stampede, hot keys, another failure mode.
[ ] Sync → async processing — trigger: slow user requests, spiky load; change: queue + workers + idempotency; new problems: at-least-once, ordering, DLQs, observability of async flows.

Resources: 📄 company migration write-ups (Shopify modular monolith; Uber/Airbnb service extraction; Notion/Figma sharding; Stripe/GitHub migrations); 📄 Martin Fowler — MonolithFirst / StranglerFig.

✅ You know it when: for each journey you can state the triggering pain, the safe migration path, and the new tradeoffs it introduces — and argue when NOT to make the jump (e.g. microservices too early).

Learning from Outages (Production Failure Studies) MEDIUMDEEP-DIVE

Why deep-dive: operational judgment comes from studying how real systems break. For each: what failed → why → the deeper cause → prevention → the tradeoff.

Studies (read the postmortems):

[ ] GitHub Oct 2018 (24h) — network partition split MySQL; automated failover + cross-region replication caused a consistency mess; lesson: failover automation & topology assumptions.
[ ] AWS us-east-1 — S3 2017 (typo in a runbook command, capacity assumptions), Kinesis 2020 (thread/OS limits cascading), Dec 2021 (internal network); lesson: control-plane blast radius, region as a failure domain, hidden dependencies.
[ ] Cloudflare 2019 — a bad regex caused global CPU exhaustion; lesson: global config push = global blast radius; guardrails on rollouts.
[ ] Slack / Roblox / Facebook 2021 (BGP) — dependency loops, config/DNS/BGP, tools-down-during-outage; lesson: circular dependencies & recovery tooling.
[ ] Kafka/Cassandra partition & replication incidents — split-brain, unclean leader election, data loss on misconfig.

Cross-cutting lessons: blast-radius control (cells/regions), cascading failure (retry storms, thundering herds), config changes as the #1 outage cause, dependency cycles, "recovery must not depend on the thing that's down."

Resources: 📄 the official postmortems (each company's blog); 📄 danluu.com — postmortem collection; 📜 Richard Cook — How Complex Systems Fail; 📖 Google SRE — postmortem culture.

✅ You know it when: for each study you can state the trigger, the amplifier, and the prevention — and spot the same risk pattern in a design you're reviewing.

Design-Review & Architecture-Critique Exercises HIGHDEEP-DIVE

Why deep-dive: Staff work is often reviewing designs, not producing them. Interviews increasingly hand you an architecture and ask you to critique it.

Learn (inline) — the review checklist: for any given architecture, systematically hunt for →

Bottlenecks (the single busiest resource; fan-out multipliers)
SPOFs (any component without redundancy; hidden shared dependencies)
Consistency issues (dual-writes, cache/DB divergence, read-after-write gaps)
Scalability limits (statefulness, hot keys/partitions, unbounded growth, coordination points)
Operational risks (deploy/rollback, config blast radius, observability gaps, on-call load)
Failure modes (what happens when each dependency is down/slow? retry storms? graceful degradation?)
Security/tenancy (authz gaps, cross-tenant leakage, secrets)
Cost (obvious waste: egress, over-provisioning, unbatched calls)

Exercise format: take a design (yours or a reference) and produce a written critique using the checklist. Examples: "Critique this Instagram feed design — find the bottlenecks, SPOFs, consistency issues, scale limits, and operational risks." Do this for feed, chat, payments, a RAG service.

Resources: 📖 Google SRE / design-doc review culture; 📄 architecture-review checklists; 📖 Fundamentals of Software Architecture.

✅ You know it when: handed any architecture you can produce a structured critique (bottleneck/SPOF/consistency/scale/ops/failure/cost) in minutes.

9Practice, Resources & Interview Strategy

Turning knowledge into offers — and into staff-level judgment on the job.

Tradeoff Articulation HIGHINTERMEDIATE

Learn (inline): senior/staff signal is not naming components — it's reasoning about tradeoffs out loud. For every choice: state 2–3 options, the axes (consistency vs availability, latency vs cost, complexity vs flexibility, build vs buy), pick one, and say what you're giving up + when you'd revisit. Anchor to the requirements/SLOs. Quantify with your estimation numbers.

Resources: 📖 Alex Xu Vol 1 (framework); 🎥 mock-interview channels (observe how strong candidates justify).

✅ You know it when: you instinctively present "option A vs B, I pick A because X, tradeoff is Y" instead of a single answer.

Behavioral + System Design (Staff signals) MEDIUMFUNDAMENTAL

Learn (inline): staff/architect rounds probe scope, ambiguity, influence, and impact, not just tech. Use STAR; show driving cross-team decisions, handling ambiguity, mentoring, and business-aware tradeoffs. In design rounds, demonstrate ownership of failure modes, operability, and migration/rollout — not just the happy path.

Resources: 📖 Staff Engineer (Will Larson) + staffeng.com; 📄 company leveling guides.

✅ You know it when: you can tell 5–6 crisp STAR stories showing staff-level scope and tie designs to business impact.

Curated Practice Ladder HIGHINTERMEDIATE

Learn (inline): don't study passively — build & solve. LLD: 2–3 problems per tier from Phase 2c. HLD: 2–3 per tier from Phase 4h. ML/LLM: 2–3 from Phases 6f & 7. For each: do it timed, then compare to a reference/blog, note gaps, redo the weak part. Track with the checkboxes.

Resources: 📄 System Design Primer; 📖 Alex Xu Vol 1 & 2; 📄 awesome-low-level-design; 🎥 walkthrough channels.

✅ You know it when: you've completed the target problems per tier and can redo any from a blank page.

HLD Practice by Pattern (learn patterns, not solutions) HIGHINTERMEDIATE

Learn (inline): don't memorize "Design Twitter" — recognize the reusable pattern so any variant is solvable. Group your practice:

Cache-heavy / read-scaling: CDN, product catalog, news feed, timeline — patterns: fan-out-on-write vs read, multi-layer cache, denormalized read models.
Event-driven / async: notifications, email/SMS, analytics ingestion, order processing — patterns: queues/log, idempotency, outbox, DLQ, backpressure.
Search: search engine, document/typeahead search — patterns: inverted index, ranking, CDC-fed freshness, vector/hybrid.
Realtime / stateful connections: chat, presence, collaborative editor, live dashboards, gaming — patterns: WebSockets + pub/sub backplane, OT/CRDT, sharded connection state.
Geospatial / proximity: ride-matching, nearby, maps — patterns: geohash/quadtree/S2, sharding by geo.
Transactional / consistency-critical: payments, ticket booking, inventory, wallet — patterns: strong consistency, locking/optimistic, sagas, idempotency, exactly-once effects.
AI: RAG service, agent platform, inference gateway, semantic search — patterns: retrieval + serving + caching + guardrails + evals + cost/latency.

✅ You know it when: given a new prompt you first name its pattern(s), then reuse the pattern's toolkit instead of starting from scratch.

Interview Execution Playbook HIGHINTERMEDIATE

Learn (inline): the timeboxed flow to run every HLD round (~45 min) — narrate throughout:

Requirements (~5 min) — functional + non-functional (scale, latency, consistency, availability); who/what/how-much; explicitly de-scope.
Estimation (~3 min) — QPS (avg + peak), storage/day, bandwidth, cache size; state assumptions.
API design (~3 min) — key endpoints/contracts.
Data model (~5 min) — entities, storage choice + why (tie to access patterns).
High-level architecture (~8 min) — LB, services, DB, cache, queue, CDN; draw the happy path.
Deep-dive (~10 min) — the 1–2 hard parts the interviewer cares about; show depth.
Scale & bottlenecks (~5 min) — sharding, caching, replication, hot spots; where it breaks + fixes.
Reliability & operations (~3 min) — failure modes, degradation, deploy/rollback, and always close with observability: What are the SLIs (and SLOs)? What dashboards would you build? What alert wakes someone at 3am? Which metric reveals the bottleneck? How would you debug a p99 regression? (Most candidates finish at the architecture and stop — running this operational close-out is a strong Staff signal.)
Tradeoffs & wrap-up (~3 min) — restate key decisions, what you'd revisit, what you're accepting.

⚠️ Common mistakes (avoid these): jumping to components before clarifying requirements/estimation · no data model · ignoring failure modes & observability · hand-waving scale numbers · staying silent (not narrating) · over-engineering · not justifying tradeoffs · forgetting cost.

Resources: 📖 Alex Xu Vol 1 (framework chapter); 📄 System Design Primer; 🎥 recorded mock interviews.

✅ You know it when: you can run the 9-step flow on any prompt within time, always covering estimation, data model, bottlenecks, ops, and tradeoffs.

Mock Cadence & Common Failure Modes MEDIUMFUNDAMENTAL

Learn (inline): do timed mocks (peers / Pramp / interviewing.io). Failure modes to avoid: jumping to architecture before requirements/estimation, no data model, ignoring bottlenecks/failure modes, hand-waving scale numbers, silence (not narrating), over-engineering. Keep a personal rubric and score each mock.

Resources: 📄 interviewing.io / Pramp; 🎥 recorded mock system-design interviews.

✅ You know it when: you consistently cover requirements → estimation → API → data → HLD → deep-dive → bottlenecks → tradeoffs within time.

Company-Specific Patterns LOWFUNDAMENTAL

Learn (inline): tailor emphasis — big-tech infra roles weight distributed-systems depth + scale; product companies weight product sense + pragmatic tradeoffs; AI-first companies weight LLM/agent design + evals/cost. Read the target company's engineering blog to mirror their stack and vocabulary.

Resources: 📄 target company engineering blogs; 📄 levels.fyi / leveling guides.

✅ You know it when: you can adjust your design emphasis to the company's domain and cite their real systems.

📚 Master Resource List (bookmark this) HIGHFUNDAMENTAL

Books: DDIA (Kleppmann) ⭐ · System Design Interview Vol 1 & 2 (Alex Xu) ⭐ · Grokking the System Design Interview (Educative) · Designing ML Systems (Chip Huyen) ⭐ · ML System Design Interview (Aminian & Xu) · Building Microservices (Sam Newman) · Release It! (Nygard) · Google SRE Book (free) · Head First Design Patterns · Java Concurrency in Practice (Goetz) · Clean Architecture (Martin) · Staff Engineer (Larson).

Courses / free: MIT 6.824 Distributed Systems ⭐ · OSTEP (OS, free) · High Performance Browser Networking (free) · Karpathy "Neural Networks: Zero to Hero" (LLMs).

Papers (Papers We Love): GFS · MapReduce · Bigtable · Dynamo ⭐ · Spanner/TrueTime ⭐ · Kafka / The Log ⭐ · Raft ⭐ · Chubby · Cassandra · Zanzibar (authz) · The Tail at Scale · Attention Is All You Need ⭐ · RAG · LoRA / QLoRA · PagedAttention (vLLM).

Blogs/newsletters: ByteByteGo ⭐ · Martin Fowler · Martin Kleppmann · High Scalability · Arpit Bhayani (Asli Engineering) · Eugene Yan · Chip Huyen · Lilian Weng ⭐ · engineering blogs (Netflix, Uber, Meta, LinkedIn, Stripe, Cloudflare, Discord).

YouTube: Hussein Nasser ⭐ (networking/DBs) · Gaurav Sen · ByteByteGo · Arpit Bhayani · CodeKarle · System Design Interview · 3Blue1Brown (transformers) · Karpathy (LLMs).

GitHub: donnemartin/system-design-primer ⭐ · ashishps1/awesome-low-level-design · binhnguyennus/awesome-scalability · Papers We Love.

✅ You know it when: you have a personal, prioritized reading queue mapped to your weak phases.

Suggested Study Plan (adjust to your timeline) MEDIUMFUNDAMENTAL

Learn (inline): an example ~12-week pace (compress/expand as needed) —

Wk 1–2: Phase 1 (foundations) + start Phase 2 (LLD principles).
Wk 3: Phase 2 patterns + 3–4 LLD problems.
Wk 4–5: Phase 3 (DBs) — internals, transactions, + 2 DB deep-dives/week.
Wk 6–7: Phase 4 (distributed systems + architecture); begin HLD problems (Tier 1–2).
Wk 8: Phase 5 (big data/CDC) + HLD Tier 3.
Wk 9–10: Phase 6 (LLM/RAG/agents/MCP) + classic ML SD.
Wk 11: Phase 7 (advanced AI) + GenAI design problems.
Wk 11–12: Phase 8 (operability & judgment) + Phase 9 — mocks, HLD Tier 4, tradeoff drills, resource gaps.
Study depth-first: for each card, read the resources, then self-test against "✅ you know it when". Revisit weak cards; mark [x] only when you could teach it.

✅ You know it when: you have a dated plan and are marking cards [x] honestly.

🏁 Capstone: End-to-End Design Reviews HIGHDEEP-DIVE

Why: the final integration — each capstone forces you to apply every dimension at once, the way a real Staff loop (design + follow-ups) does. Do these last, timed, then critique your own solution with the Phase-8 review checklist.

For each capstone, produce all of: ① requirements (functional + non-functional) · ② estimation · ③ API + data model · ④ high-level architecture · ⑤ key tradeoffs (with alternatives) · ⑥ scaling & bottlenecks · ⑦ cost analysis · ⑧ reliability & failure modes · ⑨ observability (SLIs/dashboards/alerts) · ⑩ migration/evolution (how would v1→v2 go?).

The 10 capstones (spanning the patterns):

Chat platform (realtime, WebSockets, fan-out, presence)
Ride-sharing (geospatial matching, surge, streams)
Payments/wallet (strong consistency, idempotency, sagas, audit)
Video streaming (CDN, encoding pipeline, storage tiers, cost)
Multi-tenant SaaS (tenant isolation, noisy-neighbor, per-tenant limits/cost)
Real-time analytics (ingestion, stream processing, approximation, lambda/kappa)
News feed (fan-out-on-write vs read, caching, ranking)
AI inference gateway (model routing, batching, caching, rate limiting, cost/latency, observability)
RAG platform (ingestion + freshness/CDC, hybrid retrieval + rerank, guardrails, evals, multi-tenant)
Collaborative editor (OT/CRDT, realtime, conflict resolution, offline)

Resources: 📖 Alex Xu Vol 1 & 2; 📄 System Design Primer; 📄 company engineering blogs; 🎥 mock interviews for follow-up style.

✅ You know it when: you can take any capstone from blank page to all 10 dimensions in ~45–60 min, then critique your own design for bottlenecks/SPOFs/consistency/cost/ops.

No topics match your filters.