Amazon AWS · cloud

AWS Solutions Architect Associate SAA-C03

Master AWS architecture for the SAA-C03 exam: EC2 purchasing options, Auto Scaling, S3 storage classes, VPC connectivity, RDS Multi-AZ, Aurora Global Database, DynamoDB, IAM, KMS, Lambda, SQS/SNS fan-out, and disaster recovery strategies with RTO/RPO tradeoffs.

8Modules
40 hoursDuration
intermediateLevel
SAA-C03Exam code
65Exam questions
720 / 1000Passing score
130 minExam duration
$150Exam fee (USD)
3 yearsValidity
Study on the go — AWS architecture podcast

Reinforce S3 storage class tradeoffs, VPC connectivity patterns, and DR strategy choices while commuting or working out. New episodes covering SAA-C03 architecture scenarios drop weekly.

▶ Listen on Spotify

Why earn the AWS Solutions Architect Associate?

SAA-C03 is the single most-recognized AWS certification. It's a judgment exam — most questions have two plausible answers, and the right one depends on a constraint like "most cost-effective" or "least operational overhead".

  • Most-requested AWS certification on job postings — the de-facto AWS Associate baseline
  • Validates architecture decisions across 200+ AWS services (compute, storage, networking, databases, security)
  • Tests the Well-Architected pillars: operational excellence, security, reliability, performance, cost, sustainability
  • Opens cloud architect, solutions engineer, and senior DevOps roles — median $130–$160k for SAA-certified architects in the US
  • Foundation for the AWS Pro tier (SAP-C02 Solutions Architect Professional) and most AWS Specialty exams
  • 3-year validity with free recertification via any higher-tier AWS exam
Exam strategy: SAA-C03 questions almost always have two reasonable answers. Find the constraint keyword first ("MOST cost-effective", "LEAST operational overhead", "MINIMUM downtime", "no code changes"), then eliminate options that violate it. When two options remain, prefer fully managed services over self-managed, serverless over provisioned, and existing AWS features over custom Lambda glue.

SAA-C03 exam domains

Four domains spanning the Well-Architected pillars. Secure Architectures and Resilient Architectures together make up 56% of the exam — invest study time accordingly.

Domain 1 — Design Secure Architectures 30%
Domain 2 — Design Resilient Architectures 26%
Domain 3 — Design High-Performing Architectures 24%
Domain 4 — Design Cost-Optimized Architectures 20%

8 modules · ~40 hours

Each module maps to one or more exam domains. Work through them in order or focus on weak areas using the practice test to guide you. Every lesson ends with a mini-quiz CTA so you can verify recall before moving on.

01

AWS Fundamentals & the Exam Blueprint3 lessons

Before building architectures, lock in the mental model of AWS's global infrastructure, the exam's judgment-question grading, and the Shared Responsibility Model. These foundations decide which answer is "right" on questions where multiple options would technically work — a pattern SAA-C03 leans on heavily.

region-az-edge local-zones outposts constraint-keywords well-architected shared-responsibility rto-rpo exam-strategy
~3h
📖 Read in-depth chapter
Lesson 1.1 Global infrastructure — Region, AZ, Edge, Local Zones

Every SAA-C03 architecture decision lives somewhere in the AWS global topology. The exam tests your ability to pick the right placement primitive — Region vs AZ vs Edge vs Local Zone vs Outpost — based on latency, residency, and HA constraints.

Key concepts
  • Region: geographic cluster of 3+ AZs (e.g., us-east-1, eu-west-3). Independent control plane. Pick on latency to users, data-residency law (GDPR → eu-*), and service availability (newest services land in us-east-1 first).
  • Availability Zone (AZ): one or more physically separate data centers within a Region, connected by < 2 ms low-latency fiber. AZs share NO failure domain — power, cooling, networking are independent. HA = span at least 2 AZs.
  • Edge Location: 600+ points-of-presence used by CloudFront, Route 53, AWS WAF, Global Accelerator. Used for caching, DNS, and DDoS absorption — NOT for running general compute.
  • Local Zones: AWS-managed metro extensions of a parent Region (e.g., us-west-2-lax-1a Los Angeles). Sub-10ms latency to local users. Used for media, real-time gaming, and latency-sensitive enterprise apps.
  • AWS Outposts: a rack of AWS hardware shipped to your data center, controlled by an AWS Region. Use when latency to on-prem must be sub-millisecond or data legally cannot leave premises.
  • Wavelength Zones: AWS compute embedded inside 5G carrier networks. Single-digit-ms latency for mobile users. Niche but exam-favored when "mobile + ultra-low latency" appears.
Concrete example

A French ad-tech platform serves bidders in Paris with a 50 ms latency budget and stores PII subject to GDPR. Design: deploy the workload in eu-west-3 (Paris) — keeps data in France and inside the 50ms latency budget. Use a CloudFront distribution with Paris edge locations for static asset caching. If a 5G mobile-app launch needs < 10 ms inference latency, add an AWS Wavelength Zone on the Orange France 5G network. No need for a Local Zone — Paris Region already covers the user base.

Key takeaway: Region for data residency + service availability. AZ span for HA. Edge for caching + DDoS. Local Zone / Outpost / Wavelength only when latency demands single-digit ms.
⚡ Mini-quiz
Drill global-infrastructure placement scenarios → study mode (10 questions).
Lesson 1.2 Reading SAA-C03 questions — constraint keywords and judgment

SAA-C03 is a judgment exam. Almost every question has two answers that would technically work; the right one is determined by a constraint keyword. Spotting and prioritizing that keyword is the single biggest score lever.

Key concepts
  • "MOST cost-effective": points to auto-scaling, Spot Instances, Reserved Instances / Savings Plans, S3 lifecycle policies, Aurora Serverless v2, Lambda + DynamoDB On-Demand. Eliminate any answer that over-provisions.
  • "LEAST operational overhead": points to fully managed services. Fargate over EC2, Aurora over RDS, DynamoDB over self-managed Cassandra, API Gateway + Lambda over ALB + EC2, AWS Backup over custom snapshot scripts.
  • "MINIMAL downtime" or "no downtime": points to Multi-AZ, blue/green deployments, Aurora Global Database, Route 53 failover routing, EC2 Auto Scaling with health-check-based replacement.
  • "No code changes": rules out anything requiring app modification. RDS Multi-AZ (same endpoint) works; RDS read replicas don't (different endpoint). RDS Proxy fits when "lots of Lambda connections + no code changes".
  • RTO / RPO numbers: map directly to DR strategy. RPO seconds → Aurora Global Database / DynamoDB Global Tables. RPO minutes-hours → backup + restore. RTO minutes → warm standby. RTO seconds → active-active.
  • Two-pass scan: first read identifies the constraint; second read maps each answer to that constraint. Flag and skip when both passes don't surface a clear winner — come back at the end.
Concrete example

Question: "A company runs a Java app on EC2 connecting to RDS MySQL. Lambda functions also need to query the same DB. Connection counts are spiking during peaks, causing errors. What is the MOST cost-effective solution with the LEAST operational overhead?" Two constraints stacked. Eliminate "rewrite Java app to use connection pooling" (operational overhead). Eliminate "migrate to Aurora Serverless" (cost overhead for a working system). Right answer: RDS Proxy — fully managed connection pooling, no code change, pay per ENI-hour.

Key takeaway: the constraint keyword decides. Find it first, eliminate violators, then pick the most-managed remaining option. Flag genuine ties and revisit.
⚡ Mini-quiz
Practise constraint-keyword scenarios → quick quiz (5 questions).
Lesson 1.3 Shared Responsibility Model and the Well-Architected pillars

SAA-C03 leans on the Well-Architected Framework as a vocabulary — Operational Excellence, Security, Reliability, Performance, Cost, Sustainability. The Shared Responsibility Model decides who owns which control. Both surface as "which AWS service / which configuration" questions.

Key concepts
  • AWS responsibility (security OF the cloud): physical data centers, hardware, hypervisor, network infrastructure, and patches for the managed-service control plane (RDS engine, Lambda runtime, S3 storage layer).
  • Customer responsibility (security IN the cloud): guest OS patches on EC2, application code, network configuration (Security Groups, NACLs), IAM permissions, data-at-rest and in-transit encryption choices, S3 bucket policies.
  • Managed-service line shifts: on EC2 you patch the OS; on RDS AWS patches the DB engine; on Aurora Serverless you only configure capacity ranges; on DynamoDB / Lambda you only own the data and code.
  • Well-Architected pillars: Operational Excellence (run + monitor), Security (defense in depth), Reliability (recover from failure), Performance Efficiency (right-size), Cost Optimization (eliminate waste), Sustainability (carbon footprint).
  • Trusted Advisor + Well-Architected Tool: AWS-provided audit tools that surface recommendations against the pillars. The tool generates a workload review you walk through with stakeholders.
  • Service Quotas: per-Region soft limits AWS enforces (e.g., default 5 VPCs per region). Increase via the Service Quotas console well before you hit them — production outages from quota exhaustion are common.
Concrete example

A SOC2 audit asks "who is responsible for patching the database?". For an RDS PostgreSQL instance: AWS patches the engine binaries during your maintenance window; YOU control which maintenance window (Reliability pillar) and YOU encrypt the data at rest with KMS (Security pillar). For an EC2 self-managed PostgreSQL: YOU patch the engine, the OS, and the OS kernel. Choosing RDS over self-managed shifts the patching responsibility AWS-ward — explicit alignment with "least operational overhead".

Key takeaway: the Shared Responsibility line shifts based on the service tier. Higher up the managed-service stack = less you patch. Well-Architected pillars are the exam's vocabulary for justifying choices.
⚡ Mini-quiz
Drill Shared Responsibility + Well-Architected scenarios → study mode (10 questions).
02

Compute — EC2, Auto Scaling & Load Balancers3 lessons

Compute is the largest dollar line item and one of the most-tested SAA-C03 areas. Master the five EC2 purchasing options (and which question keyword points at each), Auto Scaling policies including warm pools, and the ALB-vs-NLB-vs-GWLB decision tree. Block storage choices (gp3, io2, st1, instance store) round out the module.

on-demand reserved-instances savings-plans spot auto-scaling target-tracking alb-nlb-gwlb ebs-instance-store
~6h
📖 Read in-depth chapter
Lesson 2.1 EC2 purchasing options — On-Demand, RI, SP, Spot, Dedicated

Pricing-model choice is the single biggest compute cost lever. The exam tests when to use each model and how to combine them safely — for example, RI baseline + Spot for burst + On-Demand fill.

Key concepts
  • On-Demand: pay per second (Linux/Windows) or per hour. No commitment. Highest price. Best for short-lived, unpredictable, or initial development workloads.
  • Reserved Instances (1 / 3 year): 30–72% discount. Standard RI = max discount, locked to instance family (size flex within family). Convertible RI = exchangeable for different family/OS, smaller discount. Almost always prefer Compute Savings Plans over Convertible RI today.
  • Savings Plans: commit to $/hr compute spend. Compute SP covers EC2 + Fargate + Lambda + SageMaker across any family / region / OS — most flexible. EC2 Instance SP commits to one family in one region — bigger discount, less flexible.
  • Spot Instances: up to 90% off, 2-minute interruption notice. Best for fault-tolerant batch, stateless processing, CI runners, big-data, ASG fallback. Diversify across instance types + AZs via Spot Fleet / mixed-instances ASG.
  • Dedicated Hosts vs Dedicated Instances: Dedicated Host = physical server visible to you, used for per-socket / per-core licensing (Oracle, Windows DC, SQL Server BYOL). Dedicated Instances = single-tenant hardware but AWS still manages placement.
  • Capacity Reservations: reserve specific instance type/AZ capacity without a pricing commitment. Combine with Savings Plans to lock both capacity AND price.
Concrete example

A SaaS runs a 24/7 web tier (baseline 12 c5.large, peak 36) and a nightly batch ETL (200 c5.xlarge for 4 hours). Optimal pricing: 3-year Compute Savings Plan covering 12 c5.large-equivalents = ~60% off baseline. Peak burst served by On-Demand. ETL runs on Spot via an ASG with mixed-instance policy (c5, c5n, c6i across 3 AZs) — ~85% saved on the batch tier. Total: ~45% lower than All-On-Demand.

Key takeaway: Compute SP for steady baseline, On-Demand for unpredictable peak, Spot for fault-tolerant batch, Dedicated Hosts only for BYOL licensing. Always wire Spot interruption handling.
⚡ Mini-quiz
Drill EC2 pricing-model scenarios → study mode (10 questions).
Lesson 2.2 Auto Scaling policies, lifecycle hooks, warm pools

Auto Scaling matches capacity to demand. SAA-C03 tests which policy fits the load profile, how lifecycle hooks enable zero-downtime ops, and when warm pools rescue boot-slow workloads from spike lag.

Key concepts
  • Target tracking: declare a target metric value (e.g., CPU = 50%). AWS adjusts capacity to maintain it. Simplest and recommended default for most web tiers.
  • Step scaling: CloudWatch alarm-driven step adjustments (+1 instance at 60% CPU, +3 at 80%). Use when scale response must be non-linear or when you need finer control than target tracking.
  • Scheduled scaling: calendar-based adjustments (e.g., scale to 20 instances at 8:55 AM weekdays for market open). Predictable load only.
  • Predictive scaling: ML-driven forecast that pre-scales for recurring patterns. Best for daily / weekly cyclic traffic with at least 14 days of history.
  • Lifecycle hooks: pause instances at Pending:Wait (before joining target group) or Terminating:Wait (before tear-down). Use SSM Run Command or Lambda to warm caches, drain connections, or capture logs. Critical for stateful apps.
  • Warm pools: pre-initialized stopped instances ready for near-instant launch. Eliminates boot+warm time when responding to spikes. Pay only for EBS volume costs while stopped.
Concrete example

An e-commerce ASG runs 4–20 instances on Java app servers with 5-minute boot + cache-warm time. Black Friday spikes cause 10× traffic in 60 seconds — instances can't catch up. Fix: enable a warm pool of 8 stopped instances at desired state. Add a lifecycle hook on Pending:Wait running an SSM Automation that pre-warms the local cache before the instance attaches to the ALB. Target tracking policy CPU=60% with 60s cooldown. Result: spike response drops from 6 min to ~45s.

Key takeaway: target tracking by default. Lifecycle hooks for graceful join/leave. Warm pools when boot time matters. Predictive scaling when traffic is cyclic.
⚡ Mini-quiz
Practise Auto Scaling scenarios → quick quiz (5 questions).
Lesson 2.3 Load balancers and EC2 storage choices

Picking the right ELB and the right disk for an EC2 workload is a recurring SAA-C03 scenario. ALB / NLB / GWLB each have a clear sweet spot; gp3 / io2 / st1 / instance store map directly to IOPS, throughput, and persistence requirements.

Key concepts
  • Application Load Balancer (ALB): Layer 7 HTTP/HTTPS. Path-based + host-based routing, Lambda targets, redirects, WAF integration, native gRPC. The default choice for web apps and microservices.
  • Network Load Balancer (NLB): Layer 4 TCP/UDP/TLS. Ultra-low latency (~100µs), static IPs / Elastic IPs, source-IP preservation. Used for high-throughput TCP services and as the back-end for PrivateLink providers.
  • Gateway Load Balancer (GWLB): Layer 3 for inserting third-party virtual appliances (firewalls, IDS/IPS, packet inspection). Uses GENEVE encapsulation. Niche but exam-favored when "transparent inline appliance" appears.
  • EBS gp3: general-purpose SSD with decoupled IOPS / throughput. Up to 16 000 IOPS + 1 000 MB/s, independent of volume size. The 2024+ default — cheaper than gp2 at every size.
  • EBS io2 / io2 Block Express: provisioned IOPS for tier-1 databases. Up to 256 000 IOPS, 99.999% durability. Multi-Attach allows up to 16 instances to share one volume for clustered DBs.
  • EBS st1 / sc1: throughput-optimized HDD (big data, log processing) / cold HDD (rarely accessed). Cheap per GB but low IOPS — never for boot volumes.
  • Instance Store: NVMe physically attached to the host. Microsecond latency, millions of IOPS, but ephemeral (lost on stop/terminate). Use for caches, scratch space, NoSQL replication targets.
Concrete example

A SaaS migrates a Java REST service: ALB in front of an Auto Scaling group, path-based routing splits /api/* to one target group and /admin/* to another (with WAF rate limiting only on admin). The pricing-engine microservice using gRPC stays on ALB (native support). A streaming-ingest service handling 500k TCP connections/sec uses NLB for the static IP + L4 speed. Database tier on i4i.4xlarge with Instance Store NVMe for the WAL and EBS io2 Block Express 64 000 IOPS for table data.

Key takeaway: ALB for HTTP, NLB for L4 / static IP, GWLB for appliances. gp3 = new default, io2 for tier-1 DBs, st1 for big-data throughput, Instance Store for ephemeral speed.
⚡ Mini-quiz
Drill ELB + EBS choice scenarios → study mode (10 questions).
03

Storage — S3, EFS, FSx, DataSync & Snowball3 lessons

S3 is the most-tested service on SAA-C03. Master the eight storage classes and their access / retrieval / minimum-storage tradeoffs, the lifecycle policy syntax, S3 security (bucket policies, Block Public Access, presigned URLs, Object Lock, encryption tier choice), and the migration toolset (DataSync, Snowball, Storage Gateway) for moving data into AWS.

s3-storage-classes lifecycle-policy block-public-access presigned-urls object-lock sse-kms efs-fsx datasync-snowball
~5h
📖 Read in-depth chapter
Lesson 3.1 S3 storage classes and lifecycle policies

S3 storage cost spans 25× between Standard and Glacier Deep Archive. Picking the right tier and automating transitions via lifecycle policies is the canonical SAA-C03 cost optimisation question.

Key concepts
  • S3 Standard: 99.99% availability, 3+ AZ durability, instant retrieval. For frequently accessed data. The default; transition out only when access pattern is proven.
  • S3 Intelligent-Tiering: auto-moves objects between Frequent / Infrequent / Archive Instant Access / Archive Access / Deep Archive tiers based on observed access. Small per-object monitoring fee. Best when access pattern is unknown.
  • S3 Standard-IA / One Zone-IA: lower storage price, retrieval fee per GB. 30-day minimum storage. Standard-IA is multi-AZ; One Zone-IA is single-AZ (cheaper but loses data if AZ fails — use for reproducible secondary copies only).
  • S3 Glacier Instant Retrieval: ms retrieval at archive prices. 90-day minimum storage. Use for "rarely accessed but must be instant when needed" (medical imaging, news archives).
  • S3 Glacier Flexible Retrieval / Deep Archive: minutes-hours (Flexible) or hours (Deep Archive, ~$0.001/GB-month) retrieval. 90 / 180-day minimum. For 7+ year compliance archives.
  • Lifecycle policies: declarative rules to transition between classes and / or expire objects. Common stack: Standard → Standard-IA at 30 days → Glacier Flexible at 90 → Deep Archive at 365 → expire at 7 years. ALSO abort incomplete multipart uploads after 7 days — easy-to-miss cost trap.
Concrete example

A media SaaS uploads 2 TB of user-generated video per month. Access pattern: 80% of views happen in the first 7 days, 15% in the first 90 days, < 5% after. Compliance requires 7-year retention. Design: S3 Standard on upload, lifecycle transitions to Standard-IA at 30 days, Glacier Flexible at 90 days, Glacier Deep Archive at 365 days, expire at 7 years. Abort incomplete multipart uploads at 7 days. Result: 70% lower storage cost than All-Standard.

Key takeaway: match storage class to retrieval pattern. Lifecycle policies automate tiering. Intelligent-Tiering when pattern unknown. Always abort incomplete multipart uploads.
⚡ Mini-quiz
Drill S3 lifecycle + storage-class scenarios → study mode (10 questions).
Lesson 3.2 S3 security — policies, Block Public Access, presigned URLs, Object Lock, encryption

S3 misconfiguration is the single most common cause of cloud data leaks. SAA-C03 expects you to know the Block Public Access defaults, the bucket-vs-IAM-policy interaction, presigned URLs for credential-free uploads, Object Lock for WORM compliance, and the encryption tier tradeoffs (SSE-S3 / SSE-KMS / SSE-C / client-side).

Key concepts
  • S3 Block Public Access (BPA): account-level + bucket-level toggle that overrides ALL ACLs and bucket policies granting public access. ON by default since 2023. Disable per-bucket only when public hosting is intentional.
  • Bucket policy vs IAM policy: bucket policy is resource-based, attached to the bucket — controls cross-account access, VPC-endpoint access, IP-based restrictions. IAM policy is identity-based — controls what users/roles can do across all buckets.
  • Presigned URLs: time-limited GET / PUT URLs signed with the creator's credentials. Lets browsers upload directly to S3 without backend proxy. Default 1 hour expiry; max 7 days for IAM-user signers, 12 hours for IAM-role signers.
  • S3 Object Lock: WORM (Write Once Read Many). Governance mode = permission-override deletes possible. Compliance mode = NOBODY can delete until retention expires, not even root. Requires versioning enabled.
  • Encryption tiers: SSE-S3 (AWS-managed AES-256, transparent, free), SSE-KMS (customer-managed CMK, audited, $0.03/10k requests + small overhead), SSE-C (you supply the key each request, AWS never stores it), client-side (you encrypt before upload).
  • S3 default encryption: every bucket is encrypted by default since 2023 (SSE-S3 minimum). Override per-bucket with SSE-KMS + a specific CMK when audit / key-rotation control is required.
Concrete example

A photo-sharing app needs (1) browser-direct uploads with no backend proxy, (2) public reads only via CloudFront, (3) ransomware protection. Design: S3 Block Public Access ON. Backend issues presigned PUT URLs with 15-min expiry to authenticated users. CloudFront uses an Origin Access Control (OAC) with a bucket policy allowing only the distribution to GET. SSE-KMS with a Customer-Managed Key (auto-rotated yearly). S3 Versioning ON + Object Lock Governance mode with 30-day retention — ransomware can't overwrite the originals.

Key takeaway: BPA ON by default, presigned URLs for credential-free I/O, OAC for CloudFront, Object Lock + versioning for ransomware resilience, SSE-KMS when key custody matters.
⚡ Mini-quiz
Practise S3 security scenarios → quick quiz (5 questions).
Lesson 3.3 EFS, FSx, and migrating data with DataSync / Snowball

When the workload needs a file system, not an object store, the choice is EFS (Linux NFS) vs FSx (Windows / Lustre / NetApp / OpenZFS). For getting data INTO AWS at scale, DataSync handles online transfers and Snowball handles "too big to ship over the wire" migrations.

Key concepts
  • Amazon EFS: managed NFSv4 file system. Multi-AZ by default, shared by 1000s of EC2 / Fargate / Lambda clients. Storage classes: Standard, Standard-IA, One Zone, One Zone-IA. Auto-tiering via lifecycle.
  • Amazon FSx for Windows File Server: managed SMB file shares with Active Directory integration. For Windows apps lifted-and-shifted from on-prem file servers.
  • Amazon FSx for Lustre: sub-millisecond, 100s-of-GB/s parallel file system. For HPC, ML training, financial modelling. Can be backed by S3 — Lustre lazy-loads from S3, writes back on demand.
  • Amazon FSx for NetApp ONTAP / OpenZFS: enterprise file systems with snapshots, clones, multi-protocol (NFS + SMB), data dedup. For NetApp customers migrating without rewriting.
  • AWS DataSync: online transfer of NFS / SMB / HDFS / S3 / Azure data to AWS storage. 10× faster than rsync, automatic schedule, integrity validation. Use when bandwidth × time < data size / 7 days.
  • AWS Snow Family: Snowcone (8 TB, edge processing), Snowball Edge (50–80 TB, optional EC2 + GPU), Snowmobile (100 PB truck — rare). Use when online transfer would take > 1 week. Calculate: data ÷ effective bandwidth.
Concrete example

A media studio migrates 800 TB of video editing assets from on-prem NetApp to AWS. Bandwidth: 1 Gbps line × 70% effective = ~700 Mbps. Time estimate: 800 TB × 8 bits ÷ 700 Mbps = ~106 days online → use Snowball Edge. 10× 80 TB devices shipped, encrypted in flight, ingested to S3 over weekend. Production data going forward: FSx for NetApp ONTAP for editing workflows (cross-protocol NFS + SMB, snapshot integration). Daily incremental sync via DataSync.

Key takeaway: EFS for Linux shared NFS. FSx variants for Windows / Lustre / NetApp / ZFS. DataSync for online migration. Snowball when "online" would take > 1 week.
⚡ Mini-quiz
Drill EFS / FSx / Snowball scenarios → study mode (10 questions).
🎧

Halfway through compute and storage? Reinforce S3 lifecycle policies and EC2 pricing models by listening to the CertQuests podcast — concise audio breakdowns of exactly these scenarios for your commute.

▶ Open Spotify
04

Networking — VPC, Security Groups & Connectivity3 lessons

VPC is the second-most-tested area on SAA-C03. Master VPC building blocks (IGW, NAT Gateway, subnets, route tables), the stateful-vs-stateless gap between Security Groups and NACLs, and the multi-VPC / hybrid connectivity options (VPC Peering, Transit Gateway, PrivateLink, Site-to-Site VPN, Direct Connect) with their cost and transitivity trade-offs.

vpc-building-blocks nat-gateway security-groups nacls vpc-endpoints transit-gateway privatelink direct-connect
~6h
📖 Read in-depth chapter
Lesson 4.1 VPC building blocks — subnets, IGW, NAT, route tables

Every AWS architecture sits inside a VPC. SAA-C03 tests how traffic flows from the internet to a private EC2 instance, why one NAT Gateway per AZ matters for HA, and how route-table entries determine reachability.

Key concepts
  • VPC CIDR: private IPv4 block (typically /16 = 65k IPs) carved into subnets (/24 = 251 usable). One VPC spans one Region across multiple AZs. Secondary CIDRs allow expansion without re-architecting.
  • Public subnet: route table includes 0.0.0.0/0 → IGW. Resources need a public IP or Elastic IP to reach the internet. Hosts ALBs, NAT Gateways, bastion hosts.
  • Private subnet: no IGW route. Outbound internet via a NAT Gateway in a public subnet. Hosts app servers, databases, caches — anything that shouldn't accept inbound from the internet.
  • Internet Gateway (IGW): horizontally-scaled VPC component allowing two-way internet traffic for resources with public IPs. One IGW per VPC. Attached at VPC level, not subnet level.
  • NAT Gateway: managed, AZ-scoped, scales to 45 Gbps. Allows private-subnet outbound to internet; blocks all unsolicited inbound. For HA deploy ONE NAT Gateway per AZ with each private subnet routing 0.0.0.0/0 to its OWN AZ's NAT — cross-AZ NAT routing adds per-GB cost.
  • VPC Flow Logs: capture ENI traffic to CloudWatch Logs / S3 / Kinesis Data Firehose. ACCEPT/REJECT format reveals SG / NACL drops. The exam-canonical tool for "why can't A reach B?".
Concrete example

A 3-tier web app in 10.0.0.0/16: public subnets 10.0.1.0/24 (AZ-a) and 10.0.2.0/24 (AZ-b) host the ALB + per-AZ NAT Gateways. Private app subnets 10.0.11.0/24 + 10.0.12.0/24 host EC2 fleet, route 0.0.0.0/0 to their own AZ's NAT. Private DB subnets 10.0.21.0/24 + 10.0.22.0/24 host RDS Multi-AZ — no NAT route, no internet egress. VPC Flow Logs at REJECT level stream to CloudWatch for connectivity debugging.

Key takeaway: 3-tier subnet layout (public/private-app/private-DB) per AZ. One NAT Gateway per AZ for HA + cost. Flow Logs for forensics.
⚡ Mini-quiz
Drill VPC building-block scenarios → study mode (10 questions).
Lesson 4.2 Security Groups vs NACLs and VPC Endpoints

Security Groups are stateful, NACLs are stateless — and the exam tests the difference relentlessly. VPC Endpoints (Gateway for S3/DynamoDB free; Interface / PrivateLink for everything else) close the NAT-Gateway cost loophole.

Key concepts
  • Security Groups (stateful): implicit deny, explicit allow only. Return traffic auto-allowed. Reference other SGs as source/destination (e.g., "allow from sg-webtier"). Apply at the ENI level. Default limit: 60 inbound + 60 outbound rules.
  • NACLs (stateless): rule-numbered allow + deny. Apply at subnet level. Return traffic NOT automatic — must explicitly open ephemeral ports (1024-65535) outbound. Rule numbers in increments of 100 leave room for inserts.
  • SG vs NACL pick: SG always (instance-level intent). NACL only as a defensive secondary layer (e.g., subnet-level blocked-IP list). Don't push all filtering down to NACL — rule limits and statelessness bite.
  • Gateway VPC Endpoints: S3 and DynamoDB only. Add a route-table entry steering subnet traffic over the AWS backbone — no internet, no NAT cost. Free. The canonical "reduce NAT Gateway cost" answer.
  • Interface VPC Endpoints (PrivateLink): ENI in your subnet with a private IP that resolves to an AWS service (SSM, STS, ECR, Secrets Manager, KMS, etc.) or your own NLB-backed service. Pay per-hour + per-GB. Use for cross-account service exposure WITHOUT routing-table touches.
  • PrivateLink as service-provider: expose an NLB-fronted service via PrivateLink — consumers create Interface Endpoints pointing at your service name. Clean consumer/provider separation, no CIDR overlap issues.
Concrete example

Cost analysis shows $3 200/month going to NAT Gateway data processing because private-subnet Lambda functions are hammering S3 and DynamoDB. Fix: add a Gateway VPC Endpoint for S3 and a Gateway VPC Endpoint for DynamoDB with route-table entries on the private subnets — both FREE. NAT bill drops to $200/month overnight. SSM Session Manager access added via an Interface VPC Endpoint for ssmmessages + ec2messages + ssm (pay per ENI-hour but unlocks bastion-free access).

Key takeaway: SG stateful + intent-based, NACL stateless + defense-in-depth. Gateway endpoints free for S3/DynamoDB — the #1 NAT cost fix. Interface endpoints / PrivateLink for cross-account service exposure.
⚡ Mini-quiz
Practise SG / NACL / Endpoint scenarios → quick quiz (5 questions).
Lesson 4.3 Multi-VPC and hybrid — Peering, Transit Gateway, VPN, Direct Connect

Beyond a single VPC, the connectivity choices each have distinct scaling, cost, and transitivity trade-offs. SAA-C03 expects you to pick correctly between Peering, Transit Gateway, Site-to-Site VPN, and Direct Connect (often with VPN as backup).

Key concepts
  • VPC Peering: 1:1 connection between two VPCs (same or different account / region). Non-transitive — A↔B + B↔C does NOT yield A↔C. Manual route table entries on BOTH sides required (#1 SAA "why doesn't this work" trap).
  • Transit Gateway (TGW): hub-and-spoke router for VPCs + VPN + Direct Connect Gateway. Replaces N×(N-1)/2 peerings with one TGW per region. TGW route tables enable selective transitivity (e.g., dev VPCs talk to shared services only, not to each other).
  • Site-to-Site VPN: two IPsec tunnels over the public internet between your on-prem customer gateway and an AWS Virtual Private Gateway (or TGW). Fast to provision (minutes), variable latency. Always use BOTH tunnels for HA.
  • AWS Direct Connect: dedicated private 1/10/100 Gbps link between your data center and an AWS Direct Connect location. Consistent latency, no internet exposure. 4–12 weeks to provision. Pair with a VPN as encrypted fallback.
  • Direct Connect Gateway: connect ONE DX link to MULTIPLE Regions / VPCs. Combine with Transit Gateway for hub-and-spoke hybrid topology.
  • Decision tree: 2 VPCs simple → Peering. 3+ VPCs / hybrid / multi-region mesh → Transit Gateway. Internet-OK hybrid → Site-to-Site VPN. Dedicated bandwidth + low latency hybrid → Direct Connect + VPN backup.
Concrete example

A bank with 25 VPCs across 3 accounts needs full-mesh connectivity + secure on-prem reach to their core mainframe. Choice: Transit Gateway in each Region with all 25 VPCs attached + 2 TGW route tables (workload + shared-services). Hybrid: 2× 10 Gbps Direct Connect from two locations (provider diversity) to a Direct Connect Gateway attached to TGW. Site-to-Site VPN as IPsec-encrypted fallback over internet if both DX go down. Result: full transit, predictable latency, encrypted fallback.

Key takeaway: TGW for 3+ VPCs and hybrid hub-and-spoke. Peering only for 2 VPCs. VPN for fast/cheap hybrid. Direct Connect for predictable bandwidth — always pair with VPN backup.
⚡ Mini-quiz
Drill multi-VPC + hybrid scenarios → study mode (10 questions).

Test your knowledge on Modules 1–4 before moving to databases and security.

05

Databases — RDS, Aurora, DynamoDB & ElastiCache3 lessons

The database decision tree is one of the richest exam areas. Master RDS Multi-AZ vs Read Replicas (HA vs scale), RDS Proxy for serverless connection storms, Aurora's cluster volume architecture and Global Database, DynamoDB's partition-key design with GSI / LSI / DAX / Streams, and the Redis-vs-Memcached split for ElastiCache.

rds-multi-az read-replicas rds-proxy aurora-serverless-v2 aurora-global dynamodb-gsi-lsi dax elasticache-redis
~6h
📖 Read in-depth chapter
Lesson 5.1 RDS — Multi-AZ, Read Replicas, RDS Proxy

RDS questions hinge on three primitives: Multi-AZ for HA (same endpoint), Read Replicas for read scale (different endpoint), and RDS Proxy for connection pooling. The exam tests the difference relentlessly because Multi-AZ and Read Replicas look similar but solve different problems.

Key concepts
  • RDS Multi-AZ: synchronous standby replica in another AZ. Automatic failover via DNS flip in 60–120 seconds. SAME endpoint — no application changes needed. Standby is NOT readable; purely HA. Available for MySQL/Postgres/Oracle/SQL Server/MariaDB.
  • RDS Read Replicas: asynchronous copies (same-region or cross-region). DIFFERENT endpoint — app must route read traffic explicitly. Up to 15 replicas per source. Can be promoted to standalone (planned cross-region migration trick).
  • Multi-AZ DB Cluster (Postgres/MySQL): newer mode with 1 writer + 2 readable standbys across 3 AZs. Combines HA + read scale. Failover < 35 seconds. Preferred over single-standby Multi-AZ for new MySQL/Postgres deployments.
  • RDS Proxy: fully managed connection pool sitting between clients and RDS. Crucial for Lambda + RDS — solves connection-exhaustion storms during cold starts. Supports IAM auth + Secrets Manager rotation. No app code changes.
  • Automated backups + snapshots: backups (1–35 days, point-in-time restore to any second), snapshots (manual, retained until deleted). Cross-region snapshot copy for DR.
  • RDS Storage Auto Scaling: auto-grows storage when free space drops below a threshold. Set a max limit. Prevents the 2 AM "disk full" page.
Concrete example

A Java app + Lambda batch jobs hit a single RDS Postgres instance. Symptoms: weekly fail-over outages (1 AZ goes red), plus Lambda runs occasionally fail with "too many connections". Fix: switch to Multi-AZ DB Cluster (1 writer + 2 readable standbys, < 35s failover) — kills the HA outages. Add RDS Proxy in front and point Lambda's connection string at the Proxy — pools connections, no Java app changes. Cross-region read replica in eu-west-1 for DR (RPO seconds, RTO minutes on manual promote).

Key takeaway: Multi-AZ = same endpoint HA. Read Replicas = different endpoint scale. Multi-AZ DB Cluster = both. RDS Proxy = Lambda + RDS without connection storms.
⚡ Mini-quiz
Drill RDS HA + replica + proxy scenarios → study mode (10 questions).
Lesson 5.2 Aurora — clusters, Serverless v2, Global Database

Aurora is RDS's premium tier and the SAA-C03 default answer when the question stresses "performance", "auto-scaling storage", or "cross-region DR sub-second RPO". The cluster volume + writer + reader topology is unique and exam-tested.

Key concepts
  • Aurora architecture: AWS-proprietary engine, MySQL- and PostgreSQL-compatible. Storage = a 6-copy quorum across 3 AZs, auto-scaling from 10 GB to 128 TB. Compute = 1 writer + up to 15 readers sharing the same storage volume.
  • Aurora endpoints: cluster endpoint (writer), reader endpoint (load-balanced over readers), custom endpoints (subset of readers, e.g., reporting workload), instance endpoints (direct).
  • Failover behavior: writer failure auto-promotes a reader in ~30 seconds. With no readers, AWS spins one up — 15 minutes RTO instead. Always run at least one reader in prod.
  • Aurora Serverless v2: fine-grained capacity scaling in 0.5 ACU (Aurora Capacity Unit) increments. Scales up and down in seconds — including scale-to-near-zero for dev / staging. Use for variable / unpredictable workloads.
  • Aurora Global Database: primary cluster + up to 5 secondary regions with < 1 second replication lag. Cross-region failover in < 1 minute (managed planned failover). The exam-canonical answer for "RPO < 1 second, RTO < 1 minute, multi-region".
  • Backtrack (Aurora MySQL): rewind the cluster up to 72 hours in seconds — no restore from snapshot needed. Use to recover from "DROP TABLE in prod" before users notice.
Concrete example

A global SaaS needs RPO < 1s and RTO < 5 min across us-east-1 and eu-west-1. Design: Aurora Global Database with the primary cluster in us-east-1 (1 writer + 2 readers) and a secondary cluster in eu-west-1 (1 reader, promotable). Reader endpoint serves EU users low-latency reads from the secondary. On regional failure: trigger Aurora Global Database managed planned failover — promotes eu-west-1 writer in ~1 minute. Route 53 Failover routing flips the app DNS. Dev/staging on Aurora Serverless v2 scaled to 0.5 ACU off-hours.

Key takeaway: Aurora for performance + shared cluster storage. Aurora Serverless v2 for variable workloads / dev. Aurora Global Database for sub-second cross-region DR.
⚡ Mini-quiz
Practise Aurora scenarios → quick quiz (5 questions).
Lesson 5.3 DynamoDB, DAX, and ElastiCache

DynamoDB is the SAA-C03 default whenever the question mentions "single-digit ms", "serverless NoSQL", "billions of requests", or "key-value access at any scale". DAX adds in-memory caching; ElastiCache Redis covers everything-else caching, session state, leaderboards.

Key concepts
  • DynamoDB tables: fully managed serverless NoSQL. Partition key (and optional sort key) determine where items live. Single-digit-ms latency at any scale. Pay per request (On-Demand) or per provisioned capacity (RCU/WCU).
  • Partition-key design: high-cardinality, evenly-distributed keys for hot-partition avoidance. Anti-pattern: status='active' as PK — most items land on one partition. Use user_id, order_id, or composite keys instead.
  • Global Secondary Index (GSI): alternate partition + sort key for querying non-key attributes. Eventually consistent only. Own RCU/WCU. Create or drop anytime.
  • Local Secondary Index (LSI): alternate sort key with same partition key. Strongly consistent reads possible. MUST be created at table creation — can't add later. Shares table RCU/WCU.
  • DAX (DynamoDB Accelerator): in-memory write-through cache in front of DynamoDB. Microsecond reads for cached items. API-compatible — no code changes. Use when read latency must drop below DynamoDB native ~5 ms.
  • DynamoDB Streams + Global Tables: Streams = time-ordered change log for Lambda triggers and audit. Global Tables = multi-active multi-region replication via Streams (sub-second cross-region RPO).
  • ElastiCache Redis vs Memcached: Redis = persistence, replication, Multi-AZ, pub/sub, sorted sets (leaderboards, session state). Memcached = simple multi-threaded shard-and-forget cache, no persistence.
Concrete example

A multiplayer game stores player profiles + leaderboards + match-history at 200k requests/sec. Design: DynamoDB table with PK player_id for profiles (single-digit-ms reads). GSI on guild_id + last_active for "who's online in my guild" queries. DAX in front for sub-ms hot-profile reads. ElastiCache Redis sorted sets for global / regional leaderboards (top-N queries in O(log n)). DynamoDB Streams → Lambda updates Redis leaderboards on every match write. DynamoDB Global Tables across us-east-1 / eu-west-1 / ap-southeast-1 for global active-active.

Key takeaway: DynamoDB for serverless key-value at any scale. GSI for alt-key queries. DAX for sub-ms reads. Redis for sorted sets + sessions. Global Tables for multi-region active-active.
⚡ Mini-quiz
Drill DynamoDB + DAX + Redis scenarios → study mode (10 questions).
06

Security — IAM, KMS, WAF & Compliance Services3 lessons

Security is the largest SAA-C03 domain at 30%. Master IAM best practices (roles over users, permission boundaries, condition keys), KMS envelope encryption and rotation, the Secrets-Manager-vs-Parameter-Store split, and the detection / response services (WAF, Shield, GuardDuty, Inspector, Macie, Config). Every architecture answer is scored against "is this secure-by-default?".

iam-roles permission-boundaries kms-envelope secrets-manager parameter-store waf-shield guardduty-inspector macie-config
~5h
📖 Read in-depth chapter
Lesson 6.1 IAM best practices — roles, conditions, permission boundaries

IAM is the spine of AWS security. The exam consistently rewards "IAM role" over "IAM user with access keys", and tests condition keys, permission boundaries, and the cross-account assume-role pattern.

Key concepts
  • Never use the root account: enable MFA on root, lock away root access keys, use root only for the <10 actions only root can do (close account, change support plan).
  • IAM roles for compute: EC2 instance profile, Lambda execution role, ECS task role, EKS IRSA. Provides short-lived auto-rotated credentials via the metadata service. NEVER embed access keys in code or AMIs.
  • Cross-account access: sts:AssumeRole into a role in the target account whose trust policy allows the source account / principal. Replaces creating IAM users in every account.
  • Condition keys: sharpen policies with context. aws:MultiFactorAuthPresent: true (require MFA), aws:SourceIp (corporate IP range), aws:RequestedRegion (region allowlist), aws:PrincipalOrgID (only your AWS Org).
  • Permission Boundaries: cap the MAXIMUM permissions a role can have, even if attached policies grant more. Used when delegating IAM management — devs can self-serve roles within the boundary you set.
  • IAM Access Analyzer: flags resources accessible from outside your trust zone (account / Organization). Catches "S3 bucket policy accidentally allows another account" without manual policy review.
Concrete example

A dev platform team needs to let app teams self-serve IAM roles for their Lambda functions, but app teams must not be able to escalate to admin or attach customer-data S3 policies. Design: dev-platform team creates a Permission Boundary policy capping max permissions to "Lambda + CloudWatch + own-team's S3 prefix + own-team's DynamoDB tables". App teams' delegated role can create new roles ONLY if the new role has this boundary attached. IAM Access Analyzer catches any cross-account policy slip on the customer-data buckets. Root MFA-protected; humans use IAM Identity Center (SSO) with sts:AssumeRole into per-environment roles.

Key takeaway: roles for compute, AssumeRole for cross-account, condition keys to sharpen, Permission Boundaries to delegate safely, Access Analyzer to catch leaks.
⚡ Mini-quiz
Drill IAM design scenarios → study mode (10 questions).
Lesson 6.2 KMS, envelope encryption, Secrets Manager vs Parameter Store

Encryption keys (KMS) and secrets storage (Secrets Manager vs SSM Parameter Store) are perennial SAA-C03 topics. Pick the wrong tier and you ship plaintext credentials, miss rotation, or pay 10× for a use case the cheaper service covers.

Key concepts
  • KMS key types: AWS-managed (free, AWS controls), Customer-Managed CMK (you control rotation, key policy, audit), AWS-owned (in AWS-internal account, no audit). Use CMK when key custody / audit / cross-account sharing matters.
  • KMS key rotation: automatic annual rotation for symmetric CMKs — free, transparent, generates new backing material while preserving old material for decrypting historical ciphertext. Asymmetric / HMAC keys: manual rotation only.
  • Envelope encryption: KMS encrypts a Data Encryption Key (DEK), the DEK encrypts your actual data. App makes one KMS call to fetch a DEK, encrypts megabytes locally, stores DEK + ciphertext together. Pattern used by every AWS-native encryption (SSE-KMS, EBS encryption, etc.).
  • KMS key policy + grants: key policy is REQUIRED (no IAM-only access). Grants = short-lived programmatic permissions used by AWS services (e.g., EBS at volume create time). All KMS use is CloudTrail-logged.
  • AWS Secrets Manager: stores secrets, native rotation for RDS / Redshift / DocumentDB / custom Lambda rotators. ~$0.40/secret/month + API calls. Use when you need rotation.
  • SSM Parameter Store: stores config + secrets. SecureString type uses KMS. Standard tier FREE up to 4 KB. Advanced tier ~$0.05/parameter/month. Use when rotation isn't required.
Concrete example

A fintech needs (1) database credentials rotated every 30 days, (2) public API keys for third-party services stored cheaply, (3) all S3 data encrypted with key custody for audit. Design: Secrets Manager for the RDS Postgres credentials with native Lambda rotation every 30 days (~$5/month). SSM Parameter Store SecureString for third-party API keys (FREE in standard tier). SSE-KMS on every S3 bucket using a Customer-Managed KMS Key with annual rotation ON; key policy restricted to specific roles + region — CloudTrail captures every kms:Decrypt for the auditors.

Key takeaway: CMK when custody matters. Envelope encryption is how every AWS-native crypto works. Secrets Manager for rotation. Parameter Store for free secrets without rotation.
⚡ Mini-quiz
Practise KMS / Secrets scenarios → quick quiz (5 questions).
Lesson 6.3 WAF, Shield, GuardDuty, Inspector, Macie, Config

Six security services, each with a clear job. SAA-C03 expects you to map a threat or compliance need to the correct service — Macie for PII discovery, GuardDuty for runtime threats, Inspector for CVEs, Config for drift, WAF/Shield for L7/L3-4 attack defense.

Key concepts
  • AWS WAF: Layer-7 HTTP filtering on CloudFront / ALB / API Gateway / AppSync / App Runner. Managed rule groups (Core Rule Set, Known Bad Inputs, SQL DB), custom rules, rate-based rules for bot defense. Pay per rule + per million requests.
  • AWS Shield Standard / Advanced: Standard is free, on by default, covers L3/L4 DDoS (SYN floods, UDP reflection). Advanced is paid ($3 000/month/org) — adds L7 protection, 24/7 SRT response team, cost protection against attack-driven scaling.
  • Amazon GuardDuty: ML + threat-intel detection from VPC Flow Logs, DNS logs, CloudTrail, EKS audit logs, S3 data events. Findings: crypto mining, exfiltration, port scans, compromised IAM keys. Enable per region.
  • Amazon Inspector v2: CVE scanning for EC2, ECR container images, and Lambda functions. Continuous re-scan on package change. Integrates with Security Hub.
  • Amazon Macie: ML-driven PII / PHI / financial-data discovery in S3 buckets. Sample-based default, full scan on demand. Outputs to Security Hub. Pay per GB scanned.
  • AWS Config: continuous resource-state recording + rules for compliance evaluation. Managed rules (e.g., s3-bucket-server-side-encryption-enabled) or custom Lambda. Non-compliance can auto-remediate via SSM Automation.
  • AWS Security Hub: aggregation layer — collects findings from GuardDuty, Inspector, Macie, Config, IAM Access Analyzer, partners. Runs compliance standards (AWS Foundational Security Best Practices, CIS, PCI DSS) as continuous checks.
Concrete example

A SaaS hosting customer health records on AWS: design defense-in-depth. WAF on CloudFront with AWS Core Rule Set + rate-based rule (2 000 req/5min per IP) + geo-block non-US. Shield Advanced on the production zone for L7 DDoS + cost protection. GuardDuty + Inspector v2 + Macie enabled in every region. Macie scans the customer-data buckets weekly for misclassified PII. AWS Config managed rule s3-bucket-public-read-prohibited with SSM Automation auto-remediation. Security Hub as the single dashboard; PCI-DSS + AWS FSBP standards enabled; critical findings → EventBridge → PagerDuty.

Key takeaway: WAF for L7, Shield for L3/L4 DDoS, GuardDuty for runtime threats, Inspector for CVEs, Macie for data sensitivity, Config for drift, Security Hub to aggregate.
⚡ Mini-quiz
Drill detection-service mapping scenarios → study mode (10 questions).
07

Serverless & Messaging — Lambda, SQS, SNS, API Gateway, Step Functions3 lessons

Whenever SAA-C03 says "decoupling", "fan-out", or "event-driven", it points at SQS, SNS, EventBridge, and Lambda. Master Lambda limits and VPC behavior, SQS standard vs FIFO with visibility timeout + DLQ, SNS fan-out to multiple subscribers, API Gateway types and timeout, and Step Functions for explicit workflow orchestration.

lambda-limits provisioned-concurrency sqs-fifo-dlq visibility-timeout sns-fan-out api-gateway step-functions eventbridge
~5h
📖 Read in-depth chapter
Lesson 7.1 Lambda — limits, VPC, Lambda@Edge, Provisioned Concurrency

Lambda powers SAA-C03's serverless answers. The exam tests hard limits (15-min execution, 10 GB memory, 250 MB zip), VPC attachment behavior, edge variants, and the Provisioned Concurrency cold-start fix.

Key concepts
  • Hard limits: 15-minute max execution, 10 GB max memory, 6 vCPUs (scales with memory), 250 MB unzipped deployment (or 10 GB via container image), 512 MB /tmp by default (up to 10 GB configurable).
  • Concurrency model: initial burst of 500-3000 concurrent executions per region (varies), then +500/minute. Reserved Concurrency caps a function's concurrency. Account-level concurrency limit defaults to 1000 per region.
  • Lambda in VPC: Lambda creates Hyperplane ENIs in your subnets. Needed to reach RDS / ElastiCache / private resources. Cold start adds 1-2 seconds on first ENI provisioning (now once per function version, not per invocation).
  • Lambda@Edge: Node.js / Python at CloudFront edge locations. Triggers on viewer request / response or origin request / response. Max 30s execution. Use for header rewrites, A/B routing, dynamic origin selection.
  • CloudFront Functions: simpler / cheaper alternative to Lambda@Edge. JavaScript only, < 1 ms execution, ~$0.10/million. Pick for header manipulation; pick Lambda@Edge for richer logic.
  • Provisioned Concurrency: pre-warmed Lambda instances kept ready — eliminates cold starts for latency-sensitive APIs. Pay per concurrency-hour even when idle. Pair with Application Auto Scaling for scheduled scaling.
Concrete example

A serverless API on Lambda + API Gateway has p99 cold-start latency of 800 ms — unacceptable for the trading dashboard. Function lives in a VPC to reach an RDS Postgres. Fix: enable Provisioned Concurrency = 20 on the production alias of the function. Application Auto Scaling ramps Provisioned Concurrency to 50 between 08:00–17:00 ET on weekdays via scheduled scaling. Move the RDS interaction to RDS Proxy to eliminate connection-storm cold-paths. Result: p99 drops to < 50 ms.

Key takeaway: know the hard limits cold. VPC attachment adds ENI cost. Provisioned Concurrency for cold-start latency. CloudFront Functions for cheap edge logic; Lambda@Edge for richer.
⚡ Mini-quiz
Drill Lambda limits + VPC scenarios → study mode (10 questions).
Lesson 7.2 SQS — standard vs FIFO, visibility timeout, DLQ

SQS is the default "decoupling" answer. The exam tests the standard-vs-FIFO trade-off, visibility-timeout sizing for retries, and Dead Letter Queues as poison-pill defense.

Key concepts
  • Standard queue: unlimited throughput, at-least-once delivery, best-effort ordering. The default for most decoupling cases. Duplicates possible — consumers must be idempotent.
  • FIFO queue: exactly-once processing, strict ordering within a message group. 300 TPS without batching, 3000 TPS with batching. Use for financial transactions, order processing, inventory adjustments. Suffix queue name with .fifo.
  • Visibility timeout: period a message is hidden from other consumers after a Receive. Default 30s, max 12h. If consumer crashes before deleting, message reappears after timeout. Set ≥ 95th percentile of processing time.
  • Dead Letter Queue (DLQ): queue that receives messages exceeding maxReceiveCount. Prevents poison messages from blocking the main queue. Alarm on DLQ depth — every message there is a bug.
  • Long polling: Receive call waits up to 20s for a message to arrive (vs short polling = return immediately). Reduces empty receives and cost. Set WaitTimeSeconds=20 for production consumers.
  • Message retention: 1 minute to 14 days. Default 4 days. Messages older than retention are dropped — a silent data-loss trap on slow consumers.
Concrete example

An order-processing app handles 500 orders/sec. Requirement: orders within the same customer ID must process in strict order; consumers may crash. Design: SQS FIFO queue using customer_id as the MessageGroupId — orders for different customers run in parallel, same-customer orders stay strict. Visibility timeout = 60s (95th percentile processing). maxReceiveCount = 3 → after 3 fails, message goes to DLQ. CloudWatch alarm on DLQ depth > 0. Consumer Lambda uses long polling with WaitTimeSeconds=20.

Key takeaway: Standard for high-throughput at-least-once. FIFO when order matters. Visibility timeout = 95p processing time. DLQ + alarm catches bugs.
⚡ Mini-quiz
Practise SQS scenarios → quick quiz (5 questions).
Lesson 7.3 SNS fan-out, API Gateway, Step Functions

Three more pillars of event-driven AWS: SNS for pub/sub fan-out, API Gateway for synchronous front doors, Step Functions for explicit multi-step workflow orchestration.

Key concepts
  • SNS topic: publish-subscribe. One publish, fan-out to many subscribers — SQS, Lambda, HTTP, email, SMS, Kinesis Firehose. No message retention; if a subscriber is unavailable, message is lost (use SQS subscriber for durability).
  • SNS fan-out pattern: Producer → SNS Topic → Multiple SQS Queues (one per consumer service). Each consumer service processes from its OWN queue with independent throughput, retries, and DLQ. The SAA-C03 default "decoupled multi-consumer" answer.
  • SNS FIFO topic: ordered delivery to SQS FIFO subscribers only. For ordering-preserving fan-out.
  • API Gateway types: REST API (v1) = full feature set, transformations, usage plans, API keys. HTTP API (v2) = lower latency, ~70% cheaper, native JWT auth, best for Lambda proxy. WebSocket API = bidirectional persistent connections for chat, games, live feeds.
  • API Gateway max timeout: 29 seconds. Never use API Gateway for long-running jobs. Pattern: synchronous request submits to SQS, async worker processes, client polls a status endpoint or receives a webhook.
  • AWS Step Functions: orchestrate multi-step workflows with explicit state, error handling, retries, parallel branches, wait states. Standard (long-running, expensive per transition) vs Express (high-volume short workflows, cheap). Use when "Lambda + SQS + retry logic" becomes unmanageable.
  • Amazon EventBridge: event bus with rules routing AWS service events / custom events / SaaS partner events to targets (Lambda, Step Functions, SNS, SQS, ECS, etc.). EventBridge Scheduler for cron-style schedules.
Concrete example

An e-commerce platform processes new orders with: (a) update inventory, (b) email customer, (c) push to fulfillment, (d) update analytics warehouse — each independent and may retry on failure. Design: SNS topic order-placed. Four SQS queues subscribed to the topic, one per consumer service. Each consumer Lambda reads its own queue with its own visibility timeout + DLQ. The 4-step order-lifecycle workflow (validate → charge → reserve → confirm) runs in Step Functions Express — explicit state, retries with exponential backoff, parallel branch for "send notification + write analytics". Front door: HTTP API Gateway with JWT auth.

Key takeaway: SNS fan-out → SQS per consumer. API Gateway max 29s — use SQS for long jobs. Step Functions when retry / branch logic outgrows hand-rolled Lambda. EventBridge for cross-service event routing.
⚡ Mini-quiz
Drill SNS / API Gateway / Step Functions scenarios → study mode (10 questions).
08

Cost Optimization, DR Strategies & Exam Prep3 lessons

Cost optimization is 20% of SAA-C03 but threads through every other domain — every architecture question has a "most cost-effective" variant. Pair that with the four canonical DR strategies (backup/restore, pilot light, warm standby, multi-site active-active) and their RTO/RPO mapping, plus a final exam-prep checklist. This module is where the pieces from Modules 1–7 turn into exam-ready answers.

cost-pillars compute-optimizer cost-explorer dr-strategies rto-rpo-mapping analytics-migration exam-strategy final-checklist
~4h
📖 Read in-depth chapter
Lesson 8.1 Cost optimization pillars + Compute Optimizer + Cost Explorer

Cost optimization isn't a single service — it's a set of pillars that map to specific AWS tools. SAA-C03 expects you to know which tool surfaces which signal and which lever moves the bill.

Key concepts
  • Right-sizing: AWS Compute Optimizer = ML-driven right-sizing recommendations for EC2 / EBS / Lambda / ECS on Fargate, based on observed utilization. Free. Run quarterly, act on its findings.
  • Pricing model matching: On-Demand for unpredictable peaks, Reserved / Savings Plans for steady baseline, Spot for fault-tolerant batch. Mix-and-match within one ASG via mixed instances policy.
  • Storage tiering: S3 lifecycle policies, EBS gp3 (cheaper than gp2 at every size), EFS Intelligent-Tiering. Always abort incomplete multipart uploads at 7 days.
  • Data transfer: the hidden cost killer. In-AZ free, cross-AZ ~$0.01/GB, cross-region ~$0.02/GB, internet egress $0.05-0.09/GB, NAT Gateway ~$0.045/GB on top of per-hour. Use Gateway VPC Endpoints for S3 / DynamoDB to nuke NAT data-processing fees.
  • Cost Explorer: cube-style analytics on cost + usage. Pivot by service, tag, account, time. Save and schedule reports. The default "find runaway services" tool.
  • Cost Anomaly Detection + Budgets: ML alerts on unusual spend; per-tag / per-service budgets with SNS notifications at 80% / 100% thresholds. Wire to PagerDuty for fast detection.
  • AWS Trusted Advisor: rules-based recommendations across cost / security / performance / fault tolerance / service limits. Surface idle ELBs, unassociated EIPs ($0.005/hr each!), underutilised EBS, idle RDS.
Concrete example

A platform team audits $80k/month spend. Compute Optimizer finds 35% of EC2 instances over-provisioned — switch to recommended types. Trusted Advisor flags 22 unassociated Elastic IPs ($80/month wasted). Cost Explorer reveals NAT Gateway data-processing at $6 200/month from a chatty cross-AZ Lambda. Fix: add Gateway VPC Endpoints for S3 + DynamoDB → NAT bill drops 80%. Set up a Cost Anomaly Detection monitor + Budget alarms per tag (per-team chargeback). Total savings: ~$24k/month.

Key takeaway: Compute Optimizer for right-sizing, Cost Explorer for trends, Trusted Advisor for waste, Anomaly Detection + Budgets for alerts, Gateway VPC Endpoints to kill NAT data fees.
⚡ Mini-quiz
Drill cost-optimization scenarios → study mode (10 questions).
Lesson 8.2 DR strategies — backup/restore, pilot light, warm standby, multi-site

The four DR strategies are a cost-versus-RTO/RPO ladder. SAA-C03 hands you specific RTO/RPO numbers and expects you to pick the matching strategy AND the AWS primitives implementing it.

Key concepts
  • Backup & Restore: cheapest. RTO hours, RPO hours. Daily snapshots of EBS / RDS / DynamoDB via AWS Backup; cross-region copy. Restore creates fresh infra in DR region. For non-critical workloads.
  • Pilot Light: core data services (RDS replica, DynamoDB Global Table, S3 CRR) always running in DR region. Compute is dormant — provisioned just-in-time on failover. RTO 10s of minutes, RPO seconds-minutes. Cheap.
  • Warm Standby: scaled-down full stack always running in DR region. Auto Scaling ramps up on failover. RTO minutes, RPO seconds. Higher cost than pilot light.
  • Multi-site Active-Active: full production capacity in both regions. Route 53 splits traffic; either region absorbs full load on the other's failure. RTO seconds-to-zero, RPO near-zero. Most expensive.
  • RTO / RPO → strategy mapping: RPO 24h OK → daily backups. RPO < 5 min → continuous replication (Aurora Global / DynamoDB Global / S3 CRR). RTO 4h OK → pilot light. RTO 30 min → warm standby. RTO < 5 min → active-active.
  • Cross-region failover plumbing: Route 53 Failover routing + health checks, AWS Backup cross-region copy, S3 Cross-Region Replication (CRR), Aurora Global Database, DynamoDB Global Tables. Test the runbook quarterly — paper DR plans don't work.
Concrete example

A regulated workload requires RTO 30 min / RPO 5 min. Choose warm standby. Primary in us-east-1, DR in us-west-2 with 25% scaled-down capacity always running. Data layer: Aurora Global Database (sub-second cross-region replication). S3 CRR on the user-uploads bucket (lag < 15s typical). AWS Backup daily cross-region copies of EFS + EBS, retained 35 days with vault lock for compliance. Route 53 Failover routing + health checks flip DNS in ~30 seconds on primary failure; Auto Scaling at DR ramps from 25% to 100% in ~5 min.

Key takeaway: match strategy to RTO/RPO numbers. Aurora Global / DynamoDB Global = sub-second RPO. Backup/restore for non-critical, active-active for tier-0. Always test the plan.
⚡ Mini-quiz
Practise DR strategy scenarios → quick quiz (5 questions).
Lesson 8.3 Analytics, migration, and final exam strategy

The last 5–8 questions on most SAA-C03 attempts cover analytics services (Athena, Redshift, Kinesis, EMR) and migration tools (DMS, SCT, Application Migration Service). The lesson closes with a final timing + flagging strategy for exam day.

Key concepts
  • Amazon Athena: serverless SQL on S3 data. Pay per TB scanned. Use Parquet / ORC + partitioning to slash scan size (and cost). The default "ad-hoc query against S3" answer.
  • Amazon Redshift: petabyte-scale columnar data warehouse. For complex OLAP on TB-PB datasets. Redshift Spectrum = query S3 directly from Redshift without loading. Redshift Serverless = pay-per-use mode.
  • Amazon Kinesis: Data Streams (real-time ingestion, shards, ms latency), Data Firehose (managed delivery to S3 / Redshift / OpenSearch with optional Lambda transform), Data Analytics (SQL / Flink on streams).
  • AWS DMS + Schema Conversion Tool: Database Migration Service for online migration with CDC (Change Data Capture) for minimal downtime. SCT converts schema between heterogeneous engines (Oracle → Aurora Postgres).
  • AWS Application Migration Service (MGN): lift-and-shift servers from on-prem / other clouds to AWS with continuous block-level replication. Cutover with minutes of downtime.
  • Final timing strategy: 130 min / 65 questions = ~2 min/question. First pass: answer obvious questions, flag the 10-15 hard ones, never spend > 3 min on one. Second pass: revisit flagged. Final pass: review flagged-and-uncertain with fresh eyes. Never leave a question blank — no negative marking.
  • Final checklist: S3 storage classes cold; Multi-AZ vs Read Replica; Gateway vs Interface endpoint; SQS DLQ + visibility timeout; SNS fan-out; Spot vs RI vs Compute SP; the four DR strategies; IAM role > user; KMS envelope; Aurora Global Database; presigned URLs; Block Public Access; WAF rate-based rules; Lambda + RDS Proxy.
Concrete example

A retailer migrates an on-prem Oracle DW (8 TB) to AWS for cost + scalability. Plan: AWS DMS with Schema Conversion Tool to migrate Oracle DDL to Aurora PostgreSQL; DMS CDC keeps source + target in sync during cutover. New analytics layer: data lake on S3 in Parquet partitioned by date + region. Ad-hoc analyst queries via Athena. Daily complex OLAP on Redshift Serverless; real-time clickstream via Kinesis Data StreamsFirehose → S3 + OpenSearch. On exam day apply the timing strategy: 90 min on first-pass, 30 min on flagged, 10 min final review.

Key takeaway: Athena = ad-hoc S3 SQL. Redshift = warehouse OLAP. Kinesis = real-time streaming. DMS+SCT = heterogeneous DB migration. On exam day: flag-and-move, never blank.
⚡ Mini-quiz
Drill analytics + migration + exam-strategy scenarios → study mode (10 questions).

Capstone labs

Four labs that exercise the modules end-to-end. Run each in a free-tier AWS account with a $1 budget alarm; tear everything down when finished. These are the same flows that recur on SAA-C03 scenario questions — building them once burns the patterns into memory.

Lab 1 — 3-tier highly-available web app

Build a VPC with public + private + DB subnets across 2 AZs, one NAT Gateway per AZ. Deploy an ALB in the public subnets, an Auto Scaling group of t3.micro web servers in the private subnets (target tracking CPU = 50%), and an RDS MySQL Multi-AZ in the DB subnets. Hit the ALB and watch ASG scale up when you generate load.

Lab 2 — Hybrid networking with Transit Gateway

Create 3 VPCs (shared-services, prod, dev) and connect them via Transit Gateway with two TGW route tables (prod and dev each reach shared-services; prod and dev do NOT reach each other). Simulate on-prem with a 4th VPC connected via Site-to-Site VPN over public IP. Verify route isolation with VPC Flow Logs.

Lab 3 — Serverless API with DynamoDB + DAX

Build an HTTP API Gateway → Lambda → DynamoDB stack for a simple product catalog. Add a Global Secondary Index on category. Add DAX in front of the table and measure read latency before vs after. Add a DynamoDB Stream → Lambda that pushes change events to an SNS topic with two SQS subscribers (analytics + audit).

Lab 4 — Security + DR for a customer-data bucket

Create an S3 bucket with Block Public Access ON, SSE-KMS using a Customer-Managed Key with annual rotation, versioning + Object Lock Governance mode. Enable S3 Cross-Region Replication to a DR region. Wire AWS Config managed rule s3-bucket-server-side-encryption-enabled with SSM Automation auto-remediation. Drop an unencrypted file via CLI and watch Config remediate.

Top 4 mistakes candidates make on SAA-C03

  • Ignoring constraint keywords: two answers will both work; the right one matches the keyword ("MOST cost-effective", "LEAST operational overhead"). Identify the keyword on the first read.
  • Confusing Multi-AZ with Read Replicas: Multi-AZ = HA, same endpoint, standby NOT readable. Read Replicas = scale, different endpoint, app must route reads. They solve different problems.
  • Missing VPC Endpoint cost wins: heavy S3 traffic from private subnets via NAT Gateway is the #1 hidden cost. Gateway VPC Endpoints for S3 and DynamoDB are FREE — the canonical "reduce cost" answer.
  • Skipping DR strategy nuance: the four strategies map to specific RTO/RPO numbers. RPO < 1s = Aurora Global / DynamoDB Global. RTO < 1 min = active-active. Memorize the ladder.

Ready for SAA-C03?

65 scenario-based practice questions across all 4 exam domains. Free, no signup, instant feedback on every answer. Open the Cert Quest path to combine practice questions with mini-game drills.

Complete the AWS path

SAA-C03 pairs naturally with SOA-C02 and DVA-C02 to cover all three AWS Associate specializations, and is the prerequisite mindset for the SAP-C02 Professional tier.

Start practicing →