Amazon AWS · cloud

AWS SysOps Administrator Associate SOA-C02

Master AWS cloud operations for the SOA-C02 exam: CloudWatch monitoring, Auto Scaling lifecycle hooks, disaster recovery strategies (pilot light, warm standby, active-active), Systems Manager Patch Manager, CloudFormation StackSets, GuardDuty, Security Hub, VPC networking, Transit Gateway, and cost optimization with Reserved Instances and Compute Optimizer.

7Modules
35 hoursDuration
intermediateLevel
SOA-C02Exam code
65Exam questions
720 / 1000Passing score
180 minExam duration
$300Exam fee (USD)
3 yearsValidity
Study on the go — CertQuests Podcast

Reinforce CloudWatch alarming, Systems Manager patching, and DR strategies while commuting or working out. New episodes covering SOA-C02 operations topics drop weekly.

▶ Listen on Spotify

Why earn the AWS SysOps Administrator?

SOA-C02 is the AWS certification for cloud operators — professionals responsible for deploying, managing, and monitoring production AWS workloads. It tests real-world operational skills, not just architectural knowledge.

  • Validates hands-on operational skills: CloudWatch, SSM, Config, GuardDuty, VPC troubleshooting
  • Proves ability to implement high-availability architectures: Multi-AZ, Auto Scaling, Route 53 failover
  • Demonstrates automation expertise: CloudFormation, Systems Manager, EC2 Image Builder
  • Opens cloud operations, DevOps, and SRE roles — median salary $120–$150k for AWS-certified ops engineers
  • Completes the AWS Associate trilogy alongside SAA-C03 and DVA-C02
  • Unique feature: SOA-C02 includes an optional exam lab section testing hands-on AWS console skills
Exam strategy: SOA-C02 is the most operational AWS Associate exam. Many questions ask "what is the MOST cost-effective" or "LEAST operational overhead" solution. AWS always prefers managed services over custom code, native monitoring over third-party tools, and preventive security controls over reactive ones. When in doubt, choose the fully-managed, serverless AWS service.

SOA-C02 exam domains

Six domains spanning the full operations lifecycle. Monitoring and Networking are the heaviest domains — make CloudWatch and VPC your strongest areas.

Domain 1 — Monitoring, Logging, and Remediation 20%
Domain 2 — Reliability and Business Continuity 16%
Domain 3 — Deployment, Provisioning, and Automation 18%
Domain 4 — Security and Compliance 16%
Domain 5 — Networking and Content Delivery 18%
Domain 6 — Cost and Performance Optimization 12%

7 modules · ~35 hours

Each module maps to one or more exam domains. Work through them in order or focus on your weak areas using the practice test to guide you.

01

Monitoring, Logging & Remediation3 lessons

The heaviest domain at 20%. Master CloudWatch alarms (standard, composite, M-of-N evaluation), CloudWatch Logs metric filters and Insights queries, AWS Config managed rules and auto-remediation, EventBridge rules for event-driven operations, CloudTrail for audit and integrity validation, Systems Manager OpsCenter for operational incident management, and VPC Flow Logs for network analysis. Understand when to use CloudWatch vs Config vs CloudTrail vs GuardDuty for different monitoring scenarios.

cloudwatch-alarms m-of-n-evaluation logs-insights metric-filters aws-config eventbridge cloudtrail vpc-flow-logs
~6h
📖 Read in-depth chapter
Lesson 1.1 CloudWatch alarms — standard, composite, anomaly

CloudWatch alarms are the SysOps engineer's primary signal. The exam tests both alarm mechanics (M-of-N evaluation, missing-data handling) and composition (combining multiple alarms into one composite signal).

Key concepts
  • Standard alarms: threshold on a metric over an evaluation period. Three states: OK, ALARM, INSUFFICIENT_DATA. Actions on state transitions — typically SNS notify or Auto Scaling action.
  • M-of-N evaluation: "alarm when 3 out of the last 5 data points exceed threshold". Reduces flapping vs simple "any breach fires". The exam-canonical setting for production CPU/memory alarms.
  • Missing data treatment: Missing / NotBreaching / Breaching / Ignore. Default Missing is fine for most cases; NotBreaching when no data legitimately means OK; Breaching for tight SLAs that need to alert on metric-pipeline failures.
  • Composite alarms: combine multiple alarms with AND/OR/NOT logic. Use for "alarm when CPU is high AND memory is high" (single noise-free incident) or "alarm when 5xx errors AND health check failing" (correlated signal).
  • Anomaly detection alarms: ML-trained baseline per metric. Alarm fires when current data deviates from the predicted band. Useful for traffic / latency where static thresholds don't fit.
  • Metric math: compose expressions over multiple metrics — e.g., (m1 / m2) * 100 for error rate percentage. Alarms can evaluate metric-math expressions directly.
Concrete example

A web tier needs to alarm on "error rate > 1% sustained for 5 minutes" without false positives during deploy noise. Standard alarm on a single metric won't work — need metric math m1 = sum(5xx), m2 = sum(requests), expression = m1/m2*100 with M-of-N: alarm if > 1% on 3 of 5 data points. Missing data = NotBreaching (low-traffic periods don't fire). Action: SNS topic notifying the on-call team via PagerDuty.

Key takeaway: M-of-N evaluation kills flap. Missing-data treatment matters more than candidates expect. Composite alarms for correlated signals; metric math for computed thresholds.
⚡ Mini-quiz
Drill alarm-design scenarios → study mode (10 questions).
Lesson 1.2 Logs, Logs Insights, and metric filters

CloudWatch Logs ingests text from EC2, Lambda, container, and on-prem sources. Metric filters convert log patterns into CloudWatch metrics; Logs Insights queries log data ad-hoc. SOA-C02 asks both.

Key concepts
  • Log groups + log streams: a log group is a namespace with retention + permissions; a stream is a single source (one EC2 instance, one Lambda invocation). Retention: 1 day to forever, per group.
  • Metric filters: pattern on log events that increments a CloudWatch metric. Pattern syntax supports JSON ("$.user = 'alice'"), regex, or simple terms. Common pattern: count "ERROR" log lines → alarm on rate.
  • Subscription filters: stream log events to Kinesis Data Streams, Lambda, or OpenSearch. Use for cross-account log aggregation or for ML/SIEM ingestion.
  • Logs Insights: ad-hoc KQL-like query language. fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 100. Aggregations: stats count() by bin(5m).
  • Pricing: per-GB ingested + per-GB stored per month. Drop noisy logs at source (CloudWatch Agent filter) or sample with subscription filter → Kinesis Firehose with transformation.
  • Live Tail: real-time tail of one or more log groups for quick debugging. Limited to 1 hour per session.
Concrete example

An application logs to CloudWatch Logs. Need: (1) alarm when "OutOfMemoryError" appears more than 5 times in 5 minutes. Solution: metric filter on the log group pattern OutOfMemoryError incrementing a custom metric app/OOMErrors. Alarm on the metric with sum > 5 over 5-minute period, M-of-N 1-of-1. (2) Ad-hoc investigation: Logs Insights query filter @message like /OutOfMemoryError/ | stats count() by bin(5m) to see the time distribution.

Key takeaway: metric filters for "log pattern → alarm". Logs Insights for ad-hoc investigation. Subscription filters for streaming out to other systems. Set retention deliberately — forever logging is expensive.
⚡ Mini-quiz
Practise Logs + metric filters scenarios → quick quiz (5 questions).
Lesson 1.3 Config, EventBridge, CloudTrail — the audit triad

CloudTrail records WHO did WHAT WHEN. Config records WHAT THE STATE WAS at a point in time. EventBridge reacts to events as they happen. Together they give you control-plane audit + drift detection + automation. The exam contrasts them constantly.

Key concepts
  • CloudTrail: records management-plane API calls (CreateBucket, DeleteVolume) and optionally data-plane events (S3 object reads, Lambda invocations). One trail per account, multi-region default. Integrity validation via signed digest files. Forensic source-of-truth for "who deleted this?".
  • AWS Config: continuously records resource configuration state. Rules evaluate config against desired state (managed rules from AWS, custom rules in Lambda). Non-compliance triggers EventBridge events. Optional auto-remediation via SSM Automation.
  • EventBridge: the event bus. AWS service events (EC2 state changes, CloudTrail events, GuardDuty findings) land on the default bus; custom and partner events on additional buses. Rules match events and route to targets (Lambda, Step Functions, SNS, SQS, ECS task, …).
  • EventBridge Scheduler: cron-style scheduled triggers. Replaces CloudWatch Events scheduled rules with a richer expression language and one-time schedules.
  • OpsCenter (Systems Manager): aggregates operational issues (alarms, findings, custom events) into "OpsItems" workflow. Useful for incident management overlay; less common in greenfield.
  • Use-pattern triad: CloudTrail for forensics, Config for drift/compliance, EventBridge for automation. They overlap but solve different problems — exam scenarios usually have one clear best fit.
Concrete example

Requirement: any S3 bucket created without server-side encryption must be auto-remediated within 5 minutes. Design: AWS Config managed rule s3-bucket-server-side-encryption-enabled. Non-compliance → EventBridge rule on Config compliance change event → SSM Automation document that enables default encryption on the bucket. CloudTrail records the creation event as the auditable source-of-truth for who originally created the misconfigured bucket.

Key takeaway: CloudTrail = who/what/when. Config = what state was. EventBridge = react in real time. Triad covers forensics + drift + automation.
⚡ Mini-quiz
Drill audit-triad scenarios → study mode (10 questions).
02

Reliability & Business Continuity3 lessons

Build systems that survive failures. Covers EC2 Auto Scaling (target tracking, step scaling, lifecycle hooks for zero-downtime deployments), Route 53 routing policies and health checks (especially Failover routing), RDS Multi-AZ failover behavior and replica promotion, AWS Backup for cross-region backups, S3 Cross-Region Replication and versioning, Aurora Global Database, and DR strategies (backup/restore vs pilot light vs warm standby vs active-active) with their respective RTO/RPO tradeoffs.

auto-scaling lifecycle-hooks route53-failover rds-multi-az aws-backup s3-crr rto-rpo warm-standby
~5h
📖 Read in-depth chapter
Lesson 2.1 Auto Scaling — policies, lifecycle hooks, warm pools

Auto Scaling is the AWS native way to match capacity to demand. The exam tests scaling policy choice, instance termination order, and lifecycle hooks for zero-downtime deploys.

Key concepts
  • Scaling policies: Target tracking (set "CPU = 50%", AWS adjusts) — simplest and recommended default. Step scaling (CloudWatch alarm-driven, +N instances per breach severity) — used for non-linear scale needs. Scheduled (calendar trigger). Predictive (ML-based forecast for recurring patterns).
  • Cooldown: the wait between scale actions, prevents flapping. Default 300 seconds. Tune with the application's warm-up time — too short = oscillation, too long = lag.
  • Termination policies: control which instance dies when scaling in. Defaults: OldestLaunchTemplate → AllocationStrategy (Spot) → OldestInstance → Default. Override per-ASG; AZ-balance is always preserved first.
  • Lifecycle hooks: pause instances during launch (Pending:Wait) or termination (Terminating:Wait) so SSM Run Command / Lambda can run pre-attach or pre-detach scripts. Critical for graceful drain on terminate.
  • Warm pools: pre-initialised stopped instances kept ready for near-instant launch. Slashes the "minutes to first request" time when scaling responds to a sudden spike.
  • Instance refresh: rolling replacement of every instance in an ASG with the new launch template version. Configurable min healthy + warm-up time. Used for AMI updates without downtime.
Concrete example

An e-commerce ASG runs 4-20 instances. Spike-response too slow because instances take 5 minutes to boot + warm cache. Fix: enable warm pool with 5 stopped instances at desired state. Add lifecycle hook on Pending:Wait so an SSM Automation runs cache-prewarming before the instance attaches to the LB target group. Target tracking policy CPU=60% with 60s cooldown. Result: spike response drops from 6 min to ~30s.

Key takeaway: target tracking by default. Lifecycle hooks for "do work before instance joins or leaves". Warm pools when boot time matters. Instance refresh for AMI rollouts.
⚡ Mini-quiz
Drill ASG policy + lifecycle scenarios → study mode (10 questions).
Lesson 2.2 Route 53 + RDS Multi-AZ — failover patterns

Route 53 routing policies + health checks plus database failover features (RDS Multi-AZ, Aurora Global Database) are the building blocks of regional and global failover. SOA-C02 tests when each routing policy fires and how RDS failover actually behaves.

Key concepts
  • Route 53 routing policies: Simple, Weighted (% split), Latency (region with lowest latency to user), Failover (primary + secondary with health check), Geolocation (per country/continent), Geoproximity, Multi-value answer, IP-based.
  • Health checks: endpoint, calculated (combine multiple), CloudWatch-alarm. Failover policy requires health checks on both primary and secondary (secondary inverted or alias to a known-healthy resource).
  • RDS Multi-AZ (single-region HA): synchronous standby in another AZ. Failover triggers on instance failure / Multi-AZ → automatic DNS flip in 60-120 seconds. Standby is NOT readable — purely an HA feature.
  • RDS read replicas: async replicas. Same-region (read scale) or cross-region (DR + lower-latency reads). Promote a replica to primary for manual failover or to break replication for a separate DB.
  • Aurora Multi-AZ: different from RDS Multi-AZ — Aurora has one writer + multiple readers (1-15), all in the cluster volume across 3 AZs. Failover promotes a reader (~30 seconds). Aurora is preferred for new builds.
  • Aurora Global Database: cross-region replication with sub-second RPO, < 1 minute promote-time for cross-region failover. Up to 5 secondary regions. Right for global active-active reads.
Concrete example

A global SaaS needs < 5 min RTO for primary-region database failure. Design: Aurora Global Database primary in us-east-1, secondary in eu-west-1 (sub-second RPO). On region failure: manually promote secondary (or use the headless failover for unplanned). Route 53 with Failover routing — primary endpoint pointing at the us-east-1 Aurora writer endpoint with a health check; secondary endpoint pointing at eu-west-1. DNS TTL 60s. Total RTO: ~2 minutes promote + DNS propagation.

Key takeaway: Multi-AZ for in-region HA, read replicas for scale + cross-region DR, Aurora Global for < 5 min cross-region failover. Route 53 Failover routing + health checks completes the DNS side.
⚡ Mini-quiz
Practise Route 53 + RDS failover scenarios → quick quiz (5 questions).
Lesson 2.3 Backups, S3 CRR, and DR strategy ladder

The DR strategy choice is always a cost / RTO / RPO trade-off. AWS Backup centralises the policies; S3 CRR + lifecycle covers object data. SOA-C02 tests the strategy ladder and the AWS-specific primitives that implement each step.

Key concepts
  • DR strategy ladder: Backup & Restore (cheapest, RTO hours), Pilot Light (DB warm, app servers stopped, RTO hours), Warm Standby (scaled-down full stack, RTO minutes), Active/Active multi-region (full prod both sides, RTO < 1 minute, expensive).
  • AWS Backup: central backup management across EC2/EBS/RDS/DynamoDB/S3/EFS/FSx/Storage Gateway. Backup plans = schedule + retention + lifecycle to Glacier. Cross-region copy for DR. Backup vaults with vault lock for immutability.
  • S3 Cross-Region Replication (CRR): async object-level replication to another region. Configurable per prefix or tag. Requires versioning enabled on both source and dest. Use for cross-region DR of S3 data or compliance copies.
  • S3 Same-Region Replication (SRR): same as CRR but within one region — useful for cross-account replication or aggregation.
  • EBS snapshots: incremental block-level backups in S3. Cross-region copy is a separate API. Fast Snapshot Restore (FSR) pre-warms snapshots to eliminate the first-read latency penalty.
  • S3 Versioning + Object Lock: versioning preserves every version (ransomware recovery). Object Lock adds WORM immutability (compliance retention). Both layered for max protection.
Concrete example

A regulated workload requires RTO 30 minutes / RPO 5 minutes cross-region. Design: warm standby pattern. Primary in us-east-1; DR in us-west-2 with scaled-down 25% capacity. Data: Aurora Global Database for DB (Lesson 2.2). S3 CRR on the user-uploads bucket (sub-15s replication lag for most objects). AWS Backup daily backups of EC2 + EFS, cross-region copy to us-west-2, retention 35 days with vault lock for compliance. Route 53 Failover routing flips the DNS in 30 seconds on health-check failure.

Key takeaway: match strategy to RTO/RPO. AWS Backup for centralised cross-service backup. S3 CRR + Aurora Global for hot replication. Always test the DR plan — quarterly minimum.
⚡ Mini-quiz
Drill DR strategy + Backup scenarios → study mode (10 questions).
🎧

Halfway through the reliability module? Reinforce Auto Scaling and DR strategy tradeoffs by listening to the CertQuests podcast — concise audio breakdowns of exactly these scenarios for your commute.

▶ Open Spotify
03

Deployment, Provisioning & Automation3 lessons

Automate everything a SysOps engineer manages. CloudFormation: drift detection (what changed outside the stack?), Change Sets (preview before applying), StackSets for multi-account/region deployments with automatic deployment for new accounts. Systems Manager: Run Command for ad-hoc execution, State Manager for configuration compliance, Patch Manager with baselines and maintenance windows, Automation documents for multi-step runbooks, and Session Manager for bastion-free access. EC2 Image Builder for golden AMI pipelines. Elastic Beanstalk deployment policies (All at Once, Rolling, Rolling with Additional Batch, Immutable).

cloudformation-drift stacksets ssm-run-command patch-manager state-manager session-manager ec2-image-builder beanstalk-immutable
~6h
📖 Read in-depth chapter
Lesson 3.1 CloudFormation — drift detection, Change Sets, StackSets

CloudFormation is the AWS-native IaC substrate. SOA-C02 focuses on operational concerns — detecting manual changes (drift), previewing updates (Change Sets), and deploying at scale (StackSets).

Key concepts
  • Drift detection: compares stack template's expected state to actual deployed state, lists properties that differ. Run on demand from the console / API. Use to find resources someone modified outside the stack.
  • Change Sets: preview an update before executing — added/modified/removed resources, replacement vs in-place. Best practice for production stack updates: always create + review before Execute.
  • StackSets: deploy ONE template to many accounts AND/OR many regions simultaneously. Target by AWS Organizations OU. Auto-deployment applies the stack to new accounts that join the OU.
  • Stack policy: JSON that allows/denies updates on specific resources. Prevents accidental replacement of stateful resources (databases) during a CloudFormation update.
  • DeletionPolicy / UpdateReplacePolicy: per-resource lifecycle attributes. Retain keeps the resource on stack delete; Snapshot creates a final snapshot first. Critical for databases and S3 buckets with data.
  • Stack failure behaviour: rollback on failure (default), or DisableRollback for forensic debugging. Failed-create stacks must be DeleteStack'd before retry.
Concrete example

Operations team adds a manual NSG rule on a CloudFormation-managed VPC to unblock a customer. Six months later, the stack is updated to add a new subnet — the manual rule disappears, causing a production outage. Fix going forward: schedule weekly drift detection on critical stacks; alert via EventBridge to SNS when drift detected. Every prod stack update goes through a Change Set reviewed in the PR. Multi-account governance: a StackSet deploys the org-wide IAM password policy + Config rules to every account in the OU, auto-deployed to new accounts.

Key takeaway: drift detection on schedule, Change Sets before prod updates, StackSets for multi-account, DeletionPolicy on stateful resources. Stack policy as a brake on accidental replacements.
⚡ Mini-quiz
Drill CloudFormation drift / Change Set / StackSets → study mode (10 questions).
Lesson 3.2 Systems Manager — Run Command, State Manager, Patch Manager

Systems Manager (SSM) is the SysOps swiss army knife — run commands, enforce config, patch instances, store secrets, access without SSH. SOA-C02 tests which SSM capability fits which operational need.

Key concepts
  • Run Command: ad-hoc command execution across many managed instances. Target by tag, AZ, resource group. Audit logged. Replaces SSH-and-execute-a-script workflows.
  • Session Manager: browser- or CLI-based shell access to instances WITHOUT SSH/RDP/Bastion. Requires the SSM Agent + IAM permissions. Audit logged. The modern replacement for jump hosts.
  • State Manager: declarative config compliance — "instance X must always have application Y at version Z". Periodic enforcement, drift remediation. Use for fleet-wide standard config.
  • Patch Manager: OS patch baselines (which CVEs / packages auto-approved) + maintenance windows (when to apply). Cross-instance compliance reports. Replaces hand-rolled cron-based patching.
  • Automation documents: multi-step runbooks (e.g., "snapshot the disk, then upgrade, then test"). Trigger manually, on schedule, or via EventBridge for incident auto-response.
  • Parameter Store + Secrets Manager: both store config + secrets. Parameter Store is cheaper (free up to 4 KB) + integrates with KMS for SecureString. Secrets Manager adds automatic rotation. Pick by whether rotation matters.
Concrete example

An ops team has 200 EC2 instances and needs: (1) standard config drift remediation, (2) monthly OS patching during a 4-hour window, (3) browser shell access without SSH. Setup: SSM Agent on every instance with the AmazonSSMManagedInstanceCore IAM role. State Manager association applies a CloudWatch Agent config document daily. Patch Manager with a custom patch baseline (approve security patches automatically after 3 days), maintenance window first Sunday of the month 02:00-06:00 UTC. Session Manager for admin access — public SSH NSG rule deleted.

Key takeaway: Session Manager replaces SSH. Run Command for ad-hoc. State Manager for declarative config. Patch Manager for governed OS patching. Parameter Store for free secrets; Secrets Manager for rotated secrets.
⚡ Mini-quiz
Practise SSM scenarios → quick quiz (5 questions).
Lesson 3.3 Image Builder + Elastic Beanstalk deployment policies

EC2 Image Builder governs the golden-AMI pipeline. Elastic Beanstalk's deployment policies are the SOA-C02 canonical example of progressive deployment strategies.

Key concepts
  • EC2 Image Builder: declarative pipelines for AMI building. Components (install / validate / test), recipes (combine components + parent image), pipelines (run on schedule or trigger). Replaces hand-built golden AMIs and Packer for AWS-only orgs.
  • Inspector v2 integration: Image Builder pipelines can run Amazon Inspector against the built AMI for CVE scanning — block deploy if criticals found.
  • Beanstalk deployment policies: All at once (fast, downtime risk), Rolling (batch by batch, in-place), Rolling with additional batch (one extra batch — no capacity dip), Immutable (provision NEW ASG, swap when healthy — safest, slowest), Traffic Splitting (canary).
  • Beanstalk environment types: single-instance (cheap dev), load-balanced + auto-scaled (production). Worker tier for SQS-backed background jobs.
  • Beanstalk configuration sources: .ebextensions YAML files in source bundle (most precise control); environment variables; saved configurations (reusable). Layered with later-applied winning.
  • Beanstalk + RDS gotcha: if you let Beanstalk create an RDS in the environment, the DB lives and dies with the environment. Better: create RDS separately and supply the connection string via env vars / Secrets Manager.
Concrete example

A team uses Beanstalk for a customer-facing API. Need: zero-downtime deploys, fast rollback. Choice: Immutable deployment policy — Beanstalk provisions a new ASG, runs health checks, then swaps. Rollback = revert to previous version, which provisions yet another ASG. Slower than Rolling but safer; rollback is instant if something fails post-deploy. AMI built via EC2 Image Builder with Inspector v2 scanning on every pipeline run — pipeline fails if CVEs above critical threshold.

Key takeaway: Image Builder for governed AMI pipelines. Beanstalk Immutable for safe zero-downtime deploys; Traffic Splitting for canary. Never bake RDS into a Beanstalk env — use a separate DB.
⚡ Mini-quiz
Drill Image Builder + Beanstalk scenarios → study mode (10 questions).
04

Security & Compliance3 lessons

Security is 16% of SOA-C02 but underpins every other domain. Master GuardDuty (threat detection findings + EventBridge-based automated remediation), AWS Inspector v2 (CVE scanning, network reachability), AWS Security Hub (aggregated multi-account security posture), Amazon Macie (PII and sensitive data discovery in S3), IAM Access Analyzer (external access findings), KMS automatic key rotation, CloudTrail log file integrity validation, AWS Organizations SCPs for preventive controls, WAF rate-based rules and geographic restrictions, and S3 Block Public Access at account level.

guardduty inspector-v2 security-hub macie iam-access-analyzer kms-rotation scp waf
~5h
📖 Read in-depth chapter
Lesson 4.1 GuardDuty, Inspector, Security Hub — detection and posture

Three security services that often appear together — GuardDuty for runtime threat detection, Inspector for vulnerability scanning, Security Hub for aggregated posture. SOA-C02 asks which one finds what.

Key concepts
  • GuardDuty: ML + threat-intelligence-based detection of anomalous behaviour from VPC Flow Logs, DNS logs, CloudTrail. Findings: cryptocurrency mining, port scanning, exfiltration, IAM key compromise. Cross-region aggregation by enabling per region.
  • Inspector v2: CVE scanning for EC2 + ECR + Lambda. Continuous (re-scan on package change) vs scheduled. Network reachability for EC2 ("which of my instances are reachable from the internet on port 22?"). Integrated with Security Hub findings.
  • Security Hub: aggregation layer — collects findings from GuardDuty, Inspector, Macie, Config, third-party. Single dashboard, single API. Runs compliance standards (AWS Foundational Security Best Practices, CIS, PCI DSS) as checks against your environment.
  • IAM Access Analyzer: flags resources accessible from OUTSIDE your trust zone (account / org). Catches "S3 bucket policy accidentally allows another account" without manual policy review.
  • Macie: ML-driven discovery of PII / PHI / financial data in S3. Sample-based by default; full scan on demand. Outputs findings into Security Hub.
  • Cross-account aggregation: all three (GuardDuty / Inspector / Security Hub) support delegated administrator in Organizations — one security account aggregates findings from every member account.
Concrete example

An org with 30 accounts: designate a security tooling account. Delegate GuardDuty + Inspector + Security Hub + Macie admin to it. Auto-enable each on all member accounts via Organizations. Security Hub aggregates findings; AWS Foundational Security Best Practices + CIS Foundations Benchmark standards enabled. Critical findings route via EventBridge to PagerDuty. IAM Access Analyzer scans every account; flagged cross-account access triggers a Slack notification with the principal + resource.

Key takeaway: GuardDuty = runtime threats, Inspector = vulnerabilities, Macie = data sensitivity, Security Hub = aggregation. Delegated admin in Organizations is the multi-account pattern.
⚡ Mini-quiz
Drill detection-service mapping scenarios → study mode (10 questions).
Lesson 4.2 KMS, WAF, and S3 protection

Encryption keys (KMS), web-app protection (WAF), and the per-bucket / per-account S3 defenses are perennial SOA-C02 topics. Get the defaults wrong and you ship publicly-readable buckets or encrypt with the wrong key custody.

Key concepts
  • KMS key types: AWS-managed (free, in your account but Microsoft controls), Customer-Managed (you control rotation, key policy, audit), AWS-owned (in AWS-internal account, free, no audit). Per-service default; you flip to CMK when key custody matters.
  • KMS rotation: automatic annual rotation for symmetric CMKs (free, transparent). Asymmetric and HMAC keys require manual rotation. Rotation creates new backing material; older material kept to decrypt old data.
  • KMS key policy + grants: key policy is required (no IAM-only access). Grants are short-lived programmatic permissions used by AWS services (e.g., EBS at volume create). Auditable in CloudTrail.
  • WAF: deploys on CloudFront, ALB, API Gateway, AppSync, App Runner. Web ACL contains rules (managed, custom, rate-based). Rate-based rule limits requests per source IP — essential bot defense.
  • WAF managed rule groups: AWS-managed (Common Rule Set / Known Bad Inputs / SQL DB / Linux / etc.), Marketplace, custom. Enable AWS Core Rule Set as baseline.
  • S3 Block Public Access: account-level setting + bucket-level setting. When ON, NEW or CHANGED public ACLs/policies are denied regardless of intent. AZ-500 expects you to ENABLE this org-wide via SCPs.
Concrete example

A SaaS platform with public S3 (static site) and private S3 (customer data) in the same account. Setup: S3 Block Public Access at account level ON for new buckets (catches accidents). Public site bucket has bucket-level BPA OFF + a least-privilege public-read policy. Customer-data bucket encrypted with Customer-Managed KMS key (annual rotation ON), Inspector v2 scanning the EC2 fleet, WAF on the CloudFront in front of the API with rate-based rule (2000 req/5min per IP) + AWS Core Rule Set.

Key takeaway: CMK for key custody, annual rotation for symmetric keys. WAF managed rule groups baseline + rate-based rules for bot defense. S3 Block Public Access ON by default; opt out per bucket only when intentional.
⚡ Mini-quiz
Practise KMS / WAF / S3 protection scenarios → quick quiz (5 questions).
Lesson 4.3 Organizations, SCPs, and multi-account guardrails

AWS Organizations governs multi-account structure. Service Control Policies (SCPs) cap what any role in a member account can do. SOA-C02 expects you to design preventive controls via SCPs and detective via Config / Security Hub.

Key concepts
  • Organizational Units (OU): nested grouping of accounts. Common structure: Root → Security / Infrastructure / Sandbox / Workloads (with sub-OUs for prod / non-prod / business unit).
  • SCPs: JSON policies attached at root / OU / account level. CAP what IAM principals can do — they're a ceiling, not a grant. Best practice: deny-list rather than allow-list (allow-list breaks on every new AWS service).
  • Common SCP patterns: deny disable-CloudTrail, deny disable-GuardDuty / Config, deny IAM changes outside the management account, deny regions outside the approved list, deny S3 BPA changes, deny KMS key deletion.
  • SCPs don't apply to: the management account, service-linked roles, or root user when MFA-protected for specific actions. Always provision break-glass in the management account exempt from SCPs.
  • AWS Control Tower: opinionated landing zone — pre-built OU structure, mandatory + strongly-recommended guardrails (preventive = SCP, detective = Config rule), account factory. The fastest way to stand up a multi-account org securely.
  • Centralized billing + consolidated discounts: Organizations gives one bill across all accounts. Reserved Instances + Savings Plans purchased in one account apply across the org by default.
Concrete example

An org with 50 accounts: deploy AWS Control Tower for the landing zone. OU structure: Security (audit + log archive accounts), Infrastructure (shared networking + identity), Sandbox, Workloads/Prod, Workloads/NonProd. Mandatory guardrails enabled on all OUs. Strongly-recommended guardrails on Workloads/Prod: deny S3 BPA changes, deny disable CloudTrail / GuardDuty, deny actions outside us-east-1 and eu-west-1. Sandbox OU has a SCP capping spend via a separate billing alarm + auto-isolation Lambda.

Key takeaway: Organizations OU hierarchy + SCPs for preventive guardrails. Control Tower for opinionated landing zones. Deny-list SCPs over allow-list. Provision break-glass in the management account.
⚡ Mini-quiz
Drill SCP and Control Tower scenarios → study mode (10 questions).

Test your knowledge on Domains 1–4 before moving to networking and cost.

⚡ Take practice test ▶ Spotify episodes
05

Networking & Content Delivery3 lessons

Networking is tied for the heaviest domain at 18%. VPC fundamentals: subnets, route tables, Internet Gateway, NAT Gateway vs NAT Instance (HA differences), Security Groups (stateful) vs NACLs (stateless, rule ordering). VPC connectivity: Peering (missing route table entries are the #1 failure cause), Transit Gateway for hub-and-spoke replacing N×(N-1)/2 peering connections, VPC Endpoints (interface vs gateway types), and AWS PrivateLink for cross-account service exposure. Hybrid: Site-to-Site VPN dual tunnels, Direct Connect + VPN failover with BGP. CloudFront: cache behaviors, TTL settings, Origin Access Control for S3, Origin Shield for dynamic content acceleration. Route 53 routing policies and health checks.

vpc-routing nat-gateway transit-gateway vpc-endpoints privatelink direct-connect vpn-tunnels cloudfront-oac
~7h
📖 Read in-depth chapter
Lesson 5.1 VPC fundamentals — subnets, routing, SG vs NACL

VPC questions are the most-tested category on SOA-C02. The exam consistently asks about the SG-vs-NACL distinction, NAT Gateway HA, and the route-table interactions that determine which packets go where.

Key concepts
  • Subnet types: public (route to IGW), private (no IGW route, optional NAT for egress), VPN-only (route to VPN GW). Subnet is the AZ boundary — one subnet per AZ for HA designs.
  • NAT Gateway vs NAT Instance: NAT Gateway is managed, zone-redundant within AZ, no admin needed, ~$0.045/hr + bandwidth. NAT Instance is self-managed EC2 — legacy, avoid. For multi-AZ HA, deploy ONE NAT Gateway per AZ.
  • Security Groups (stateful): implicit deny, explicit allow. Return traffic always allowed. Reference other SGs as source/destination (e.g., "allow from sg-webtier"). Apply at the ENI level.
  • NACLs (stateless): rule-number-ordered list with allow + deny. Apply at subnet level. Return traffic NOT automatic — open ephemeral-port range (1024-65535) outbound for clients, or full 0-65535 for the path. Rule numbers in increments of 100 leave room for inserts.
  • SG vs NACL pick: SG always (instance-level intent). NACL as defensive secondary layer (subnet-level deny — e.g., "block this attacker IP"). Don't try to do all your filtering at NACL.
  • VPC Flow Logs: capture ENI traffic to CloudWatch Logs / S3. Format includes ACCEPT/REJECT — REJECTs are typically SG/NACL drops. Use to diagnose "why can't A reach B" without packet capture.
Concrete example

A multi-AZ VPC needs HA NAT for private-subnet egress: deploy 3 NAT Gateways, one per AZ; each AZ's private subnet route table points 0.0.0.0/0 at its own AZ's NAT Gateway. Cross-AZ NAT routing would work but adds per-GB cost — keep traffic in-AZ. SGs control instance access; NACLs as a defense-in-depth layer with a specific blocked-IP list at subnet level. VPC Flow Logs at REJECT level for forensic visibility.

Key takeaway: SG stateful + intent-based; NACL stateless + defensive. NAT Gateway per AZ for HA + cost. Flow Logs for diagnostics.
⚡ Mini-quiz
Drill VPC + SG / NACL scenarios → study mode (10 questions).
Lesson 5.2 VPC connectivity — Peering, Transit Gateway, Endpoints, PrivateLink

Beyond a single VPC, the connectivity options each have different scaling and cost trade-offs. SOA-C02 tests when you should reach for each.

Key concepts
  • VPC Peering: 1:1 connection between two VPCs (same or different account / region). Non-transitive — A↔B + B↔C does NOT give A↔C. Manual route table entries on BOTH sides required (the most-common SOA failure cause).
  • Transit Gateway (TGW): hub-and-spoke for VPCs + VPN + Direct Connect. Replaces N×(N-1)/2 peerings with one TGW per region. Route tables enable selective transitivity. Pricier per-GB than peering but scales.
  • VPC Endpoints — Gateway: S3 and DynamoDB only. Route-table entries direct subnet traffic to the AWS service over the AWS backbone — no internet, no NAT cost. Free.
  • VPC Endpoints — Interface (PrivateLink): ENI in your subnet with a private IP that resolves to an AWS service or your own service. Per-hour cost + per-GB. Use for cross-account / cross-VPC service exposure WITHOUT routing tables touching the consumer's networking.
  • PrivateLink as service-provider: expose a service (NLB-fronted) to other AWS accounts via PrivateLink — they create an interface endpoint targeting your service. Cleanly separates consumer/provider VPCs.
  • Decision pattern: 1-2 VPCs to interconnect → peering. 3+ VPCs / multi-region / hybrid → TGW. Access to AWS services → Gateway endpoints (S3/DynamoDB) or Interface endpoints. Cross-account services → PrivateLink.
Concrete example

An org with 15 VPCs across 3 accounts needs full mesh connectivity + access to S3 + a centralised auth service in a security VPC. Choice: Transit Gateway in each region with all 15 VPCs attached. Two TGW route tables: workload (full transitivity to all workload VPCs + auth via PrivateLink target) and shared (only allows the security VPC + S3). Each VPC has a Gateway endpoint for S3. Auth service exposed via PrivateLink from the security account.

Key takeaway: TGW for 3+ VPCs or hybrid. Gateway endpoints free for S3/DynamoDB. Interface endpoints / PrivateLink for cross-account service exposure. Peering only for 2-VPC simple cases.
⚡ Mini-quiz
Practise VPC connectivity scenarios → quick quiz (5 questions).
Lesson 5.3 CloudFront and edge delivery — OAC, cache, Origin Shield

CloudFront is the AWS CDN. SOA-C02 tests origin security (OAC), cache behaviours, and invalidation patterns.

Key concepts
  • Distributions: one CloudFront distribution per public site / app. Origins: S3 bucket, ALB, EC2, MediaPackage, custom origin (any HTTP server). Multi-origin with path-based routing.
  • Origin Access Control (OAC): replaces legacy Origin Access Identity (OAI). CloudFront signs requests to S3; the bucket policy allows ONLY the distribution. Public access to the bucket can stay BLOCKED. The modern S3-origin pattern.
  • Cache behaviours: per path-pattern: which origin, TTLs, allowed methods, cookie/header forwarding. Multiple behaviours allow different caching for /api/* (TTL 0) vs /static/* (TTL 1y).
  • TTLs and invalidation: object TTL controlled by Cache-Control headers on origin response or by behaviour Min/Default/Max TTL. Invalidation forces edge cache to re-fetch — paid per path beyond the first 1000/month, slow (minutes).
  • Origin Shield: regional shield layer in front of origin — collapses multiple POP requests into one. Reduces origin load and per-GB transfer. Worth enabling for any non-trivial CloudFront usage.
  • Signed URLs / signed cookies: time-bound access to private content. Signed URLs for one-asset downloads (e.g., paywall video); signed cookies for multi-asset access (e.g., the user's whole gallery).
  • CloudFront Functions vs Lambda@Edge: CloudFront Functions (lightweight, viewer request/response only, < 1 ms, ~$0.10/M) vs Lambda@Edge (full Node/Python runtime at edge, slower, pricier). Pick Functions for header rewrites; Lambda@Edge for richer logic.
Concrete example

A photo-sharing app: CloudFront distribution fronts an S3 bucket of user photos. OAC wired between distribution and bucket; bucket BPA stays ON. Behaviour A: /api/* with TTL 0 + cookie forwarding (dynamic). Behaviour B: /static/* with TTL 1 year + Origin Shield. Private user gallery served via signed cookies set by the auth Lambda after sign-in. CloudFront Functions add a security header on every response.

Key takeaway: OAC replaces OAI for S3 origins. Per-behaviour TTLs match content lifetime. Origin Shield for cost. Signed cookies for multi-asset private access. CloudFront Functions for cheap edge logic.
⚡ Mini-quiz
Drill CloudFront scenarios → study mode (10 questions).
06

Cost & Performance Optimization3 lessons

Cost optimization is 12% but deeply integrated into all other domains — every question type has a "most cost-effective" variant. Key concepts: Reserved Instances vs Savings Plans (Standard RI = max discount for fixed workloads; Compute Savings Plans = flexibility across families/regions), Spot Instances for fault-tolerant batch jobs with 2-minute interruption notice handling, S3 storage class selection and lifecycle policies (Standard → Standard-IA → Glacier Deep Archive), AWS Compute Optimizer for right-sizing recommendations, AWS Cost Anomaly Detection for ML-based spend alerts, Trusted Advisor cost checks, and inter-AZ vs cross-region data transfer costs.

reserved-instances savings-plans spot-interruption s3-lifecycle compute-optimizer cost-anomaly-detection trusted-advisor inter-az-transfer
~4h
📖 Read in-depth chapter
Lesson 6.1 EC2 pricing — Reserved, Savings Plans, Spot

Compute is the biggest line item; pricing model choice is the biggest lever. SOA-C02 tests when to use each and how to handle Spot interruption gracefully.

Key concepts
  • Standard RI vs Convertible RI: Standard = max discount, locked instance family/size (with size flex inside family). Convertible = exchangeable for different family/OS, smaller discount. Almost always pick Compute Savings Plans over Convertible RI.
  • EC2 Instance Savings Plan: commit to $/hr in a specific instance family + region. Max discount, less flexible.
  • Compute Savings Plan: commit to $/hr across EC2 + Fargate + Lambda + Sagemaker + any family/region/OS. Slightly less discount than Instance SP, but the flexibility usually wins for evolving workloads.
  • Spot Instances: up to 90% off, can be reclaimed with 2-minute notice. Best for fault-tolerant batch, dev/test, stateless. Use Spot via Auto Scaling Group with mixed instance types + multiple AZs for resilience.
  • Spot interruption handling: listen for the 2-minute notice via instance metadata (http://169.254.169.254/latest/meta-data/spot/instance-action). Drain LB target, flush in-memory state, exit gracefully. Or use ASG lifecycle hooks to react automatically.
  • Cost-anomaly tools: AWS Cost Anomaly Detection (ML alerts on unusual spend), Cost Explorer (cube-style analytics), Cost & Usage Reports (detailed CSV for Power BI / QuickSight).
Concrete example

A 24/7 web tier (10-instance baseline + 4× peak) and a batch-job tier: cover baseline with a 3-year Compute Savings Plan at $/hr matching 10 c5.large equivalents (~60% off On-Demand). Peak burst: On-Demand instances. Batch job tier: Spot via ASG with diversified instance types across 3 AZs. Cost Anomaly Detection alerts on +20% week-over-week. Total: ~40% lower than All-On-Demand.

Key takeaway: Compute SP for steady baseline (flexibility wins). Spot for fault-tolerant. On-Demand for unpredictable peaks. Always wire interruption handling for Spot.
⚡ Mini-quiz
Drill EC2 pricing-model scenarios → study mode (10 questions).
Lesson 6.2 S3 storage classes and lifecycle policies

S3 storage cost differs by ~25× between Standard and Deep Archive. Picking the right class + automating tier transitions with lifecycle policies is the canonical S3 cost optimisation.

Key concepts
  • Storage classes: Standard (frequent, default), Intelligent-Tiering (auto-tiers based on access patterns, monitoring fee per object), Standard-IA (≥30 days, cheaper storage + retrieval fee), Cold (≥90 days, cheaper still), One Zone-IA (one AZ, cheap, low durability tolerance), Glacier Instant Retrieval (ms retrieval, cheap), Glacier Flexible Retrieval (min-hours retrieval), Glacier Deep Archive (hours retrieval, cheapest).
  • Intelligent-Tiering: auto-moves objects between Frequent / Infrequent / Archive Instant Access tiers based on observed access. Avoids the "guess access pattern" problem. Monitoring fee per object (skip for huge numbers of tiny objects).
  • Lifecycle policies: declarative rules to transition between classes and / or expire. Run daily, no extra cost. Standard pattern: transition Standard → Standard-IA at 30 days → Glacier Flexible at 90 → Deep Archive at 365.
  • Lifecycle on incomplete multipart uploads: the most overlooked cost optimisation — abort incomplete multipart uploads after 7 days. Saves storage for failed uploads sitting in S3 forever.
  • Storage Class Analysis: S3 feature that observes access patterns + recommends lifecycle transitions. Run for 30+ days on a bucket before crafting lifecycle policies.
  • Glacier retrieval pricing: Standard ~3-5h, Bulk ~5-12h cheapest, Expedited < 5 min most expensive. Plan retrieval cost into compliance budgets.
Concrete example

Customer-data S3 bucket grows 1 TB/month. Access pattern: heavy first 30 days, sporadic for 90, near-zero after. Set up lifecycle policy: Standard → Standard-IA at 30 days → Glacier Flexible at 90 → Deep Archive at 365 → expire at 7 years. Add incomplete multipart upload abort at 7 days. Use Storage Class Analysis for new bucket types where pattern isn't yet known. Total cost: ~75% lower than All-Standard.

Key takeaway: lifecycle policies do the manual tiering automatically. Intelligent-Tiering when pattern is unknown. Always abort incomplete multipart uploads. Storage Class Analysis before policy design.
⚡ Mini-quiz
Practise S3 cost scenarios → quick quiz (5 questions).
Lesson 6.3 Right-sizing, transfer costs, Trusted Advisor

Right-sizing comes from data, not gut feel. AWS provides specific tooling for the data; SOA-C02 expects you to know which tool surfaces which signal — and the surprisingly common transfer-cost gotchas.

Key concepts
  • AWS Compute Optimizer: ML-driven right-sizing recommendations for EC2 / EBS / Lambda / ECS on Fargate. Identifies over-provisioned or under-provisioned resources based on observed metrics. Free.
  • Trusted Advisor: rules-based recommendations across cost / security / performance / fault tolerance / service limits. Business or Enterprise Support unlocks the full set. Surface idle ELBs, underutilised EBS volumes, idle RDS, unassociated EIPs.
  • Cost Explorer: cube-style analytics on cost and usage data — pivot by service, tag, account, time. Save reports, schedule emails. Use to find runaway services week-over-week.
  • Cost & Usage Reports (CUR): detailed billing data delivered to S3 as CSV / Parquet. Ingest into Athena / QuickSight / Power BI for custom dashboards.
  • Data transfer costs: the biggest hidden cost on AWS. In-AZ free. Cross-AZ within a region: ~$0.01/GB both directions. Cross-region: ~$0.02/GB. Internet egress: $0.05-$0.09/GB. NAT Gateway: ~$0.045/GB on top of the per-hour. Pin in-AZ for chatty pipelines.
  • S3 Inventory + S3 Storage Lens: Inventory = scheduled report of all objects + metadata to a target bucket. Storage Lens = account-wide usage dashboards with cost optimisation insights.
Concrete example

An audit identifies $50k/month avoidable spend. Pull Compute Optimizer recommendations — finds 30% of EC2 instances over-provisioned. Pull Trusted Advisor — 15 unassociated EIPs ($0.005/hr each), 8 idle RDS instances. Cost Explorer reveals NAT Gateway costs >$8k/month from a chatty cross-AZ Lambda; refactor Lambda to use VPC endpoints for S3 + DynamoDB (zero transfer cost). S3 Storage Lens shows 12 TB of incomplete multipart uploads in older buckets — add lifecycle to abort.

Key takeaway: Compute Optimizer for right-sizing, Trusted Advisor for waste, Cost Explorer for trends. Transfer cost is often the hidden killer — pin in-AZ and use VPC endpoints.
⚡ Mini-quiz
Drill right-sizing + transfer-cost scenarios → study mode (10 questions).
07

Exam Lab Skills — Hands-On AWS Console3 lessons

SOA-C02 is unique among AWS Associate exams: it optionally includes exam labs where you perform real tasks in a live AWS environment. This module covers console skills you must be able to perform under time pressure: creating CloudWatch alarms and log metric filters, configuring Auto Scaling group lifecycle hooks, deploying CloudFormation stacks and detecting drift, running SSM Run Command and Session Manager sessions, configuring S3 bucket policies and lifecycle rules, creating VPC endpoints and updating route tables, and reviewing GuardDuty/Config findings. Practice these in a free-tier AWS account or AWS skill builder labs.

console-skills cloudwatch-console cloudformation-console ssm-console vpc-console s3-console exam-labs hands-on-practice
~2h
📖 Read in-depth chapter
Lesson 7.1 Exam lab anatomy and time management

SOA-C02 labs put you in front of a live AWS console with a scenario, success criteria, and a time budget. Most candidates lose points not because they don't know AWS but because they ran out of time on a single lab while the next two went un-attempted.

Key concepts
  • Lab format: typically 2-3 labs out of 65 questions, ~20% of the score weight. You have one shared time budget across labs + MCQs (~190 minutes total). Labs auto-grade against success criteria the moment you click "Done".
  • Time budget rule of thumb: 20 minutes per lab MAX. If you hit 20 min without success criteria met, save what's done and move on — partial credit is awarded per criterion.
  • Skim all labs first: when you reach a lab section, read all the success criteria across all labs before doing anything. Order your work — knock out the easiest first, leave the longest for last.
  • Read criteria literally: success criteria are checked exactly. "Create a CloudWatch alarm named app-cpu-alarm" — the name MUST match. Wrong name = 0 credit on that criterion even if everything else works.
  • Region / account context: the lab has its own AWS account + region. Check the region selector before clicking around. Resources you create are auto-deleted at lab end — no cleanup needed.
  • Save and exit pattern: if you must move on, click "Save" not "Done" — Done marks the lab complete and starts grading. Save lets you return if time allows.
Concrete example

Lab 1: 4 criteria, 3 trivial + 1 complex CloudFormation change-set. Lab 2: 6 criteria, all medium. Lab 3: 3 criteria, all hard. Strategy: tackle Lab 1's 3 trivial criteria (10 min, 75% of Lab 1 score). Lab 2 in full (18 min). Lab 3's lowest-hanging criterion if time (5 min, 33% of Lab 3 score). Skip Lab 1's hard criterion and Lab 3's complex ones. Total > 60% of lab score in < 35 min vs going deep on one lab and timing out.

Key takeaway: labs are a partial-credit game. Skim all first, then knock out the easiest. Match names exactly. 20-minute hard cap per lab.
⚡ Mini-quiz
Drill exam-lab strategy scenarios → study mode (10 questions).
Lesson 7.2 High-frequency console tasks — the muscle memory list

Specific console flows appear over and over in SOA-C02 labs. Practicing these to muscle-memory means you spend exam time on the scenario logic, not the click hunt.

Key concepts
  • CloudWatch alarm + SNS: Alarms → Create alarm → pick metric → set threshold + period + M-of-N → action: SNS topic (existing or create new). The most-tested lab — practice the SNS topic creation flow inline.
  • Log metric filter + alarm: Log groups → log group → Create metric filter → pattern + metric name → then create alarm on that metric. Two-step flow — easy to forget the alarm step.
  • CloudFormation stack + Change Set: CloudFormation → Create stack → upload template → parameters → review → Create. For updates: Update → Replace template → Create change set → Execute. Always Change Set, never direct Update.
  • Auto Scaling group lifecycle hook: ASG → Instance management → Lifecycle hooks → Create hook → state transition (launch / terminate) + notification target (SNS / SQS) + heartbeat timeout.
  • SSM Session Manager: Systems Manager → Session Manager → Start session → pick instance → opens browser shell. Lab may require enabling KMS encryption on session data — check Preferences first.
  • S3 bucket policy + lifecycle: S3 → bucket → Permissions tab (bucket policy in JSON editor — they often hand you the policy). Management tab → Create lifecycle rule → scope (whole bucket / prefix / tag) → actions (transition + expire).
  • VPC endpoint: VPC → Endpoints → Create → pick service (S3, DynamoDB Gateway; everything else Interface) → VPC + route table (Gateway) or subnet + SG (Interface).
Concrete example

A typical lab: "Create a CloudWatch alarm named web-cpu-high that fires when EC2 CPUUtilization > 80% for 5 minutes (1 of 1), notifying SNS topic web-alerts (create if missing). Add a metric filter on log group /aws/ec2/web matching ERROR, name WebErrors, then create an alarm on it named web-error-alarm at threshold > 5 per 5 min." Practice this exact flow until it's < 4 minutes end-to-end.

Key takeaway: a few flows account for most lab work. Practice them in a free-tier account until they're reflex. Always create resources with the exact name in the criteria.
⚡ Mini-quiz
Practise high-frequency console flow scenarios → quick quiz (5 questions).
Lesson 7.3 Lab readiness — practice plan and common pitfalls

Two weeks of focused lab practice is the difference between a confident attempt and a panic. Build a runbook of the 8-10 most common scenarios and execute each one cold under a timer.

Key concepts
  • Free-tier practice account: create a separate AWS account just for lab practice. Set a $1 budget alert. Use only free-tier-eligible services (t3.micro, default S3, CloudWatch standard tier).
  • AWS Skill Builder labs: some are free, others bundled with AWS-paid training. Skill Builder labs grade the same way as exam labs — closest possible simulation.
  • Build a personal runbook: 1-page cheat-sheet per scenario (CloudWatch alarm, metric filter, lifecycle hook, Change Set, etc.) with exact click paths and gotchas. Review the night before.
  • Common pitfalls: wrong region (the lab pins you to one — check before clicking); permissions errors (read the criterion — usually the IAM role is pre-provisioned and you just need to USE it); typos in resource names (criteria are exact-match).
  • Skipping criteria: if one criterion is taking too long, MARK the lab Saved and move on. Coming back with fresh eyes often unblocks. Don't sink 25 min into a single criterion when you have 3 more across other labs.
  • "Done" vs partial: remember Done = final grade. Some candidates leave one criterion unattempted and click Done, scoring partial credit — better than running out of time and getting 0 on that lab AND missing MCQs.
  • Day-of habits: read all MCQs first (skip labs); flag the hard MCQs; do labs around the 60-minute mark when fresh but acclimatised; return to flagged MCQs in the final 30 min.
Concrete example

A 2-week prep plan: Days 1-3 — build the practice account, run through 5 canonical labs cold. Days 4-7 — Skill Builder lab pack, time-box each. Days 8-10 — write the personal runbook, drill the muscle-memory list from Lesson 7.2. Days 11-13 — full mock exams + scored lab simulation. Day 14 — rest, review runbook, sleep early. Exam-day: read all MCQs first, mark hards, do labs at minute 60, return to flagged at minute 150.

Key takeaway: labs reward practice, not memorisation. Build a runbook. Time-box every drill. On exam day: MCQs first, labs in the middle, hards at the end.
⚡ Mini-quiz
Drill SOA-C02 exam-readiness scenarios → study mode (10 questions).

Top 4 mistakes candidates make on SOA-C02

  • Confusing monitoring tools: CloudWatch = metrics/logs/alarms. CloudTrail = API audit history. AWS Config = resource configuration compliance. GuardDuty = threat detection. Knowing which tool answers which question type is critical.
  • Skipping lifecycle hooks: The difference between health check grace period (prevents premature termination), default cooldown (prevents rapid scale-out), and lifecycle hooks (pauses instances for custom initialization) is heavily tested.
  • Overlooking VPC routing: VPC Peering, Transit Gateway, and VPC Endpoints all require explicit route table entries. The most common trick question: "peering is set up but traffic doesn't flow" → missing routes.
  • Ignoring the exam labs: Candidates who only study theory but never use the AWS console struggle with the lab portion. Spend at least 10 hours practicing the most-tested operations in a real AWS free-tier account.

5-week study plan

Assumes 1 hour per weekday + 2 hours each weekend day (~7 hours/week). Adjust to your schedule.

Week 1

Monitoring + Foundations

Complete Module 1 (CloudWatch, CloudTrail, Config, EventBridge). Set up a free-tier AWS account and create your first CloudWatch alarms and log metric filters hands-on. Take the practice test once to establish your baseline score.

Week 2

Reliability + Deployment

Complete Modules 2–3. Practice creating Auto Scaling lifecycle hooks and CloudFormation drift detection in the console. Listen to CertQuests podcast episodes on disaster recovery strategies during commutes.

Week 3

Security + Compliance

Complete Module 4. Enable GuardDuty and Inspector on your free-tier account to see real findings. Practice creating AWS Config rules and reviewing compliance. Study SCP structure and cross-account IAM role patterns.

Week 4

Networking + Cost

Complete Modules 5–6. Build a VPC with public/private subnets, NAT Gateway, and VPC Endpoint in your account. Run a cost analysis using Cost Explorer to understand your spending patterns. Practice S3 lifecycle policy configuration.

Week 5

Exam Labs + Full Review

Complete Module 7. Take the practice test 2–3 more times targeting >85% score. Use AWS Skill Builder exam labs if available. Focus review on your consistently missed question categories. Schedule your exam.

SOA-C02 vs SAA-C03 — key differences: Both cover AWS services but from different perspectives. SAA-C03 asks "which architecture should you design?" SOA-C02 asks "how do you operate, monitor, secure, and fix running workloads?" SOA is more focused on CloudWatch, SSM, Config, and hands-on operational tasks. Many candidates take SAA first (broader architecture knowledge) then SOA, but the order is not mandatory.

Ready to test your SOA-C02 knowledge?

60 scenario-based practice questions covering all 6 exam domains. Free, no signup, instant feedback on every answer.

⚡ Start SOA-C02 practice test ▶ Listen on Spotify

Complete the AWS path

SOA-C02 pairs well with SAA-C03 and DVA-C02 to cover all three AWS Associate specializations.

Start practicing →