Amazon AWS · cloud

AWS Developer Associate DVA-C02

Master serverless development on AWS: Lambda, API Gateway, DynamoDB, SQS/SNS, Cognito, KMS, CodeDeploy, CodePipeline, Elastic Beanstalk, CloudFormation, X-Ray, and Step Functions. Covers all DVA-C02 exam domains with scenario-based practice.

8Modules
35 hoursDuration
intermediateLevel
DVA-C02Exam code
65Exam questions
720 / 1000Passing score
130 minExam duration
$300Exam fee (USD)
3 yearsValid period
Study on the go — CertQuests Podcast

Reinforce Lambda patterns, DynamoDB design, and CI/CD concepts while commuting. New episodes covering DVA-C02 topics drop weekly.

▶ Listen on Spotify

Why earn the AWS Developer Associate?

DVA-C02 is the go-to certification for developers building cloud-native and serverless applications on AWS. It validates that you can architect, build, deploy, and debug production-grade AWS workloads — not just use the console.

  • Proves hands-on AWS development skills: Lambda, DynamoDB, API Gateway, SQS/SNS
  • Validates CI/CD knowledge: CodePipeline, CodeBuild, CodeDeploy, Elastic Beanstalk
  • Demonstrates security practices: IAM, Cognito, KMS, Secrets Manager
  • Opens doors to senior developer and cloud engineer roles — median salary $130k+
  • Natural next step after Cloud Practitioner (CLF-C02) or alongside SAA-C03
  • Valid for 3 years; earns AWS re:Certify continuing education credits
Exam strategy: DVA-C02 is heavily scenario-based. For each question, eliminate answers that describe actions you wouldn't do in production (e.g., hardcoding credentials, using root IAM user, synchronous polling in hot loops). AWS favors managed services and least-privilege IAM over DIY solutions.

DVA-C02 exam domains

Four domains weighted by importance. Development is the largest — make sure Lambda, DynamoDB, and API Gateway are your strongest areas.

Domain 1 — Development with AWS Services 32%
Domain 2 — Security 26%
Domain 3 — Deployment 24%
Domain 4 — Troubleshooting & Optimization 18%

8 modules · ~35 hours

Each module focuses on an exam domain cluster. Work through them in order or jump to your weak areas.

01

AWS Lambda — Deep Dive3 lessons

Lambda is the heart of DVA-C02. This module covers the full Lambda lifecycle: cold starts and provisioned concurrency, execution roles, environment variables, layers, VPC integration, event source mappings (SQS, Kinesis, DynamoDB Streams), synchronous vs asynchronous invocations, retries and DLQs, Lambda Destinations, concurrency limits, and traffic shifting with aliases and versions.

cold-start provisioned-concurrency layers event-source-mapping async-retries dlq vpc-integration aliases-versions
~5h
📖 Read in-depth chapter
Lesson 1.1 Lambda lifecycle — execution context, cold starts, and concurrency

Every Lambda invocation runs inside a managed execution context. AWS reuses the context for subsequent invocations (a warm start) but creates a new one when scaling out or after idle time (a cold start). Knowing what gets reused — and what doesn't — is the difference between a 30ms response and a 2-second response in the same function.

Key concepts
  • Cold start phases: download package, start runtime, initialise handler code (this is where module-level imports and SDK clients run), then invoke the handler. Total cold-start time depends on package size + runtime: small Node.js ~150ms, Python ~250ms, .NET ~500ms-1s, Java >1s.
  • Reusable execution context: code OUTSIDE the handler runs once per cold start and stays warm across invocations. Cache SDK clients, DB connections, and config there. Anything INSIDE the handler runs on every invocation — keep it lean.
  • Provisioned Concurrency: pre-warms a set number of execution environments so they never cold-start. Configured per alias (you can't apply it to $LATEST). Costs continuously even when idle — only use for latency-sensitive APIs.
  • Reserved Concurrency: caps the maximum concurrent executions for a function. Different from Provisioned — Reserved is a limit (used to protect downstream systems), Provisioned is a pre-warm. You can set both.
  • Account concurrency: the AWS account has a soft limit (default 1,000 concurrent executions across all functions in a region). Reserved Concurrency carves out a guaranteed slice from this account pool; unreserved functions share the remainder.
  • SnapStart (Java): a newer feature that snapshots the initialised JVM and resumes from the snapshot — reduces Java cold starts from 2-3 seconds to ~200ms. Currently free, Java-only, must publish a new version to use it.
Concrete example

A latency-sensitive API behind API Gateway sees p99 spikes of 1.5s at low traffic. Diagnosis: low traffic means Lambda scales down to zero between requests, and each new request pays a cold-start tax. Fix: enable Provisioned Concurrency = 2 on the production alias and route API Gateway at that alias. P99 drops to ~80ms. To protect a downstream legacy database from a sudden Lambda fan-out, also set Reserved Concurrency = 50 so a runaway invocation count can't take down the DB.

Key takeaway: hoist expensive init out of the handler. Provisioned Concurrency pre-warms (latency); Reserved Concurrency caps (protection). Account-level limit defaults to 1,000 and is the ceiling your functions share.
⚡ Mini-quiz
Practise cold-start and concurrency scenarios → study mode (10 questions).
Lesson 1.2 Triggers, event source mappings, and async patterns

Lambda's value comes from the breadth of trigger sources it integrates with. There are three invocation models — synchronous (API Gateway, ALB), asynchronous (S3, SNS, EventBridge), and event source mappings (SQS, Kinesis, DynamoDB Streams). The differences determine retry behaviour, error handling, and idempotency requirements.

Key concepts
  • Synchronous invocation: caller waits for the response. API Gateway → Lambda is the canonical pattern. Errors propagate back to the caller — no automatic retries. Idempotency optional; you control retry logic on the client side.
  • Asynchronous invocation: Lambda accepts the event and returns immediately; processing happens in the background. Two automatic retries with exponential backoff. After exhausted retries, the event can be sent to a Dead Letter Queue (SQS/SNS) or a Lambda Destination (preferred — supports success and failure routing to SQS/SNS/EventBridge/Lambda).
  • Event source mapping (poll-based): Lambda's runtime polls the source (SQS, Kinesis, DynamoDB Streams, Kafka). For SQS, success deletes the message; failure leaves it in the queue (visibility timeout-based retry). For Kinesis/DynamoDB Streams, failure blocks the shard — use bisect-on-error and a DLQ to drain poison messages.
  • SQS batch size and partial batch responses: Lambda receives up to 10 (Standard) / 10 (FIFO) messages per invocation. With ReportBatchItemFailures enabled, return only the failed message IDs and the rest are deleted — vital to avoid reprocessing the whole batch on one bad message.
  • Lambda Destinations vs DLQ: Destinations are richer than the legacy DLQ — they support routing both success and failure events to a target (SQS / SNS / EventBridge / Lambda), include full execution metadata, and are configured per alias/version. Prefer Destinations for new builds.
  • Idempotency: any non-sync source can deliver the same event more than once. Use idempotency keys in the event payload to detect retries (cache key in DynamoDB with TTL). The AWS Lambda Powertools libraries provide this out of the box.
Concrete example

An order-processing pipeline reads from SQS and writes to DynamoDB. Occasional malformed orders break processing and the entire batch retries. Fix: enable ReportBatchItemFailures in the event source mapping; the function returns the failed message IDs and SQS deletes the others. Add a DLQ on the source queue with maxReceiveCount = 3 so poison messages drain to the DLQ after 3 retries. For the rare success-path notification, wire a Lambda Destination on success to an SNS topic that fans out order-confirmation emails.

Key takeaway: sync (no retries), async (2 retries + Destinations), event source mapping (source-specific retry, watch out for poison messages). Use Destinations, not the legacy DLQ. Build idempotent handlers — at-least-once is the rule everywhere except sync.
⚡ Mini-quiz
Drill invocation-model retry behaviour → quick quiz (5 questions).
Lesson 1.3 Versions, aliases, traffic shifting, and VPC integration

Lambda's deployment story is two primitives: immutable versions snapshot the code + configuration, and mutable aliases are named pointers (with weights) at one or two versions. Together they enable canary and linear deployments without re-platforming. VPC integration is the extra wrinkle: necessary for private resources, but with implications for cold start and IP planning.

Key concepts
  • $LATEST and versions: every code change updates $LATEST in place. Publishing a version creates an immutable snapshot with a numeric version (1, 2, 3...). Pinning callers (API Gateway integration, event source mapping) to a version means upgrades require a deploy of the consumer, not just a Lambda push.
  • Aliases: a named pointer at one OR two versions. Aliases support weighted aliases — e.g., prod at 90% version 8 and 10% version 9. API Gateway / event source mappings target the alias so you control traffic shifting without changing consumers.
  • Traffic shifting patterns: all-at-once (instant cut-over, fastest, riskiest), linear (X% every Y minutes — e.g., 10% every 10 minutes), canary (small % then jump — e.g., 10% for 10 min then 100%). All managed declaratively by CodeDeploy or SAM DeploymentPreference.
  • Lambda Layers: shared library bundles attached to one or more functions. Max 5 layers per function, max 250 MB unzipped (including all layers + function code). Common use: shared SDK clients, common utilities, large native binaries (Pillow, libpq).
  • VPC integration: attach the function to subnets + security groups to reach RDS, ElastiCache, EFS, internal ALBs. AWS auto-provisions Hyperplane ENIs in the subnet — cold-start impact is now negligible (used to be 10+ seconds). Plan subnet IP space: each Hyperplane ENI consumes one IP from the subnet.
  • Internet egress from VPC: a VPC-attached Lambda loses public internet access — it lives in private subnets. To reach the internet (3rd-party API, Bedrock), add a NAT Gateway in the VPC and route the Lambda's subnets through it. Cost gotcha — NAT is per-GB.
Concrete example

A team deploys a new Lambda version that interacts with an internal RDS instance. Design: attach Lambda to two private subnets (different AZs) with a security group allowing 5432 to the RDS SG. Publish version 5, create an alias prod pointing at it with weight 1.0. CI/CD then publishes version 6, updates the alias to 90% v5 + 10% v6 via SAM DeploymentPreference: Canary10Percent10Minutes. CloudWatch Alarms watch error rate; if the alarm fires within the 10-minute canary window, CodeDeploy automatically rolls back to v5. After the window, alias jumps to 100% v6.

Key takeaway: versions are immutable, aliases route weighted traffic, CodeDeploy manages the shifting. VPC integration is no longer a cold-start concern with Hyperplane ENIs — but plan subnet IPs and remember Lambda loses public egress without a NAT.
⚡ Mini-quiz
Practise alias-shift and VPC-integration scenarios → study mode (10 questions).
02

API Gateway — REST, HTTP & WebSocket APIs3 lessons

Build and secure APIs the AWS way. Covers REST vs HTTP API differences, Lambda proxy vs custom integration, request/response mapping templates, stage variables, caching and throttling, usage plans and API keys, Lambda authorizers, Cognito User Pool authorizers, CORS configuration, canary deployments, and WebSocket APIs for real-time bidirectional communication.

proxy-integration caching throttling lambda-authorizer cognito-authorizer websocket stage-variables cors
~4h
📖 Read in-depth chapter
Lesson 2.1 REST vs HTTP API — picking the right gateway

API Gateway has two flavours that get confused all the time: REST API (the original, feature-complete) and HTTP API (newer, simpler, ~70% cheaper). The exam loves edge-case questions on which features exist in which flavour, because picking the wrong one costs you either money or capability.

Key concepts
  • HTTP API: the modern, low-cost option. JWT authorizers via Cognito or any OIDC provider, simple request/response, auto-deployed stages, CORS first-class. NO request/response mapping templates, NO API keys / usage plans, NO caching, NO WAF integration (until recently — check the current docs). Use for serverless microservices that don't need transformation.
  • REST API: the full-feature option. Lambda + Cognito authorizers, request/response mapping (VTL), API keys + usage plans, response caching, WAF, private API endpoints with VPC interface endpoints. Costs ~3.5× HTTP API. Use when you need any of those features.
  • Lambda integrations: proxy integration (the default — full request passed as JSON, full response expected back) vs custom integration (you write mapping templates to translate). Proxy is what 95% of teams use; custom only when you must integrate a non-Lambda backend or transform the contract.
  • Stages and stage variables: a stage is a named deployment of the API (dev, prod). Stage variables are key/value pairs accessible from integrations and authorizers — use them to point at different Lambda aliases per environment (e.g., ${stageVariables.lambdaAlias}).
  • Throttling and caching: account-level throttle (10,000 RPS soft limit), stage-level throttle (override per stage), method-level throttle (per-route override). Caching only on REST API — TTL up to 1 hour, cache key includes path + headers you opt into. Cache cost scales with size.
  • Endpoint types: Edge-optimised (CloudFront in front, default for REST), Regional (your region only, faster from same-region clients), Private (REST only — accessible only via VPC interface endpoint). HTTP APIs are always Regional.
Concrete example

A SaaS startup needs a public API with three properties: (1) cost-efficient, (2) authenticated via Cognito User Pool JWT, (3) protected by simple per-stage rate limits. Choice: HTTP API with a JWT authorizer pointing at the Cognito issuer, stages dev and prod with stage-level throttling (e.g., 100 / 1000 RPS). Saves ~70% vs REST. If later the team needs request transformation or response caching, they migrate to a REST API — but only then.

Key takeaway: HTTP API for cost-effective serverless APIs with JWT auth. REST API when you need mapping templates, API keys, caching, or WAF. Proxy integration almost always; custom only when forced.
⚡ Mini-quiz
Drill REST vs HTTP API decisions → study mode (10 questions).
Lesson 2.2 Authorizers — IAM, Cognito, Lambda, JWT

API Gateway has four authorizer types, each suited to a different identity story. The exam asks you to pick the right one given a description of the client population — corporate users, consumer app, server-to-server, third-party integration.

Key concepts
  • IAM authorizer (Sigv4): the caller signs requests with AWS credentials. Right for service-to-service inside AWS — e.g., a Lambda in another account calling your API. Free, no Lambda invocation per call. Doesn't fit end-user clients.
  • Cognito User Pool authorizer: the caller presents a Cognito JWT in the Authorization header. API Gateway validates the JWT against the configured Cognito pool. Right for consumer apps where Cognito handles sign-up / sign-in / MFA / federation.
  • Lambda authorizer (custom): a Lambda function you write evaluates the request (token-based on Authorization header, OR request-based on multiple inputs) and returns an IAM policy that allows or denies. Used for third-party JWT validation (Auth0, Okta, Firebase), custom business rules, or PII-aware decisions. Cache results to avoid invoking on every request — TTL up to 1 hour.
  • JWT authorizer (HTTP API only): built-in OIDC JWT validation against any issuer (Cognito, Auth0, Azure AD). Same feel as Lambda authorizer but no Lambda — config-only, free, fast. Use whenever the identity story is "any OIDC IdP".
  • API keys + usage plans: distinct from authorizers — API keys identify the CALLING SYSTEM (not the user), bind to usage plans that enforce per-key throttle + quota (RPS + monthly request count). Use for paid-tier rate limiting or partner API access. Always combine with an authorizer for real auth — API keys alone are not authentication.
  • CORS: handled by API Gateway via OPTIONS preflight responses. HTTP API has first-class CORS config (just declare allowed origins/methods/headers). REST API requires enabling CORS on each method — easy to miss and the most common "my API doesn't work in the browser" complaint.
Concrete example

A B2B SaaS API needs: (1) end-user auth via the customer's own SSO (any OIDC), (2) partner-level rate limits on shared keys, (3) admin endpoints callable only from internal AWS Lambda. Design: HTTP API with a JWT authorizer pointing at the customer's OIDC issuer for user routes; API keys + usage plans on top of the JWT for per-partner quotas; a separate REST API (or routes within the HTTP API) with IAM authorization for the internal admin endpoints. Three identity stories, three authorizer types, one logical API.

Key takeaway: IAM for service-to-service inside AWS. Cognito for consumer apps. JWT (HTTP API) for OIDC providers. Lambda authorizer for everything bespoke. API keys are for billing tier, not authentication.
⚡ Mini-quiz
Practise authorizer-type decisions → quick quiz (5 questions).
Lesson 2.3 Deployment, observability, and WebSockets

How you ship to API Gateway determines the blast radius of a bad release. Add CloudWatch + X-Ray and you can find regressions before customers do. WebSockets are the off-spec topic that always catches one or two candidates per exam.

Key concepts
  • Canary deployments (REST API): deploy a new version to the same stage but route a small % of traffic to it. Run for a defined window; promote if metrics are clean; roll back by re-deploying the prior version. Integrates with CodeDeploy for automatic rollback on alarm.
  • HTTP API auto-deploy: changes to an HTTP API can be auto-deployed to the configured stage — simpler but less control. Use stage variables and feature flags to decouple risky changes from the deploy.
  • CloudWatch metrics: per-stage / per-method 4XXError, 5XXError, Count, Latency, IntegrationLatency. Build dashboards for p50/p99 latency and error rate; alarm on 5XX rate > 1% over 5 minutes.
  • Access logging: per-request log entries (JSON) sent to CloudWatch Logs or Kinesis Firehose. Enable to debug specific requests by request ID. Costs ingestion fees — sample if traffic is huge.
  • X-Ray tracing: enable on the stage to capture per-segment latency through the integration. Lambda must opt-in to X-Ray (env var + IAM permissions). Surfaces "DynamoDB call took 800ms" inside an otherwise unexplainable 1.2s API response.
  • WebSocket APIs: a separate API type for bidirectional, stateful connections. Routes are keyed by $connect, $disconnect, $default, and message-payload routes you define. Storage of connection IDs is your job — DynamoDB is the canonical backing store. Backend (typically Lambda) calls PostToConnection via the management API to push messages to specific clients.
Concrete example

A trading-platform team needs a real-time price-update feed to thousands of browsers. Design: WebSocket API with $connect Lambda that writes the connectionId + userId to a DynamoDB connections table, $disconnect Lambda that deletes the row. A separate price-publisher Lambda triggered by a Kinesis stream reads all active connections and calls PostToConnection per subscriber. Add X-Ray for end-to-end latency tracing. Per-stage CloudWatch alarm on ConnectCount sudden drops signals a backend issue.

Key takeaway: canary deployments for safe rollouts (REST), auto-deploy for speed (HTTP). CloudWatch + access logs + X-Ray for observability. WebSocket APIs need you to manage connection state — DynamoDB is the canonical store.
⚡ Mini-quiz
Test deployment and observability scenarios → study mode (10 questions).
03

DynamoDB — Design, Query & Optimization3 lessons

DynamoDB is the most-tested data store on DVA-C02. Master partition key and sort key design, avoiding hot partitions, LSI vs GSI, Query vs Scan and when to use each, ProjectionExpression, ConditionExpression, optimistic locking with version attributes, DynamoDB Streams, TTL, transactions (TransactGetItems/TransactWriteItems), DAX for caching, and on-demand vs provisioned capacity.

partition-key-design gsi-lsi query-vs-scan optimistic-locking streams transactions dax ttl
~5h
📖 Read in-depth chapter
Lesson 3.1 Keys, partitions, and access-pattern design

DynamoDB is designed for known access patterns — get this exact entity by this exact key. Get the key design right and your application scales linearly; get it wrong and you'll fight hot partitions for years. The exam tests partition-key design more than any other DynamoDB topic.

Key concepts
  • Partition key (PK): required, hashes to determine which physical partition stores the item. Choose a PK with high cardinality and even access distribution. Bad: status ("active"/"deleted") — all "active" items land in one partition. Good: userId — millions of distinct values, evenly accessed.
  • Sort key (SK): optional second part of the primary key. Items with the same PK are stored sorted by SK and Queryable as a range. Common pattern: PK = userId, SK = orderTimestamp — Query gives "all orders for this user, newest first" in O(log n).
  • Hot partition: traffic concentrated on one PK value (or a few). DynamoDB's adaptive capacity helps but doesn't eliminate the issue. Mitigation: write sharding — append a random suffix (userId#0...userId#9) to spread writes, then Query all 10 shards in parallel on read.
  • Single-table design: store multiple entity types (users, orders, products) in one DynamoDB table, distinguished by PK/SK prefixes (USER#123, ORDER#456, PRODUCT#789). Lets one Query return a hierarchical record (user + their last 10 orders) in one request.
  • Composite-key Query patterns: with PK + SK, Query supports begins_with, between, <, >, = on the SK. Reverse SK with a timestamp-formatted SK to query newest-first cheaply.
  • Attribute size and item limits: max 400 KB per item. Total all attributes including names. For large objects, store metadata in DynamoDB and the blob in S3 with the S3 URL as an attribute.
Concrete example

An e-commerce app needs: (1) get user profile by userId, (2) list all orders for a user, newest first, (3) get a single order by orderId. Single-table design: PK = USER#{userId}, SK = PROFILE for the profile item; PK = USER#{userId}, SK = ORDER#{2026-05-16T18:34:00Z}#{orderId} for each order. Query (1) is GetItem with PK + SK=PROFILE. Query (2) is Query with PK + begins_with(SK, "ORDER#") and ScanIndexForward=false. Query (3) needs a GSI with PK=ORDER#{orderId}.

Key takeaway: design from access patterns backward to keys, not from schema forward. PK = high cardinality, distinct accesses. SK enables range queries. Single-table design when multiple entity types share access patterns. Write-shard PKs only when you legitimately have a hot key.
⚡ Mini-quiz
Practise PK/SK design and hot-partition scenarios → study mode (10 questions).
Lesson 3.2 Query vs Scan, GSI, LSI, and capacity modes

Once your keys are designed, the two read operations are Query (fast, indexed) and Scan (slow, sequential). Secondary indexes give you alternate access paths into the data. Capacity mode controls the cost model — on-demand pays per request, provisioned pays per reserved capacity unit.

Key concepts
  • Query: must specify PK; can filter on SK; reads one partition. Returns up to 1 MB per page, paginated via LastEvaluatedKey. Use ProjectionExpression to fetch only needed attributes. Always the right answer for known-entity reads.
  • Scan: reads every item in the table (or index). Filter expressions are applied AFTER the read — you still pay RCU for every item read, even if filtered out. Avoid Scan except for periodic reporting jobs against small tables.
  • Global Secondary Index (GSI): alternate (PK, SK) pair indexing the SAME table. Eventually consistent (lag is usually <1 second but can spike). Has its own provisioned capacity. Up to 20 per table. Use for "I need to look up an order by orderId but the table PK is userId".
  • Local Secondary Index (LSI): alternate SK against the SAME PK. Must be created with the table (cannot add later). Limited to 5 per table. Strongly consistent reads available. Use for "all orders by user, sorted by total instead of by timestamp".
  • Capacity modes: On-demand pays per million read/write request units — no capacity planning, scales to any traffic. Provisioned reserves RCU/WCU per second, supports auto-scaling and Reservations. Provisioned is ~5-7× cheaper at steady high traffic. On-demand is the default choice for variable / unpredictable workloads.
  • RCU/WCU math: 1 RCU = 1 strongly-consistent read up to 4 KB, or 2 eventually-consistent. 1 WCU = 1 write up to 1 KB. Transactional read costs 2× RCU. Transactional write costs 2× WCU. Plan for the largest item, not the average.
Concrete example

The order-lookup pattern from Lesson 3.1 needs to support "find an order by orderId" without knowing the user. Create a GSI on the orders table with PK = orderId, no SK, and project KEYS_ONLY (cheapest). Lookup logic: Query the GSI by orderId to get the user/order key, then GetItem from the main table. Total cost: 1 RCU (eventually consistent GSI Query) + 0.5 RCU (eventually consistent GetItem on a small item). On-demand mode keeps this trivially cheap at unpredictable traffic.

Key takeaway: Query for known PK, Scan only for ad-hoc reports. GSI for alternate access paths (eventually consistent, separate capacity). LSI only when you need same-PK alternate-SK AND strong consistency AND can create it at table creation. On-demand for unpredictable; Provisioned for steady high traffic.
⚡ Mini-quiz
Drill Query/Scan + GSI/LSI decisions → quick quiz (5 questions).
Lesson 3.3 Streams, transactions, TTL, and DAX

The "everything else" lesson — features that DVA-C02 tests heavily but that don't fit the keys / queries / capacity story. Get these right and your data layer becomes resilient (transactions), event-driven (streams), self-cleaning (TTL), and microsecond-fast (DAX).

Key concepts
  • DynamoDB Streams: an ordered change log per shard for the past 24 hours. Stream view types: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE, NEW_AND_OLD_IMAGES. The standard event source for Lambda-driven cross-region replication, analytics fan-out, or trigger-style validation.
  • Optimistic locking: add a version attribute, include ConditionExpression: version = :expected on every update, and increment version on each write. Concurrent updates with a stale version receive ConditionalCheckFailedException — the client retries with the latest value. No actual locks taken.
  • Transactions: TransactWriteItems and TransactGetItems group up to 100 actions into one ACID-compliant operation. All succeed or all roll back. Cost: 2× the equivalent non-transactional units. Use for cross-item invariants ("transfer money from account A to account B").
  • TTL (Time To Live): add a numeric attribute holding a Unix timestamp. DynamoDB deletes items past their TTL within ~48 hours, asynchronously, free. Useful for session storage, idempotency keys, cache layers. Items past TTL but not yet deleted still appear in Query/Scan — filter them out client-side if necessary.
  • DAX (DynamoDB Accelerator): in-memory write-through cache cluster in front of DynamoDB. Microsecond reads for cache hits, sub-millisecond writes. Only works with DynamoDB native operations (not PartiQL with FetchAll, not all index types). Sit DAX in private subnets — connect via the cluster endpoint.
  • Global Tables: multi-region active-active replication. Tables in each region accept writes; conflicts resolved by last-writer-wins on the per-region timestamp. Use for global low-latency reads/writes; not a substitute for backup (a Global Table replicates a delete to every region within seconds).
Concrete example

A funds-transfer API debits one account and credits another. Naive design with two UpdateItem calls can leave money in limbo if the second fails. Correct: TransactWriteItems with two ConditionExpression updates — debit only if balance >= :amount, credit unconditionally. Both succeed or both fail. Wrap in optimistic-locking version checks if concurrent transfers on the same account are possible. Emit the transfer record to a DynamoDB Stream so a Lambda can asynchronously notify the user via SNS.

Key takeaway: Streams for change-data-capture. Transactions for cross-item ACID. Optimistic locking for concurrent single-item updates. TTL for self-cleaning data. DAX when microsecond reads are required. Global Tables for active-active geo, NOT for backup.
⚡ Mini-quiz
Practise Streams/Transactions/TTL/DAX scenarios → study mode (10 questions).
🎧

Halfway through? Reinforce DynamoDB and Lambda patterns by listening to the CertQuests podcast — concise audio episodes covering exactly these topics for your commute.

▶ Open Spotify
04

Messaging — SQS, SNS, Kinesis & EventBridge3 lessons

Decoupled event-driven architectures are a core DVA-C02 topic. Learn when to use SQS (durable queuing), SNS (fan-out pub/sub), Kinesis Data Streams (ordered high-throughput streaming), and EventBridge (event routing with schema registry). Key concepts: SQS visibility timeout and DLQ, FIFO vs Standard queues, SNS subscription filters, Kinesis shard management and Lambda parallelization factor, and EventBridge rules and targets.

sqs-visibility-timeout dlq fifo-queue sns-fan-out kinesis-shards kinesis-parallelization eventbridge-rules
~4h
📖 Read in-depth chapter
Lesson 4.1 SQS — Standard, FIFO, visibility timeout, DLQs

SQS is the durable queue. Producers write, consumers poll, the queue keeps the message until acknowledged. Two queue types — Standard (high throughput, at-least-once, no ordering) and FIFO (exactly-once, ordered, lower throughput) — and a small set of operational levers that the exam tests heavily.

Key concepts
  • Standard vs FIFO: Standard supports unlimited throughput, best-effort ordering, at-least-once delivery (duplicates possible). FIFO is strict ordering inside a MessageGroupId, exactly-once when ContentBasedDeduplication is on, max ~3000 msg/sec with high throughput mode. FIFO queue names must end with .fifo.
  • Visibility timeout: when a consumer receives a message, it's invisible to others for the visibility timeout (default 30s, max 12h). If the consumer fails before deleting, the message becomes visible again. Tune to slightly longer than your slowest consumer; too short causes duplicate processing.
  • Long polling: set ReceiveMessageWaitTimeSeconds = 20 (max). The Receive API call holds open until a message arrives or 20s elapse — drastically cuts cost vs short polling (~$0.40/M reqs adds up fast on a hot consumer).
  • Dead Letter Queue (DLQ): a second SQS queue that receives messages that have been received but not deleted N times (maxReceiveCount, default 3). Lets poison messages drain off the main queue. DLQ for FIFO must also be FIFO; DLQ for Standard can be Standard.
  • Message size: up to 256 KB. For larger payloads, store the body in S3 and put the S3 URL in the SQS message (the Extended Client Library does this automatically). Compresses large-payload pipelines without changing producer/consumer code much.
  • Delay queues and delays per message: a queue-level DelaySeconds (up to 15 minutes) postpones visibility for every new message. Per-message delay is supported on Standard only. Useful for scheduled retries or pacing downstream systems.
Concrete example

An e-commerce order pipeline processes ~5,000 orders/sec at peak. Orders for the same customer must be processed in order, but order between different customers doesn't matter. Choice: FIFO queue with high-throughput mode + MessageGroupId = customerId so each customer becomes a strict-order shard while different customers run in parallel. Visibility timeout 60s (consumer p99 is 30s). DLQ with maxReceiveCount = 5. ContentBasedDeduplication on, so the producer's retry of the same body doesn't double-process.

Key takeaway: Standard for max throughput + tolerable duplicates. FIFO when order or exactly-once matters — group by MessageGroupId to retain per-shard ordering with parallel consumers. Visibility timeout > max processing time. DLQs always — drain poison messages.
⚡ Mini-quiz
Practise SQS queue-type and DLQ scenarios → study mode (10 questions).
Lesson 4.2 SNS, EventBridge, and event-routing patterns

SNS and EventBridge both deliver events to multiple subscribers, but they target different patterns. SNS is the simple fan-out pub/sub — one topic, many subscribers, optional filter. EventBridge is the routing bus — one event, many rules that fire targets based on content matching against a schema. Picking between them is a frequent exam question.

Key concepts
  • SNS topics: publish once, fan-out to many subscribers. Subscriber types: SQS, Lambda, HTTP(S) endpoint, email, SMS, mobile push, Kinesis Data Firehose, application endpoints. Standard topics (best-effort order, at-least-once) and FIFO topics (with FIFO SQS subscribers, deduplication, group-by-MessageGroupId).
  • SNS subscription filters: JSON policy attached to a subscription that tests message attributes (and optionally body in newer regions). Only matching messages are delivered. Avoid sending every message to every subscriber when only a subset cares.
  • Cross-region/cross-account fan-out: SNS topics can have subscribers in other regions and accounts. The standard pattern for "one publisher, multiple business-unit consumers".
  • EventBridge: a managed event bus with rules. Each rule has an event pattern (content-based filter) and one or more targets (Lambda, Step Functions, SQS, SNS, Kinesis, ECS, Batch, etc.). Schema registry can auto-discover and version event schemas. Three bus types: default (AWS service events), custom (your events), partner (SaaS integrations like Auth0, Datadog, Shopify).
  • Scheduled rules: EventBridge supports cron-style rules (the modern replacement for CloudWatch Events). Trigger Lambda on a schedule with a richer expression language than the legacy CloudWatch syntax.
  • EventBridge Pipes (newer): point-to-point integration with optional filter and enrichment between source (SQS/Kinesis/DynamoDB Stream) and target (Lambda/Step Functions/SNS/etc.). Replaces the "Lambda just to translate" anti-pattern for many integrations.
Concrete example

An order-published event needs to: (1) fan out to a billing service, an inventory service, and a notifications service; (2) only the notifications service cares about orderTotal > $500 (VIP customers). Choice A (SNS): one topic, three SQS subscribers, an SNS subscription filter on the notifications subscriber matching {"orderTotal": [{"numeric": [">", 500]}]}. Choice B (EventBridge): one event, three rules. SNS is simpler and free for the fan-out itself. EventBridge wins when content routing gets more complex or you want schema discovery.

Key takeaway: SNS for simple fan-out with optional filters. EventBridge for content-routed, schema-versioned event buses. Both are pub/sub — the choice depends on filtering richness and schema needs.
⚡ Mini-quiz
Drill SNS vs EventBridge decisions → quick quiz (5 questions).
Lesson 4.3 Kinesis Data Streams — shards, ordering, and Lambda integration

When you need ordered, high-throughput, replayable streams (real-time clickstream, telemetry, financial ticks), Kinesis Data Streams is the answer. The mental model is fundamentally different from SQS — partitioned by shard, consumers track position, retention up to 365 days. Exam questions focus on shard scaling and Lambda integration parallelism.

Key concepts
  • Shards: the unit of throughput. Each shard supports 1 MB/sec or 1000 records/sec write, 2 MB/sec read. Capacity scales by adding shards. Provisioned mode (you size manually) vs On-Demand mode (auto-scales up to peak ~30× over 24h).
  • Partition key: producer-supplied; hashed to determine which shard. Records with the same partition key always land in the same shard, preserving order WITHIN that key. Even distribution is your job — a hot partition key creates a hot shard exactly like DynamoDB.
  • Retention: 24 hours by default, configurable up to 365 days. Replay-capable — a downstream consumer can re-process the last 7 days by resetting its position. Cost increases with retention.
  • Lambda as consumer: Lambda polls the stream via event source mapping. Parallelization factor (1-10) increases how many Lambda invocations process records FROM ONE SHARD concurrently — useful when downstream work is slow and records-per-shard exceeds Lambda's throughput. Use carefully — invocations within a shard see records out of original order.
  • Failure handling for stream consumers: a failing batch BLOCKS the shard (Lambda retries indefinitely by default). Mitigations: BisectBatchOnFunctionError (split the batch and retry halves), MaximumRetryAttempts (cap retries), OnFailure destination (SQS/SNS DLQ for unprocessable records). Without these, one poison record halts the entire shard.
  • Enhanced Fan-out: per-consumer dedicated 2 MB/sec read throughput, push-based via HTTP/2. Use when multiple consumers compete for the same shard's read capacity. Costs more — only justified for 3+ consumers.
Concrete example

A telemetry pipeline ingests sensor readings at 50 MB/sec, must preserve per-sensor ordering, and runs a real-time anomaly detector. Design: Kinesis Data Stream with 50 shards (on-demand mode for variable load), PartitionKey = sensorId. Anomaly Lambda as consumer with parallelizationFactor = 1 (we need order per sensor — bumping the factor would interleave records). Add BisectBatchOnFunctionError + OnFailure destination SQS DLQ so a poison record doesn't block the shard. Retention bumped to 7 days for replay during incidents.

Key takeaway: Kinesis for ordered, replayable, partitioned streams. Shards are the throughput unit. Partition keys determine ordering scope. Lambda parallelization factor gates how concurrent processing within a shard is — be deliberate. Always configure failure handling — the default blocks the shard.
⚡ Mini-quiz
Practise shard sizing and Lambda integration scenarios → study mode (10 questions).
05

Security — Cognito, IAM, KMS & Secrets Manager3 lessons

Security is 26% of DVA-C02. This module covers Cognito User Pools vs Identity Pools and when to use each, ID token vs Access token vs Refresh token, fine-grained DynamoDB/S3 access with identity pool policy variables, IAM roles for Lambda and cross-account access, resource-based policies, KMS envelope encryption with GenerateDataKey, Secrets Manager automatic rotation, SSM Parameter Store SecureString, and S3 presigned URLs.

cognito-user-pools cognito-identity-pools jwt-tokens kms-envelope-encryption secrets-manager-rotation ssm-securestring presigned-url resource-based-policy
~5h
📖 Read in-depth chapter
Lesson 5.1 Cognito — User Pools vs Identity Pools

Cognito has two products that share a brand but solve different problems. User Pools are the directory; Identity Pools are the AWS-credential vending machine. Most apps use both together — the exam loves the "which one for which job" distinction.

Key concepts
  • User Pool: a managed user directory — sign-up, sign-in, password reset, MFA, social logins, custom attributes, hosted UI. Issues JWTs (ID + Access + Refresh tokens) to authenticated users. Use it whenever your app needs an "account" concept.
  • Identity Pool (Federated Identities): exchanges a token (from Cognito User Pool OR third-party IdP OR Google/Facebook/Apple) for TEMPORARY AWS credentials via STS AssumeRoleWithWebIdentity. Use to grant browser/mobile apps direct AWS access (S3 upload, DynamoDB read).
  • JWT trio: ID token contains user identity claims (use for "who is this user"). Access token authorises access to resources (use to call APIs). Refresh token long-lived, swap for new ID+Access. Default expiry: ID/Access 1h, Refresh 30d.
  • Policy variables in Identity Pool roles: ${cognito-identity.amazonaws.com:sub} resolves to the user's unique identity ID. Use in IAM policies ("Resource": "arn:aws:s3:::mybucket/${cognito-identity.amazonaws.com:sub}/*") to give each user their own folder without writing per-user policies.
  • Cognito Triggers (User Pool Lambda): hook into pre-signup, post-confirmation, custom message, pre-auth, post-auth, pre-token-generation. Use pre-token-generation to inject custom claims (org_id, tier) into the JWT.
  • SAML / OIDC federation: User Pools can federate to a corporate IdP (Okta, Azure AD via SAML or OIDC). Users authenticate at the IdP; Cognito issues its own JWT with the federation claims merged in. Common in B2B SaaS.
Concrete example

A mobile app needs (1) user accounts with email + Google sign-in, (2) per-user private S3 photo storage. Design: Cognito User Pool for sign-up/sign-in (Google federated). After auth, the app exchanges the User Pool JWT at the Identity Pool for temporary AWS credentials. The Identity Pool authenticated role grants s3:GetObject/s3:PutObject on mybucket/${cognito-identity.amazonaws.com:sub}/* — each user can read/write only their own folder, no Lambda required between client and S3.

Key takeaway: User Pool = directory + JWT issuer. Identity Pool = JWT → temporary AWS credentials. Most apps need both. Policy variables on the Identity Pool role enable per-user isolation without writing per-user IAM policies.
⚡ Mini-quiz
Drill Cognito User Pool vs Identity Pool decisions → study mode (10 questions).
Lesson 5.2 IAM for developers — roles, trust policies, cross-account

IAM on the DVA-C02 exam is heavy on practical role design rather than policy syntax memorisation. Understanding execution roles, trust relationships, and cross-account access patterns is what separates a passing score from a fail.

Key concepts
  • Execution role: the IAM role a Lambda (or ECS task, EC2, etc.) assumes when it runs. Grants the function permissions to call AWS APIs. The role's trust policy must allow lambda.amazonaws.com as principal.
  • Identity-based vs resource-based policies: identity-based policies attach to IAM users/roles ("this role can do X"). Resource-based policies attach to resources ("anyone listed here can do X to me") — S3 bucket policies, Lambda resource policies, KMS key policies. Resource-based policies enable cross-account access without role chaining.
  • STS AssumeRole: the cross-account mechanism. Account A's role has a trust policy allowing Account B's principal to assume it. Account B calls sts:AssumeRole and gets temporary credentials that act as Account A's role for up to 12 hours.
  • Lambda invocation permission (resource policy): for cross-account Lambda invokes, add a resource-based policy on the Lambda using lambda:AddPermission — that's how API Gateway in account B is allowed to invoke a Lambda in account A.
  • Policy evaluation logic: deny anywhere wins. Explicit allow needed somewhere. Resource-based policies CAN grant access without an identity-based policy in the same account (cross-account requires both ends).
  • Permissions boundary: the maximum permissions a role/user can have. Used to delegate role-creation safely — devs can create roles, but the permissions boundary caps what those roles can do.
Concrete example

A Lambda in account A needs to read DynamoDB in account B. Design: in account B, create an IAM role with dynamodb:GetItem on the target table; its trust policy allows the Lambda execution role from account A. The Lambda code calls STS AssumeRole at startup (cached for ~50min), uses the temporary credentials in a DynamoDB client. Alternative for one-off cross-account writes: a resource-based policy on the DynamoDB table itself — simpler but doesn't carry over to Streams or backups.

Key takeaway: execution role for in-account permissions, AssumeRole + trust policy for cross-account. Resource-based policies enable cross-account without role chaining where supported. Deny always wins; explicit allow always required.
⚡ Mini-quiz
Practise role/trust/cross-account scenarios → quick quiz (5 questions).
Lesson 5.3 KMS, Secrets Manager, Parameter Store, and presigned URLs

Encryption keys and secrets management round out the security domain. The exam distinguishes Secrets Manager from Parameter Store routinely, and asks about KMS envelope encryption mechanics whenever a workflow involves "encrypt this 500 MB file".

Key concepts
  • KMS envelope encryption: KMS keys are 4 KB and can't encrypt large objects directly. Instead: call GenerateDataKey which returns a plaintext data key AND an encrypted version of that data key. Encrypt your large blob locally with the plaintext data key, discard the plaintext, store the encrypted blob + the encrypted data key together. To decrypt: Decrypt the data key with KMS, use plaintext data key to decrypt the blob.
  • Customer-Managed Keys (CMK) vs AWS-managed keys vs AWS-owned keys: CMK is the only one where you control the key policy, rotation, deletion. CMK comes in symmetric (default), asymmetric (RSA/ECC), and HMAC variants.
  • Secrets Manager: rotates secrets automatically via a Lambda rotation function. Built-in rotation for RDS, DocumentDB, Redshift, Redshift Serverless. Higher cost than Parameter Store. Use whenever automatic rotation matters.
  • SSM Parameter Store SecureString: stores secrets encrypted with KMS. Free (Standard tier, up to 4 KB per parameter), no built-in rotation. Use for config that rotates manually or rarely (DB endpoints, feature flags, ENV-prefixed values).
  • S3 presigned URLs: a signed URL with embedded credentials and short expiry that allows the holder to GET or PUT a specific S3 object without AWS credentials. Common pattern: client calls Lambda → Lambda generates presigned URL → client uploads directly to S3, bypassing Lambda payload limits (6 MB sync, 256 KB async).
  • S3 server-side encryption options: SSE-S3 (AWS-managed key, AES256), SSE-KMS (KMS CMK, auditable, more expensive — invokes Decrypt on every GET unless caching is on), SSE-C (you provide the key on every request — rare).
Concrete example

A user uploads a 500 MB report through a web app. The Lambda backend can't accept 500 MB payloads. Design: Lambda exposes an API that, on request, calls S3.getSignedUrl('putObject', ...) with a 15-minute expiry and returns the URL to the client. The browser PUTs the file directly to S3. S3 is configured with SSE-KMS using a CMK. After upload, S3 ObjectCreated event triggers a downstream Lambda. To decrypt: that Lambda has kms:Decrypt on the CMK; reads the encrypted data-key from the object metadata; uses KMS to decrypt it; uses the plaintext data key to decrypt the report locally.

Key takeaway: envelope encryption is the standard pattern for >4 KB plaintext. Secrets Manager for auto-rotated DB creds; Parameter Store SecureString for cheaper static secrets. Presigned URLs bypass Lambda payload limits for large uploads/downloads.
⚡ Mini-quiz
Practise KMS, Secrets Manager, and presigned-URL scenarios → study mode (10 questions).
06

CI/CD — CodeCommit, CodeBuild, CodeDeploy & CodePipeline3 lessons

The deployment domain (24%) focuses on AWS-native CI/CD. Learn CodePipeline stage architecture (Source → Build → Deploy), CodeBuild buildspec.yml phases, compute types, and artifact caching, CodeDeploy deployment strategies for EC2/ECS/Lambda (AllAtOnce, Rolling, Blue/Green, Canary, Linear), AppSpec hooks (BeforeInstall, AfterInstall, ValidateService), CloudWatch alarm rollback, and CodeArtifact for private package management.

codepipeline-stages codebuild-buildspec codedeploy-strategies blue-green canary-linear appsec-hooks codeartifact rollback
~5h
📖 Read in-depth chapter
Lesson 6.1 CodePipeline and CodeBuild — pipeline anatomy

CodePipeline orchestrates the source → build → deploy flow. CodeBuild executes the actual build. Together they're AWS's native CI alternative to GitHub Actions / GitLab CI / Jenkins. Exam questions focus on stages, actions, and where artifacts flow.

Key concepts
  • Pipeline anatomy: stages run sequentially; actions within a stage can run in parallel or sequentially. Each action has an input artifact and an output artifact (S3-backed). Common stages: Source → Build → Test → Deploy. Stages can include manual approval actions.
  • Sources: CodeCommit (deprecated for new accounts), GitHub, Bitbucket, GitLab, S3, ECR (for container builds). Push triggers webhook events on connected repos; CodeStar Connections handles the OAuth.
  • CodeBuild buildspec.yml: phases are install (runtime + tools), pre_build (login, fetch deps), build (compile/test), post_build (push artifacts). artifacts section names what to upload. cache.paths caches dependency dirs across builds — major time savings for npm/maven.
  • CodeBuild compute types: Small (3 GB / 2 vCPU) → Large (15 GB / 8 vCPU) → 2XLarge (145 GB / 72 vCPU). Pay per minute, billed in 1-min increments. Use Arm Graviton compute for ~20% cost savings when your build supports it.
  • Environment variables and Parameter Store / Secrets Manager: CodeBuild buildspec can reference SSM Parameter Store or Secrets Manager secrets directly — no need to bake credentials into the project. Use parameter-store or secrets-manager keys in the env section.
  • CodeArtifact: managed package repo for npm, Maven, PyPI, NuGet. Supports upstream proxies to public registries with caching. Use to enforce internal package use, mitigate supply-chain attacks, and lock versions across teams.
Concrete example

A Node.js Lambda needs a pipeline: GitHub → build (with cached node_modules) → deploy via CloudFormation. Design: CodePipeline with three stages. Source: GitHub via CodeStar connection. Build: CodeBuild project with buildspec phases (install nvm-aliased Node 20, pre_build runs npm ci, build runs npm test + sam build, artifacts publish the .aws-sam directory). Cache S3 location for node_modules. Deploy: CloudFormation action with the SAM-built template. Manual approval before the prod deploy stage. Total pipeline runtime: ~3 minutes after cache warm-up.

Key takeaway: CodePipeline orchestrates, CodeBuild executes. buildspec phases + cache.paths are the leverage points. Parameter Store / Secrets Manager keep secrets out of buildspec. CodeArtifact for private packages and supply-chain control.
⚡ Mini-quiz
Practise pipeline + buildspec scenarios → study mode (10 questions).
Lesson 6.2 CodeDeploy strategies — Lambda, ECS, EC2

CodeDeploy automates the deploy phase. The exam tests deployment strategies (how traffic shifts) and AppSpec hooks (what runs at each phase). Get the strategy wrong and you either ship too fast or take down production.

Key concepts
  • Lambda strategies: AllAtOnce (instant cut-over, riskiest), Canary (X% for N minutes, then 100%), Linear (X% every N minutes, e.g., 10% every 10 min). CodeDeploy shifts traffic between two Lambda aliases via weighted aliases.
  • ECS strategies: Rolling (replace tasks in batches — supports minimumHealthyPercent / maximumPercent), Blue/Green (provision new task set, swap target group at the ALB). Blue/Green supports the same Canary/Linear shift percentages as Lambda.
  • EC2/on-prem strategies: In-place (replace one or batches at a time — needs CodeDeploy agent on the instance), Blue/Green (provision new ASG, swap behind ALB). In-place is what you SHOULD avoid for prod — no rollback path if the agent fails mid-deploy.
  • AppSpec hooks (Lambda): BeforeAllowTraffic (run smoke tests before any traffic shifts), AfterAllowTraffic (validation after the shift completes). Each hook is itself a Lambda. Return a status to CodeDeploy via the SDK.
  • AppSpec hooks (EC2): ApplicationStop, BeforeInstall, AfterInstall, ApplicationStart, ValidateService. Scripts in the appspec.yml run on the target instance via the CodeDeploy agent.
  • Automatic rollback: CodeDeploy can roll back when a CloudWatch alarm fires during deployment. Configure the deployment group with the alarm; if it transitions to ALARM during the shift window, CodeDeploy reverses the alias weight back to the old version. Set this up — manual rollback during an incident is too slow.
Concrete example

Production Lambda deployment with safety: deployment group uses Canary10Percent10Minutes. BeforeAllowTraffic hook runs an integration test against the new alias before any production traffic is shifted. AfterAllowTraffic hook runs a smoke test against the production alias after 100% shift. CloudWatch alarm: error rate > 1% on the new alias over 5 minutes. If the alarm fires during the 10-min canary window, CodeDeploy auto-rolls back to v(N-1). Combined with the SAM DeploymentPreference in template, this gives full GitOps deploys with no manual intervention.

Key takeaway: Lambda uses alias weight shifts (Canary/Linear/AllAtOnce). ECS supports Blue/Green via target-group swap. EC2 prefers Blue/Green over in-place. Always configure alarm-based automatic rollback — fast detection is the whole point of canary.
⚡ Mini-quiz
Drill CodeDeploy strategy decisions → quick quiz (5 questions).
Lesson 6.3 Blue/Green at the load balancer — ECS and EC2 patterns

Blue/Green deployments for ECS and EC2 use load-balancer-level target-group swaps. The mental model is two parallel environments (blue = current, green = new) where CodeDeploy decides when to flip. Exam scenarios test whether you understand which traffic goes where during each phase.

Key concepts
  • Two target groups: the ALB has two target groups (TG-blue and TG-green) registered on different listener rules. CodeDeploy controls which TG receives production traffic (the "production listener") and which receives test traffic (the "test listener", optional).
  • Deployment flow (ECS): CodeDeploy creates the green task set, registers tasks into TG-green, runs your AppSpec test hook against the test listener (if configured), then shifts production listener from blue to green. After a configurable wait window, terminates the blue task set.
  • Original revision termination: by default 5 minutes after successful shift. Tunable up to many hours — increase if you want a manual gate before tearing down the old environment.
  • Test listener: optional separate port (typically 8080) on the ALB pointing at TG-green. Lets validation hooks call the new version with real traffic-shape requests before production users see it.
  • EC2 Blue/Green: CodeDeploy provisions a new ASG with the new version, registers its instances into TG-green, and swaps. The old ASG can be retained (manual decommission) or destroyed automatically. Faster rollback than in-place — original ASG is still running.
  • Cost vs safety trade-off: Blue/Green doubles infrastructure for the duration of the deploy window. For small services that cost is trivial. For massive ECS clusters, factor it into the deploy plan.
Concrete example

An ECS service runs 20 tasks behind an ALB. Deployment goal: shift to a new image with a 5-minute validation window, automatic rollback on 5XX rate > 0.5%. Setup: CodeDeploy deployment group references the ECS service. The ALB has TG-blue (current 20 tasks) and TG-green (empty). Listener rules: prod on :443 → TG-blue; test on :8080 → TG-green. On deploy, CodeDeploy starts 20 new tasks into TG-green, runs BeforeAllowTraffic hook against :8080, then shifts :443 from TG-blue to TG-green. CloudWatch alarm watches 5XX. After 5 minutes without alarm, old task set terminates; with alarm, traffic reverts and green tears down.

Key takeaway: Blue/Green = two target groups + listener swap. ECS Blue/Green is the standard prod deploy pattern. EC2 Blue/Green beats in-place for rollback speed. Test listener for pre-traffic validation. Tune termination wait based on how long you want to keep the rollback option alive.
⚡ Mini-quiz
Practise Blue/Green scenarios → study mode (10 questions).
07

Infrastructure as Code — CloudFormation, SAM & CDK3 lessons

Every AWS developer needs IaC skills. This module covers CloudFormation template anatomy (Parameters, Mappings, Conditions, Resources, Outputs), Change Sets for safe updates, cross-stack references with Exports/ImportValue, custom resources, DeletionPolicy (Retain, Snapshot, Delete), and stack failure behavior. Then SAM: Transform declaration, AWS::Serverless::Function, sam local invoke/start-api for local testing. Finally CDK: L1/L2/L3 constructs, cdk synth/deploy, and why CDK synthesizes to CloudFormation.

change-sets cross-stack-references custom-resources deletion-policy sam-transform sam-local cdk-constructs
~4h
📖 Read in-depth chapter
Lesson 7.1 CloudFormation — template anatomy and Change Sets

CloudFormation is the substrate underneath every AWS IaC tool. Templates declare desired state; CloudFormation reconciles. The exam tests template sections (Parameters, Mappings, Conditions, Resources, Outputs), update mechanics (Change Sets), and resource lifecycle attributes (DeletionPolicy, UpdateReplacePolicy).

Key concepts
  • Template sections: Parameters (input values), Mappings (lookup tables — region → AMI), Conditions (boolean expressions used to gate resources), Resources (the actual AWS resources to create), Outputs (values to expose, optionally exported for cross-stack).
  • Intrinsic functions: !Ref (resource ID or parameter value), !GetAtt Resource.Attribute (specific attribute), !Sub "${var}" (string substitution), !FindInMap, !If [Condition, ValueIfTrue, ValueIfFalse], !Join.
  • Change Sets: a preview of what an update will do — added, modified, removed resources, and whether modifications require replacement. Create a Change Set; review; Execute (or Discard). Best practice: always create a Change Set for production stack updates instead of direct UpdateStack.
  • Cross-stack references: a stack can Export outputs (must be unique per region); other stacks ImportValue them. The downside: once an export is consumed, the producing stack can't update or delete that output. For loose coupling, use Parameter Store or Systems Manager references instead.
  • Lifecycle attributes: DeletionPolicy: Retain keeps a resource on stack delete (databases, S3 buckets with data); Snapshot creates a final snapshot first (RDS, EBS); Delete (default). UpdateReplacePolicy applies when a property change forces resource replacement.
  • Stack failure behaviour: default rollback on failure — undo any resources created in the failed update. DisableRollback keeps the partial state for debugging. Failed-create stacks must be DeleteStack'd before retry; failed-update stacks can be retried after fixing the template.
Concrete example

A stack defines an RDS instance and a Lambda that connects to it. Risk: rerunning the stack with a different DBInstanceClass forces replacement, losing data. Mitigations: set DeletionPolicy: Snapshot AND UpdateReplacePolicy: Snapshot on the RDS resource. Before deploying any change touching the DB, create a Change Set to preview replacement actions. For the Lambda's DB endpoint reference, prefer Parameter Store (the Lambda reads /myapp/db-endpoint at runtime) over CloudFormation ImportValue — that decouples future RDS stack rebuilds from the Lambda stack.

Key takeaway: Change Sets are the safety net for prod updates. DeletionPolicy + UpdateReplacePolicy protect stateful resources. Cross-stack Export/ImportValue creates tight coupling — use Parameter Store for loosely coupled references.
⚡ Mini-quiz
Drill template structure and lifecycle scenarios → study mode (10 questions).
Lesson 7.2 SAM — serverless-focused CloudFormation

SAM (Serverless Application Model) is CloudFormation with shorthand for Lambda, API Gateway, and DynamoDB. Templates start with Transform: AWS::Serverless-2016-10-31; SAM CLI handles local testing and packaging. The exam asks about the SAM-specific resources and the local-dev workflow.

Key concepts
  • Transform directive: Transform: AWS::Serverless-2016-10-31 at the top of the template tells CloudFormation to expand SAM shorthand into native resources during deployment. AWS::Serverless::Function with a few lines expands to a full Lambda + execution role + log group + event mappings.
  • AWS::Serverless::Function: the SAM Lambda resource. Properties: CodeUri, Handler, Runtime, Events (Api, S3, SQS, SNS, EventBridge, Schedule). Inline policies via Policies property — point at AWS-managed policy names, ARNs, or inline policy docs.
  • AWS::Serverless::Api: SAM API Gateway resource. Supports auth, CORS, stage variables. Often you don't need to declare it explicitly — declaring an Api event on a Function auto-creates a default API.
  • SAM CLI commands: sam build (bundles dependencies into .aws-sam/build), sam local invoke (run Lambda locally in Docker), sam local start-api (run API Gateway + Lambdas locally), sam deploy (package + deploy via CloudFormation).
  • DeploymentPreference: SAM resource attribute that wires CodeDeploy. Type: Canary10Percent10Minutes, Type: Linear10PercentEvery10Minutes, Type: AllAtOnce. Add Alarms list — auto-rollback if any alarm fires during the deploy.
  • SAM packaging: sam package uploads artifacts to S3 and outputs a deploy-ready template with S3 URIs. Used by CodePipeline-driven flows where a build step runs sam package and the deploy step uses the output template directly.
Concrete example

A team prototypes a new API locally. sam local start-api runs API Gateway + Lambdas on localhost:3000, executing the same code that will run in AWS. They iterate, then commit. CI runs sam build + sam deploy --guided for staging. The template's Function has DeploymentPreference: Type: Canary10Percent10Minutes with a 5XX-rate alarm, so every prod deploy is auto-canary'd. Cuts iteration time vs deploying-to-test-each-time, and provides safety on top.

Key takeaway: SAM = CloudFormation shorthand for serverless. sam local is the productivity lever for iteration. DeploymentPreference wires CodeDeploy without writing CodeDeploy resources manually.
⚡ Mini-quiz
Practise SAM template + local-dev scenarios → quick quiz (5 questions).
Lesson 7.3 CDK — IaC in real programming languages

CDK lets you define infrastructure in TypeScript, Python, Java, C#, or Go. The CDK compiler (cdk synth) produces a CloudFormation template that gets deployed. You get types, autocomplete, loops, and unit tests — but the substrate is still CloudFormation. The exam tests construct levels and the synth/deploy flow.

Key concepts
  • Construct levels: L1 = raw CloudFormation resources (Cfn* classes), 1:1 mapping. L2 = curated AWS abstractions (e.g., Bucket, Function) — sensible defaults, fewer required props. L3 = patterns (e.g., LoadBalancedFargateService) — opinionated combos of L2 constructs solving a common problem.
  • App, Stack, Construct hierarchy: a CDK App contains one or more Stacks; a Stack contains Constructs. Each Stack synthesises to one CloudFormation stack. Use one Stack per deployable unit / environment.
  • cdk commands: cdk synth outputs CloudFormation. cdk diff compares your code's synth output to the deployed stack. cdk deploy deploys; cdk destroy tears down. cdk bootstrap creates the CDKToolkit stack (the S3 bucket and IAM roles CDK uses) — once per account/region.
  • Context and environment: the env property on a Stack ties it to a specific account and region. Without it, the Stack is environment-agnostic but can't use environment-specific Lookups. CDK pulls account-specific data (AMIs, VPCs) into cdk.context.json for reproducibility.
  • Assets: CDK auto-uploads Lambda code, Docker images, and arbitrary files referenced in your code to the bootstrap S3 bucket / ECR. No manual sam package step.
  • Testing: CDK supports unit testing of synthesised templates via the aws-cdk-lib/assertions module — assert "the stack contains an S3 bucket with versioning enabled" without deploying.
Concrete example

A team writes a CDK stack in TypeScript that creates a Lambda (from ./src), an API Gateway HTTP API integration, and a DynamoDB table. The stack uses L2 constructs: new lambda.Function(...), new apigatewayv2.HttpApi(...), new dynamodb.Table(...). A unit test asserts the table has BillingMode: PAY_PER_REQUEST. CI runs cdk synth to produce CloudFormation, then cdk deploy to push. cdk diff in the PR review shows exactly what will change before merging.

Key takeaway: CDK = programming-language IaC that compiles to CloudFormation. L1 raw / L2 curated / L3 patterns. Bootstrap once per env. Use cdk diff in CI for safety. Same deployment substrate (CloudFormation) — same rollback / change-set guarantees.
⚡ Mini-quiz
Test CDK construct-level and command scenarios → study mode (10 questions).
08

Observability & Optimization — X-Ray, CloudWatch & Step Functions3 lessons

The troubleshooting domain (18%) demands deep observability skills. Master AWS X-Ray: active tracing, segments/subsegments, annotations (indexed, filterable) vs metadata (not indexed), sampling rules, and capturing AWS SDK calls with captureAWSv3Client. CloudWatch: Lambda standard metrics (Duration, Errors, Throttles, ConcurrentExecutions), custom metrics via PutMetricData, Log Insights query syntax, and alarms for automated rollbacks. Step Functions: state machine design patterns (sequential, parallel, wait, error catch/retry), Express vs Standard workflows, and when to use Step Functions vs SQS for orchestration.

x-ray-annotations x-ray-sampling cloudwatch-metrics custom-metrics log-insights step-functions parallel-state error-handling
~3h
📖 Read in-depth chapter
Lesson 8.1 X-Ray — distributed tracing on AWS

X-Ray traces a request across all the AWS services it touches — API Gateway → Lambda → DynamoDB → SNS — so you can find the slow segment without guessing. The exam tests annotations vs metadata, sampling rules, and which services support native integration.

Key concepts
  • Segments and subsegments: a segment is one service's contribution to the trace (one Lambda invocation, one API Gateway request). Subsegments are nested operations (an SDK call, a DB query). Segments aggregate into a trace identified by a trace ID propagated end-to-end.
  • Active tracing: enable on Lambda and API Gateway via a single config flag (or SAM property). The runtime auto-captures segments + AWS SDK subsegments. No code change.
  • Annotations vs metadata: annotations are key/value pairs indexed by X-Ray — filterable in the console (annotation.user = 'alice'). Limit ~50 per segment. Metadata is unindexed — visible in the trace UI but not searchable. Use annotations for facets (userId, region, route); metadata for debug context (request body, headers).
  • captureAWSv3Client / captureHTTPSGlobal: SDK helpers that wrap AWS clients and Node's https module so all outgoing calls automatically generate subsegments. Without them, your downstream calls appear as black boxes.
  • Sampling rules: default rule is 1 req/sec + 5% of additional reqs. Custom rules (priority-ordered) match by service name, http method, URL — set higher sample rates for low-traffic endpoints, low rates for chatty ones. Sampling decisions made at the entry point and propagated.
  • ServiceLens and Insights: X-Ray ServiceLens combines traces + metrics + alarms into a service map view. X-Ray Insights surfaces anomaly-based notifications when error or latency suddenly deviates from baseline.
Concrete example

An API has p99 spikes intermittently. Enable X-Ray active tracing on API Gateway + Lambda. In the Lambda code, const AWSXRay = require('aws-xray-sdk-core'); wrap the AWS SDK clients; add segment.addAnnotation('userId', event.requestContext.authorizer.userId). After a day of traffic, query the trace UI: filter by responsetime > 1 AND annotation.userId = 'X'. Service map shows DynamoDB Query taking 800ms of the 1.2s response. Now you have something to optimise.

Key takeaway: X-Ray active tracing + annotations on key facets + SDK capture turns "the API is slow sometimes" into "this Query against table X is slow for users in cohort Y". Sampling rules keep cost in check.
⚡ Mini-quiz
Drill X-Ray annotation, sampling, and capture scenarios → study mode (10 questions).
Lesson 8.2 CloudWatch — metrics, logs, alarms, Log Insights

CloudWatch is the operational telemetry plane — metrics for time series, Logs for searchable text, alarms for actions. The exam asks both about default Lambda metrics and about Log Insights query syntax.

Key concepts
  • Lambda standard metrics: Invocations, Errors, Duration (avg / max / p50 / p99), Throttles (concurrency-limit denials), ConcurrentExecutions, UnreservedConcurrentExecutions, ProvisionedConcurrencyUtilization. Free metrics, 1-minute granularity.
  • Custom metrics: PutMetricData writes custom metric values. Cost ~$0.30 per metric per month. Embed dimensions (function name, region) but keep dimension cardinality bounded — high cardinality = many metrics = expensive.
  • Embedded Metric Format (EMF): write logs in a special JSON schema; CloudWatch auto-parses them into metrics. No PutMetricData API call — cheaper at scale because you pay per log line, not per metric value.
  • Alarms: evaluate a metric (or expression) over N periods. Static threshold, anomaly detection, missing data treatment (Missing / NotBreaching / Breaching / Ignore). Multi-metric expressions use SELECT-style syntax (m1 / m2 > 0.05).
  • Log Insights query syntax: pipe-style. fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 50. Aggregations: stats count() by bin(5m), stats avg(@duration) by @log. Saves to dashboards.
  • Lambda Insights: opt-in deep telemetry (memory utilisation, network throughput, runtime details) via a managed layer. Adds ~$0.20/M invocations. Use when standard metrics aren't enough to diagnose performance.
Concrete example

A Lambda has rising error rate. Investigation: Log Insights query filter @message like /ERROR/ | stats count() by bin(5m) shows the spike's timeline. Drill in: filter @message like /ERROR/ | parse @message "error: *" as err | stats count() by err bins by error string. Top error is "Throttling" → DynamoDB is throttling the Lambda. Check the standard Throttles metric and the DynamoDB UserErrors. Fix: switch DynamoDB to on-demand mode, set an alarm on Errors > 5% for future regressions.

Key takeaway: Lambda standard metrics are free and rich. Custom metrics via EMF cost less than PutMetricData at scale. Log Insights' pipe syntax answers "what's failing and how often" in 30 seconds. Alarms turn metrics into actions.
⚡ Mini-quiz
Practise metric, log-insights, and alarm scenarios → quick quiz (5 questions).
Lesson 8.3 Step Functions — orchestration patterns

Step Functions orchestrate Lambdas (and other AWS services) into state machines. They handle retries, parallel execution, human-approval waits, and ACID-like cross-service rollback with Saga patterns. The exam asks when to use Step Functions vs SQS vs direct Lambda invocation.

Key concepts
  • State machines: Amazon States Language (ASL) JSON defines states. State types: Task (invoke a Lambda / service), Choice (branch), Parallel (run branches concurrently), Map (iterate over a collection), Wait (pause for time/timestamp), Pass (no-op transformation), Succeed, Fail.
  • Standard vs Express workflows: Standard — durable, up to 1-year runtime, exactly-once execution, $25/M state transitions. Express — high-throughput (100k/sec), 5-minute max, at-least-once, billed by duration + memory + count. Express for high-volume short-lived; Standard for long-running workflows.
  • Retry and Catch: per-state Retry handles transient failures with exponential backoff. Catch matches specific error codes and routes to a different state. Cleaner than wrapping every Lambda in retry logic.
  • Service integrations: two flavours — Request Response (call service, get response, move on), Run a Job (.sync) (call service, wait for completion before next state — supports Batch jobs, Glue, ECS tasks), Wait for Callback (.waitForTaskToken) (pause until an external system calls back with a task token — for human approvals, third-party webhooks).
  • Saga pattern: a transaction across multiple services modeled as a sequence of forward steps each with a compensating-undo step. If step 3 fails, run undo-step-2 and undo-step-1 in reverse order. Step Functions Parallel + Catch routes are the natural fit.
  • When NOT to use Step Functions: simple "Lambda calls Lambda" doesn't need orchestration overhead — just invoke directly. High-volume fire-and-forget event processing — SNS or SQS is cheaper. Step Functions cost adds up at hundreds-of-thousands-per-day workflows.
Concrete example

An order-fulfillment workflow: validate order → reserve inventory → charge card → ship order. If charge fails, release inventory. If ship fails, refund card + release inventory. Design: a Step Functions Standard workflow. Each step is a Lambda task with Retry on transient errors and Catch on business errors routing to a compensating-action branch. Inventory release and card refund themselves are Lambda tasks. Standard workflow's exactly-once semantics ensure no double-charge.

Key takeaway: Step Functions for multi-step workflows that need retries, branching, parallel execution, or human approvals. Standard for durability and exactly-once; Express for high volume and short workflows. Saga pattern + Catch for cross-service rollback. Don't reach for it for simple Lambda chains.
⚡ Mini-quiz
Drill Step Functions vs alternatives → study mode (10 questions).

High-frequency DVA-C02 concepts

These patterns appear repeatedly on the exam. Knowing them cold will save minutes per question.

Lambda + SQS: Set the SQS visibility timeout to at least 6× the Lambda function timeout. Use a Dead Letter Queue (DLQ) to preserve messages that fail after maxReceiveCount attempts.
DynamoDB hot partitions: If total consumed capacity is under the provisioned limit but you still see throttling, the problem is a hot partition key. Design keys with high cardinality (random suffix, user ID, UUID) — not dates or status values.
API Gateway 502 errors: With Lambda proxy integration, your function MUST return { "statusCode": 200, "headers": {}, "body": "..." }. Any other shape causes a 502 Bad Gateway.
X-Ray annotations vs metadata: Annotations are indexed — use them for values you'll filter on (userId, orderId). Metadata is not indexed — use it for large debugging payloads. If the question says "filter traces by X", the answer is annotations.
Lambda VPC + AWS services: Lambda in a VPC loses internet access. Without a NAT Gateway or VPC Endpoints, it can't reach DynamoDB, S3, or other AWS public endpoints. This is a very common exam trap.
CodeDeploy Lambda deployments: Canary10Percent5Minutes shifts 10% of traffic to the new version for 5 minutes, then all traffic. Linear10PercentEvery1Minute shifts 10% per minute over 10 minutes. Both support CloudWatch alarm rollback.
Cognito tokens: Access token → API authorization. ID token → user identity claims. Refresh token → get new Access/ID tokens. Never send the Refresh token to your API.
CloudFormation custom resources: Your Lambda function MUST PUT a JSON response to the pre-signed ResponseURL in the event. If it doesn't, the stack waits up to 1 hour then fails.

Test your DVA-C02 knowledge now

60 scenario-based questions covering all 4 exam domains. Progress saved locally — no signup required.

⚡ Start 60-question quiz Also try: CLF-C02

Pass DVA-C02 in 6 weeks

A structured week-by-week plan for working developers who can dedicate 5–6 hours per week.

  • Week 1: Module 1 (Lambda) + Module 2 (API Gateway). Build a simple REST API with Lambda and test locally with SAM.
  • Week 2: Module 3 (DynamoDB). Design a table schema from scratch, practice Query vs Scan, implement optimistic locking.
  • Week 3: Module 4 (SQS/SNS/Kinesis). Set up an SQS-triggered Lambda with a DLQ, implement an SNS fan-out pattern.
  • Week 4: Module 5 (Security). Configure Cognito User Pools, add a Cognito authorizer to API Gateway, rotate a Secrets Manager secret.
  • Week 5: Module 6 (CI/CD) + Module 7 (IaC). Build a CodePipeline: CodeCommit → CodeBuild → CodeDeploy to Lambda with Canary strategy.
  • Week 6: Module 8 (Observability) + full practice test. Review wrong answers. Listen to CertQuests podcast episodes for topic reinforcement. Sit the exam.
Hands-on practice matters: DVA-C02 tests real scenarios, not memorisation. AWS Free Tier covers Lambda (1M requests/month), DynamoDB (25 GB), API Gateway (1M calls), and CodeBuild (100 build-minutes). Build the things you're studying.
Keep the momentum — listen while you code

The CertQuests podcast covers Lambda edge cases, DynamoDB design anti-patterns, and CI/CD war stories — all mapped to DVA-C02 exam objectives. Perfect for study sessions when you can't be at a screen.

▶ Listen on Spotify

Continue your AWS journey

DVA-C02 pairs well with these certifications — they share significant topic overlap.

BEGINNER
AWS CLF-C02
Cloud Practitioner — great foundation before DVA-C02
INTERMEDIATE
AWS SAA-C03
Solutions Architect — architecture focus, same AWS services
INTERMEDIATE
Terraform Associate
IaC complement to CloudFormation knowledge
ADVANCED
CKA
Kubernetes Administrator — natural next step for DevOps
Start practicing →