Reinforce Pod design patterns, rolling-update strategies, and probe configurations while commuting. New CKAD-focused episodes drop weekly.
About the exam
Why earn the CKAD?
CKAD is the CNCF credential for developers who ship to Kubernetes. It is hands-on, kubectl-fluent, and bound to a strict 2-hour timer — the exam that proves you can deploy, debug, and configure cloud-native apps on a live cluster.
- Hands-on performance-based exam — you type real kubectl commands in a live terminal, not multiple choice
- CNCF-recognized and vendor-neutral — same value at AWS EKS, GKE, AKS, on-prem, or any managed-K8s shop
- The default developer-side Kubernetes credential — every cloud-native backend posting either asks for or favours it
- Lighter cluster-admin scope than CKA — focused on workloads and developer ergonomics, not etcd or kubeadm
- Gateway to platform engineering and senior backend roles ($110-150k US, €70-100k EU for K8s-fluent developers)
- Pairs naturally with CKA (operate) or CKS (secure) — same exam style, same kubectl-fluency expectation
kubectl <create|run> ... --dry-run=client -o yaml | tee file.yaml. Never hand-write YAML on the clock.
Exam blueprint
CKAD exam domains
Five domains. Configuration + Security is the heaviest at 25% — ConfigMaps, Secrets, SecurityContext, ServiceAccounts, and resource limits drive a quarter of the score. The other four domains sit close together (15–20%), so no module is a low-impact skip.
Course content
8 modules · ~30 hours
Each module maps to one or more exam domains. Work through them in order — Core Concepts and Configuration set up the mental model every later module assumes. Modules 5–8 (Pod Design, Networking, State, Helm) is where most exam points are won or lost.
Core Concepts3 lessons
The mental model every later module depends on. Master the control-plane / worker components, the Pod lifecycle, and the imperative-then-edit kubectl workflow that wins exam time. CKAD does not test you on installing a cluster — but if you don't know which component owns the symptom, you waste minutes debugging the wrong thing.
📖 Read in-depth chapter ▾
CKAD doesn't ask you to install a cluster, but it does expect you to know which component to blame when something goes wrong. A Pod stuck Pending is a scheduler symptom; a Pod CrashLooping is a kubelet symptom; an unreachable Service is a kube-proxy symptom. The mental model is your fastest debugger.
- kube-apiserver: the front door for every kubectl call and controller action; authenticates, authorizes, validates, and persists desired state to etcd over HTTPS on port 6443.
- etcd: the consistent key-value store that holds all cluster state — the source of truth. Backed up on real clusters; on CKAD you treat it as opaque storage.
- kube-scheduler: watches Pods with no
nodeNameand picks a node based on requests, taints/tolerations, and affinity. A Pod stuck in Pending almost always means the scheduler can't satisfy the spec. - kube-controller-manager: runs control loops (Deployment, ReplicaSet, Job, EndpointSlice) that continuously reconcile desired vs actual state — this is what makes a Deployment self-heal.
- kubelet + kube-proxy + container runtime: per-node agents. Kubelet starts and watches containers, kube-proxy maintains iptables/IPVS rules for Services, and the OCI-compliant runtime (containerd, CRI-O) actually runs the containers.
- Diagnosis reflex: Pending → scheduler can't place it; ImagePullBackOff → kubelet can't fetch the image; CrashLoopBackOff → kubelet keeps restarting a failing container; cannot reach a Service → kube-proxy or NetworkPolicy.
Task: a Pod is stuck Pending. Inspect: kubectl describe pod webapp — the Events tail says "0/3 nodes are available: 3 Insufficient memory". That is the scheduler talking. Verify: kubectl describe nodes | grep -A 5 "Allocated resources" shows nodes already at 95% memory commitment. Fix: lower the Pod's resources.requests.memory from 2Gi to 512Mi. The next reconcile cycle places it. Same script, different opening event message, and you know which component to chase — that is the architecture model paying off on the clock.
The Pod is the smallest deployable unit and every CKAD question lives or dies on knowing its anatomy. The five phases (Pending, Running, Succeeded, Failed, Unknown), the three restart policies, and the imperative shortcut to scaffold one are non-negotiable muscle memory.
- Pod phases:
Pending(accepted but not yet running — image pulling or unschedulable),Running(at least one container is up),Succeeded(all containers terminated with code 0),Failed(at least one container exited non-zero and won't restart),Unknown(kubelet lost contact). - Single vs multi-container: one Pod shares one network namespace, one IP, and any defined volumes. Multi-container Pods communicate via localhost and shared
emptyDir— covered in depth in Module 3. - Pod YAML skeleton:
apiVersion: v1,kind: Pod,metadata: { name, namespace, labels },spec: { containers: [{ name, image, ports, env, resources, volumeMounts }] }. Memorize the shape. - Imperative scaffolder:
kubectl run nginx --image=nginx --dry-run=client -o yaml > pod.yamlgenerates a valid manifest you then edit. Faster than typing YAML from scratch. - restartPolicy:
Always(default, used by Deployments; restarts on any exit),OnFailure(used by Jobs; restarts only on non-zero exit),Never(never restarts — the container's exit is final). - Restart backoff: kubelet uses exponential backoff between restart attempts — 10s, 20s, 40s, capped at 5 minutes. CrashLoopBackOff is just the wait state, not a fatal error.
Task: deploy a single nginx Pod with the label tier=frontend and verify it reaches Running. Scaffold: kubectl run nginx --image=nginx --labels=tier=frontend --dry-run=client -o yaml > pod.yaml. Apply: kubectl apply -f pod.yaml. Watch: kubectl get pod nginx -w until it shows Running. Verify the label: kubectl get pod -l tier=frontend. If it stays Pending, kubectl describe pod nginx and read the Events tail — that's where the scheduler explains itself.
kubectl run --dry-run=client -o yaml is the universal scaffolder. The five phases + three restart policies are the Pod's whole behaviour model.
CKAD is a typing exam. Your timer drains while you reach for documentation. Every minute spent hunting for a field name is one less minute spent solving the next question. kubectl explain, the imperative-then-edit reflex, and a sensible default namespace are how you reclaim those minutes.
- Namespaces: logical partitions for namespaced resources. Defaults:
default,kube-system,kube-public,kube-node-lease. Create withkubectl create namespace dev. - Resource scope: Pods, Deployments, Services, ConfigMaps, Secrets are namespace-scoped. Nodes, PersistentVolumes, ClusterRoles, Namespaces are cluster-scoped. List with
kubectl api-resources --namespaced=true|false. - Set default namespace:
kubectl config set-context --current --namespace=devremoves the need to repeat-n devon every command. Always set it at the start of a multi-step exam task. - Core kubectl verbs:
get(list),describe(deep dive + events),create(imperative create),apply -f(declarative create-or-update),edit(in-place edit),delete,explain(inline API docs). - Output formats:
-o yaml(full spec),-o wide(extra columns: node, IP),-o jsonpath='{.status.containerStatuses[0].state}'(field-precise extraction). - API groups: core
v1(Pods, Services, ConfigMaps),apps/v1(Deployments, StatefulSets, DaemonSets),batch/v1(Jobs, CronJobs),networking.k8s.io/v1(Ingress, NetworkPolicy). List withkubectl api-versions.
Task: scaffold a Deployment web with 3 replicas of nginx in a new namespace dev, then expose it as a ClusterIP. Setup: kubectl create namespace dev && kubectl config set-context --current --namespace=dev. Deployment: kubectl create deployment web --image=nginx --replicas=3. Service: kubectl expose deployment web --port=80 --target-port=80. Verify: kubectl get deploy,svc,pods -l app=web. Need a field you forgot? kubectl explain service.spec — inline docs, no browser.
kubectl explain is your offline manual. The verbs get/describe/create/apply/edit/delete/explain cover 90% of the exam keystrokes.
Configuration3 lessons
The 25%-weight Configuration & Security domain — the single biggest scoring area. ConfigMaps and Secrets injected as env vars or volume mounts, resource requests + limits + QoS classes, SecurityContext at Pod + container scope, ServiceAccount tokens, and the immutability flag that production environments lean on. Most points on the exam pass through this module.
📖 Read in-depth chapter ▾
ConfigMaps and Secrets are how you decouple configuration from container images. The CKAD always asks you to create one, then mount it three ways — as env vars, as projected env vars, and as a volume — and you need the imperative shortcuts in your fingers.
- ConfigMap creation:
kubectl create configmap app-config --from-literal=LOG_LEVEL=debug --from-literal=ENV=staging, or--from-file=app.properties, or--from-env-file=.env. - Secret creation:
kubectl create secret generic db-creds --from-literal=user=admin --from-literal=password=s3cr3t. Types:Opaque(default),kubernetes.io/tls,kubernetes.io/dockerconfigjson. - Consume as env vars (per key):
env: [{ name: LOG_LEVEL, valueFrom: { configMapKeyRef: { name: app-config, key: LOG_LEVEL } } }]. Secret variant usessecretKeyRef. - Consume as env vars (whole map):
envFrom: [{ configMapRef: { name: app-config } }]— exposes every key as an env var with the same name. - Consume as volume: mount the ConfigMap/Secret at a path; each key becomes a file. Volume-mounted ConfigMaps auto-update on change (with a propagation delay of up to a minute); env-var ConfigMaps do not.
- Immutable flag:
immutable: truein the ConfigMap/Secret spec prevents changes (must be deleted and recreated to modify) and reduces apiserver watch load. Use for production configs that should never silently drift.
Task: create a ConfigMap with two keys, inject one as an env var and the other as a file. Create: kubectl create configmap web-config --from-literal=GREETING=hello --from-literal=index.html='<h1>Hi</h1>'. Pod manifest: scaffold with kubectl run web --image=nginx --dry-run=client -o yaml > web.yaml, then edit to add an env entry pulling GREETING from the ConfigMap, and a volumes + volumeMounts pair that maps index.html into /usr/share/nginx/html/. Verify: kubectl exec web -- env | grep GREETING and kubectl exec web -- cat /usr/share/nginx/html/index.html.
kubectl describe, logs, and crash dumps. Volume-mounted ConfigMaps hot-update; env-var ones do not.
Requests drive scheduling, limits drive throttling and OOM. The QoS class your Pod lands in (Guaranteed, Burstable, BestEffort) decides who gets evicted first when a node runs out of memory. On the exam, a Pending Pod or an OOMKilled container traces back here.
- requests: the floor the scheduler reserves on the chosen node. limits: the ceiling the kubelet enforces — CPU over-limit = throttle, memory over-limit = OOMKill (exit 137).
- YAML shape:
resources: { requests: { cpu: "250m", memory: "128Mi" }, limits: { cpu: "500m", memory: "256Mi" } }. CPU 1000m = 1 core; memory units Ki / Mi / Gi (binary, preferred) or K / M / G (decimal). - QoS classes:
Guaranteed(every container has requests = limits for both CPU + memory),Burstable(at least one resource has requests < limits or only one set),BestEffort(no requests/limits anywhere). - Eviction order: under MemoryPressure / DiskPressure, kubelet evicts BestEffort first, then Burstable (highest memory usage relative to request first), and only touches Guaranteed Pods as a last resort.
- LimitRange: namespace-scoped policy that supplies default requests/limits to Pods that omit them, plus min/max constraints. Without it, a Pod with no resources falls into BestEffort.
- ResourceQuota: caps the namespace's total
requests.cpu,limits.memory, Pod count, etc. When a Quota is in effect for CPU/memory, every Pod must explicitly set requests + limits or it is rejected.
Task: a Pod is OOMKilled repeatedly. Diagnose: kubectl describe pod app shows Last State: Terminated, Reason: OOMKilled, Exit Code: 137. kubectl top pod app shows it's hitting 256Mi — exactly its limit. Fix: edit the Deployment to raise the limit (kubectl set resources deployment/app --limits=memory=512Mi), or shrink the application's working set. Verify QoS: kubectl get pod app -o jsonpath='{.status.qosClass}' — Burstable becomes Guaranteed if you also raise the request to match the limit.
SecurityContext is where the exam tests the "run as non-root, read-only filesystem, drop all caps" pattern that production K8s deployments live by. ServiceAccount is how a Pod authenticates to the apiserver — and getting the auto-mount flag wrong silently widens the blast radius of any workload compromise.
- Pod-level SecurityContext:
spec.securityContext: { runAsUser: 1000, runAsGroup: 3000, fsGroup: 2000, runAsNonRoot: true }.fsGroupchowns mounted volumes to the group so the app can write to them. - Container-level SecurityContext: overrides Pod-level. Key fields:
runAsNonRoot: true(refuse to start if image specifies root),readOnlyRootFilesystem: true(block writes to/),allowPrivilegeEscalation: false,privileged: false. - Linux capabilities: drop everything then re-add what you need —
capabilities: { drop: ["ALL"], add: ["NET_BIND_SERVICE"] }. Common adds:NET_BIND_SERVICE(port < 1024),SYS_TIME,CHOWN. - ServiceAccount basics: each namespace has a
defaultSA. Pods authenticate to the apiserver using the SA's token. Create dedicated SAs for workloads that need API access:kubectl create serviceaccount deployer. - Assign SA to a Pod:
spec.serviceAccountName: deployer. Pair with a RoleBinding granting the SA the exact API verbs it needs — never reusedefault. - automountServiceAccountToken: set to
falseon the Pod (or the SA) for workloads that don't talk to the apiserver. Reduces attack surface — a compromised container with no token can't enumerate the API.
Task: harden an nginx Pod so it cannot run as root, cannot write to its root filesystem, and cannot escalate privileges. Scaffold: kubectl run web --image=nginx --dry-run=client -o yaml > web.yaml. Edit: add a container-level securityContext with runAsNonRoot: true, runAsUser: 101 (nginx's UID inside the image), readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, capabilities.drop: ["ALL"] + add: ["NET_BIND_SERVICE"]. Add an emptyDir volume for /var/cache/nginx and /var/run since the root FS is now read-only. Verify: kubectl exec web -- id returns UID 101; kubectl exec web -- touch /tmp.lock fails.
runAsNonRoot with an explicit runAsUser so the API server's check passes. Disable automountServiceAccountToken on Pods that don't need the API.
Multi-Container Pods3 lessons
Multi-container Pods drive a chunk of Application Design (20%). The four patterns — Sidecar, Init, Ambassador, Adapter — each have a canonical use case the exam can dress up with different application names. Sidecar (log shipper, proxy), Init (wait-for + migration), Ambassador (smart proxy to external services), Adapter (output reshaper) — knowing which to reach for is half the battle.
📖 Read in-depth chapter ▾
A sidecar runs alongside the main container in the same Pod, sharing network and volumes. It is the workhorse of the multi-container patterns — Istio, Linkerd, Fluentd, and git-sync all use it. CKAD often asks you to "add a logging sidecar that reads /var/log/app.log and forwards it" — and that wording maps to a single canonical YAML shape.
- Sidecar = same-Pod helper: two (or more) containers in
spec.containersshare the network namespace (localhost), the IPC namespace, and any volumes you define at Pod scope. - Log-shipper pattern: main container writes logs to a shared
emptyDirat/var/log; sidecar (Fluentd, Fluent Bit, Vector) tails the files and forwards to Elasticsearch / Loki / S3. - Proxy sidecar: the sidecar (Envoy, HAProxy) intercepts inbound or outbound traffic for the main container — TLS termination, mTLS, retries, circuit breaking. Service meshes inject this automatically.
- Sync sidecar: pulls data from an external source (Git repo, S3 bucket) into a shared volume the main container reads.
k8s.gcr.io/git-sync/git-sync:v3is the canonical example. - Shared
emptyDir: the universal communication channel — declared once underspec.volumes, mounted at the right path in each container undervolumeMounts. Lives for the Pod's lifetime, dies with it. - Lifecycle coupling: the Pod is "Running" while any container is up. To terminate the Pod cleanly, all containers must stop — design sidecars to exit on SIGTERM, not retry forever.
Task: add a busybox sidecar to an existing Pod so that the sidecar continuously prints the main container's log file. Edit the manifest: define an emptyDir volume shared-logs; mount it at /var/log in the main container; add a second container log-tailer with image busybox, command ['sh', '-c', 'tail -f /var/log/app.log'], and the same volumeMount. Apply with kubectl apply -f pod.yaml. Verify: kubectl logs <pod> -c log-tailer streams the main container's log. Two containers, one volume, zero application changes.
emptyDir is the default channel; use medium: Memory for tmpfs when you care about speed.
Init containers run one after another before the main containers start. They are the canonical answer to "wait for the database to be ready" and "run the schema migration before the app boots". On the exam, "the Pod must not start its main container until X is true" almost always means "use an init container".
- Location: defined under
spec.initContainers(notspec.containers). They run sequentially, in order, and each must exit 0 before the next starts. - Failure handling: if an init container fails, the kubelet restarts it per the Pod's restartPolicy. Main containers never start until all init containers succeed once.
- Wait-for pattern:
command: ['sh', '-c', 'until nslookup mysql; do echo waiting for db; sleep 2; done']— blocks the Pod until the named Service resolves. - Migration pattern: run
flyway migrate,rake db:migrate, oralembic upgrade headin an init container so the app boots into a fully-migrated schema. No race conditions, no startup-time migration in app code. - File-prep pattern: generate config from a template, set filesystem permissions, fetch a TLS cert — anything that must exist before the app reads from a shared volume.
- Debugging:
kubectl logs <pod> -c <init-container-name>to view init output;kubectl describe podshows the init container's status (Init:0/2, Init:1/2, etc.).
Task: a web app must not start until the mysql Service is reachable. Edit the manifest: under spec.initContainers, add a container named wait-for-db with image busybox:1.28 and command ['sh', '-c', 'until nslookup mysql.default.svc.cluster.local; do echo waiting; sleep 2; done']. Apply. While MySQL is offline, kubectl get pod shows Init:0/1. As soon as MySQL's Service resolves, the init container exits 0 and the main container starts. Verify: kubectl logs <pod> -c wait-for-db shows the polling output.
Init:N/M in kubectl get pod tells you which init is running.
Ambassador and adapter are sidecar specializations the exam tests conceptually — "which pattern decouples the application from external service discovery" or "which pattern reshapes the application's output for a standardised consumer". Knowing the wording mapping is fast points.
- Ambassador (outbound proxy): the sidecar proxies the main container's outbound calls to external services. The main container connects to localhost; the ambassador handles discovery, sharding, connection pooling, retry, protocol translation.
- Ambassador example: a Redis client that always talks to
localhost:6379; the ambassador sidecar routes to the correct Redis shard based on the key — the application stays simple. - Adapter (output reshaper): the sidecar transforms the main container's output into a standardized format. The main container is free to emit anything; the adapter exposes the canonical format.
- Adapter example: a legacy app emits custom metrics on a TCP socket; the adapter sidecar consumes them and exposes
/metricsin Prometheus format so the rest of the platform's tooling Just Works. - Pattern wording cheat-sheet: "shared volume + log forwarder" → Sidecar. "wait for X / set up before app starts" → Init. "main container talks to localhost, sidecar handles external" → Ambassador. "sidecar exposes a standard interface over the main app's output" → Adapter.
- All three patterns share the sidecar mechanics — multiple containers in one Pod, shared volume or localhost. The pattern name describes the intent, not a different YAML shape.
Task: a legacy app exposes metrics on a custom TCP socket; the cluster monitors everything via Prometheus pull. Solution: add an adapter sidecar (e.g. a small Python container) that reads the TCP stream and exposes a /metrics HTTP endpoint in Prometheus exposition format. The application stays unchanged. A Service points Prometheus at the adapter's port. Identify it on the exam: the keyword is "reshape", "translate", "standardise" — that's adapter. If the keyword were "wait for", that's init; "log forwarder", that's sidecar; "proxy out to external service", that's ambassador.
Observability & Maintenance3 lessons
The 15%-weight Observability & Maintenance domain. Liveness / Readiness / Startup probes (the trio that misconfigured kills more deployments than anything else), kubectl logs with the right flags, and the describe → logs → exec → debug ladder you climb when a container refuses to behave. Light on weight, dense in points.
📖 Read in-depth chapter ▾
Probes decide three things — when to restart your container (liveness), when to send it traffic (readiness), and when to wait longer for it to start (startup). Wire them wrong and your slow-booting app gets killed before it warms up, or your dead app keeps receiving requests. The exam tests the YAML shape and the failure-threshold math.
- Liveness probe: if it fails
failureThresholdtimes in a row, kubelet kills the container and restarts it. Use for processes that can hang or deadlock without crashing. - Readiness probe: if it fails, the Pod is removed from all Service endpoint sets (no traffic) but the container is not restarted. Use for warm-up time, dependency checks, or shedding load.
- Startup probe: blocks liveness + readiness until it succeeds once. Use for slow boot times so liveness's short interval doesn't kill the container during initialization.
- Probe types:
httpGet { path, port }(2xx/3xx = success),tcpSocket { port }(connect = success),exec { command: ['cat', '/tmp/ready'] }(exit 0 = success),grpc { port }on Kubernetes 1.24+. - Tuning fields:
initialDelaySeconds,periodSeconds(default 10),timeoutSeconds(default 1),successThreshold(default 1),failureThreshold(default 3). Total grace =periodSeconds × failureThreshold. - Anti-pattern: using the same endpoint for liveness and readiness — a dependency outage now restart-loops the app instead of just removing it from traffic. Keep them distinct.
Task: add a readiness probe that hits /healthz on port 8080, fails fast, and a liveness probe with a 30-second tolerance. Edit the container spec: readinessProbe: { httpGet: { path: /healthz, port: 8080 }, periodSeconds: 5, failureThreshold: 2 } (10s to remove from Service); livenessProbe: { httpGet: { path: /alive, port: 8080 }, initialDelaySeconds: 20, periodSeconds: 10, failureThreshold: 3 } (30s grace after warm-up). Verify: kubectl describe pod shows the probe definitions; kubectl get pod shows the Pod going Ready only after /healthz returns 200.
initialDelaySeconds — cleaner separation of concerns.
Kubernetes captures everything containers write to stdout / stderr. kubectl logs is the first tool you reach for, and the candidates who pass know its flags cold. The exam loves a CrashLoopBackOff scenario where the answer is in the previous instance's logs.
- Basic forms:
kubectl logs <pod>(last instance);-ffollow;--previoussee the last crashed container's logs;--tail=Nlast N lines;--since=10mtime-windowed. - Multi-container:
kubectl logs <pod> -c <container-name>required when more than one container exists.--all-containers=trueshows every container's logs interleaved. - Init container logs: same syntax —
kubectl logs <pod> -c <init-container-name>. Essential for diagnosing why Init:0/1 is stuck. - Stdout convention: 12-factor apps log to stdout/stderr only. Apps that log to files need a logging sidecar that tails the file and re-streams it, otherwise
kubectl logsshows nothing. - Node-level storage: container logs live as JSON files under
/var/log/containers/on the node; kubelet rotates them by size. Cluster-level aggregation needs a DaemonSet (Fluentd / Fluent Bit) or a sidecar per Pod. - Selector-based logs:
kubectl logs -l app=web --tail=20 --all-containers=truegrabs logs from every Pod matching the selector. Faster than looping overkubectl get pods.
Task: a Pod is in CrashLoopBackOff and the current instance has nothing in its logs. Diagnose: kubectl get pod app -o wide shows 5 restarts. kubectl logs app is empty — the current container has just started and not yet emitted anything. Crucial flag: kubectl logs app --previous shows "FATAL: cannot connect to db: connection refused" from the last crashed instance. Fix: that's an init-container-wait-for-db problem (Module 3). Add the init container, redeploy, the loop stops.
kubectl logs --previous is the single most under-used flag on the CKAD. CrashLoopBackOff hides the diagnostic message in the crashed instance — always check --previous. Multi-container Pods need -c.
When logs aren't enough, you climb the debug ladder: get → describe → logs → top → exec → debug. Each rung shows you something the last one doesn't. Memorise the order, because under exam pressure you will skip a rung and waste five minutes.
- kubectl top:
kubectl top pod+kubectl top nodeshow live CPU/memory. Needs Metrics Server installed (it is on every CKAD exam env).--containersbreaks down per-container. - kubectl describe: dumps spec + status + conditions + Events. The Events tail at the bottom is where the scheduler, kubelet, and controllers explain themselves — always read it.
- Events: chronological log of state changes — Scheduled, Pulled, Created, Started, Killing, ProbeFailed, BackOff. Cluster-wide:
kubectl get events --sort-by=.metadata.creationTimestamp. Events expire after ~1 hour. - kubectl exec:
kubectl exec -it <pod> -- /bin/shdrops you into the running container. With multi-container Pods add-c <container-name>. Useful for runtime inspection — env vars, mounted files, listening ports. - kubectl debug — ephemeral containers: for distroless / scratch images that lack a shell,
kubectl debug -it <pod> --image=busybox --target=<container-name>injects a temp container into the running Pod and shares its process namespace. - kubectl debug — node:
kubectl debug node/<node> -it --image=busyboxcreates a privileged Pod with the node's filesystem mounted — last-resort node-level diagnosis.
Task: a Pod from a distroless image keeps crashing with no logs. Climb the ladder: kubectl get pod shows CrashLoopBackOff. kubectl describe pod app Events tail says "Liveness probe failed: HTTP 503". kubectl logs app --previous — empty (distroless = no log output before the crash). Last rung: kubectl debug -it app --image=busybox --target=app, then inside the debug container ps shows the main process is alive but listening on the wrong port. Fix the probe's port field, redeploy.
kubectl debug --target shares the target container's process namespace and gives you a real shell. kubectl describe Events tail is where 60% of bugs reveal themselves.
Pod Design & Workloads3 lessons
Application Design & Build (20%) and Application Deployment (20%) intersect here. Labels and selectors drive everything from Services to Deployments; Deployments + rolling updates + rollback are the default workload; Jobs and CronJobs cover one-shot and scheduled work. Pick the wrong workload kind and the rest of your manifest can be perfect but the test still fails.
📖 Read in-depth chapter ▾
Labels are the connective tissue of Kubernetes. Services find Pods by label, Deployments own ReplicaSets by label, NetworkPolicies scope by label. The exam frequently asks you to add a label to a running resource, then verify a selector picks it up.
- Labels: key-value pairs on any object —
app: web,tier: frontend,env: prod. Add live:kubectl label pod nginx env=prod; remove:kubectl label pod nginx env-. - Equality selectors:
=,==,!=. CLI:kubectl get pods -l env=prod,tier=frontend(multiple are ANDed). - Set-based selectors:
in,notin,exists. CLI:kubectl get pods -l 'env in (prod,staging)'; YAML:matchExpressions: [{ key: env, operator: In, values: [prod, staging] }]. - matchLabels vs matchExpressions: Deployments and ReplicaSets use
spec.selector.matchLabels(simple) ormatchExpressions(richer). Whatever the selector says, the Pod template's labels must match — mismatch = the Deployment can't find its own Pods. - Recommended labels:
app.kubernetes.io/name,/instance,/version,/component,/managed-by. Helm and Kustomize set these automatically — good hygiene to follow on your own resources. - Annotations: non-identifying metadata — build SHA, git commit, ingress controller hints (
nginx.ingress.kubernetes.io/rewrite-target: /). Can be larger and richer than labels; not selectable.
Task: label every nginx Pod with env=prod, then list only the Pods that match. Bulk-label: kubectl label pods -l app=nginx env=prod. Verify: kubectl get pods -l env=prod -L env,tier shows the new label column. Selector test on a Service: kubectl get endpoints my-svc — only Pods matching the Service's spec.selector appear as endpoints. Change a Pod's label so the selector no longer matches and watch its IP drop from the endpoints — that is exactly how selectors gate traffic.
kubectl get endpoints is the fast way to verify a Service is actually finding its Pods.
The Deployment is the default workload type on the CKAD. You need fluency in kubectl set image, kubectl rollout, and the rolling-update knobs (maxSurge + maxUnavailable) — they are tested on every attempt.
- Imperative create:
kubectl create deployment web --image=nginx:1.24 --replicas=3. Scaffold-then-edit:--dry-run=client -o yaml > web.yaml. - RollingUpdate strategy (default):
maxSurge(extra new Pods beyond replicas),maxUnavailable(how many old Pods can be down). Defaults: 25% / 25%. Set to0/25%for surge-free updates. - Recreate strategy: kills all old Pods before starting new — causes downtime. Use only when the app can't run two versions side by side (schema-incompatible migrations).
- Update an image:
kubectl set image deployment/web nginx=nginx:1.25 --recordkicks off a rolling update. Watch withkubectl rollout status deployment/web. - Rollback:
kubectl rollout history deployment/weblists revisions;kubectl rollout undo deployment/webreverts to the previous;--to-revision=2jumps to a specific one. - Pause / resume / scale:
kubectl rollout pause deployment/webfreezes mid-rollout for inspection;resumecontinues. Scale:kubectl scale deployment web --replicas=5. Autoscale:kubectl autoscale deployment web --min=2 --max=10 --cpu-percent=70.
Task: deploy nginx with 4 replicas, update to a new image with zero unavailable Pods, then rollback. Create: kubectl create deployment web --image=nginx:1.24 --replicas=4. Edit the strategy: kubectl edit deployment web, set strategy.rollingUpdate.maxSurge: 1 and maxUnavailable: 0. Update: kubectl set image deployment/web nginx=nginx:1.25. Watch: kubectl rollout status deployment/web. Rollback (one liner): kubectl rollout undo deployment/web — instant revert, no manifest editing needed.
kubectl set image + kubectl rollout undo is your two-command rollout toolkit. maxSurge/maxUnavailable let you choose between fast (more surge) and resource-light (more unavailable). Always verify with kubectl rollout status.
Deployments are for long-running services. Jobs are for one-shot work that runs to completion (data migration, batch job). CronJobs schedule Jobs on cron syntax. The exam always asks for parallel, retry-on-failure, or history-limit variants — knowing the YAML fields cold is the difference between 30 seconds and 5 minutes per question.
- Job basics:
apiVersion: batch/v1,kind: Job. Imperative:kubectl create job migrate --image=migrator:1.0. Pod'srestartPolicymust beOnFailureorNever. - Completions + parallelism:
spec.completions: 5+parallelism: 2= run until 5 successful exits, with up to 2 Pods active at a time. Default: 1/1. - backoffLimit: total number of failed retries before the Job is marked Failed. Default 6. Set to 0 for "fail fast, don't retry".
- activeDeadlineSeconds: kills the Job after N seconds regardless of progress. Belt-and-suspenders against runaway batches.
- CronJob basics:
kind: CronJob,spec.schedule: "*/5 * * * *"(every 5 minutes). Imperative:kubectl create cronjob hello --image=busybox --schedule="*/5 * * * *" -- echo hi. - Concurrency + history:
concurrencyPolicy: Forbid|Allow|Replace,successfulJobsHistoryLimit: 3,failedJobsHistoryLimit: 1.Forbidprevents overlap when a Job overruns the cron interval.
Task: run a Job that prints "hello" exactly 4 times, with up to 2 Pods at once, and fails fast (no retries). Scaffold: kubectl create job greet --image=busybox --dry-run=client -o yaml -- echo hello > job.yaml. Edit: add spec.completions: 4, parallelism: 2, backoffLimit: 0. Apply + watch: kubectl apply -f job.yaml && kubectl get pods -w -l job-name=greet — 2 Pods at a time, 4 successes, Job marked Complete. Now wrap it on a 5-minute cron: kubectl create cronjob greet-cron --image=busybox --schedule="*/5 * * * *" -- echo hello.
OnFailure or Never. completions/parallelism/backoffLimit/activeDeadlineSeconds are the four knobs you'll be asked to set.
Services & Networking3 lessons
The 20%-weight Services & Networking domain. Services (ClusterIP / NodePort / LoadBalancer / ExternalName) glue your Pods to the rest of the world; NetworkPolicy is how you firewall them off; Ingress is the HTTP front door. CKAD doesn't test you on building a CNI — it tests you on picking the right Service type and writing an ingress + policy YAML.
📖 Read in-depth chapter ▾
A Service gives a stable IP + DNS name to a set of Pods selected by label. Pick the wrong type and your app is either unreachable or accidentally public — both lose exam points.
- ClusterIP (default): stable virtual IP reachable only from inside the cluster; backed by kube-proxy iptables/IPVS rules. DNS name:
my-svc.<ns>.svc.cluster.local. - NodePort: exposes the Service on a static port (30000–32767) on every node's IP. Reach with
http://<any-node-ip>:<nodePort>. Implies a ClusterIP underneath. - LoadBalancer: provisions a cloud load balancer (AWS NLB, GCP LB, Azure LB) and routes external traffic to the Service. Implies NodePort + ClusterIP. On bare metal needs MetalLB or similar.
- ExternalName: no selector, no endpoints — just a DNS CNAME from inside the cluster to an external hostname (
spec.externalName: api.example.com). Useful for migrating off external dependencies. - Imperative expose:
kubectl expose deployment web --port=80 --target-port=8080 --type=ClusterIP.--type=NodePortorLoadBalancerto switch. - Endpoints:
kubectl get endpoints my-svcshows the Pod IPs the Service is routing to. Empty = yourspec.selectordoesn't match any Pod labels. First thing to check when a Service appears broken.
Task: expose a 3-replica nginx Deployment externally on port 30080 of every node. Create: kubectl create deployment web --image=nginx --replicas=3. Expose: kubectl expose deployment web --port=80 --target-port=80 --type=NodePort, then kubectl edit svc web and set nodePort: 30080. Verify: kubectl get svc web -o wide shows the nodePort; kubectl get endpoints web shows 3 Pod IPs; curl http://<any-node-ip>:30080 returns the nginx welcome page.
kubectl get endpoints — empty endpoints means a selector mismatch.
Pods can talk to each other by default. NetworkPolicy is how you flip that — implement zero-trust networking by declaring which Pods can talk to which. The exam loves "allow only frontend → backend on port 8080" style asks.
- Default behaviour: with no NetworkPolicy in the namespace, all Pods can reach all Pods. Apply a single Policy with
podSelector: {}+policyTypes: [Ingress]and no rules — that denies all ingress to all Pods. - Targeting:
spec.podSelectorpicks the Pods this policy applies to (in this namespace).{}= all Pods in the namespace. - Ingress rules:
spec.ingress[].from+.ports.fromcan bepodSelector(same ns),namespaceSelector(other ns),ipBlock(CIDR), or a combination ANDed within onefromentry, ORed across entries. - Egress rules: mirror —
spec.egress[].to+.ports. Don't forget DNS — egress tokube-system's CoreDNS on TCP/UDP 53 or your Pods can't resolve names. - policyTypes: always include both
IngressandEgresswhen you define egress rules, otherwise the egress block is ignored. - CNI must support it: NetworkPolicy enforcement requires a CNI that implements it (Calico, Cilium, Antrea). On a CNI without enforcement (flannel default), policies are silently ignored — verify by trying a blocked connection.
Task: in namespace app, allow only Pods labelled tier=frontend to reach Pods labelled tier=backend on TCP 8080. Deny everything else. Manifest: NetworkPolicy with podSelector: { matchLabels: { tier: backend } }, policyTypes: [Ingress], and one ingress rule allowing from: [{ podSelector: { matchLabels: { tier: frontend } } }] on ports: [{ protocol: TCP, port: 8080 }]. Verify: kubectl exec -it frontend-pod -- curl backend-svc:8080 succeeds; kubectl exec -it other-pod -- curl backend-svc:8080 times out.
An Ingress is an L7 routing resource that fronts multiple Services with hostname / path rules and TLS termination. Without an Ingress controller running (nginx, Traefik, HAProxy), Ingress objects are inert — the controller is what actually translates them into routing rules.
- Ingress vs Service: a Service exposes Pods on a port; an Ingress routes HTTP/HTTPS by hostname + path to one of several Services. Single LoadBalancer in front of many Services.
- IngressClass: picks which controller services the Ingress.
spec.ingressClassName: nginx. Without one, the cluster's default IngressClass handles it. - Rules shape:
spec.rules[].host: shop.example.com+http.paths[]with{ path: /api, pathType: Prefix, backend: { service: { name: api-svc, port: { number: 80 } } } }. - pathType:
Exact(literal match),Prefix(string-prefix match),ImplementationSpecific(controller's choice). Default-preferPrefix. - TLS termination:
spec.tls: [{ hosts: [shop.example.com], secretName: shop-tls }]. The Secret must be of typekubernetes.io/tlswith keystls.crtandtls.key. - Imperative create:
kubectl create ingress shop --rule="shop.example.com/api*=api-svc:80" --rule="shop.example.com/=web-svc:80"— multiple rules in one shot.
Task: route shop.example.com/api/* to api-svc:80 and shop.example.com/* to web-svc:80. Scaffold: kubectl create ingress shop --rule="shop.example.com/api*=api-svc:80" --rule="shop.example.com/*=web-svc:80" --dry-run=client -o yaml > ingress.yaml. Apply. Test: with DNS or a host header, curl -H 'Host: shop.example.com' http://<ingress-ip>/api/healthz hits api-svc; curl -H 'Host: shop.example.com' http://<ingress-ip>/ hits web-svc.
pathType: Prefix; TLS termination needs a Secret of type kubernetes.io/tls. Without an Ingress controller installed, the Ingress object does nothing.
State Persistence2 lessons
The lightest exam slice but a guaranteed appearance — emptyDir vs hostPath vs PVC, the PV/PVC binding dance, StorageClass for dynamic provisioning, and the StatefulSet basics that every database deployment leans on. Storage is where "the Pod started but writes go to nowhere" silently happens.
📖 Read in-depth chapter ▾
Volumes solve "the container's filesystem dies with the container." The CKAD exam covers three: emptyDir (scratch space tied to the Pod's life), hostPath (node-local, niche), and PersistentVolumeClaim (durable, dynamically provisioned). Knowing which to reach for is half the question.
- emptyDir: created when a Pod is assigned to a node, deleted when the Pod is removed. Shared by all containers in the Pod. Use for scratch, cache, sidecar communication.
medium: Memorybacks it with tmpfs. - hostPath: mounts a node-local path into the Pod. Types:
Directory,File,DirectoryOrCreate,FileOrCreate. Useful for accessing node-level files (Docker socket); avoid for application data because the Pod is tied to a single node. - PersistentVolume (PV): a cluster-scoped storage resource — either pre-provisioned (static) or created on demand (dynamic). Has a size, an access mode, and a StorageClass.
- PersistentVolumeClaim (PVC): a namespace-scoped request for storage. The control plane binds the PVC to a matching PV — or dynamically creates one via the requested StorageClass.
- Access modes:
ReadWriteOnce (RWO)= one node read-write;ReadOnlyMany (ROX)= many nodes read-only;ReadWriteMany (RWX)= many nodes read-write (NFS, CephFS);ReadWriteOncePod (RWOP)= exactly one Pod (K8s 1.27+). - StorageClass & reclaimPolicy: StorageClass binds a provisioner (AWS EBS, GCE PD, CSI driver) + parameters.
reclaimPolicy: Retainkeeps the PV after the PVC is deleted;Deleteremoves both PV and backing storage.
Task: deploy a Pod with persistent storage that survives Pod restarts. Create PVC: apiVersion: v1, kind: PersistentVolumeClaim, metadata: { name: data }, spec: { accessModes: [ReadWriteOnce], resources: { requests: { storage: 1Gi } }, storageClassName: standard }. Mount it: in the Pod spec, volumes: [{ name: data, persistentVolumeClaim: { claimName: data } }], then volumeMounts: [{ name: data, mountPath: /var/data }]. Verify: write a file inside the Pod, delete the Pod, recreate it (same PVC reference), confirm the file is still there.
kubectl describe pvc.
Deployments give you N anonymous, interchangeable replicas. Databases and distributed systems need the opposite — stable names, ordered startup, individual persistent storage. That's StatefulSet. CKAD doesn't dig deep into operator-pattern internals, but you should be able to write the YAML and explain the three guarantees.
- Stable Pod identity: Pods are named
<sts-name>-0,<sts-name>-1, ... — ordinal indices that survive reschedules.mysql-0staysmysql-0even after the node it ran on dies. - Ordered startup & shutdown: Pods are created sequentially from 0 to N-1, each waiting for the previous to be Ready. Scale-down reverses — highest ordinal first. Critical for primary-replica setups.
- volumeClaimTemplates: defines a PVC template — each Pod gets a unique PVC named
<template-name>-<sts-name>-<ordinal>. The Pod re-attaches to the same PVC after reschedule, preserving data. PVCs are not auto-deleted when the StatefulSet is. - Headless Service: required for DNS.
spec.clusterIP: Noneon a Service that selects the StatefulSet's Pods. Each Pod gets a DNS record:<pod-name>.<svc-name>.<ns>.svc.cluster.local. - Update strategies:
RollingUpdate(default) — updates Pods in reverse-ordinal order; thepartitionfield enables canary-style updates by only touching Pods with ordinal ≥ partition.OnDelete— manual control, you delete Pods to trigger their replacement. - Three guarantees: stable network identity, stable persistent storage per ordinal, ordered + graceful deployment / scaling. Memorise that list — it is the canonical "why StatefulSet not Deployment" answer.
Task: deploy a 3-replica StatefulSet for nginx with each Pod getting its own 1Gi PVC. Headless Service first: kind: Service, metadata: { name: nginx-svc }, spec: { clusterIP: None, selector: { app: nginx }, ports: [{ port: 80 }] }. StatefulSet: spec.serviceName: nginx-svc, replicas: 3, selector.matchLabels: { app: nginx }, template with the same labels, and volumeClaimTemplates: [{ metadata: { name: data }, spec: { accessModes: [ReadWriteOnce], resources: { requests: { storage: 1Gi } } } }]. Verify: kubectl get pvc shows data-nginx-0, data-nginx-1, data-nginx-2; nslookup nginx-0.nginx-svc from another Pod resolves.
Helm & Application Deployment Patterns2 lessons
The last slice of Application Deployment (20%) — Helm chart basics, plus the advanced patterns the exam asks about conceptually: blue-green, canary, Kustomize overlays. Helm + Kustomize are both available in the exam environment via helm and kubectl apply -k — use whichever the question signals.
📖 Read in-depth chapter ▾
Helm is the de-facto package manager for Kubernetes — and a 2021+ addition to the CKAD curriculum. You should be able to install, upgrade, list, and roll back a release; override values; and inspect a chart's defaults.
- Chart: a package of templated YAML manifests + a
Chart.yamlmetadata file + avalues.yamlof defaults. Release: one named install of a chart with a specific set of values. - Repositories:
helm repo add bitnami https://charts.bitnami.com/bitnami;helm repo update;helm search repo nginx. Public catalogue: Artifact Hub. - Install:
helm install my-nginx bitnami/nginx -n web --create-namespace. Override defaults:--set service.type=NodePortor-f my-values.yaml(file is more readable for multiple overrides). - Upgrade:
helm upgrade my-nginx bitnami/nginx --set image.tag=1.25. Add--installto install-if-missing in one command (idempotent CI step). - Inspect & rollback:
helm listshows releases;helm history my-nginxlists revisions;helm rollback my-nginx 2reverts to revision 2. Helm stores revisions as Secrets in the release's namespace. - Preview before applying:
helm template ./chart --values my-values.yamlrenders the manifests locally without installing. Add--dry-run --debugon install/upgrade to preview what would be applied.
Task: install the bitnami/nginx chart as release web in namespace shop, override the Service type to NodePort, then upgrade the image tag. Add repo: helm repo add bitnami https://charts.bitnami.com/bitnami && helm repo update. Install: helm install web bitnami/nginx -n shop --create-namespace --set service.type=NodePort. Upgrade: helm upgrade web bitnami/nginx -n shop --set image.tag=1.25 --reuse-values. Verify: helm history web -n shop shows two revisions; kubectl get svc -n shop shows NodePort type. Rollback if needed: helm rollback web 1 -n shop.
--reuse-values preserves your overrides across upgrades. helm template + --dry-run --debug are how you preview without risk.
Blue-green and canary are how production teams roll out new versions without taking downtime. The CKAD doesn't require you to implement them from scratch, but it does test the conceptual model and the YAML mechanics. Kustomize is the built-in alternative to Helm — overlays without templating.
- Blue-green: run two complete environments (blue = current, green = new). Test green in isolation, then switch traffic by updating the Service's
selectorto point at green. Rollback = flip the selector back. Requires double the resources during transition. - Canary: run two Deployments with the same Service selector label but different replica counts (e.g.
stable: 9,canary: 1= 10% canary traffic). Increase the canary's replicas as confidence grows. No double-resource cost. - kubectl apply vs create:
createis imperative — fails if the resource exists.apply -fis declarative — creates or updates via three-way merge (last-applied annotation, live state, new config). Useapplyfor production / GitOps; usecreatefor one-shot exam tasks. - Kustomize basics: built into kubectl.
kustomization.yamlwithresources(base manifests),namePrefix,commonLabels,patches,configMapGenerator. Apply:kubectl apply -k ./. Preview:kubectl kustomize ./. - Overlay pattern: a
base/directory with shared manifests, plusoverlays/dev/andoverlays/prod/each with their ownkustomization.yamlthat references the base and applies environment-specific patches. Same manifests, different values, no templating. - Production hygiene: always set requests + limits, configure probes, define a
podDisruptionBudget(PDB) to protect availability during voluntary node drains, setterminationGracePeriodSecondsfor clean shutdowns, usepreStophooks to drain in-flight connections.
Task: do a canary deployment of nginx:1.25 alongside an existing nginx:1.24 Deployment, with 10% of traffic on the canary. Setup: existing Deployment web-stable with labels { app: web, track: stable } and 9 replicas. Canary: create a second Deployment web-canary with image nginx:1.25, labels { app: web, track: canary }, and 1 replica. Service: spec.selector: { app: web } — matches both Deployments, distributes traffic ~9:1. Validate: kubectl get endpoints web-svc shows 10 IPs (9 stable + 1 canary). Promote: scale web-canary up + web-stable down in steps, or do the full cutover.
apply is declarative + idempotent; create is imperative + one-shot. kubectl apply -k activates Kustomize without installing anything.
Exam-day playbook
The 2-hour battle plan
15-19 scenarios in 120 minutes — a hard ~6-8 minutes per question. Triage hard, set context once, scaffold with --dry-run=client -o yaml every time, verify before moving on.
- First 5 minutes — set context once:
kubectl config set-context --current --namespace=<ns-from-the-question>. Aliasalias k=kubectlandexport do='--dry-run=client -o yaml'(e.g.k run x --image=nginx $do). - Triage: skim every question; flag the long / multi-step ones; do the 1-2 minute easy wins first. Don't leave any question blank — partial credit exists.
- Scaffold, don't write: every Pod, Deployment, Service, Job, ConfigMap, Secret, Ingress has a
kubectl createorkubectl runimperative form +--dry-run=client -o yaml. Use it. Edit the YAML, thenkubectl apply -f. - Verify every answer:
kubectl get,kubectl describe,kubectl exec -- env,kubectl get endpoints. The scoring script checks the cluster, not your YAML — if your manifest didn't actually create the object correctly, you score 0 even if it looks right. - Use
kubectl explaininstead of the docs tab — it stays in your terminal, no context switch, no scrolling.kubectl explain pod.spec.containers.resourcesdumps every field. - Don't fight a question: if you're 4 minutes in and stuck, mark it, skip it, move on. The killer.sh practice attempts (2 included with the exam fee) train you to feel that 4-minute mark.
Watch-outs
Top 5 mistakes that fail CKAD candidates
- Hand-writing YAML. Every minute on indentation is a minute you don't have. Scaffold imperatively, then edit.
- Forgetting the namespace. The question specifies a namespace; you run kubectl in
default. The scoring script doesn't find your resource. Alwaysset-context --namespacefirst. - Selector / label mismatch. Deployment's
spec.selector.matchLabelsdoesn't match the template's labels. The Deployment can't find its Pods.kubectl describe deployshows the mismatch immediately. - Skipping verification. "Looks right" isn't right.
kubectl get endpointsfor Services;kubectl exec -- envfor ConfigMaps;kubectl logs --previousfor crashed Pods. - Not reading the Events tail. 60% of "this doesn't work" answers are in
kubectl describe's last block. Read it before doing anything else.
Related credentials
After CKAD
CKAD slots into the CNCF Kubernetes triad. Natural next moves depend on which side of the wall you want to be on.
- CKA — Certified Kubernetes Administrator: same exam style, operator-side scope. kubeadm install + upgrade, etcd backup + restore, RBAC, troubleshooting (30% of CKA, vs nothing on CKAD). The natural pivot for engineers moving from app-side to platform-side.
- CKS — Certified Kubernetes Security Specialist: requires an active CKA. Security-focused: RBAC deep dive, Pod Security Standards, network policy enforcement, runtime security, supply chain (image signing, SBOM). Higher salary band, narrower audience.
- Cloud-managed K8s certs: AWS EKS specialty deep-dives, GCP Professional Cloud Architect (K8s is a domain), Azure CKAD-equivalent baked into AZ-104. Useful if your shop is single-cloud.