CNCF · security

Certified Kubernetes Security Specialist (CKS)

Complete CKS exam prep: NetworkPolicy, CIS benchmark hardening, RBAC least privilege, AppArmor, seccomp, Linux capabilities, OPA Gatekeeper, Pod Security Admission, Secrets encryption at rest, Falco runtime rules, Kubernetes audit logging, image signing with Cosign, Trivy CVE scanning, supply chain security, and gVisor RuntimeClass isolation. 60 scenario-based practice questions.

7Modules
40 hoursDuration
advancedLevel

Jump straight into practice questions

60 scenario-based CKS questions covering every exam domain — free, no signup required.

Start CKS Quiz Now Study on Spotify

CKS Exam Snapshot

FormatPerformance-based
Duration2 hours
Questions15–20 tasks
Passing score67%
PrerequisiteActive CKA
Validity2 years

Exam Domain Weights

Cluster Setup10%
Cluster Hardening15%
System Hardening15%
Minimize Microservice Vulnerabilities20%
Supply Chain Security20%
Monitoring, Logging & Runtime Security20%

Key Concept: Defense in Depth

The CKS is organized around layered security: cluster-level controls (NetworkPolicy, RBAC, API server flags) + node-level controls (AppArmor, seccomp, kernel modules) + workload-level controls (SecurityContext, Pod Security Admission) + runtime detection (Falco, audit logs). No single layer is sufficient — the exam tests whether you can apply all layers together.

🎧

Learn Kubernetes security on the go

Tune in to Falco deep dives, supply chain security walkthroughs, and CKS exam strategy discussions. New episodes every week — perfect for commutes and gym sessions.

Listen on Spotify

7 modules · ~40 hours

Each module maps to one or more CKS exam domains. Work through them in order — Cluster Setup and Cluster Hardening lay the defense-in-depth foundation every later module assumes. System Hardening (Module 4) and Runtime Security (Module 7) is where most CKS exam points are won or lost on the clock.

01

CKS Overview & Kubernetes Security Architecture3 lessons

Before any single hardening control, you need the mental model. What the CKS exam expects, the layered security model (AuthN → AuthZ → Admission → NetworkPolicy → SecurityContext → Runtime), and the attacker kill-chain that those layers are designed to break — every later module is one of those layers in depth.

cks-vs-cka security-model threat-model attack-surface killer.sh exam-tactics
~5h
📖 Read in-depth chapter
Lesson 1.1 The CKS exam & the Kubernetes security model

CKS is the third CNCF cert and the only one focused exclusively on security. It is performance-based — you work in a live cluster via browser terminal — and assumes you already passed CKA. Before grinding controls, anchor on the six-layer model that all CKS questions live inside.

Key concepts
  • CKS vs CKA vs CKAD: CKA = cluster admin (prerequisite for CKS); CKAD = developer-side workloads; CKS = security specialist, builds directly on CKA, the hardest of the three.
  • Exam format: 2 hours, ~15–20 hands-on tasks, performance-based via a Linux Foundation browser terminal — no multiple choice, no partial credit unless explicitly stated.
  • Allowed docs: kubernetes.io/docs + github.com/kubernetes + the Falco, Trivy, AppArmor and Sysdig docs (the exact allowed list is on the LF FAQ). No personal notes, no Stack Overflow.
  • The 6-layer security model: (1) Authentication — certs, tokens, OIDC; (2) Authorization — RBAC, Node authorizer; (3) Admission control — Pod Security Admission, OPA Gatekeeper, webhooks; (4) NetworkPolicy — Pod-to-Pod and Pod-to-external traffic; (5) SecurityContext — container isolation from the host; (6) Runtime security — Falco, audit logs, behavioural detection.
  • Domain weights (2024 blueprint): Cluster Setup 10%, Cluster Hardening 15%, System Hardening 15%, Minimize Microservice Vulnerabilities 20%, Supply Chain 20%, Runtime Security 20%.
Concrete example

Task: a CKS-style prompt reads "Pod nginx-uncontrolled in namespace prod can be reached from the public internet and runs with the cluster-admin SA. Apply the minimum set of controls to fix it." Map to layers: AuthZ (rebind SA to least privilege) + Admission (label ns with PSA enforce=baseline) + NetworkPolicy (default-deny ingress + allow only the front Service). Apply each control in that order and verify with kubectl auth can-i, kubectl get ns prod -o yaml | grep pod-security, and an in-cluster curl from outside the namespace. The 6-layer model is what turns a vague prompt into a checklist.

Key takeaway: the CKS exam is 2 hours and you can use the K8s + Falco + Trivy docs — speed matters more than memorisation. Every task slots into one of the six layers (AuthN, AuthZ, Admission, NetworkPolicy, SecurityContext, Runtime). Practice every kubectl command until it is muscle memory; the killer.sh simulator (bundled with your voucher) is the single best practice surface.
⚡ Mini-quiz
Drill exam-model + security-layer scenarios → study mode (10 questions).
Lesson 1.2 Kubernetes attack surface & threat model

Every CKS hardening control exists because an attacker discovered a way to abuse the default. Walk the kill-chain — initial access, lateral movement, credential theft, persistence — and you will instantly recognise which control each exam task is reaching for.

Key concepts
  • Compromised container: attacker exploits an app vulnerability inside a Pod, then attempts to escape to the host via privileged: true, a hostPath mount, or a container-runtime CVE.
  • Stolen ServiceAccount token: attacker reads /var/run/secrets/kubernetes.io/serviceaccount/token from a compromised Pod and uses it against the API server to enumerate or exfiltrate cluster resources.
  • SSRF to cloud metadata: attacker tricks an in-cluster app into fetching 169.254.169.254 and steals long-lived cloud credentials from the node's IAM role.
  • Malicious container image: supply-chain attack via a compromised public image or a poisoned CI pipeline — controls live in Module 6 (Trivy, Cosign, ImagePolicyWebhook, OPA registry policy).
  • Overly permissive RBAC: a ServiceAccount accidentally bound to cluster-admin lets a single compromised Pod take over the whole cluster.
  • Controls ↔ threats mapping: container escape → SecurityContext + AppArmor + seccomp + non-root + readOnlyRootFilesystem; SA-token abuse → disable automount + projected tokens + least-privilege RBAC; metadata SSRF → NetworkPolicy egress deny on 169.254.169.254; supply chain → Trivy + Cosign + OPA + distroless; runtime → Falco + audit log + immutable infrastructure.
Concrete example

Scenario: a developer image runs as root, mounts a host path, and uses the default ServiceAccount. Map the kill-chain: code-exec → host filesystem write (via hostPath) → metadata SSRF (no egress policy) → cluster-admin token theft (default SA bound to wide RBAC). Fix in least-privilege order: drop hostPath + set runAsNonRoot + drop ALL capabilities (closes the escape path); apply NetworkPolicy egress deny 169.254.169.254/32 (closes the credential exfil); disable automountServiceAccountToken + create a scoped SA (closes the token reuse). Each fix corresponds to one CKS exam-tip you have seen this lesson.

Key takeaway: the CKS exam never presents abstract questions — it gives you a scenario and expects the least-privilege control. Ask "what is the specific attack vector?" then "what is the smallest control that addresses it?" The full attack-surface map lives in your head before the timer starts.
⚡ Mini-quiz
Practise threat-model scenarios → quick quiz (5 questions).
Lesson 1.3 Exam tactics: time budget, vimrc & killer.sh

CKS pass rates hover near 25–30% — most failures are not knowledge gaps but time-management gaps. The candidates who pass have a vimrc, an aliases file, and a documented "skip and come back" rule. This lesson is how to spend 120 minutes.

Key concepts
  • Time budget: ~6–8 minutes per task average. The first 5 minutes of the exam: set aliases (alias k=kubectl), set export do="--dry-run=client -o yaml", set export now="--grace-period=0 --force", configure vimrc (set ts=2 sw=2 et).
  • Skip-and-come-back rule: if a task is not solved in <10 minutes, flag it and move on — the last 20 minutes of the exam are for re-attacking flagged tasks with fresh eyes.
  • Context-switching reflex: every task names a cluster + sometimes a namespace. Always run the provided kubectl config use-context … first, then kubectl config set-context --current --namespace=<ns> to remove a class of "wrong namespace" mistakes.
  • Allowed bookmarks: bookmark the 2024-blueprint sections of kubernetes.io ahead of time — Pod Security Admission, NetworkPolicy YAML, audit policy schema, EncryptionConfiguration, RuntimeClass. The exam UI supports Ctrl+F inside the docs panel.
  • killer.sh simulator: bundled with your CKS voucher; harder than the real exam by design, so passing killer.sh ≈ comfortably passing CKS. Use both attempts — first as diagnostic, second as drill.
  • Verification before submit: for every solved task, run a one-line verifier (kubectl auth can-i …, kubectl get netpol -n …, kubectl exec … -- curl …) — proctors don't grade what you forgot to apply.
Concrete example

Minute 0–2: alias k=kubectl; export do="--dry-run=client -o yaml"; export now="--grace-period=0 --force" + vimrc tweaks + bookmark the K8s, Falco, Trivy docs. Minute 2–100: cycle through tasks in order; if any task hits 10 minutes without traction, mark it (write the task number on the scratchpad) and skip. Minute 100–115: re-attack flagged tasks. Minute 115–120: verify every solved task with a one-line check. End at the 120-minute mark. This rhythm is how 25–30% of candidates pass on the first attempt.

Key takeaway: CKS rewards typing speed and ruthless time management over breadth of knowledge. Set aliases + vimrc in the first 2 minutes, skip anything that bleeds past 10 minutes, and reserve the final 5 minutes for verification. killer.sh harder than the real thing — use both attempts.
⚡ Mini-quiz
Drill exam-tactic scenarios → study mode (10 questions).
02

Cluster Setup: NetworkPolicy, TLS & API Hardening3 lessons

Cluster Setup is exam domain 1 (10%) and one of the densest typing domains. NetworkPolicy defaults are permissive, the API server ships with anonymous auth on, and TLS certs need explicit SANs — every default must be hardened. These three lessons cover the controls and the apiserver-manifest reflexes you will reach for repeatedly.

networkpolicy cis-benchmark kube-bench apiserver-flags ingress-tls cert-sans
~6h
📖 Read in-depth chapter
Lesson 2.1 NetworkPolicy deep dive

By default every Pod can reach every other Pod and every external endpoint — including the cloud metadata service. NetworkPolicy is the K8s-native control that locks this down at the CNI layer. CKS NetworkPolicy tasks reward people who can read an existing policy and surgically fix it without breaking DNS.

Key concepts
  • Namespaced object: NetworkPolicy only affects Pods in its own namespace. Cross-namespace rules use namespaceSelector.
  • Default behaviour: all ingress + egress is allowed while no NetworkPolicy selects a Pod. Once even one policy selects it, traffic is denied unless explicitly listed.
  • CNI requirement: NetworkPolicy is enforced by the CNI plugin — Calico, Cilium, Weave support it; Flannel does not (so applying a policy is a silent no-op). Verify with kubectl get pods -n kube-system.
  • Default-deny ingress pattern: spec: podSelector: {} + policyTypes: [Ingress] with empty ingress rules → deny all ingress in the namespace.
  • Same-namespace allowlist: use namespaceSelector: {matchLabels: {kubernetes.io/metadata.name: <ns>}} — the K8s 1.22+ auto-injected label removes the need to label namespaces manually.
  • Block cloud metadata: egress with to: ipBlock.cidr: 0.0.0.0/0 and except: [169.254.169.254/32] closes the SSRF credential-theft path.
  • Don't kill DNS: any default-deny egress policy must allow port 53 UDP/TCP to the cluster CoreDNS, or every Pod loses name resolution and looks "broken".
Concrete example

Task: namespace prod currently allows all traffic. Apply: default-deny ingress + allow only Pods labelled tier=frontend from namespace web; default-deny egress except DNS and port 443 to 0.0.0.0/0 excluding metadata. Write two policies: deny-all-ingress.yaml and allow-web-to-prod.yaml for ingress; one combined egress policy. Verify: kubectl exec -n prod app-0 -- nslookup kubernetes.default (must succeed — DNS path is open) and kubectl exec -n prod app-0 -- curl 169.254.169.254 --max-time 3 (must time out — metadata blocked).

Key takeaway: NetworkPolicy tasks on CKS usually involve fixing an over-permissive policy. Read the existing rules carefully before editing. Two rules to never forget: allow DNS (53/UDP+TCP) on every default-deny egress, and block 169.254.169.254/32 egress on every workload namespace.
⚡ Mini-quiz
Practise NetworkPolicy edits + metadata-blocking → study mode (10 questions).
Lesson 2.2 CIS benchmark & kube-bench remediation

The CIS Kubernetes Benchmark is the industry-standard hardening checklist, and kube-bench is the Go binary that runs the checks against a live cluster. CKS tasks framed as "fix CIS check 1.2.X" are really "edit the apiserver static-pod manifest and add this flag".

Key concepts
  • CIS check sections: 1.x control-plane / apiserver flags; 2.x etcd (TLS + encryption at rest); 3.x controller-manager + scheduler flags; 4.x worker / kubelet config; 5.x in-cluster policies (RBAC, PSA, NetworkPolicy, Secrets).
  • Run kube-bench: kube-bench run --targets master,node,etcd. A specific check: kube-bench run --check 1.2.6. JSON output: kube-bench run --json > results.json.
  • Apiserver remediations live in: /etc/kubernetes/manifests/kube-apiserver.yaml (static-pod manifest). Edit the YAML and kubelet auto-restarts the API server within ~60s.
  • Kubelet remediations live in: /var/lib/kubelet/config.yaml + /etc/kubernetes/kubelet.conf. After edits: systemctl daemon-reload && systemctl restart kubelet.
  • Key apiserver flags to know cold: --anonymous-auth=false; --authorization-mode=Node,RBAC (remove AlwaysAllow); --enable-admission-plugins=NodeRestriction,PodSecurity,EventRateLimit; --audit-log-path=/var/log/audit.log; --audit-policy-file=/etc/kubernetes/audit-policy.yaml; --profiling=false.
  • kube-bench output values: PASS / FAIL / WARN / INFO. WARN often means the check requires manual review — read the remediation text before assuming pass.
Concrete example

Task: kube-bench run --check 1.2.1 reports FAIL — --anonymous-auth is not set to false. Edit /etc/kubernetes/manifests/kube-apiserver.yaml and add - --anonymous-auth=false to the apiserver container args. Wait ~60s for the kubelet to recreate the static Pod (or crictl ps -a to watch it restart). Verify: kube-bench run --check 1.2.1 now reports PASS; curl -k https://<apiserver>:6443/api returns 401 instead of 200. The whole loop is under 3 minutes once you know where the manifest lives.

Key takeaway: 90% of CIS remediations are an edit to /etc/kubernetes/manifests/kube-apiserver.yaml. Memorise the path, memorise the top 6 flags (--anonymous-auth, --authorization-mode, --enable-admission-plugins, --audit-log-path, --audit-policy-file, --profiling=false), and trust that kubelet will restart the static pod for you. If it doesn't, crictl ps -a + crictl logs <id> shows the manifest error.
⚡ Mini-quiz
Drill kube-bench + apiserver-flag scenarios → quick quiz (5 questions).
Lesson 2.3 Ingress TLS & apiserver certificate SANs

Two TLS skills come up on every CKS attempt: terminating TLS at an Ingress, and adding a SAN to the apiserver cert (a real-world need when you front the cluster behind a new load-balancer hostname). Both are about openssl + manifest edits, not abstract crypto theory.

Key concepts
  • TLS Secret format: Kubernetes type kubernetes.io/tls with two required keys: tls.crt and tls.key. Create with kubectl create secret tls myapp-tls --cert=tls.crt --key=tls.key -n myapp.
  • Ingress TLS termination: reference the Secret in spec.tls[] with secretName + hosts list matching the cert SAN — Ingress controller terminates TLS and forwards plaintext to the backend.
  • Apiserver certificate file: /etc/kubernetes/pki/apiserver.crt (+ apiserver.key). Read the SAN list with openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout | grep -A1 "Subject Alternative".
  • Add a SAN (kubeadm): edit kubeadm config to add apiServer.certSANs: [<new-host-or-ip>]; delete the existing cert files; run kubeadm init phase certs apiserver --config kubeadm.yaml; restart the apiserver static pod (delete its pod or wait for kubelet refresh).
  • Verify a SAN with openssl: openssl s_client -connect <host>:6443 -showcerts <<<"" then inspect the printed cert chain. The SAN line you added should appear.
Concrete example

Task: the cluster apiserver is now fronted by a new LB DNS cluster.example.com and kubectl from that hostname errors with certificate is not valid for cluster.example.com. Edit kubeadm config: add cluster.example.com to apiServer.certSANs. Regenerate: rm /etc/kubernetes/pki/apiserver.{crt,key} then kubeadm init phase certs apiserver --config /etc/kubernetes/kubeadm.yaml. Restart: kubectl -n kube-system delete pod -l component=kube-apiserver. Verify: openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout | grep DNS: shows the new SAN; kubectl from the new hostname now succeeds.

Key takeaway: CKS TLS tasks reduce to two recipes. Ingress TLS: kubectl create secret tls + reference in spec.tls[]. Apiserver SAN: edit kubeadm config certSANs + delete + regenerate certs + restart static pod. Both end with an openssl x509 -text -noout verification — know how to read that output.
⚡ Mini-quiz
Practise TLS Secret + SAN-regeneration scenarios → study mode (10 questions).
03

Cluster Hardening: RBAC, Service Accounts & Upgrades3 lessons

RBAC is the single most-tested CKS topic — and the easiest to over-grant. Service-account hardening (automount off, projected tokens, NodeRestriction) closes the most common post-compromise path. Cluster upgrades double as CVE remediation. These three lessons are the bulk of domain 2 (15%).

rbac least-privilege service-accounts automount noderestriction kubeadm-upgrade
~6h
📖 Read in-depth chapter
Lesson 3.1 RBAC & the least-privilege workflow

Most RBAC bugs are accidental over-grant — a Role with verbs: ["*"], or a ClusterRoleBinding where a RoleBinding would have been enough. CKS rewards the workflow: read the current grant with kubectl auth can-i --list, replace it with the narrowest possible Role, verify with can-i again.

Key concepts
  • Role vs ClusterRole: Role is namespaced (one ns); ClusterRole is cluster-scoped but can also be bound namespace-locally via RoleBinding.
  • RoleBinding vs ClusterRoleBinding: RoleBinding grants a Role or ClusterRole inside a single namespace; ClusterRoleBinding grants a ClusterRole across all namespaces (much wider — avoid unless cluster-scoped resources are needed).
  • Least-privilege rules: always use the most namespace-restricted binding possible; avoid verbs: ["*"] and resources: ["*"]; never bind a service account to cluster-admin.
  • Dangerous verbs: escalate + bind on roles/bindings (the user can grant themselves new permissions), impersonate on users/groups/serviceaccounts (user can act as another identity), create on clusterrolebindings.
  • Inspection commands: kubectl auth can-i list pods --as=system:serviceaccount:<ns>:<sa> for a single check; kubectl auth can-i --list --as=system:serviceaccount:<ns>:<sa> for the full grant matrix.
  • Quick imperative creation: kubectl create role pod-reader --verb=get,list --resource=pods -n dev + kubectl create rolebinding pod-reader-binding --role=pod-reader --serviceaccount=dev:myapp -n dev. Faster than YAML for the small ones.
Concrete example

Task: ServiceAccount ci-runner in ns build currently has a ClusterRoleBinding to cluster-admin — restrict it to listing Pods in its own namespace only. Inspect: kubectl auth can-i --list --as=system:serviceaccount:build:ci-runner shows everything granted. Replace: kubectl delete clusterrolebinding ci-runner-admin; kubectl create role pod-lister --verb=get,list,watch --resource=pods -n build; kubectl create rolebinding ci-runner-pods --role=pod-lister --serviceaccount=build:ci-runner -n build. Verify: kubectl auth can-i list pods -n build --as=system:serviceaccount:build:ci-runneryes; kubectl auth can-i delete nodes --as=system:serviceaccount:build:ci-runnerno. The before/after can-i is your audit trail.

Key takeaway: the CKS RBAC pattern is always (1) can-i --list before, (2) replace with the narrowest Role + RoleBinding, (3) can-i after to prove the fix. Beware the four dangerous verbs: escalate, bind, impersonate, create on ClusterRoleBindings.
⚡ Mini-quiz
Drill RBAC least-privilege scenarios → study mode (10 questions).
Lesson 3.2 Service-account hardening: automount, projected tokens, NodeRestriction

Every Pod gets a ServiceAccount token by default — and most application Pods never call the Kubernetes API. Disabling the automount, using bounded projected tokens, and enabling NodeRestriction together close the most popular post-compromise path on Kubernetes.

Key concepts
  • Default automount: every Pod gets a projected SA token at /var/run/secrets/kubernetes.io/serviceaccount/token. If the app is compromised, the attacker has API credentials immediately.
  • Disable per Pod: set automountServiceAccountToken: false in the Pod spec — Pod-level override wins.
  • Disable per ServiceAccount: set automountServiceAccountToken: false on the SA object — applies to every Pod using that SA unless overridden.
  • Legacy vs projected tokens: legacy tokens (stored in Secrets, never expire, single audience) are deprecated; modern projected tokens (TokenRequest API) have a bounded expirationSeconds, specific audience, and are tied to Pod lifetime — kubelet auto-refreshes them.
  • Inspect a projected token: kubectl exec <pod> -- cat /run/secrets/kubernetes.io/serviceaccount/token | cut -d. -f2 | base64 -d | jq . — shows aud, exp, iss.
  • NodeRestriction admission plugin: default in kubeadm clusters. Restricts kubelet to (1) only modify its own Node object, (2) only modify Pods scheduled on it. A compromised node cannot pivot to modify other nodes' Pods.
  • What NodeRestriction does NOT do: it does not limit what a container does inside its Pod — that is SecurityContext + AppArmor + seccomp (Module 4).
Concrete example

Task: Deployment web uses the default SA and doesn't call the K8s API. Disable token automount with the minimum change. Approach A (preferred): patch the Pod template — kubectl patch deploy web -p '{"spec":{"template":{"spec":{"automountServiceAccountToken":false}}}}'. Verify: kubectl get pod <web-pod> -o yaml | grep -i automount shows the flag; kubectl exec <pod> -- ls /var/run/secrets/kubernetes.io/serviceaccount returns No such file. The token is gone; the API attack path from inside that Pod is closed.

Key takeaway: for any Pod that does not call the K8s API, set automountServiceAccountToken: false at Pod level — the single highest-leverage SA hardening control. NodeRestriction is already on by default in kubeadm clusters; verify with kubectl -n kube-system get pod kube-apiserver-<node> -o yaml | grep enable-admission.
⚡ Mini-quiz
Practise SA hardening scenarios → quick quiz (5 questions).
Lesson 3.3 Kubernetes upgrades as a security practice

Patch upgrades fix CVEs. Skipping a minor version is unsupported. CKS may present a cluster running a known-vulnerable patch and ask you to walk it to the next safe version using kubeadm upgrade. The order matters and the commands are exact.

Key concepts
  • Why upgrades matter: every minor K8s release patches CVEs — kubelet privilege escalation, etcd information disclosure, apiserver SSRF, IngressNightmare-style issues. Staying current is a hardening control, not maintenance.
  • Upgrade order: control plane first → workers second. Worker node version must be ≤ control plane version; never skip a minor version (1.28 → 1.29 → 1.30, not 1.28 → 1.30).
  • Inspect available versions: apt-cache madison kubeadm (Debian/Ubuntu) or yum --showduplicates list kubeadm (RHEL).
  • Upgrade kubeadm itself: apt-mark unhold kubeadm && apt-get install -y kubeadm=1.30.0-00 && apt-mark hold kubeadm.
  • Plan + apply: kubeadm upgrade plan (read the proposal); kubeadm upgrade apply v1.30.0 on the first control-plane node; kubeadm upgrade node on additional control-plane nodes and workers.
  • Drain + upgrade + uncordon: kubectl drain <node> --ignore-daemonsets --delete-emptydir-data; apt-get install -y kubelet=1.30.0-00 kubectl=1.30.0-00; systemctl daemon-reload && systemctl restart kubelet; kubectl uncordon <node>.
Concrete example

Task: upgrade a 1-control-plane + 1-worker cluster from 1.29.4 to 1.30.0. Control plane: apt-mark unhold kubeadm && apt-get install -y kubeadm=1.30.0-00 && apt-mark hold kubeadmkubeadm upgrade plankubeadm upgrade apply v1.30.0. Drain CP node from another machine: kubectl drain cp-1 --ignore-daemonsets. Upgrade kubelet/kubectl on CP → restart → uncordon. Worker: kubeadm upgrade kubeadm package on worker → kubeadm upgrade node → drain worker → upgrade kubelet → restart → uncordon. Verify: kubectl get nodes shows both at v1.30.0; kubectl get pods -n kube-system shows all control-plane pods Ready.

Key takeaway: upgrades are exam-tested because they are security. The recipe is the same every time: unhold + install kubeadm → upgrade plan → upgrade apply on CP1 → upgrade node on the rest → drain + kubelet upgrade + restart + uncordon, per node. One minor version at a time; control plane before workers.
⚡ Mini-quiz
Drill kubeadm upgrade-flow scenarios → study mode (10 questions).
04

System Hardening: AppArmor, Seccomp & Linux Capabilities3 lessons

System Hardening (domain 3, 15%) is where most CKS candidates lose points — the LSM and syscall-filter mechanics feel unfamiliar even to experienced operators. These three lessons cover AppArmor profiles, seccomp RuntimeDefault, Linux capability minimisation, and OS-level node hardening — the controls that stop a compromised container from becoming a compromised node.

apparmor seccomp capabilities securitycontext readonlyrootfs user-namespaces
~6h
📖 Read in-depth chapter
Lesson 4.1 AppArmor for containers

AppArmor is a Linux Security Module (LSM) that restricts what a process can do using path-based profiles. CKS tasks reward two skills: load a profile on the node and reference it correctly from a Pod spec. The 1.30 API change (from annotation to appArmorProfile) is a known gotcha.

Key concepts
  • AppArmor concepts: Application Armor is an LSM that restricts a process via profiles; an alternative to SELinux (per-distro choice). Used on Ubuntu/Debian K8s nodes by default.
  • Profile modes: enforce (block violations + log), complain (log only, do not block), disabled (no enforcement).
  • Profile location: profiles live on each node (e.g., /etc/apparmor.d/), not in K8s objects. Kubernetes only references them by name.
  • Inspect / load profiles: cat /sys/kernel/security/apparmor/profiles shows loaded profiles + mode. Load a new profile: apparmor_parser -q /etc/apparmor.d/my-profile.
  • Apply to a container (K8s 1.30+): in container securityContext: appArmorProfile: { type: Localhost, localhostProfile: <profile-name> }. Types: RuntimeDefault / Localhost / Unconfined.
  • Legacy syntax (pre-1.30): Pod annotation container.apparmor.security.beta.kubernetes.io/<container-name>: localhost/<profile-name>. Deprecated but still tested on older exam clusters.
  • Profile must exist on the scheduled node: otherwise the Pod fails to start with "failed to create container: apply apparmor profile". In production, distribute via DaemonSet or node provisioner.
Concrete example

Task: load profile k8s-deny-write on all nodes and apply it to container web in Pod nginx. On each node: apparmor_parser -q /etc/apparmor.d/k8s-deny-write; verify with grep k8s-deny-write /sys/kernel/security/apparmor/profiles. Edit the Pod: under spec.containers[0].securityContext add appArmorProfile: { type: Localhost, localhostProfile: k8s-deny-write }. Verify: kubectl exec nginx -c web -- touch /tmp/test should fail with Permission denied. Profile loaded, applied, verified — all three states.

Key takeaway: the AppArmor exam recipe is: (1) apparmor_parser -q on the node, (2) reference via appArmorProfile.localhostProfile in the container securityContext, (3) verify with a forbidden action and check it is denied. Profiles missing on a node = Pod stuck. The 1.30+ securityContext path replaced the legacy annotation — know both, default to securityContext.
⚡ Mini-quiz
Drill AppArmor profile-loading + referencing → study mode (10 questions).
Lesson 4.2 Seccomp profiles & RuntimeDefault

Seccomp filters which syscalls a process can make. Default K8s behaviour pre-1.27 was Unconfined — fix that with RuntimeDefault in one line. CKS often gives you a Pod with no seccomp set and asks for the minimum-change hardening; this lesson is that one-line.

Key concepts
  • Seccomp = secure computing mode: a kernel feature that filters syscalls per process. Reduces attack surface even when an attacker has code execution.
  • Three profile types in K8s: RuntimeDefault (the container runtime's default profile — recommended baseline), Localhost (custom JSON file on the node), Unconfined (no seccomp — avoid).
  • Pod-level vs container-level: spec.securityContext.seccompProfile.type: RuntimeDefault applies to every container; per-container override at spec.containers[].securityContext.seccompProfile.
  • Custom Localhost profile: JSON file at /var/lib/kubelet/seccomp/profiles/my-profile.json on every node; referenced via type: Localhost, localhostProfile: profiles/my-profile.json.
  • CIS 5.7.2: Ensure the seccomp profile is set to RuntimeDefault (or docker/default for Docker-era clusters). Pre-1.27 default was Unconfined — never rely on the default, always set it.
Concrete example

Task: Pod api in ns prod has no seccomp profile. Set RuntimeDefault at Pod level. Patch: kubectl patch pod api -n prod --type merge -p '{"spec":{"securityContext":{"seccompProfile":{"type":"RuntimeDefault"}}}}' — or for a Pod that's already running and can't be patched, recreate it. Verify: kubectl get pod api -n prod -o jsonpath='{.spec.securityContext.seccompProfile.type}' returns RuntimeDefault; the container can no longer call disallowed syscalls (try kubectl exec api -- unshare -U — should fail).

Key takeaway: the seccomp exam-line is spec.securityContext.seccompProfile.type: RuntimeDefault — memorise the exact field path. Pod-level applies to all containers; container-level override only when one container needs a custom profile. Unconfined is the wrong answer in every CKS scenario.
⚡ Mini-quiz
Practise seccomp + custom profile scenarios → quick quiz (5 questions).
Lesson 4.3 Linux capabilities & node OS hardening

Container default capabilities are far too generous. The CKS-grade pattern is drop ALL then add back only what is needed. This lesson also covers the SecurityContext combo (no-priv-escalation + readOnlyRootFilesystem + runAsNonRoot) and OS-level kernel-module + user-namespace hardening.

Key concepts
  • Default container caps: a subset (~14) of the full Linux capability set — still includes CHOWN, DAC_OVERRIDE, NET_RAW, etc. Way more than most apps need.
  • Minimisation pattern: capabilities.drop: ["ALL"] then capabilities.add: ["NET_BIND_SERVICE"] (or whatever a specific app legitimately needs).
  • Capabilities to never add lightly: SYS_ADMIN (almost as bad as privileged: true), NET_ADMIN (full network config), SYS_PTRACE (attach to host processes), DAC_READ_SEARCH, SYS_MODULE.
  • SecurityContext combo: allowPrivilegeEscalation: false (sets no_new_privs — blocks setuid escalation), readOnlyRootFilesystem: true (immutable rootfs — any write is attacker activity), runAsNonRoot: true + runAsUser: 1000 (explicit UID), and never privileged: true.
  • readOnlyRootFilesystem caveat: apps needing temp space need an emptyDir volume mounted at /tmp (and possibly /var/run, /var/cache) — otherwise they CrashLoop.
  • OS-level kernel hardening: blacklist rarely-used kernel modules with exploit history — dccp, sctp, rds, tipc. Add blacklist dccp to /etc/modprobe.d/kubernetes.conf and reload.
  • User namespaces (K8s 1.30+): spec.hostUsers: false maps container UID 0 to an unprivileged host UID — even root inside the container is unprivileged outside. Verify host support: cat /proc/sys/kernel/unprivileged_userns_clone.
Concrete example

Task: harden Pod web to drop all capabilities + non-root + read-only root + seccomp default + no privilege escalation, in a single securityContext block. Patch: add to each container — securityContext: { allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, runAsNonRoot: true, runAsUser: 1000, capabilities: { drop: ["ALL"] }, seccompProfile: { type: RuntimeDefault } } plus an emptyDir mount at /tmp if needed. Verify: kubectl exec web -- id shows uid=1000; kubectl exec web -- touch /test fails (rootfs read-only); kubectl exec web -- chown 0:0 /tmp/x fails (no CAP_CHOWN). One securityContext, four hardening controls.

Key takeaway: the CKS securityContext "full block" is muscle memory — allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, runAsNonRoot: true, capabilities.drop: ["ALL"], seccompProfile.type: RuntimeDefault. Add only the capabilities the app provably needs. Remember the emptyDir /tmp escape hatch when read-only root breaks an app.
⚡ Mini-quiz
Drill capability-minimisation + securityContext scenarios → study mode (10 questions).
05

Minimize Microservice Vulnerabilities: PSA, Secrets & Isolation3 lessons

Domain 4 (20%, the largest single domain). Pod Security Admission replaces PSP for namespace-level policy; Secrets need encryption at rest and volume mounts (not env vars); RuntimeClass + gVisor/Kata give strong isolation for untrusted workloads. Three lessons, three controls — every one of them appears on most CKS attempts.

pod-security-admission opa-gatekeeper secrets-encryption runtimeclass gvisor kata-containers
~6h
📖 Read in-depth chapter
Lesson 5.1 Pod Security Admission & OPA Gatekeeper

Pod Security Admission is the K8s-native namespace-level policy replacement for the (deprecated) PodSecurityPolicy. OPA Gatekeeper handles custom rules PSA cannot express. CKS tasks usually walk you through labelling a namespace at the right level, in the right mode, and verifying that violating Pods are rejected.

Key concepts
  • The three Pod Security Standards: Privileged (no restrictions — for system / infra), Baseline (blocks known privilege escalation — no privileged, no hostPath, no hostPID/hostNetwork, limited caps), Restricted (heavily hardened — drop ALL caps, runAsNonRoot, allowPrivilegeEscalation false, seccomp required).
  • Apply at namespace level: kubectl label ns production pod-security.kubernetes.io/enforce=restricted. Labels: enforce, warn, audit.
  • The three modes: enforce rejects the API request (Pod never created); warn allows but returns a warning header CLIs display; audit allows + logs the violation to the audit log.
  • Version pinning: pod-security.kubernetes.io/enforce-version: v1.30 locks the policy semantics to a K8s version (important when upgrading).
  • Safe rollout order: audit first → read audit log for violations → warn → fix workloads → enforce. Never go straight to enforce in prod.
  • OPA Gatekeeper for custom rules: PSA covers Pod security only. For anything else (block specific registries, require labels, enforce naming), use Gatekeeper — two CRDs: ConstraintTemplate (Rego logic) + Constraint (activates the template with parameters).
  • Rego violation pattern: violation[{"msg": msg}] { not startswith(input.review.object.spec.containers[_].image, "registry.example.com/"); msg := "image not from approved registry" }.
Concrete example

Task: enforce Restricted on namespace prod in audit-then-enforce style. Phase 1 (audit): kubectl label ns prod pod-security.kubernetes.io/audit=restricted pod-security.kubernetes.io/audit-version=v1.30. Wait, then read the apiserver audit log for "audit":"restricted" violations. Phase 2 (warn): add warn=restricted. Phase 3 (enforce): add enforce=restricted. Verify: kubectl apply -f priv-pod.yaml -n prod with a privileged Pod is rejected — "violates PodSecurity: restricted: …". Three labels, three rollout phases, every workload kept running.

Key takeaway: audit → warn → enforce is the only safe PSA rollout. Memorise the label keys (pod-security.kubernetes.io/enforce, /warn, /audit) and the three levels (Privileged, Baseline, Restricted). For policies PSA cannot express, OPA Gatekeeper with ConstraintTemplate + Constraint is the K8s-native answer.
⚡ Mini-quiz
Drill PSA labels + Gatekeeper Constraint scenarios → study mode (10 questions).
Lesson 5.2 Secrets security: at rest, in transit, access patterns

Kubernetes Secrets are stored base64-encoded in etcd by default — trivially decodable. Encryption at rest + volume mounts (not env vars) + an external secrets manager when stakes are high is the layered fix. CKS tasks usually ask for the EncryptionConfiguration recipe verbatim.

Key concepts
  • Why defaults are weak: Secrets are base64 in etcd — anyone with etcd access reads them. Env-var exposure leaks via kubectl describe pod and /proc/<pid>/environ.
  • Fix 1 — volume mounts: mount Secrets as files instead of injecting as env vars. File-mode 0400, owner root or app UID. Volume-mounted ConfigMaps + Secrets also hot-update when the source changes; env vars do not.
  • Fix 2 — encryption at rest: create /etc/kubernetes/enc/encryption.yaml with provider aescbc (recommended) or secretbox; add --encryption-provider-config=/etc/kubernetes/enc/encryption.yaml to the apiserver manifest; re-encrypt existing Secrets with kubectl get secrets -A -o json | kubectl replace -f -. Verify by reading etcd directly — value should start with k8s:enc:aescbc:v1:.
  • Fix 3 — external manager: Vault Agent Injector, AWS Secrets Manager CSI driver, or the Secrets Store CSI driver. Secret material never lands in etcd at all.
  • EncryptionConfiguration providers in order: the first provider listed under resources[].providers is the one used to encrypt new writes; all listed providers are tried for decrypt. Always keep identity last during the migration window.
Concrete example

Task: enable encryption at rest for Secrets on a kubeadm cluster. Write /etc/kubernetes/enc/encryption.yaml with kind: EncryptionConfiguration + resources: [{ resources: [secrets], providers: [{ aescbc: { keys: [{ name: key1, secret: <32-byte-b64> }] } }, { identity: {} }] }]. Mount into apiserver: in /etc/kubernetes/manifests/kube-apiserver.yaml add a hostPath volume + volumeMount + the flag - --encryption-provider-config=/etc/kubernetes/enc/encryption.yaml. Re-encrypt: kubectl get secrets -A -o json | kubectl replace -f -. Verify on etcd: ETCDCTL_API=3 etcdctl get /registry/secrets/default/my-secret shows k8s:enc:aescbc:v1:key1:… prefix.

Key takeaway: the CKS Secrets recipe is (1) EncryptionConfiguration with aescbc + identity as fallback, (2) mount-into-apiserver via hostPath + the --encryption-provider-config flag, (3) re-encrypt with get-pipe-replace, (4) verify the etcd value starts with k8s:enc:aescbc:v1:. Always prefer volume mounts over env vars — env vars leak everywhere.
⚡ Mini-quiz
Practise Secret encryption + access-pattern scenarios → quick quiz (5 questions).
Lesson 5.3 Container isolation: gVisor, Kata & RuntimeClass

Default containers share the host kernel via Linux namespaces + cgroups — a kernel CVE is a cluster compromise. gVisor and Kata Containers raise the isolation bar (userspace syscall sandbox / per-Pod micro-VM). The K8s glue is RuntimeClass: a small object that maps the cluster scheduler to the right node-level handler.

Key concepts
  • Default runtime (containerd + runc): Linux namespaces + cgroups + seccomp + AppArmor — fast, but shares the host kernel; a kernel CVE breaks the boundary.
  • gVisor (runsc): intercepts syscalls in userspace and handles them inside a Go-implemented sandbox — host kernel never sees the syscall directly. Tradeoff: ~30% perf hit on syscall-heavy apps.
  • Kata Containers: each Pod runs in a lightweight VM with its own kernel — strongest isolation. Higher memory overhead than gVisor.
  • RuntimeClass shape: apiVersion: node.k8s.io/v1; kind: RuntimeClass; metadata: {name: gvisor}; handler: runsc. The handler must match what containerd is configured for on the node.
  • Use a RuntimeClass: set spec.runtimeClassName: gvisor on the Pod. K8s schedules it on nodes whose runtime configures that handler.
  • Verification reflex: kubectl exec <pod> -- dmesg or uname -r inside the Pod shows the sandbox kernel string (gVisor) or a different kernel version (Kata) vs the host.
  • Auto-apply RuntimeClass: an admission webhook (Kyverno mutate or OPA mutating) can inject runtimeClassName for specific namespaces — e.g., all Pods in untrusted namespace land on gVisor automatically.
Concrete example

Task: create a RuntimeClass gvisor and ensure Pod untrusted-app uses it. Apply the RuntimeClass: kubectl apply -f - with the four-field spec. Patch the Pod template: spec.runtimeClassName: gvisor. Verify: kubectl exec untrusted-app -- dmesg shows the gVisor banner ("Starting gVisor…") or uname -r shows a Linux version that doesn't match the host. The K8s side is two YAMLs; the node side (runsc binary + containerd handler config) is exam-cluster pre-installed.

Key takeaway: the CKS RuntimeClass task is create the RuntimeClass object + reference it via spec.runtimeClassName — never installing gVisor itself. Verify with dmesg / uname -r from inside the Pod. Use gVisor or Kata for multi-tenant or untrusted workloads, never as a blanket default — perf hit is real.
⚡ Mini-quiz
Drill RuntimeClass + gVisor / Kata scenarios → study mode (10 questions).
06

Supply Chain Security: Scanning, Signing & Admission Control3 lessons

Domain 5 (20%) — supply chain attacks dominate modern incident reports. Trivy for CVE + IaC scanning, Cosign for image signing, and an ImagePolicyWebhook / OPA Constraint for admission control are the three layers that decide whether a malicious image ever runs. Heavy on CLI muscle memory.

trivy kubesec cosign distroless imagepolicywebhook opa-registry-policy
~6h
📖 Read in-depth chapter
Lesson 6.1 Image scanning with Trivy & kubesec

Trivy is the de facto CKS scanner — single binary, scans images, filesystems, IaC, K8s manifests, full cluster. kubesec scans a manifest for risky securityContext settings. Both surface in CI gates and admission webhooks; on the exam, you'll run them at the command line and interpret the output.

Key concepts
  • Scan an image: trivy image nginx:1.25. Severity filter: trivy image --severity HIGH,CRITICAL nginx:1.25. CI gate: trivy image --severity CRITICAL --exit-code 1 nginx:1.25 (non-zero exit = fail the build).
  • Scan a running cluster: trivy k8s --severity HIGH,CRITICAL --report all cluster. Audits all workloads in one pass.
  • Generate an SBOM: trivy image --format cyclonedx nginx:1.25 > sbom.json. SBOM ⇒ verifiable inventory for compliance.
  • Scan IaC + manifests: trivy config ./k8s-manifests/ reports YAML/HCL misconfigurations (privileged, no resource limits, hostPath, etc.) — same engine as Aqua's misconfig DB.
  • kubesec for static analysis: kubesec scan pod.yaml returns a numerical score + advisories on missing securityContext settings, privileged flag, hostPath, missing runAsNonRoot, capability over-grants.
  • Three layers of scanning: (1) CI gate — scan before push, fail the build (image never enters registry); (2) admission webhook — scan during Pod admission (image entering cluster); (3) periodic cluster scantrivy k8s or Starboard / Trivy Operator on a cron.
Concrete example

Task: report all CRITICAL CVEs in image app:v1 and fail the build if any are found. Run: trivy image --severity CRITICAL --exit-code 1 app:v1 — output lists each CVE with package, fixed-in, severity. CI sees exit code 1 and stops the pipeline. Companion check: kubesec scan pod.yaml on the deployment manifest catches missing runAsNonRoot + missing readOnlyRootFilesystem — both before the image ever ships. CVE gate + manifest gate, both before merge.

Key takeaway: Trivy CLI flags to know cold — --severity HIGH,CRITICAL, --exit-code 1, --format cyclonedx (SBOM), --report all. Three layers: CI gate, admission webhook, periodic cluster scan — defense in depth, not "scan once and forget". kubesec is the manifest counterpart to Trivy on images.
⚡ Mini-quiz
Practise Trivy CLI + kubesec scenarios → study mode (10 questions).
Lesson 6.2 Image signing with Cosign & minimal-base images

Cosign (from the Sigstore project) is the industry-standard image signer; the workflow is sign → verify → admission-gate. Distroless and FROM scratch images shrink the attack surface — no shell, no package manager — so even with code execution an attacker has nowhere to pivot.

Key concepts
  • Cosign basics: cosign generate-key-pair produces cosign.key + cosign.pub. Keep the private key in CI secrets only.
  • Sign + verify: cosign sign --key cosign.key registry.example.com/myapp:v1.0; cosign verify --key cosign.pub registry.example.com/myapp:v1.0. Signatures are OCI artifacts stored next to the image in the registry.
  • Keyless (production): cosign sign <image> with no --key uses Fulcio + an OIDC identity (GitHub Actions, Gitlab CI). Signatures are short-lived certs, logged to the Rekor transparency log.
  • Distroless images: gcr.io/distroless/static + gcr.io/distroless/base. No shell (bash, sh), no package manager — attacker with code-exec can't bash -i or apt install.
  • Multi-stage Dockerfile recipe: stage 1 FROM golang:1.22 AS builder compiles; stage 2 FROM gcr.io/distroless/static copies only the binary. Final image: binary + a handful of libraries.
  • FROM scratch: zero-OS base for fully static binaries (Go, Rust with musl). Smallest possible attack surface — but no nslookup, no cat, no debugging tools either.
Concrete example

Task: sign image registry.example.com/myapp:v1 and gate cluster admission on a valid signature. Sign in CI: cosign generate-key-pair once (store cosign.key in CI secrets, commit cosign.pub); cosign sign --key $COSIGN_KEY registry.example.com/myapp:v1 after the image push. Verify locally: cosign verify --key cosign.pub registry.example.com/myapp:v1 returns a signed payload. Cluster gate: install the Sigstore policy controller, create a ClusterImagePolicy matching registry.example.com/* with cosign.pub — unsigned images are rejected at admission.

Key takeaway: the signing flow is generate-key-pair → sign with private key → verify with public key → gate admission on verify. Cosign keys: private signs, public verifies — same model as SSH. For minimal images, prefer distroless over Alpine (Alpine still has musl + apk); prefer FROM scratch for static binaries.
⚡ Mini-quiz
Practise Cosign signing + distroless scenarios → quick quiz (5 questions).
Lesson 6.3 Admission control for supply chain

Scanning and signing are pointless if unsigned, unscanned images can still land in the cluster. Admission control (ImagePolicyWebhook, OPA Gatekeeper, Kyverno) is the gate. CKS reliably asks for an OPA Constraint that blocks a non-approved registry — the Rego pattern is short and worth memorising.

Key concepts
  • ImagePolicyWebhook: a K8s-built-in admission plugin specifically for image policy. Enable with --enable-admission-plugins=ImagePolicyWebhook on apiserver + admission config file with the webhook URL.
  • defaultAllow flag: defaultAllow: false = fail-closed (safe — deny when the webhook is down); defaultAllow: true = fail-open (unsafe — allows everything if webhook unreachable). CKS almost always wants fail-closed.
  • OPA Gatekeeper for registry policy: ConstraintTemplate with Rego that checks image prefix, then a Constraint applying it to Pod kinds with a parameter for the allowed registry list.
  • Rego registry-allowlist snippet: violation[{"msg": msg}] { container := input.review.object.spec.containers[_]; not startswith(container.image, input.parameters.allowedRegistry); msg := sprintf("image %v not from approved registry", [container.image]) }.
  • Validating vs mutating webhooks: validating webhooks can reject (no modification); mutating webhooks can modify (inject sidecars, runtimeClassName, labels). Execution order: all mutating → all validating.
  • failurePolicy: Fail = webhook outage rejects the request (safe); Ignore = request proceeds (unsafe for security-critical webhooks). PSA-style webhooks should be Fail.
Concrete example

Task: deny any Pod whose container image is not from registry.example.com/. Install Gatekeeper (assume done on the exam cluster). Apply a ConstraintTemplate with the Rego snippet above + a K8sAllowedRegistry Constraint with parameters.allowedRegistry: registry.example.com/ and match: { kinds: [{ apiGroups: [""], kinds: [Pod] }] }. Verify: kubectl apply -f bad-pod.yaml with image docker.io/nginx is rejected with the Gatekeeper message; kubectl apply -f good-pod.yaml with image registry.example.com/web:1 succeeds.

Key takeaway: supply-chain admission has three options — ImagePolicyWebhook (built-in, custom webhook server), OPA Gatekeeper (ConstraintTemplate + Constraint with Rego), Kyverno (declarative ClusterPolicy). For CKS, know the Gatekeeper Rego pattern by heart: violation[{"msg": msg}] { … } reading input.review.object.spec.containers[_].image. Always fail-closed.
⚡ Mini-quiz
Drill OPA Constraint + ImagePolicyWebhook scenarios → study mode (10 questions).
07

Runtime Security: Falco, Audit Logs & Behavioral Analysis3 lessons

Domain 6 (20%). Falco watches kernel syscalls in real time; Kubernetes audit logs capture every API call; immutable infrastructure means any filesystem change is signal, not noise. Together they detect a compromise and bound its blast radius. CKS will almost certainly test you on at least one Falco rule and one audit policy YAML.

falco falco-rules audit-policy audit-logs immutable forensics
~6h
📖 Read in-depth chapter
Lesson 7.1 Falco: runtime threat detection

Falco (CNCF graduated) watches kernel syscalls via eBPF or a kernel module and fires alerts when something matches a rule. CKS uses Falco for "detect a shell in a container", "detect a read of /etc/shadow", or "detect a write under /etc". Knowing the rule schema + the field names is non-negotiable.

Key concepts
  • Falco architecture: driver (eBPF probe or kernel module) → rule engine → output sink (stdout, file, gRPC, HTTP, falcosidekick for Slack/SIEM).
  • Config files: /etc/falco/falco.yaml (engine config), /etc/falco/falco_rules.yaml (built-in rules — don't edit), /etc/falco/falco_rules.local.yaml (custom rules — edit here, survives upgrades).
  • Rule fields: rule (name), desc, condition (the match expression — Falco macros + fields), output (alert template with field interpolation), priority (EMERGENCY → DEBUG).
  • Common macros: container (event is in a container), spawned_process (new process exec'd), open_read (file opened for read), open_write (file opened for write).
  • Common fields: fd.name (file path), proc.name (process name), user.name, container.name, k8s.pod.name, fd.directory, evt.type.
  • Iconic CKS rules: shell in container — spawned_process and container and proc.name in (sh, bash, zsh, dash, ash). Sensitive read — open_read and container and fd.name in (/etc/shadow, /etc/passwd). Write to /etcopen_write and container and fd.directory=/etc. Env leak — open_read and container and fd.name=/proc/1/environ.
  • Output interpolation: output: "Shell in container (user=%user.name container=%container.name pod=%k8s.pod.name command=%proc.cmdline)" — every %field is replaced at alert time.
Concrete example

Task: write a Falco rule that fires when anyone reads /etc/shadow in any container, and verify it triggers. Edit /etc/falco/falco_rules.local.yaml and add - rule: Read shadow file with desc: detect access to /etc/shadow, condition: open_read and container and fd.name=/etc/shadow, output: "Read shadow detected (pod=%k8s.pod.name container=%container.name proc=%proc.name)", priority: WARNING. Restart: systemctl restart falco. Trigger: kubectl exec <pod> -- cat /etc/shadow (will fail permissions, but the open syscall fires). Verify: journalctl -u falco -n 20 --no-pager shows the WARNING line with your output template. Whole loop < 5 minutes.

Key takeaway: Falco rules have their own DSL — not PromQL, not Rego. Memorise the four iconic conditions (shell-in-container, read-shadow, write-to-/etc, read-/proc/1/environ), the field names (fd.name, proc.name, container, k8s.pod.name), and the test loop: edit falco_rules.local.yamlsystemctl restart falco → trigger → journalctl -u falco.
⚡ Mini-quiz
Drill Falco rule-writing scenarios → study mode (10 questions).
Lesson 7.2 Kubernetes audit logging

Audit logs answer "who called the K8s API to do what, when". Designed well, they record Secret reads + dangerous deletes without drowning in ConfigMap polling noise. The CKS killer detail is rule order — top-to-bottom, first match wins — so suppressions must come before broad rules.

Key concepts
  • Four log levels: None (do not log), Metadata (user, time, resource, verb only), Request (metadata + request body), RequestResponse (metadata + request + response body — captures returned secret values).
  • Rule ordering: rules in rules[] are evaluated top-to-bottom, first match wins. Put level: None suppressions before broader rules or they never fire.
  • Common rule shape: { level, verbs[], resources: [{ apiGroups[], resources[] }], users[], userGroups[], namespaces[], nonResourceURLs[] }. Omitting a field means "any".
  • Audit policy file: typically /etc/kubernetes/audit-policy.yaml, mounted via hostPath into the apiserver static pod.
  • Apiserver flags: --audit-policy-file=/etc/kubernetes/audit-policy.yaml, --audit-log-path=/var/log/kubernetes/audit.log, --audit-log-maxage=30 (days), --audit-log-maxbackup=10, --audit-log-maxsize=100 (MB).
  • Reading audit logs: the log is JSON-per-line. jq 'select(.verb=="delete" and .objectRef.resource=="secrets") | {user: .user.username, name: .objectRef.name, time: .requestReceivedTimestamp}' is the canonical "who deleted a Secret" query.
Concrete example

Task: log all Secret operations at RequestResponse, suppress read-only ConfigMap traffic, catch-all at Metadata. Write /etc/kubernetes/audit-policy.yaml: rule 1 → level: None, verbs: [get,list,watch], resources: [{resources: [configmaps]}]; rule 2 → level: RequestResponse, resources: [{resources: [secrets]}]; rule 3 → level: Metadata (catch-all). Edit apiserver manifest: add the three audit flags + a hostPath volume + volumeMount for the policy file. Verify: kubectl get secret my-secret -o yaml appears as a RequestResponse entry in /var/log/kubernetes/audit.log; kubectl get configmap my-cm produces no log entry; kubectl get pods produces a Metadata entry.

Key takeaway: audit policy is top-to-bottom, first match wins — suppressions go first. Memorise the 4 levels (None, Metadata, Request, RequestResponse) + 3 apiserver flags (--audit-policy-file, --audit-log-path, --audit-log-maxage). The hostPath-mount-into-apiserver dance is the same as encryption-at-rest from Module 5 — same recipe, different file.
⚡ Mini-quiz
Practise audit-policy + rule-ordering scenarios → quick quiz (5 questions).
Lesson 7.3 Immutable infrastructure & incident response

Immutability is the loudest signal you can buy: when the rootfs is read-only, every filesystem write is unambiguous attacker activity. Containment uses crictl from the node, not kubectl exec inside the compromised Pod. CKS may give you a "compromised" Pod and ask for the containment + forensic steps.

Key concepts
  • readOnlyRootFilesystem in practice: set readOnlyRootFilesystem: true in container securityContext; mount emptyDir volumes at /tmp + log paths the app needs to write. Verify: kubectl exec <pod> -- touch /test.txt fails with Read-only file system.
  • Signal advantage: with readOnlyRootFilesystem, any successful write is attacker behaviour by definition — Falco rules become near-zero false-positive.
  • Forensic rule #1: never run commands inside a compromised container — you are executing attacker-controlled code (think LD_PRELOAD, hijacked binaries).
  • Container inspection from the node: crictl ps, crictl inspect <container-id> (overlay filesystem paths, mounts, runtime info), crictl logs <container-id> (stdout without entering the container).
  • Evidence preservation: snapshot the node before remediation; preserve the overlay filesystem layers (/var/lib/containerd/io.containerd.runtime.v2.task/k8s.io/<id>/rootfs). Disk image first, kill the Pod after.
  • Behavioural detection stack: Falco watches runtime syscalls; audit logs watch the K8s control plane. Combine: Falco flags a shell in a Pod; audit log shows who reattached to it via exec. falcosidekick forwards Falco alerts to Slack / SIEM / webhook for paging.
  • Key audit-log forensic fields: user.username, verb, objectRef.{resource, name, namespace}, sourceIPs, requestReceivedTimestamp. With these five you can reconstruct most attack timelines.
Concrete example

Scenario: Falco fires "Read shadow detected (pod=web-7c9 container=app)". Step 1 — confirm via audit log: jq 'select(.objectRef.namespace=="prod" and .objectRef.name=="web-7c9") | {user: .user.username, verb, time: .requestReceivedTimestamp, ip: .sourceIPs}' /var/log/kubernetes/audit.log — find any exec calls into that Pod. Step 2 — contain: cordon the node (kubectl cordon node-3), apply a default-deny NetworkPolicy on the namespace to cut off egress. Step 3 — preserve evidence: SSH to the node; crictl ps → find the container ID; crictl inspect <id> for overlay paths; tar czf /forensic/web-7c9.tgz /var/lib/containerd/.../rootfs. Step 4 — kill: kubectl delete pod web-7c9. No kubectl exec into the live Pod at any point.

Key takeaway: immutability turns runtime detection from probabilistic to definite — any write on a readOnlyRootFilesystem Pod is an attacker. For containment, use the node-side crictl path (not kubectl exec), preserve overlay-FS evidence, then delete the Pod. Audit-log + Falco together reconstruct the attack; jq on /var/log/kubernetes/audit.log is the muscle-memory tool.
⚡ Mini-quiz
Drill incident-response + crictl forensics scenarios → study mode (10 questions).

Key Concept: Falco vs Audit Logs

Falco watches kernel syscalls in real time — it fires the moment a container opens a sensitive file or spawns a shell. Kubernetes Audit Logs capture Kubernetes API operations — who called the API, what resource was accessed, what was the response. Use Falco for runtime container behavior; use audit logs for Kubernetes control plane activity. The CKS tests both independently — they complement each other.

6-Week CKS Study Plan

Week 1
Prerequisites & Setup: Confirm your CKA is valid. Set up a Kubernetes cluster (kubeadm on VMs or kind + Calico). Review NetworkPolicy and RBAC fundamentals. Install Falco, kube-bench, and Trivy on your lab cluster.
Week 2
Cluster Setup & Hardening: Run kube-bench against your cluster and fix all FAIL items. Practice writing NetworkPolicy YAML for namespace isolation and metadata endpoint blocking. Configure the API server with all security flags. Practice RBAC tasks with kubectl auth can-i.
Week 3
System Hardening: Load AppArmor profiles, reference them in Pods. Configure seccomp RuntimeDefault and custom profiles. Write Pod specs with full SecurityContext hardening (drop ALL caps, readOnlyRootFilesystem, runAsNonRoot, allowPrivilegeEscalation=false). Create RuntimeClasses for gVisor scenarios.
Week 4
Microservice Vulnerabilities & Secrets: Apply Pod Security Admission to namespaces. Write OPA Gatekeeper ConstraintTemplates. Configure Secrets encryption at rest. Practice the Vault Agent Injector pattern. Apply all three layers (PSA + Gatekeeper + SecurityContext) together.
Week 5
Supply Chain Security: Scan images with Trivy in CI simulation. Sign images with Cosign and verify. Build multi-stage distroless Dockerfiles. Configure ImagePolicyWebhook (fail-closed). Write OPA policies for registry enforcement. Run trivy k8s cluster and fix findings.
Week 6
Runtime Security & Mock Exams: Write custom Falco rules and verify them. Configure Kubernetes audit logging with a tiered policy. Practice parsing audit logs with jq. Use killer.sh for 2 full mock exam sessions. Focus on speed — the real exam is 2 hours for 15–20 hands-on tasks.

Top 4 CKS Exam Mistakes

  • Forgetting DNS when writing NetworkPolicy: If you add an Egress deny-all policy without allowing port 53 UDP/TCP, DNS stops working and the Pod appears broken. Always add a DNS exception.
  • AppArmor profile not loaded on the right node: The profile must be present on every node where the Pod can schedule. If the profile isn't loaded, the Pod fails to start with a cryptic containerd error.
  • Audit policy rule order: Rules are first-match. A broad level: Metadata rule before your targeted level: None suppressions will catch everything. Always put specific suppressions first.
  • Using Falco field names that don't exist: file.path, syscall.type, and filename are not valid Falco fields. Use fd.name, evt.type, and proc.name. Always test rules by restarting Falco and checking journalctl.

CKS vs CKA — What's Different?

CKA — Administration

  • Cluster installation (kubeadm)
  • etcd backup & restore
  • Node maintenance and upgrades
  • Workload management (Deployments, rolling updates)
  • Storage: PV, PVC, StorageClass
  • Troubleshooting broken clusters

CKS — Security (requires CKA)

  • CIS benchmark hardening
  • RBAC least privilege + SA token management
  • AppArmor, seccomp, Linux capabilities
  • OPA Gatekeeper, Pod Security Admission
  • Falco runtime threat detection
  • Audit logging + supply chain security

Test your CKS knowledge

60 scenario-based questions covering all 6 CKS domains — Falco rules, RBAC, audit policy, NetworkPolicy, OPA and more.

Start CKS Practice Quiz Listen on Spotify

← Back to all courses

Start practicing →