CNCF / Kubernetes

Certified Kubernetes Administrator (CKA)

Master every domain of the CNCF Certified Kubernetes Administrator exam. This course covers the full Kubernetes architecture, installing and configuring clusters with kubeadm, managing workloads and scheduling, services and networking, persistent storage, role-based access control, security contexts, and systematic troubleshooting — with real kubectl commands, YAML examples, and exam-aligned explanations throughout.

Intermediate 7 modules ~35 hours 60 practice questions
🎧

Study on the go with our IT certification podcast

Tune in to Kubernetes tips, cluster architecture breakdowns, and CKA exam strategies while commuting or working out. New episodes weekly.

Listen on Spotify

Course Modules

01
Kubernetes Architecture & Core Concepts
3 lessons · ~4 hours
Control Plane Components

The Kubernetes Control Plane

  • kube-apiserver — the front-end for the Kubernetes control plane; all internal and external communication passes through it; exposes the Kubernetes API over HTTPS
  • etcd — a consistent, highly-available key-value store used as Kubernetes' backing store for all cluster data; treat it as the source of truth
  • kube-scheduler — watches for newly created Pods with no assigned node; selects the best node based on resource requirements, policies, taints/tolerations, and affinity rules
  • kube-controller-manager — runs all controller processes as a single binary (Node Controller, Replication Controller, Endpoints Controller, ServiceAccount Controller, etc.)
  • cloud-controller-manager — interfaces with the underlying cloud provider API to manage nodes, routes, and load balancers; separates cloud-specific logic from the core controllers
The control plane is the brain of Kubernetes. On a kubeadm cluster, control plane components run as static Pods in the kube-system namespace — you can inspect them with kubectl get pods -n kube-system. Their manifests live in /etc/kubernetes/manifests/.
Know which component is responsible for what. etcd = data store. apiserver = gateway. scheduler = Pod placement. controller-manager = reconciliation loops. The exam often asks you to identify which component is failing based on symptoms.
Worker Node Components

What Runs on Every Worker Node

  • kubelet — the primary node agent; registers the node with the API server; ensures containers described in PodSpecs are running and healthy; communicates with the container runtime via CRI
  • kube-proxy — maintains network rules (iptables or IPVS) on each node that implement Services; handles traffic routing to Pod endpoints
  • Container runtime — the software responsible for running containers (containerd, CRI-O); communicates with kubelet via the Container Runtime Interface (CRI)

Key Node Concepts

  • Node status conditions: Ready, MemoryPressure, DiskPressure, PIDPressure, NetworkUnavailable
  • Inspect a node: kubectl describe node <node-name>
  • Check kubelet status: systemctl status kubelet
  • Kubelet config: /var/lib/kubelet/config.yaml
Unlike control plane components that run as static Pods, kubelet and kube-proxy run as system services managed by systemd. When a node is NotReady, check kubelet logs first: journalctl -u kubelet -n 50.
The CKA exam often includes a broken-node scenario. The most common cause is a stopped kubelet service. Always check systemctl status kubelet and journalctl -u kubelet first.
kubectl & Core API Resources

kubectl Command Syntax

  • Basic pattern: kubectl [command] [TYPE] [NAME] [flags]
  • kubectl get pods — list Pods in current namespace
  • kubectl get pods -A — list Pods across all namespaces
  • kubectl describe pod <name> — detailed info including Events
  • kubectl delete pod <name> --force --grace-period=0 — immediate deletion
  • kubectl get pod <name> -o yaml — output full resource spec as YAML
  • kubectl explain pod.spec.containers — inline API documentation

Imperative Commands (exam speed tricks)

  • kubectl run nginx --image=nginx --restart=Never — create a Pod
  • kubectl run nginx --image=nginx --dry-run=client -o yaml > pod.yaml — generate YAML without creating
  • kubectl create deployment app --image=nginx --replicas=3 — create a Deployment
  • kubectl expose deployment app --port=80 --type=ClusterIP — create a Service
  • kubectl set image deployment/app nginx=nginx:1.25 — update container image

Core API Resources

  • Pod — smallest deployable unit; one or more containers sharing network and storage
  • ReplicaSet — ensures a specified number of Pod replicas are running; typically managed by a Deployment
  • Deployment — declarative Pod management with rolling update and rollback capabilities
  • Namespace — virtual cluster partition for isolating resources; default namespaces: default, kube-system, kube-public, kube-node-lease
The CKA is a hands-on performance-based exam. Master --dry-run=client -o yaml to generate resource templates quickly instead of writing YAML from scratch. This saves enormous time.
02
Cluster Installation & Configuration
3 lessons · ~5 hours
kubeadm: Bootstrapping a Cluster

kubeadm init Workflow

  • Pre-flight checks: swap disabled, required ports open, container runtime running
  • kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address=<IP> — initialize control plane
  • After init, copy kubeconfig: mkdir -p $HOME/.kube && cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  • Install a CNI plugin (e.g. Calico): kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
  • Print join command: kubeadm token create --print-join-command

kubeadm join Workflow

  • Run the join command on each worker node as root: kubeadm join <apiserver-ip>:6443 --token <token> --discovery-token-ca-cert-hash sha256:<hash>
  • Bootstrap tokens expire after 24 hours by default — regenerate with kubeadm token create
  • Verify nodes joined: kubectl get nodes

etcd Backup & Restore

  • Take a snapshot: ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot.db --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
  • Verify snapshot: ETCDCTL_API=3 etcdctl snapshot status /backup/etcd-snapshot.db
  • Restore: ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db --data-dir=/var/lib/etcd-restore, then update the etcd static Pod manifest to point to the new data directory
The etcd backup/restore task appears in almost every CKA exam. Memorize the full etcdctl snapshot save command with all three certificate flags. The certs are always in /etc/kubernetes/pki/etcd/.
Always set ETCDCTL_API=3 before running etcdctl commands. Without it, etcdctl defaults to v2 API and the commands will be different or fail entirely.
Cluster Upgrade & kubeconfig Management

Upgrading a Cluster with kubeadm

  • Upgrades must happen one minor version at a time (e.g., 1.28 → 1.29, never 1.28 → 1.30)
  • Control plane first: apt-mark unhold kubeadm && apt-get install -y kubeadm=1.29.0-00 && apt-mark hold kubeadm
  • Check upgrade plan: kubeadm upgrade plan
  • Apply upgrade: kubeadm upgrade apply v1.29.0
  • Drain control plane node: kubectl drain <node> --ignore-daemonsets
  • Upgrade kubelet and kubectl: apt-get install -y kubelet=1.29.0-00 kubectl=1.29.0-00, then systemctl daemon-reload && systemctl restart kubelet
  • Uncordon: kubectl uncordon <node>
  • Repeat drain/upgrade/uncordon for each worker node

kubeconfig: Managing Cluster Access

  • Default location: ~/.kube/config — can be overridden with KUBECONFIG env var
  • List contexts: kubectl config get-contexts
  • Switch context: kubectl config use-context <context-name>
  • View current context: kubectl config current-context
  • Set default namespace for a context: kubectl config set-context --current --namespace=dev
  • Merge multiple kubeconfig files: KUBECONFIG=~/.kube/config:~/.kube/prod-config kubectl config view --merge --flatten > ~/.kube/merged-config
The CKA exam gives you multiple clusters. Always verify which cluster you are on with kubectl config current-context before running commands. Switching to the wrong context is a common exam mistake.
RBAC & TLS Certificate Management

Role-Based Access Control (RBAC)

  • Role — namespaced; grants permissions to resources within a specific namespace
  • ClusterRole — cluster-wide; can grant access to cluster-scoped resources (nodes, PVs) or any namespace
  • RoleBinding — binds a Role or ClusterRole to subjects (users, groups, ServiceAccounts) within a namespace
  • ClusterRoleBinding — binds a ClusterRole to subjects across the entire cluster

Creating RBAC Resources

  • Create a Role: kubectl create role pod-reader --verb=get,list,watch --resource=pods -n dev
  • Create a RoleBinding: kubectl create rolebinding dev-binding --role=pod-reader --user=jane -n dev
  • Test access: kubectl auth can-i get pods --as=jane -n dev
  • Inspect effective permissions: kubectl auth can-i --list --as=jane -n dev

TLS Certificates & CSRs

  • View cluster certificate details: openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout
  • Check certificate expiry: kubeadm certs check-expiration
  • Renew all certificates: kubeadm certs renew all
  • Create a user CSR: generate private key with openssl genrsa -out jane.key 2048, then openssl req -new -key jane.key -subj "/CN=jane/O=dev-team" -out jane.csr
  • Submit CSR to Kubernetes: create a CertificateSigningRequest object with the base64-encoded CSR
  • Approve CSR: kubectl certificate approve <csr-name>
Use kubectl auth can-i to verify your RBAC rules work before submitting your exam answer. A quick test like kubectl auth can-i create deployments --as=system:serviceaccount:dev:mysa -n dev confirms permissions instantly.
RBAC is heavily tested. Know the difference: Role + RoleBinding = namespaced. ClusterRole + ClusterRoleBinding = cluster-wide. You CAN bind a ClusterRole with a RoleBinding — it limits the ClusterRole's permissions to that namespace only.
03
Workloads & Scheduling
3 lessons · ~4 hours
Deployments: Rolling Updates, Rollbacks & Scaling

Deployment Strategy

  • RollingUpdate (default) — gradually replaces old Pods with new ones; configurable with maxSurge and maxUnavailable
  • Recreate — terminates all old Pods before creating new ones; causes downtime
  • Update image: kubectl set image deployment/webapp nginx=nginx:1.25 --record
  • Check rollout status: kubectl rollout status deployment/webapp
  • View rollout history: kubectl rollout history deployment/webapp
  • Rollback to previous revision: kubectl rollout undo deployment/webapp
  • Rollback to specific revision: kubectl rollout undo deployment/webapp --to-revision=2

Scaling

  • Manual scale: kubectl scale deployment webapp --replicas=5
  • HorizontalPodAutoscaler: kubectl autoscale deployment webapp --min=2 --max=10 --cpu-percent=70
  • Pause/resume rollout: kubectl rollout pause deployment/webapp / kubectl rollout resume deployment/webapp

DaemonSets, StatefulSets, Jobs & CronJobs

  • DaemonSet — ensures one Pod runs on every node (or a subset via nodeSelector); used for log collectors, monitoring agents, CNI plugins
  • StatefulSet — for stateful applications; Pods get stable network identifiers (pod-0, pod-1) and persistent volume claims per Pod; ordered deployment and scaling
  • Job — runs one or more Pods to completion; use completions and parallelism to control batch execution
  • CronJob — creates Jobs on a cron schedule; schedule: "*/5 * * * *" runs every 5 minutes
Know when to use each workload type: Deployment for stateless apps, StatefulSet for databases, DaemonSet for node agents, Job for one-time tasks, CronJob for scheduled tasks. The exam will describe a scenario and ask you to pick or create the right resource.
Resource Management: Requests, Limits & Quotas

Container Resources

  • requests — the minimum resources a container needs; used by the scheduler for placement decisions
  • limits — the maximum resources a container can use; enforced by the kubelet; exceeding CPU limit causes throttling; exceeding memory limit causes OOMKill
  • Example YAML: resources: {requests: {cpu: "250m", memory: "128Mi"}, limits: {cpu: "500m", memory: "256Mi"}}
  • CPU units: 1 CPU = 1000m (millicores); memory: Ki, Mi, Gi

LimitRange

  • Set default requests/limits and min/max constraints per container/Pod/PVC within a namespace
  • If a Pod has no resource requests/limits and a LimitRange exists in the namespace, the LimitRange defaults are applied automatically
  • Create: kubectl create -f limitrange.yaml, inspect: kubectl describe limitrange -n dev

ResourceQuota

  • Limits total resource consumption within a namespace: CPU, memory, number of objects (Pods, Services, PVCs, etc.)
  • If a namespace has a ResourceQuota for CPU/memory, all Pods MUST specify requests and limits — otherwise they will be rejected
  • Check quota usage: kubectl describe resourcequota -n dev
A common exam scenario: a Pod fails to schedule in a namespace. Check if a ResourceQuota exists and whether the Pod's requests are within the remaining quota. Also check if a LimitRange is requiring requests/limits that the Pod doesn't define.
Taints, Tolerations & Node Affinity

Taints & Tolerations

  • Taint — applied to a node to repel Pods that don't tolerate it: kubectl taint nodes node1 key=value:NoSchedule
  • Taint effects: NoSchedule (no new Pods), PreferNoSchedule (avoid if possible), NoExecute (evict existing Pods too)
  • Remove a taint: kubectl taint nodes node1 key=value:NoSchedule- (append minus)
  • Toleration — added to a Pod spec to allow scheduling onto tainted nodes

Node Affinity

  • requiredDuringSchedulingIgnoredDuringExecution — hard requirement; Pod won't schedule if no matching node exists
  • preferredDuringSchedulingIgnoredDuringExecution — soft preference; Pod schedules elsewhere if no match
  • Uses node labels; label a node: kubectl label nodes node1 disktype=ssd
  • Pod Affinity/Anti-affinity — schedule Pods relative to other Pods (e.g., co-locate app and cache, or spread replicas across nodes)

Pod Priority & Preemption

  • Define a PriorityClass with a value (higher = more important); assign to Pod with priorityClassName
  • When a high-priority Pod can't schedule due to resource constraints, the scheduler may evict lower-priority Pods to make room
  • Built-in priority classes: system-cluster-critical and system-node-critical for core Kubernetes components
Taints repel; tolerations allow. Node affinity attracts. If a Pod is stuck Pending, check: (1) Does a node have a taint the Pod doesn't tolerate? (2) Does node affinity match any available node? (3) Are there sufficient resources?
04
Services & Networking
3 lessons · ~5 hours
Services: Types & DNS Resolution

Service Types

  • ClusterIP (default) — stable internal IP accessible only within the cluster; ideal for service-to-service communication
  • NodePort — exposes the service on each node's IP at a static port (30000–32767); allows external access via <NodeIP>:<NodePort>
  • LoadBalancer — provisions an external load balancer from the cloud provider; assigns a public IP to the service; used in managed Kubernetes environments
  • ExternalName — maps the service to a DNS name (e.g., an external database hostname); returns a CNAME record; no proxying or port mapping
  • Headless ServiceclusterIP: None; returns Pod IPs directly from DNS; used with StatefulSets for direct Pod addressing

DNS in Kubernetes (CoreDNS)

  • CoreDNS runs as a Deployment in kube-system; every Pod gets its DNS configured to use the CoreDNS ClusterIP
  • Service DNS pattern: <service-name>.<namespace>.svc.cluster.local
  • Within the same namespace, just <service-name> resolves correctly
  • Pod DNS: <pod-ip-dashes>.<namespace>.pod.cluster.local
  • Debug DNS: kubectl run tmp --image=busybox --restart=Never -- nslookup kubernetes.default
CoreDNS config is stored in a ConfigMap: kubectl get configmap coredns -n kube-system -o yaml. If DNS resolution fails in a Pod, check that CoreDNS Pods are running and that the Pod's /etc/resolv.conf points to the CoreDNS ClusterIP.
When a Pod cannot reach a Service, verify the Service selector matches Pod labels exactly. A single typo in the selector means zero endpoints: kubectl get endpoints <service-name> — an empty Endpoints list confirms a selector mismatch.
Ingress: HTTP Routing & TLS Termination

Ingress Concepts

  • Ingress — API object that manages external HTTP/HTTPS access to services in a cluster; provides load balancing, name-based virtual hosting, and SSL termination
  • Requires an Ingress Controller (e.g., nginx-ingress, Traefik) to be deployed in the cluster — Ingress objects do nothing without a controller
  • Host-based routing: route app.example.com to one service, api.example.com to another
  • Path-based routing: route /api to backend-service, / to frontend-service

Ingress YAML Patterns

  • Create with: kubectl create ingress my-ingress --rule="app.example.com/=webapp:80" --rule="app.example.com/api=api-svc:8080"
  • TLS termination: reference a Secret containing tls.crt and tls.key in the Ingress spec under tls:
  • Inspect: kubectl describe ingress my-ingress shows the routing rules and backend health
Always specify ingressClassName or the kubernetes.io/ingress.class annotation to tell Kubernetes which Ingress Controller should handle the resource. Without it, no controller may pick it up.
NetworkPolicy & CNI Plugins

NetworkPolicy

  • By default, all Pods can communicate with all other Pods — NetworkPolicy allows you to restrict this
  • NetworkPolicy is namespace-scoped; works based on podSelector, namespaceSelector, and ipBlock
  • Ingress rules — control incoming traffic to selected Pods
  • Egress rules — control outgoing traffic from selected Pods
  • A NetworkPolicy that selects Pods but has no ingress/egress rules blocks all traffic in that direction for those Pods
  • Default deny all ingress: create a NetworkPolicy selecting all Pods with empty ingress: []

NetworkPolicy Example Pattern

  • Allow traffic to app=backend Pods only from app=frontend Pods: set podSelector: {matchLabels: {app: backend}} and ingress from: [{podSelector: {matchLabels: {app: frontend}}}]
  • Allow egress to DNS only: egress to port 53 UDP/TCP on kube-dns namespace
  • Test connectivity: kubectl exec -it <pod> -- nc -zv <service> <port>

CNI Plugins Overview

  • CNI (Container Network Interface) is the standard for cluster networking — every kubeadm cluster requires one
  • Calico — most popular for production; supports NetworkPolicy natively; BGP-based routing; used heavily in CKA exam environments
  • Flannel — simple VXLAN overlay; does not support NetworkPolicy natively (requires Calico for policy enforcement)
  • Weave Net — easy setup; supports NetworkPolicy; encrypted inter-node traffic optional
NetworkPolicy requires a CNI plugin that supports it. If you apply NetworkPolicy on a cluster running Flannel alone, the policies will be accepted by the API server but not enforced — traffic won't actually be blocked.
NetworkPolicy questions are common on the CKA. Read the scenario carefully to understand the direction (ingress vs. egress) and scope (namespace vs. pod selector). Drawing a quick diagram of what needs to communicate helps avoid mistakes.
05
Storage
2 lessons · ~4 hours
PersistentVolumes, PVCs & StorageClasses

The Storage Lifecycle

  • PersistentVolume (PV) — a piece of storage provisioned by an administrator or dynamically; has its own lifecycle independent of any Pod
  • PersistentVolumeClaim (PVC) — a request for storage by a user; specifies size and access mode; Kubernetes binds the PVC to a matching PV
  • StorageClass — defines the "class" of storage (provisioner, reclaim policy, parameters); enables dynamic provisioning
  • Binding is based on: storage size, access mode, and optionally storageClassName or label selectors

Access Modes

  • ReadWriteOnce (RWO) — can be mounted read-write by a single node; supported by most block storage (EBS, GCE PD)
  • ReadOnlyMany (ROX) — can be mounted read-only by many nodes simultaneously
  • ReadWriteMany (RWX) — can be mounted read-write by many nodes; requires shared filesystems (NFS, CephFS, Azure Files)
  • ReadWriteOncePod (RWOP) — mounted read-write by a single Pod only (Kubernetes 1.22+)

Dynamic Provisioning with StorageClass

  • PVC references a StorageClass; the provisioner automatically creates the underlying storage and a PV
  • Reclaim policies: Delete (PV deleted when PVC deleted; default for dynamic), Retain (PV kept for manual cleanup), Recycle (deprecated)
  • Create PVC: kubectl create -f pvc.yaml; check binding: kubectl get pvc — status should show Bound
  • Mount in a Pod: reference PVC name in volumes section, then volumeMounts in container spec
A PVC stuck in Pending state means no suitable PV was found. Check: (1) Does a PV exist with matching access mode and sufficient size? (2) Is the StorageClass name correct? (3) Does the provisioner have enough capacity or permissions?
On the CKA exam, always check kubectl get pv and kubectl get pvc to see the CAPACITY, ACCESS MODES, RECLAIM POLICY, and STATUS columns. An RWX claim will never bind to an RWO PV — access mode must match.
Volume Types: emptyDir, hostPath, ConfigMap & Secret

Ephemeral Volumes

  • emptyDir — created empty when a Pod is assigned to a node; exists for the lifetime of the Pod; shared between containers in the same Pod; useful for scratch space, caching, or sidecar communication
  • Memory-backed emptyDir: emptyDir: {medium: Memory} — faster but counts against container memory limits

Node-Level Volumes

  • hostPath — mounts a file or directory from the host node filesystem into the Pod; powerful but dangerous; creates tight coupling to a specific node; use only for node-level system components (DaemonSets)
  • Types: Directory, File, Socket, DirectoryOrCreate, FileOrCreate

ConfigMap & Secret Volumes

  • configMap volume — mounts ConfigMap data as files in the Pod; each key becomes a filename; updates to the ConfigMap propagate to the Pod (with a short delay)
  • secret volume — mounts Secret data as files; stored in tmpfs (RAM) for security; each key becomes a filename in the mount path
  • Both can be projected together using the projected volume type

NFS Volumes

  • NFS volumes allow multiple Pods across multiple nodes to share the same filesystem (ReadWriteMany)
  • Specify in PV spec: nfs: {server: <nfs-server-ip>, path: /exports/data}
  • No special provisioner needed — the NFS server must be reachable from all nodes
Know how to inject a ConfigMap as environment variables vs. as a mounted file. As env vars: changes to the ConfigMap don't automatically reflect in running Pods (restart required). As a volume: changes propagate automatically.
06
Security
3 lessons · ~4 hours
Pod Security & Security Contexts

Pod Security Admission (PSA)

  • Kubernetes 1.25+ replaces PodSecurityPolicy with Pod Security Admission — a built-in admission controller
  • Three policy levels: privileged (unrestricted), baseline (minimally restrictive; prevents known privilege escalations), restricted (heavily restricted; current hardening best practices)
  • Applied per namespace with labels: pod-security.kubernetes.io/enforce: restricted
  • Modes: enforce (reject Pods that violate), audit (log violations), warn (warn users)

Security Contexts

  • runAsNonRoot: true — prevent containers from running as root
  • runAsUser: 1000 — set the UID for the container process
  • readOnlyRootFilesystem: true — make the container's root filesystem read-only
  • allowPrivilegeEscalation: false — prevents processes from gaining more privileges than their parent
  • capabilities: {drop: ["ALL"], add: ["NET_BIND_SERVICE"]} — drop all Linux capabilities and add only what's needed
  • Pod-level vs. container-level: pod-level securityContext applies to all containers; container-level overrides pod-level
The restricted PSA level requires: non-root user, no privilege escalation, drop all capabilities, read-only root filesystem where possible, and specific seccomp profiles. If your Pods fail to schedule in a restricted namespace, check security context settings.
Security context fields are frequently tested. Know the field names exactly: runAsNonRoot, runAsUser, readOnlyRootFilesystem, allowPrivilegeEscalation, capabilities. Use kubectl explain pod.spec.securityContext during the exam.
ServiceAccounts & RBAC Best Practices

ServiceAccounts

  • Every Pod runs with a ServiceAccount (defaults to default SA in the namespace)
  • The SA token is auto-mounted at /var/run/secrets/kubernetes.io/serviceaccount/token — used by applications to authenticate to the API server
  • Create a dedicated SA: kubectl create serviceaccount my-app-sa -n production
  • Disable auto-mount if not needed: automountServiceAccountToken: false in Pod or SA spec
  • Kubernetes 1.24+: SA tokens are no longer auto-created as Secrets; use TokenRequest API or create a Secret manually with type: kubernetes.io/service-account-token

RBAC Best Practices

  • Follow least privilege — grant only the verbs and resources the application actually needs
  • Prefer namespace-scoped Roles over ClusterRoles for namespaced workloads
  • Avoid binding ClusterRoles like cluster-admin to application ServiceAccounts
  • Audit existing bindings: kubectl get rolebindings,clusterrolebindings -A -o wide
  • Use kubectl auth can-i --list --as=system:serviceaccount:<ns>:<sa> to audit effective permissions
A common CKA task: "Create a ServiceAccount, bind it to a Role, and configure a Pod to use it." Remember three steps: create SA, create Role/RoleBinding referencing the SA, set serviceAccountName in Pod spec.
Secrets Management & Image Security

Kubernetes Secrets

  • Opaque — generic key-value data (base64-encoded, not encrypted by default); most common type
  • kubernetes.io/tls — TLS certificate and private key; fields: tls.crt and tls.key
  • kubernetes.io/dockerconfigjson — registry credentials for pulling private images
  • Create from literal: kubectl create secret generic db-creds --from-literal=password=s3cr3t
  • Create TLS secret: kubectl create secret tls my-tls --cert=cert.pem --key=key.pem
  • Inject as env var: valueFrom: {secretKeyRef: {name: db-creds, key: password}}
  • Inject as volume: Secret keys become files at the mount path

Image Security & Private Registries

  • Pull from private registry: create a kubernetes.io/dockerconfigjson Secret, then reference it with imagePullSecrets in Pod spec
  • Attach pull secret to a ServiceAccount: kubectl patch sa default -p '{"imagePullSecrets": [{"name": "registry-creds"}]}' — all Pods using that SA will auto-use the pull secret
  • Always use specific image tags (not :latest) in production — :latest with imagePullPolicy: Always re-pulls on every restart

Network Policies for Zero-Trust

  • Start with a default-deny NetworkPolicy in each namespace, then explicitly allow required traffic
  • Separate ingress and egress policies; explicitly allow DNS egress (port 53) or Pods can't resolve service names
  • Combine with RBAC to restrict who can create/modify NetworkPolicy objects
Secrets are base64-encoded, not encrypted, by default. For production security, enable etcd encryption at rest by configuring an EncryptionConfiguration object and referencing it in the kube-apiserver manifest with --encryption-provider-config.
The exam won't ask you to break base64 encoding. But it will ask you to create secrets and use them. Know both the env var injection pattern and the volume mount pattern — they're both tested.
07
Troubleshooting
4 lessons · ~9 hours
Diagnosing Node Issues

Node NotReady: Step-by-Step

  • Identify problematic node: kubectl get nodes — look for NotReady or Unknown status
  • Inspect conditions: kubectl describe node <node-name> — check Conditions section for MemoryPressure, DiskPressure, NetworkUnavailable
  • Check kubelet service: systemctl status kubelet — is it running? Any error messages?
  • Read kubelet logs: journalctl -u kubelet -n 100 --no-pager — look for certificate errors, container runtime failures, or configuration issues
  • Verify container runtime: systemctl status containerd or systemctl status cri-o
  • Check disk: df -h — a full disk stops kubelet from running new Pods
  • Verify network: ensure the node can reach the API server IP on port 6443

Cordon, Drain & Uncordon

  • kubectl cordon <node> — marks node as unschedulable; existing Pods continue running
  • kubectl drain <node> --ignore-daemonsets --delete-emptydir-data — evicts all Pods and cordons the node; use for maintenance
  • kubectl uncordon <node> — marks node schedulable again after maintenance
In a CKA troubleshooting scenario, always SSH to the broken node and check kubelet first. The most common exam scenario: kubelet service is stopped — fix with systemctl start kubelet && systemctl enable kubelet.
Diagnosing Pod Failures

Pod Status States

  • Pending — Pod accepted but not yet scheduled; check Events in kubectl describe pod for: insufficient resources, taint/toleration mismatch, PVC not bound, image pull issue at scheduling
  • CrashLoopBackOff — container repeatedly crashes; Kubernetes applies an exponential back-off delay between restarts; inspect logs with kubectl logs <pod> --previous
  • OOMKilled — container exceeded its memory limit and was killed by the kernel; increase memory limit or reduce application memory usage
  • ImagePullBackOff / ErrImagePull — Kubernetes cannot pull the container image; causes: wrong image name, missing imagePullSecrets, registry unreachable, rate limiting
  • CreateContainerConfigError — missing ConfigMap or Secret referenced by the Pod; verify the referenced resources exist in the same namespace
  • Terminating (stuck) — Pod stuck in Terminating state; usually a finalizer issue; force delete with kubectl delete pod <pod> --force --grace-period=0

Inspection Commands

  • kubectl describe pod <pod> — full details: events, conditions, container statuses, volume mounts
  • kubectl get events --sort-by=.lastTimestamp -n <namespace> — chronological event stream
  • kubectl logs <pod> -c <container> — logs from a specific container in a multi-container Pod
  • kubectl logs <pod> --previous — logs from the previous (crashed) container instance
  • kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[0].state}' — precise state in JSON
CrashLoopBackOff is one of the most common issues candidates see both in exams and in real clusters. Always check kubectl logs --previous first — the crashed container's last logs usually contain the root cause (missing env var, misconfigured mount, application startup error).
For ImagePullBackOff, check the exact image name (typos are common), verify the tag exists in the registry, and confirm imagePullSecrets are correctly configured if pulling from a private registry.
Network Connectivity Debugging

Testing Pod-to-Pod & Pod-to-Service Connectivity

  • Interactive shell in a running Pod: kubectl exec -it <pod> -- /bin/sh
  • One-shot connectivity test: kubectl exec <pod> -- nc -zv <target-service> <port>
  • DNS resolution test: kubectl exec <pod> -- nslookup <service-name>.<namespace>.svc.cluster.local
  • Deploy a debug pod on demand: kubectl run debug --image=nicolaka/netshoot --rm -it --restart=Never -- /bin/bash — netshoot includes curl, nslookup, netstat, tcpdump, and more
  • Check Service endpoints: kubectl get endpoints <service-name> — empty means no matching Pods
  • Verify kube-proxy rules: kubectl get configmap kube-proxy -n kube-system -o yaml

Common Network Failure Patterns

  • Empty Endpoints → Service selector doesn't match Pod labels; fix the selector or Pod labels
  • DNS resolution fails → CoreDNS is down or misconfigured; check kubectl get pods -n kube-system | grep coredns
  • NetworkPolicy blocking traffic → verify with temporary policy removal or by checking policy selectors
  • kube-proxy not running → no Service routing; check kube-proxy DaemonSet: kubectl get ds kube-proxy -n kube-system
The nicolaka/netshoot image is your best friend for network debugging. It contains every networking tool you need. Use kubectl run debug --image=nicolaka/netshoot --rm -it --restart=Never -- bash to get an instant debug environment.
etcd Health & Control Plane Troubleshooting

etcd Health Checks

  • Check etcd member health: ETCDCTL_API=3 etcdctl endpoint health --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
  • List etcd members: ETCDCTL_API=3 etcdctl member list --endpoints=... --cacert=... --cert=... --key=...
  • If etcd is unhealthy, the API server will be unresponsive — all kubectl commands will fail
  • Check etcd static Pod: cat /etc/kubernetes/manifests/etcd.yaml — verify data directory and cert paths are correct

API Server & Other Control Plane Issues

  • Control plane components run as static Pods in /etc/kubernetes/manifests/ — the kubelet monitors these files and restarts Pods on changes
  • If API server is down, check the static Pod manifest for misconfigurations (wrong port, bad cert path)
  • Logs for static Pods: kubectl logs kube-apiserver-<node> -n kube-system or crictl logs <container-id> if kubectl is unavailable
  • crictl commands (when kubectl is unavailable): crictl ps (list containers), crictl logs <id> (container logs), crictl inspect <id>

Audit Logs

  • Configure API server audit logging via --audit-log-path and --audit-policy-file flags in the kube-apiserver manifest
  • Audit policy levels: None, Metadata (request only), Request (request + body), RequestResponse (request + response)
  • Audit logs are invaluable for security investigations — who deleted a resource, what ServiceAccount made an unauthorized API call
When kubectl doesn't work, use crictl directly on the node. It talks to the container runtime and can inspect running containers, pull logs, and check container health without going through the API server. Essential for recovering a broken control plane.
Control plane troubleshooting scenarios on the CKA often involve a deliberately broken kube-apiserver manifest. Common issues: wrong etcd endpoint, missing or misspelled certificate path, incorrect port. Always check /etc/kubernetes/manifests/ and compare against a known-good reference.

Ready to test your knowledge?

Challenge yourself with 60 Certified Kubernetes Administrator practice questions — scenario-based, hands-on style, and free.

← Back to all courses