Study on the go with our IT certification podcast
Tune in to GCP tips, cloud architecture breakdowns, and exam strategies while commuting or working out. New episodes weekly.
Listen on SpotifyCourse Modules
01
Google Cloud Foundations & Resource Hierarchy
3 lessons · ~3 hours
What is Google Cloud?
- GCP is Google's public cloud — the same infrastructure that runs Search, YouTube, and Gmail
- Available in 40+ regions, each with multiple zones (typically 3) for high availability
- A region is a geographic area (e.g., us-central1); a zone is a single deployment area within a region (e.g., us-central1-a)
- Google's private fiber network (Jupiter, Andromeda) connects all regions — low-latency global routing
- GCP follows the shared responsibility model: Google manages physical infrastructure; you manage your workloads, data, and access controls
GCP Resource Hierarchy
- Organization — top-level node, maps to a Google Workspace or Cloud Identity domain
- Folder — optional grouping layer (e.g., by department or environment); enables IAM/Org Policy inheritance
- Project — the primary unit: billing, API enablement, and IAM boundaries. Every resource belongs to a project
- Resources — VMs, buckets, databases, etc. within a project
- IAM policies applied at a higher level inherit downward; lower-level policies can be more permissive but not more restrictive
Setting Up Your Environment
- Install the Cloud SDK: provides
gcloud,gsutil(Storage),bq(BigQuery), andkubectl gcloud init— interactive setup: authenticate, set default project and regiongcloud config set project PROJECT_ID— set active projectgcloud config set compute/region us-central1— set default regiongcloud config configurations create my-config— manage multiple environments
Essential gcloud Commands
gcloud compute instances list— list all VMs in current projectgcloud compute instances create NAME --zone=ZONE --machine-type=e2-mediumgcloud compute ssh INSTANCE --zone=ZONE— SSH with automatic key managementgcloud services enable compute.googleapis.com— enable APIs (required before use)gcloud projects list— list all accessible projects
Billing Concepts
- Each project is linked to a billing account; billing accounts can cover multiple projects
- Labels (key-value pairs on resources) enable cost allocation and reporting per team/environment
- Set budgets and alerts in Cloud Billing to receive email or Pub/Sub notifications at spending thresholds (e.g., 50%, 80%, 100%)
- Use Cloud Cost Management and Recommender for rightsizing suggestions
Pricing Models
- Sustained Use Discounts (SUDs) — automatic discounts up to 57% for VMs running most of the month; no commitment required
- Committed Use Discounts (CUDs) — 1 or 3-year commitments for 57–70% off
- Spot VMs — up to 91% off, can be preempted with 30-second notice; ideal for batch workloads
- Free Tier — always-free products include: 1 f1-micro VM/month, 5 GB Cloud Storage, Cloud Functions invocations, BigQuery queries up to 1 TB/month
☁ Scenario — structuring a GCP resource hierarchy for a startup
Situation: A startup has 3 teams (frontend, backend, data). Each needs isolated billing and separate IAM boundaries, but all engineers share a single Google Workspace account.
Design: One Organization node (tied to the Google Workspace domain). One Folder per team (frontend-folder, backend-folder, data-folder). One Project per environment per team (e.g., backend-dev, backend-prod). Resources (VMs, buckets, databases) live inside projects. IAM policies applied at the folder level propagate to all child projects automatically.
Why projects matter on the ACE exam: Projects are the billing unit and IAM boundary. All GCP resources belong to exactly one project. The gcloud config set project PROJECT_ID command sets the default project for CLI commands — forgetting this is a common mistake on the real exam's lab tasks.
- The hierarchy is Organization → Folder → Project → Resource; IAM policies inherit downward — bind at the lowest level that satisfies the requirement (least privilege).
- Projects are the unit of billing, quota, and API enablement; APIs are off by default in new projects (
gcloud services enablefirst or expect a403). - Discount stack: SUDs apply automatically, CUDs need a 1- or 3-year commitment, Spot/Preemptible VMs are 60–91% off but can be evicted at any time — pick by workload tolerance to interruption.
02
Compute Engine & Managed Instance Groups
3 lessons · ~6 hours
Machine Types
- General purpose (E2, N2, N2D, T2D) — balanced price/performance for most workloads
- Compute-optimized (C2, C3) — high CPU frequency for compute-intensive apps
- Memory-optimized (M2, M3) — large in-memory databases, SAP HANA
- Custom machine types — specify exact vCPU and memory for right-sizing
- Accelerator-optimized (A2, G2) — NVIDIA GPUs for ML/AI workloads
Boot Disks & Persistent Storage
- Standard Persistent Disk (pd-standard) — HDD, cost-efficient, sequential workloads
- Balanced Persistent Disk (pd-balanced) — SSD, good general purpose (recommended default)
- SSD Persistent Disk (pd-ssd) — high IOPS for databases
- Local SSDs — ephemeral NVMe attached directly to the host; very fast but data lost on VM stop
- Snapshots — incremental backups of persistent disks; stored in Cloud Storage; used for disaster recovery
VM Lifecycle
- States: Provisioning → Staging → Running → Stopping → Terminated
- Stopped VMs do not incur compute charges but retain disk storage costs
- Metadata server at
169.254.169.254— VMs access instance metadata and service account tokens without needing key files
Instance Templates
- Define VM configuration once (machine type, disk, network, service account, startup script) — reuse for MIGs and Spot VMs
- Templates are immutable — create a new version to update; MIGs rolling updates use the new template
Managed Instance Groups (MIGs)
- MIGs deploy identical VM instances from a template, enabling autoscaling and autohealing
- Autoscaling — adds/removes VMs based on CPU utilization, HTTP load balancing capacity, or custom metrics
- Autohealing — uses health checks to detect and automatically replace unhealthy VMs
- Rolling updates — gradually deploy new templates across the MIG with configurable maxSurge and maxUnavailable
- Regional MIGs — spread instances across multiple zones for high availability
Load Balancer Types
- Global External HTTP(S) Load Balancer — Layer 7, URL routing, global Anycast IP, integrates with Cloud CDN and Cloud Armor
- Regional External TCP/UDP Network LB — Layer 4, non-proxy, preserves client IP, for non-HTTP protocols
- Internal TCP/UDP Load Balancer — Layer 4, private VPC traffic only
- Internal HTTP(S) Load Balancer — Layer 7, for microservices within VPC
- SSL Proxy and TCP Proxy LB — terminates SSL/TCP connections globally
Key Concepts
- Health checks — LBs use health checks to route only to healthy backends
- Backend services — define the backend (MIG, NEG) and health check for the LB
- URL maps — HTTP(S) LB routing rules (host/path-based)
- Cloud Armor — WAF and DDoS protection; attaches to the Global HTTP(S) LB
☁ Scenario — deploying a preemptible VM for batch processing
Situation: A data pipeline needs to process 500 GB of logs nightly. The job takes ~2 hours and can restart from a checkpoint if interrupted. Cost matters — this job runs every night.
Walk: 1) gcloud compute instances create batch-worker-1 --zone=us-central1-a --machine-type=n1-standard-4 --preemptible --image-family=debian-11 --image-project=debian-cloud. Preemptible VMs cost ~80% less but can be reclaimed by GCP with 30 seconds notice. 2) Script handles SIGTERM: saves a checkpoint to Cloud Storage before shutdown. 3) A Cloud Scheduler job retriggers the pipeline each night; if the VM was preempted, the job resumes from the last checkpoint. 4) After migration: cost drops from ~$150/night (standard) to ~$30/night (preemptible).
ACE exam note: Preemptible VMs are ideal for fault-tolerant batch jobs. Spot VMs (the successor) offer the same discount with more flexible preemption. For long-running services, use standard or committed-use VMs instead.
- Pick the machine type by workload:
e2general-purpose / cost-optimized,n2/n2dbalanced,c2compute-intensive,m2memory-optimized; default disk =pd-balanced, neverlocal SSDfor persistent data. - MIGs = autoscaling + self-healing + rolling updates; regional MIGs span zones for HA, instance templates declare the immutable spec, and stateful MIGs preserve per-VM disks/IPs across recreation.
- LB choice cheat-sheet: Global External HTTP(S) for internet web apps (Cloud CDN + Armor friendly), Internal HTTP(S) for service-to-service inside the VPC, Network LB for non-HTTP/UDP external traffic.
03
Kubernetes Engine (GKE) & Containers
3 lessons · ~6 hours
Cluster Modes
- GKE Standard — you manage node configuration, machine types, node pools; full control
- GKE Autopilot — Google manages node infrastructure; you only define pod specs; pay per pod not node
- Regional clusters — control plane and nodes replicated across 3 zones; no single zone is a SPOF; recommended for production
- Zonal clusters — single control plane in one zone; lower cost but less resilient
Node Pools
- A cluster can have multiple node pools with different machine types (e.g., standard pool + GPU pool)
- Node pools can be independently upgraded and scaled
- Cluster Autoscaler — automatically adds nodes when pods are pending; removes nodes when underutilized
- Node auto-provisioning — creates new node pools automatically for pending pods requiring specific resources
Core Objects
- Pod — smallest deployable unit; one or more containers sharing network/storage
- Deployment — manages stateless Pods with rolling updates and rollbacks; use for web apps and APIs
- StatefulSet — stateful workloads with stable network identity and persistent per-pod volumes; use for databases
- DaemonSet — ensures one Pod per node; use for log collectors, monitoring agents (Fluentd, Prometheus node exporter)
- CronJob — scheduled batch jobs on a cron schedule
Scaling
- Horizontal Pod Autoscaler (HPA) — scales Pod replicas based on CPU/memory or custom metrics
- Vertical Pod Autoscaler (VPA) — adjusts Pod resource requests/limits automatically
kubectl scale deployment nginx --replicas=5— manual scaling
Networking
- ClusterIP — internal service, reachable only within the cluster
- NodePort — exposes service on each node's IP at a static port
- LoadBalancer — provisions a GCP External Load Balancer for the service
- Ingress — HTTP(S) routing rules; on GKE creates a Global HTTP(S) Load Balancer
Workload Identity
- Workload Identity is the recommended way to grant GKE workloads access to GCP APIs
- Maps a Kubernetes Service Account (KSA) to a GCP Service Account (GSA)
- Pods use the KSA to impersonate the GSA — no key files stored in Secrets
- Enable at cluster creation:
--workload-pool=PROJECT_ID.svc.id.goog
Other GKE Security Best Practices
- Use Private clusters — nodes have no external IPs; API server accessible only via authorized networks
- Enable Binary Authorization — only signed, approved container images can run
- Use Network Policies — restrict pod-to-pod traffic
- Cloud SQL Auth Proxy as a sidecar for database connections — handles IAM auth and TLS
- Enable Shielded GKE Nodes for protection against rootkits and bootkits
☁ Scenario — deploying a containerised API to GKE
Situation: A REST API packaged as a Docker image needs to run on GKE, scale from 2 to 10 replicas based on CPU, and be reachable via a public load balancer.
Walk: 1) Push image: docker tag api gcr.io/my-project/api:v1 && docker push gcr.io/my-project/api:v1. 2) Create cluster: gcloud container clusters create api-cluster --zone=us-central1-a --num-nodes=3. 3) Deploy: kubectl create deployment api --image=gcr.io/my-project/api:v1 --replicas=2. 4) Expose: kubectl expose deployment api --type=LoadBalancer --port=80 --target-port=8080. GKE provisions a GCP HTTP load balancer and assigns a public IP. 5) Autoscale: kubectl autoscale deployment api --cpu-percent=70 --min=2 --max=10. When CPU >70%, new pods spin up automatically.
ACE exam note: LoadBalancer Service type creates an external L4 load balancer. For L7 (HTTP routing, path-based, TLS termination) use an Ingress with a GKE Ingress controller.
- Autopilot for hands-off (pay per pod, Google manages nodes), Standard when you need GPU nodes, custom OS images, or per-node config; Regional clusters for production HA (multi-zone control plane).
- Workload types: Deployment stateless / rolling updates, StatefulSet stable identity + ordered rollout, DaemonSet one pod per node (log shipper / agent), Job + CronJob for batch.
- Identity = Workload Identity (KSA ↔ GSA mapping, no JSON keys); expose with ClusterIP (internal), NodePort (per-node), LoadBalancer (cloud LB), or Ingress (HTTP routing + TLS termination).
04
Serverless, Storage & Databases
4 lessons · ~6 hours
Cloud Run
- Runs stateless containers on a fully managed platform; scales to zero; pay per CPU/memory during request processing
- Supports any language/runtime packaged as a Docker container
- Traffic splitting — split traffic between revisions for canary deployments
- Invoke via HTTP or Pub/Sub push subscriptions
Cloud Functions (2nd gen)
- Event-driven serverless functions; trigger via HTTP, Pub/Sub, Cloud Storage, Firestore, etc.
- 2nd gen is built on Cloud Run — longer timeouts (up to 60 min), larger instances
- Pair with Cloud Scheduler for cron-like scheduled execution
App Engine
- Standard environment — language-specific runtimes (Python, Node.js, Go, Java, PHP, Ruby); scales to zero; fast startup
- Flexible environment — custom Docker containers; minimum 1 instance (cannot scale to zero); use when Standard constraints are too limiting
- Versions and traffic splitting enable canary and blue/green deployments
Storage Classes
- Standard — frequently accessed data; no minimum storage duration
- Nearline — accessed at most once per month; 30-day minimum; ~50% cheaper than Standard
- Coldline — accessed at most once per 90 days; 90-day minimum
- Archive — long-term archive; <1 access/year; 365-day minimum; cheapest per GB
Key Features
- Object Lifecycle Management — rules to auto-transition or delete objects based on age, version count, etc.
- Versioning — retains every version with a generation number; enables accidental deletion recovery
- Uniform bucket-level access — disables ACLs; IAM-only access control (recommended)
- Signed URLs — time-limited, pre-signed URLs for unauthenticated access to specific objects
- Retention policies — prevent objects from being deleted or modified before a minimum age
Cloud SQL
- Fully managed MySQL, PostgreSQL, or SQL Server; regional (not global)
- High Availability (HA) — synchronous standby in a different zone; automatic failover
- Read replicas — asynchronous copies for read-heavy workloads; reduce primary load
- Connect securely via Cloud SQL Auth Proxy (recommended) or authorized networks
- Automated backups and point-in-time recovery (PITR) up to 7 days
Cloud Spanner
- Globally distributed, horizontally scalable relational database with ACID transactions
- 99.999% SLA for multi-region instances — use when Cloud SQL's regional scope is insufficient
- Ideal for: global financial apps, inventory systems, gaming leaderboards requiring strong consistency at scale
- Significantly more expensive than Cloud SQL — use it only when global distribution is truly required
Firestore
- Serverless NoSQL document database; real-time sync; offline support
- Best for: mobile apps, web apps, user profiles, content management
- Two modes: Native mode (new apps, real-time) and Datastore mode (server-side, legacy)
Cloud Bigtable
- Fully managed, wide-column NoSQL database; petabyte scale; millisecond latency
- Best for: time-series data, IoT sensor data, financial data, ML training datasets
- HBase-compatible API; integrates with Hadoop, Dataflow, Dataproc
- NOT suitable for: transactions, complex queries, small datasets (<1 TB)
Memorystore
- Fully managed Redis and Memcached — no infrastructure management
- Use for: session caching, real-time leaderboards, message queuing, rate limiting
- In-VPC only — not publicly accessible
☁ Scenario — event-driven thumbnail generation with Cloud Functions + Cloud Storage
Situation: Users upload images to a Cloud Storage bucket. Every upload should trigger automatic thumbnail creation and save the thumbnail to a second bucket — no server should be provisioned or managed.
Walk: 1) Create two buckets: gs://uploads-raw and gs://uploads-thumbs. 2) Write a Cloud Function (Python or Node.js): triggered by google.storage.object.finalize on uploads-raw. Function downloads the uploaded file, generates a 200×200 thumbnail using Pillow/Sharp, and writes it to uploads-thumbs. 3) Deploy: gcloud functions deploy generate-thumbnail --runtime=python311 --trigger-bucket=uploads-raw --entry-point=handler. 4) Test: upload a JPG to uploads-raw → Cloud Function triggers → thumbnail appears in uploads-thumbs within 2 seconds.
ACE exam note: Cloud Functions = event-driven, serverless, per-invocation billing. Cloud Run = containerised, HTTP-triggered, also serverless. App Engine Standard = managed runtime, scales to zero. App Engine Flex = custom runtime (Docker), always warm instance.
- Serverless compute: Cloud Run for containerised stateless services (scale to zero), Cloud Functions for event-driven snippets, App Engine Standard for sandbox-friendly runtimes — App Engine Flexible cannot scale to zero.
- Cloud Storage classes split on access frequency: Standard (hot), Nearline (~monthly), Coldline (~quarterly), Archive (yearly); add Lifecycle rules for auto-tiering/delete and Versioning for accidental-delete recovery.
- DB picker: Cloud SQL regional relational (MySQL/Postgres/SQL Server), Spanner global relational with strong consistency, Firestore document for mobile/web, Bigtable wide-column for time-series/IoT, Memorystore for Redis/Memcached caching.
05
Networking & IAM Security
3 lessons · ~5 hours
VPC Concepts
- GCP VPCs are global — a single VPC spans all regions (unlike AWS where VPCs are regional)
- Subnets are regional — each subnet has an IP range in a specific region
- Auto mode VPC — one /20 subnet per region created automatically; easy to start but can complicate peering
- Custom mode VPC — you define all subnets; recommended for production (avoid IP overlap)
- VMs in the same VPC communicate using internal IPs regardless of region — no VPC peering needed
Firewall Rules
- VPCs have an implicit deny-all ingress and allow-all egress by default
- Rules are stateful — established connections are tracked; return traffic is automatically allowed
- Target with tags or service accounts to apply rules to specific VMs
- Priority 0–65535 (lower = higher priority);
0.0.0.0/0= all sources
Hybrid Connectivity
- Cloud VPN — IPsec tunnels over the public internet; up to 3 Gbps per tunnel; simple setup
- Dedicated Interconnect — direct physical connection to Google's network; 10 or 100 Gbps; 99.99% SLA with redundancy
- Partner Interconnect — connect via a service provider; for locations without Dedicated Interconnect PoPs
- Cloud NAT — allows VMs without external IPs to make outbound internet connections
IAM Principals
- Google Account — individual user account (user@gmail.com)
- Service Account — machine identity for workloads (apps, VMs, functions)
- Google Group — set of users/service accounts; apply one IAM binding to many principals
- Workspace/Cloud Identity Domain — all users in your organization's domain
- allUsers — anyone on the internet (unauthenticated); use cautiously
- allAuthenticatedUsers — any signed-in Google account
Roles
- Basic roles — Owner, Editor, Viewer; coarse-grained; avoid in production
- Predefined roles — curated by Google for specific services (e.g., roles/storage.objectViewer)
- Custom roles — define exact permissions needed; enforce least privilege
Best Practices
- Principle of least privilege — grant only the minimum permissions required
- Prefer predefined roles over basic roles
- Use service accounts for workloads — never use personal accounts
- Avoid creating service account keys when possible — use Workload Identity or metadata server instead
- Organization Policy Service — enforce constraints organization-wide (e.g., prevent public IPs, restrict allowed regions)
Cloud KMS (Key Management Service)
- Manages encryption keys for GCP services
- Google-managed keys — default; Google handles rotation; no visibility to customer
- Customer-managed keys (CMEK) — you create/manage keys in Cloud KMS; GCP services use them to encrypt your data
- Customer-supplied keys (CSEK) — you provide raw key material; used for Compute Engine persistent disks
- Key rotation, audit logs, and IAM-controlled access to keys
VPC Service Controls
- Creates a security perimeter around GCP services (Storage, BigQuery, etc.)
- Restricts access to resources to only requests from authorized VPCs or IP ranges
- Prevents data exfiltration by blocking data from leaving the perimeter
Cloud Armor
- WAF (Web Application Firewall) and DDoS mitigation attached to the Global HTTP(S) LB
- Rules for: IP allowlisting/blocklisting, SQL injection protection, XSS protection, rate limiting, geo-based access
- Adaptive Protection — ML-based detection for volumetric DDoS attacks
☁ Scenario — locking down a VM with IAM + firewall rules
Situation: A backend VM should only be reachable on port 8080 from the frontend subnet (10.0.1.0/24) and only via SSH from a bastion host. No public IP. Only the deployment service account can write to the Cloud Storage bucket it reads from.
Walk: 1) No public IP: create VM with --no-address flag. 2) Firewall rules: gcloud compute firewall-rules create allow-frontend --allow=tcp:8080 --source-ranges=10.0.1.0/24 --target-tags=backend. gcloud compute firewall-rules create allow-bastion-ssh --allow=tcp:22 --source-tags=bastion --target-tags=backend. 3) Assign tag backend to the VM. 4) IAM: create service account deploy-sa@project.iam.gserviceaccount.com. Grant it roles/storage.objectViewer on the specific bucket (not the project). Attach the SA to the VM. 5) Verify: SSH from bastion works; direct SSH from internet fails; frontend can reach :8080; VM can read the bucket but not write to it.
ACE exam note: Firewall rules are stateful. GCP uses deny-all implicit rules — you must explicitly allow traffic. Prefer service account-based IAM over user-based IAM for VM workloads.
- VPC is global, subnets are regional; firewall rules are stateful, evaluated by priority (lower number = higher), default deny on ingress, default allow on egress.
- Identity: prefer Predefined roles over Basic (Owner/Editor/Viewer are too broad); bind to groups instead of individuals; for workloads use service accounts + Workload Identity, never personal credentials or downloaded JSON keys.
- Defence-in-depth: Cloud KMS for CMEK/CSEK, VPC Service Controls for service perimeters (data exfiltration protection), Cloud Armor for L7 WAF + DDoS on the Global HTTP(S) LB.
06
Operations: Monitoring, Logging & Deployment
3 lessons · ~4 hours
Cloud Monitoring (formerly Stackdriver)
- Collects metrics from GCP resources, AWS, and on-premises with the Ops Agent
- Metrics Explorer — query and visualize any metric
- Dashboards — custom or pre-built resource dashboards
- Alerting policies — trigger notifications via email, Pub/Sub, PagerDuty, Slack when metrics breach thresholds
- Uptime checks — periodic HTTP/HTTPS/TCP checks to verify service availability globally
- Ops Agent — required on Compute Engine VMs to collect system metrics and logs; install with one command
Cloud Trace & Profiler
- Cloud Trace — distributed tracing; analyzes latency across microservices; identifies slow operations
- Cloud Profiler — continuous CPU and memory profiling for production workloads
- Error Reporting — aggregates application exceptions and errors; groups similar errors; notifies on new error types
Cloud Logging
- Centralized log management for GCP services, VMs (with Ops Agent), and custom applications
- Log sinks — route log entries to Cloud Storage, BigQuery, Pub/Sub, or Splunk for archiving and analytics
- Log-based metrics — create custom metrics from log patterns to trigger alerts
- Retention: Admin Activity logs = 400 days; Data Access logs = 30 days (default)
Cloud Audit Logs
- Admin Activity — records API calls that modify resources (always on, no charge)
- Data Access — records API calls that read resource configurations or data (disabled by default; can generate very high volume)
- System Event — Google-generated system events (always on)
- Policy Denied — records when access is denied by VPC Service Controls
Deployment Options
- Cloud Deployment Manager — GCP-native IaC using YAML/Python/Jinja templates; older tooling
- Terraform — industry standard IaC; GCP provider; state stored in Cloud Storage GCS backend; multi-cloud capable
- Use Terraform for new projects — better community support, state management, and multi-cloud portability
CI/CD on GCP
- Cloud Build — fully managed CI/CD service; runs build steps in containers; triggered by GitHub, Cloud Source Repositories, or manually
- Artifact Registry — stores Docker images, Maven, npm, Python packages; replaces Container Registry
- Cloud Deploy — managed continuous delivery to GKE and Cloud Run; enforces deployment pipelines with approvals
- Typical pipeline:
git push→ Cloud Build (build + test + push image) → Artifact Registry → Cloud Deploy/gcloud deploy
Exam-Relevant Scenarios
- "Store Terraform state for team collaboration" → GCS backend bucket with versioning
- "Build and deploy containers automatically on merge" → Cloud Build + Artifact Registry + Cloud Run
- "Enforce only approved images run on GKE" → Binary Authorization
☁ Scenario — setting up an uptime check alert with Cloud Monitoring
Situation: A public-facing web app at https://app.example.com must page the on-call engineer within 2 minutes if the site goes down. Currently there is no alerting.
Walk: 1) Cloud Console → Monitoring → Uptime Checks → Create. Target: HTTPS, hostname app.example.com, path /health, check interval 1 min, timeout 10s. Checker regions: USA, Europe, Asia (multi-region validates it's not a regional blip). 2) Create an Alerting Policy: condition = uptime check failure from ≥2 out of 3 regions for ≥1 min. Notification channel: PagerDuty webhook + email. 3) Test: temporarily block the health endpoint → Monitoring flags failure after 1 min → alert fires → PagerDuty page sent. 4) Review logs: Cloud Logging → Logs Explorer → resource.type="https_lb_rule" to see which requests errored.
ACE exam note: Cloud Monitoring = metrics + uptime + alerting. Cloud Logging = structured logs, queryable with Logging Query Language. Error Reporting = aggregates application exceptions automatically. Cloud Trace = distributed request tracing for latency analysis.
- Cloud Monitoring needs the Ops Agent on Compute Engine VMs to ship metrics + logs; managed services (GKE, App Engine, Cloud Run) auto-emit; alerting policies fire on thresholds and route to channels (email, Slack, PagerDuty).
- Cloud Logging stores logs in
_Default+_Requiredbuckets; for long-term compliance, create a sink exporting Admin Activity / Data Access logs to GCS (with Object Lock) or BigQuery — a recurring ACE scenario. - GCP-native CI/CD chain: Cloud Build (CI) → Artifact Registry (image / package storage) → Cloud Deploy (CD with progression policies); pair with Binary Authorization on GKE to only run signed/approved images.