Google Cloud Foundations & Resource Hierarchy

3 lessons · ~3 hours

Every GCP scenario hangs off one diagram: Organization → Folder → Project → Resource, with IAM policies inheriting top-down and resources living inside regions and zones. Master that hierarchy plus the four billing levers (SUDs, CUDs, Spot, free tier) and the day-1 admin tasks — gcloud CLI, enabling APIs, setting quotas — and the ACE exam's foundations domain becomes a series of "where does this policy attach?" questions with obvious answers.

Cloud Concepts & GCP Architecture

What is Google Cloud?

GCP is Google's public cloud — the same infrastructure that runs Search, YouTube, and Gmail
Available in 40+ regions, each with multiple zones (typically 3) for high availability
A region is a geographic area (e.g., us-central1); a zone is a single deployment area within a region (e.g., us-central1-a)
Google's private fiber network (Jupiter, Andromeda) connects all regions — low-latency global routing
GCP follows the shared responsibility model: Google manages physical infrastructure; you manage your workloads, data, and access controls

GCP Resource Hierarchy

Organization — top-level node, maps to a Google Workspace or Cloud Identity domain
Folder — optional grouping layer (e.g., by department or environment); enables IAM/Org Policy inheritance
Project — the primary unit: billing, API enablement, and IAM boundaries. Every resource belongs to a project
Resources — VMs, buckets, databases, etc. within a project
IAM policies applied at a higher level inherit downward; lower-level policies can be more permissive but not more restrictive

Think of it as: Organization > Folder(s) > Project > Resource. When you want to isolate dev/staging/prod, use separate projects. When you want to apply a policy to an entire department, use a folder.

The ACE exam frequently asks about the resource hierarchy and where IAM policies should be applied. Understand inheritance: a role granted at the Organization level propagates to all child resources.

Cloud SDK & gcloud CLI Essentials

Setting Up Your Environment

Install the Cloud SDK: provides gcloud, gsutil (Storage), bq (BigQuery), and kubectl
gcloud init — interactive setup: authenticate, set default project and region
gcloud config set project PROJECT_ID — set active project
gcloud config set compute/region us-central1 — set default region
gcloud config configurations create my-config — manage multiple environments

Essential gcloud Commands

gcloud compute instances list — list all VMs in current project
gcloud compute instances create NAME --zone=ZONE --machine-type=e2-medium
gcloud compute ssh INSTANCE --zone=ZONE — SSH with automatic key management
gcloud services enable compute.googleapis.com — enable APIs (required before use)
gcloud projects list — list all accessible projects

APIs are disabled by default in new projects. Always enable the required API (Compute Engine API, Kubernetes Engine API, etc.) before making API calls. The exam tests this.

Billing, Quotas & Cost Management

Billing Concepts

Each project is linked to a billing account; billing accounts can cover multiple projects
Labels (key-value pairs on resources) enable cost allocation and reporting per team/environment
Set budgets and alerts in Cloud Billing to receive email or Pub/Sub notifications at spending thresholds (e.g., 50%, 80%, 100%)
Use Cloud Cost Management and Recommender for rightsizing suggestions

Pricing Models

Sustained Use Discounts (SUDs) — automatic discounts up to 57% for VMs running most of the month; no commitment required
Committed Use Discounts (CUDs) — 1 or 3-year commitments for 57–70% off
Spot VMs — up to 91% off, can be preempted with 30-second notice; ideal for batch workloads
Free Tier — always-free products include: 1 f1-micro VM/month, 5 GB Cloud Storage, Cloud Functions invocations, BigQuery queries up to 1 TB/month

Know the difference between SUDs (automatic, no action), CUDs (commitment-based), and Spot VMs (interruptible). The exam tests when to recommend each pricing model.

☁ Scenario — structuring a GCP resource hierarchy for a startup

Situation: A startup has 3 teams (frontend, backend, data). Each needs isolated billing and separate IAM boundaries, but all engineers share a single Google Workspace account.

Design: One Organization node (tied to the Google Workspace domain). One Folder per team (frontend-folder, backend-folder, data-folder). One Project per environment per team (e.g., backend-dev, backend-prod). Resources (VMs, buckets, databases) live inside projects. IAM policies applied at the folder level propagate to all child projects automatically.

Why projects matter on the ACE exam: Projects are the billing unit and IAM boundary. All GCP resources belong to exactly one project. The gcloud config set project PROJECT_ID command sets the default project for CLI commands — forgetting this is a common mistake on the real exam's lab tasks.

Key takeaways

The hierarchy is Organization → Folder → Project → Resource; IAM policies inherit downward — bind at the lowest level that satisfies the requirement (least privilege).
Projects are the unit of billing, quota, and API enablement; APIs are off by default in new projects (gcloud services enable first or expect a 403).
Discount stack: SUDs apply automatically, CUDs need a 1- or 3-year commitment, Spot/Preemptible VMs are 60–91% off but can be evicted at any time — pick by workload tolerance to interruption.

⚡ Mini-quiz — Drill resource hierarchy, IAM inheritance, project APIs, and pricing models.

Quick quiz →

02

Compute Engine & Managed Instance Groups

3 lessons · ~6 hours

Compute Engine is the IaaS workhorse: VMs with machine types and disks, Managed Instance Groups (MIGs) for autoscaling + self-healing, and Load Balancers in front (Global HTTP(S), Internal, Network, TCP/SSL Proxy). The ACE exam loves "which LB and which disk type?" — answers depend on traffic shape (internal vs internet, HTTP vs TCP) and persistence needs (pd-balanced default, local SSD ephemeral, pd-ssd for high IOPS).

VM Instance Fundamentals

Machine Types

General purpose (E2, N2, N2D, T2D) — balanced price/performance for most workloads
Compute-optimized (C2, C3) — high CPU frequency for compute-intensive apps
Memory-optimized (M2, M3) — large in-memory databases, SAP HANA
Custom machine types — specify exact vCPU and memory for right-sizing
Accelerator-optimized (A2, G2) — NVIDIA GPUs for ML/AI workloads

Boot Disks & Persistent Storage

Standard Persistent Disk (pd-standard) — HDD, cost-efficient, sequential workloads
Balanced Persistent Disk (pd-balanced) — SSD, good general purpose (recommended default)
SSD Persistent Disk (pd-ssd) — high IOPS for databases
Local SSDs — ephemeral NVMe attached directly to the host; very fast but data lost on VM stop
Snapshots — incremental backups of persistent disks; stored in Cloud Storage; used for disaster recovery

VM Lifecycle

States: Provisioning → Staging → Running → Stopping → Terminated
Stopped VMs do not incur compute charges but retain disk storage costs
Metadata server at 169.254.169.254 — VMs access instance metadata and service account tokens without needing key files

Know when to use each disk type. For most ACE scenarios: pd-balanced is the default recommendation. Local SSDs are fast but ephemeral — don't use them for persistent data.

Instance Templates & Managed Instance Groups

Instance Templates

Define VM configuration once (machine type, disk, network, service account, startup script) — reuse for MIGs and Spot VMs
Templates are immutable — create a new version to update; MIGs rolling updates use the new template

Managed Instance Groups (MIGs)

MIGs deploy identical VM instances from a template, enabling autoscaling and autohealing
Autoscaling — adds/removes VMs based on CPU utilization, HTTP load balancing capacity, or custom metrics
Autohealing — uses health checks to detect and automatically replace unhealthy VMs
Rolling updates — gradually deploy new templates across the MIG with configurable maxSurge and maxUnavailable
Regional MIGs — spread instances across multiple zones for high availability

MIGs are the backbone of scalable, resilient Compute Engine architecture. Pair a regional MIG with a Global HTTP(S) Load Balancer for a highly available web application.

Load Balancing on GCP

Load Balancer Types

Global External HTTP(S) Load Balancer — Layer 7, URL routing, global Anycast IP, integrates with Cloud CDN and Cloud Armor
Regional External TCP/UDP Network LB — Layer 4, non-proxy, preserves client IP, for non-HTTP protocols
Internal TCP/UDP Load Balancer — Layer 4, private VPC traffic only
Internal HTTP(S) Load Balancer — Layer 7, for microservices within VPC
SSL Proxy and TCP Proxy LB — terminates SSL/TCP connections globally

Key Concepts

Health checks — LBs use health checks to route only to healthy backends
Backend services — define the backend (MIG, NEG) and health check for the LB
URL maps — HTTP(S) LB routing rules (host/path-based)
Cloud Armor — WAF and DDoS protection; attaches to the Global HTTP(S) LB

For internet-facing web apps needing global routing and DDoS protection: Global External HTTP(S) LB. For internal microservices: Internal HTTP(S) LB. For UDP/non-HTTP external: Regional Network LB.

☁ Scenario — deploying a preemptible VM for batch processing

Situation: A data pipeline needs to process 500 GB of logs nightly. The job takes ~2 hours and can restart from a checkpoint if interrupted. Cost matters — this job runs every night.

Walk: 1) gcloud compute instances create batch-worker-1 --zone=us-central1-a --machine-type=n1-standard-4 --preemptible --image-family=debian-11 --image-project=debian-cloud. Preemptible VMs cost ~80% less but can be reclaimed by GCP with 30 seconds notice. 2) Script handles SIGTERM: saves a checkpoint to Cloud Storage before shutdown. 3) A Cloud Scheduler job retriggers the pipeline each night; if the VM was preempted, the job resumes from the last checkpoint. 4) After migration: cost drops from ~$150/night (standard) to ~$30/night (preemptible).

ACE exam note: Preemptible VMs are ideal for fault-tolerant batch jobs. Spot VMs (the successor) offer the same discount with more flexible preemption. For long-running services, use standard or committed-use VMs instead.

Key takeaways

Pick the machine type by workload: e2 general-purpose / cost-optimized, n2/n2d balanced, c2 compute-intensive, m2 memory-optimized; default disk = pd-balanced, never local SSD for persistent data.
MIGs = autoscaling + self-healing + rolling updates; regional MIGs span zones for HA, instance templates declare the immutable spec, and stateful MIGs preserve per-VM disks/IPs across recreation.
LB choice cheat-sheet: Global External HTTP(S) for internet web apps (Cloud CDN + Armor friendly), Internal HTTP(S) for service-to-service inside the VPC, Network LB for non-HTTP/UDP external traffic.

⚡ Mini-quiz — Drill machine types, disk choices, MIG behaviours, and load-balancer selection.

Quick quiz →

03

Kubernetes Engine (GKE) & Containers

3 lessons · ~6 hours

GKE is the second-heaviest ACE domain. Three decisions drive every exam question: cluster mode (Autopilot fully managed vs Standard with node-level control), workload type (Deployment / StatefulSet / DaemonSet / Job), and identity (Workload Identity binding KSAs to GCP service accounts — never raw JSON keys). Master those and the rest is service exposure and node-pool sizing.

GKE Cluster Architecture

Cluster Modes

GKE Standard — you manage node configuration, machine types, node pools; full control
GKE Autopilot — Google manages node infrastructure; you only define pod specs; pay per pod not node
Regional clusters — control plane and nodes replicated across 3 zones; no single zone is a SPOF; recommended for production
Zonal clusters — single control plane in one zone; lower cost but less resilient

Node Pools

A cluster can have multiple node pools with different machine types (e.g., standard pool + GPU pool)
Node pools can be independently upgraded and scaled
Cluster Autoscaler — automatically adds nodes when pods are pending; removes nodes when underutilized
Node auto-provisioning — creates new node pools automatically for pending pods requiring specific resources

For the ACE exam: use Regional clusters for production HA. Use GKE Autopilot when the team wants minimal infrastructure management. Use Standard when you need GPU nodes or specific OS configurations.

Kubernetes Workload Objects

Core Objects

Pod — smallest deployable unit; one or more containers sharing network/storage
Deployment — manages stateless Pods with rolling updates and rollbacks; use for web apps and APIs
StatefulSet — stateful workloads with stable network identity and persistent per-pod volumes; use for databases
DaemonSet — ensures one Pod per node; use for log collectors, monitoring agents (Fluentd, Prometheus node exporter)
CronJob — scheduled batch jobs on a cron schedule

Scaling

Horizontal Pod Autoscaler (HPA) — scales Pod replicas based on CPU/memory or custom metrics
Vertical Pod Autoscaler (VPA) — adjusts Pod resource requests/limits automatically
kubectl scale deployment nginx --replicas=5 — manual scaling

Networking

ClusterIP — internal service, reachable only within the cluster
NodePort — exposes service on each node's IP at a static port
LoadBalancer — provisions a GCP External Load Balancer for the service
Ingress — HTTP(S) routing rules; on GKE creates a Global HTTP(S) Load Balancer

Know which Kubernetes object to use for each scenario. Deployment = stateless. StatefulSet = stateful with stable identity. DaemonSet = one pod per node. This is heavily tested.

Workload Identity & GKE Security

Workload Identity

Workload Identity is the recommended way to grant GKE workloads access to GCP APIs
Maps a Kubernetes Service Account (KSA) to a GCP Service Account (GSA)
Pods use the KSA to impersonate the GSA — no key files stored in Secrets
Enable at cluster creation: --workload-pool=PROJECT_ID.svc.id.goog

Other GKE Security Best Practices

Use Private clusters — nodes have no external IPs; API server accessible only via authorized networks
Enable Binary Authorization — only signed, approved container images can run
Use Network Policies — restrict pod-to-pod traffic
Cloud SQL Auth Proxy as a sidecar for database connections — handles IAM auth and TLS
Enable Shielded GKE Nodes for protection against rootkits and bootkits

Workload Identity replaces the pattern of downloading a service account JSON key and mounting it as a Kubernetes Secret — which is a security risk if the Secret is misconfigured or exposed.

☁ Scenario — deploying a containerised API to GKE

Situation: A REST API packaged as a Docker image needs to run on GKE, scale from 2 to 10 replicas based on CPU, and be reachable via a public load balancer.

Walk: 1) Push image: docker tag api gcr.io/my-project/api:v1 && docker push gcr.io/my-project/api:v1. 2) Create cluster: gcloud container clusters create api-cluster --zone=us-central1-a --num-nodes=3. 3) Deploy: kubectl create deployment api --image=gcr.io/my-project/api:v1 --replicas=2. 4) Expose: kubectl expose deployment api --type=LoadBalancer --port=80 --target-port=8080. GKE provisions a GCP HTTP load balancer and assigns a public IP. 5) Autoscale: kubectl autoscale deployment api --cpu-percent=70 --min=2 --max=10. When CPU >70%, new pods spin up automatically.

ACE exam note: LoadBalancer Service type creates an external L4 load balancer. For L7 (HTTP routing, path-based, TLS termination) use an Ingress with a GKE Ingress controller.

Key takeaways

Autopilot for hands-off (pay per pod, Google manages nodes), Standard when you need GPU nodes, custom OS images, or per-node config; Regional clusters for production HA (multi-zone control plane).
Workload types: Deployment stateless / rolling updates, StatefulSet stable identity + ordered rollout, DaemonSet one pod per node (log shipper / agent), Job + CronJob for batch.
Identity = Workload Identity (KSA ↔ GSA mapping, no JSON keys); expose with ClusterIP (internal), NodePort (per-node), LoadBalancer (cloud LB), or Ingress (HTTP routing + TLS termination).

⚡ Mini-quiz — Drill Autopilot vs Standard, workload types, services, and Workload Identity.

Quick quiz →

04

Serverless, Storage & Databases

4 lessons · ~6 hours

The serverless + data domain is a giant decision tree: App Engine Std / Cloud Run / Cloud Functions for compute; Cloud Storage with four classes (Standard / Nearline / Coldline / Archive) + Lifecycle + Versioning for objects; Cloud SQL / Spanner / Firestore / Bigtable / Memorystore for databases. The exam asks the same question repeatedly: given these constraints (scale, consistency, latency, schema), which managed service? Module 04 walks each leaf of that tree.

Serverless Compute: Cloud Run, Cloud Functions & App Engine

Cloud Run

Runs stateless containers on a fully managed platform; scales to zero; pay per CPU/memory during request processing
Supports any language/runtime packaged as a Docker container
Traffic splitting — split traffic between revisions for canary deployments
Invoke via HTTP or Pub/Sub push subscriptions

Cloud Functions (2nd gen)

Event-driven serverless functions; trigger via HTTP, Pub/Sub, Cloud Storage, Firestore, etc.
2nd gen is built on Cloud Run — longer timeouts (up to 60 min), larger instances
Pair with Cloud Scheduler for cron-like scheduled execution

App Engine

Standard environment — language-specific runtimes (Python, Node.js, Go, Java, PHP, Ruby); scales to zero; fast startup
Flexible environment — custom Docker containers; minimum 1 instance (cannot scale to zero); use when Standard constraints are too limiting
Versions and traffic splitting enable canary and blue/green deployments

Scale-to-zero requires App Engine Standard or Cloud Run — not Flexible. This is a common exam trap. If cost optimization for idle apps is the goal, avoid Flexible.

Cloud Storage Deep Dive

Storage Classes

Standard — frequently accessed data; no minimum storage duration
Nearline — accessed at most once per month; 30-day minimum; ~50% cheaper than Standard
Coldline — accessed at most once per 90 days; 90-day minimum
Archive — long-term archive; <1 access/year; 365-day minimum; cheapest per GB

Key Features

Object Lifecycle Management — rules to auto-transition or delete objects based on age, version count, etc.
Versioning — retains every version with a generation number; enables accidental deletion recovery
Uniform bucket-level access — disables ACLs; IAM-only access control (recommended)
Signed URLs — time-limited, pre-signed URLs for unauthenticated access to specific objects
Retention policies — prevent objects from being deleted or modified before a minimum age

Lifecycle rules + Versioning are frequently tested together. A common question: "automatically delete objects older than 30 days" → Lifecycle rule with Age=30 + Delete action. "Prevent accidental deletion" → enable Versioning.

Relational Databases: Cloud SQL & Cloud Spanner

Cloud SQL

Fully managed MySQL, PostgreSQL, or SQL Server; regional (not global)
High Availability (HA) — synchronous standby in a different zone; automatic failover
Read replicas — asynchronous copies for read-heavy workloads; reduce primary load
Connect securely via Cloud SQL Auth Proxy (recommended) or authorized networks
Automated backups and point-in-time recovery (PITR) up to 7 days

Cloud Spanner

Globally distributed, horizontally scalable relational database with ACID transactions
99.999% SLA for multi-region instances — use when Cloud SQL's regional scope is insufficient
Ideal for: global financial apps, inventory systems, gaming leaderboards requiring strong consistency at scale
Significantly more expensive than Cloud SQL — use it only when global distribution is truly required

Cloud SQL = regional relational DB. Cloud Spanner = global relational DB. If the scenario mentions "global", "multiple regions", and "strong consistency", the answer is Spanner.

NoSQL Databases: Firestore, Bigtable & Memorystore

Firestore

Serverless NoSQL document database; real-time sync; offline support
Best for: mobile apps, web apps, user profiles, content management
Two modes: Native mode (new apps, real-time) and Datastore mode (server-side, legacy)

Cloud Bigtable

Fully managed, wide-column NoSQL database; petabyte scale; millisecond latency
Best for: time-series data, IoT sensor data, financial data, ML training datasets
HBase-compatible API; integrates with Hadoop, Dataflow, Dataproc
NOT suitable for: transactions, complex queries, small datasets (<1 TB)

Memorystore

Fully managed Redis and Memcached — no infrastructure management
Use for: session caching, real-time leaderboards, message queuing, rate limiting
In-VPC only — not publicly accessible

Database choice questions: Firestore = mobile/web app document data. Bigtable = time-series/IoT at massive scale. Memorystore = in-memory caching/session. Cloud SQL/Spanner = relational/transactional.

☁ Scenario — event-driven thumbnail generation with Cloud Functions + Cloud Storage

Situation: Users upload images to a Cloud Storage bucket. Every upload should trigger automatic thumbnail creation and save the thumbnail to a second bucket — no server should be provisioned or managed.

Walk: 1) Create two buckets: gs://uploads-raw and gs://uploads-thumbs. 2) Write a Cloud Function (Python or Node.js): triggered by google.storage.object.finalize on uploads-raw. Function downloads the uploaded file, generates a 200×200 thumbnail using Pillow/Sharp, and writes it to uploads-thumbs. 3) Deploy: gcloud functions deploy generate-thumbnail --runtime=python311 --trigger-bucket=uploads-raw --entry-point=handler. 4) Test: upload a JPG to uploads-raw → Cloud Function triggers → thumbnail appears in uploads-thumbs within 2 seconds.

ACE exam note: Cloud Functions = event-driven, serverless, per-invocation billing. Cloud Run = containerised, HTTP-triggered, also serverless. App Engine Standard = managed runtime, scales to zero. App Engine Flex = custom runtime (Docker), always warm instance.

Key takeaways

Serverless compute: Cloud Run for containerised stateless services (scale to zero), Cloud Functions for event-driven snippets, App Engine Standard for sandbox-friendly runtimes — App Engine Flexible cannot scale to zero.
Cloud Storage classes split on access frequency: Standard (hot), Nearline (~monthly), Coldline (~quarterly), Archive (yearly); add Lifecycle rules for auto-tiering/delete and Versioning for accidental-delete recovery.
DB picker: Cloud SQL regional relational (MySQL/Postgres/SQL Server), Spanner global relational with strong consistency, Firestore document for mobile/web, Bigtable wide-column for time-series/IoT, Memorystore for Redis/Memcached caching.

⚡ Mini-quiz — Drill serverless choice, storage classes & lifecycle, and DB selection.

Quick quiz →

05

Networking & IAM Security

3 lessons · ~5 hours

Two pieces of trivia define this domain: a VPC is global in GCP (subnets are regional, the VPC spans them — different from AWS), and IAM grants permissions on resources, never on users. From there, master firewall rules (stateful, evaluated by priority), VPC peering vs Shared VPC vs VPN/Interconnect, and the four IAM role types: Basic / Predefined / Custom / Service-account-scoped. Least privilege via predefined roles is the recurring exam answer.

VPC Networking Fundamentals

VPC Concepts

GCP VPCs are global — a single VPC spans all regions (unlike AWS where VPCs are regional)
Subnets are regional — each subnet has an IP range in a specific region
Auto mode VPC — one /20 subnet per region created automatically; easy to start but can complicate peering
Custom mode VPC — you define all subnets; recommended for production (avoid IP overlap)
VMs in the same VPC communicate using internal IPs regardless of region — no VPC peering needed

Firewall Rules

VPCs have an implicit deny-all ingress and allow-all egress by default
Rules are stateful — established connections are tracked; return traffic is automatically allowed
Target with tags or service accounts to apply rules to specific VMs
Priority 0–65535 (lower = higher priority); 0.0.0.0/0 = all sources

Hybrid Connectivity

Cloud VPN — IPsec tunnels over the public internet; up to 3 Gbps per tunnel; simple setup
Dedicated Interconnect — direct physical connection to Google's network; 10 or 100 Gbps; 99.99% SLA with redundancy
Partner Interconnect — connect via a service provider; for locations without Dedicated Interconnect PoPs
Cloud NAT — allows VMs without external IPs to make outbound internet connections

A VPC is global in GCP (unlike AWS). This means a VM in us-central1 and a VM in europe-west1 in the same VPC can communicate via internal IPs without VPC peering.

IAM: Identity & Access Management

IAM Principals

Google Account — individual user account (user@gmail.com)
Service Account — machine identity for workloads (apps, VMs, functions)
Google Group — set of users/service accounts; apply one IAM binding to many principals
Workspace/Cloud Identity Domain — all users in your organization's domain
allUsers — anyone on the internet (unauthenticated); use cautiously
allAuthenticatedUsers — any signed-in Google account

Roles

Basic roles — Owner, Editor, Viewer; coarse-grained; avoid in production
Predefined roles — curated by Google for specific services (e.g., roles/storage.objectViewer)
Custom roles — define exact permissions needed; enforce least privilege

Best Practices

Principle of least privilege — grant only the minimum permissions required
Prefer predefined roles over basic roles
Use service accounts for workloads — never use personal accounts
Avoid creating service account keys when possible — use Workload Identity or metadata server instead
Organization Policy Service — enforce constraints organization-wide (e.g., prevent public IPs, restrict allowed regions)

IAM questions often test least privilege. When asked which role to grant, pick the most specific predefined role that covers only what's needed. Don't grant Editor or Owner unless explicitly required.

Security Services: Cloud KMS, VPC Service Controls & Cloud Armor

Cloud KMS (Key Management Service)

Manages encryption keys for GCP services
Google-managed keys — default; Google handles rotation; no visibility to customer
Customer-managed keys (CMEK) — you create/manage keys in Cloud KMS; GCP services use them to encrypt your data
Customer-supplied keys (CSEK) — you provide raw key material; used for Compute Engine persistent disks
Key rotation, audit logs, and IAM-controlled access to keys

VPC Service Controls

Creates a security perimeter around GCP services (Storage, BigQuery, etc.)
Restricts access to resources to only requests from authorized VPCs or IP ranges
Prevents data exfiltration by blocking data from leaving the perimeter

Cloud Armor

WAF (Web Application Firewall) and DDoS mitigation attached to the Global HTTP(S) LB
Rules for: IP allowlisting/blocklisting, SQL injection protection, XSS protection, rate limiting, geo-based access
Adaptive Protection — ML-based detection for volumetric DDoS attacks

☁ Scenario — locking down a VM with IAM + firewall rules

Situation: A backend VM should only be reachable on port 8080 from the frontend subnet (10.0.1.0/24) and only via SSH from a bastion host. No public IP. Only the deployment service account can write to the Cloud Storage bucket it reads from.

Walk: 1) No public IP: create VM with --no-address flag. 2) Firewall rules: gcloud compute firewall-rules create allow-frontend --allow=tcp:8080 --source-ranges=10.0.1.0/24 --target-tags=backend. gcloud compute firewall-rules create allow-bastion-ssh --allow=tcp:22 --source-tags=bastion --target-tags=backend. 3) Assign tag backend to the VM. 4) IAM: create service account deploy-sa@project.iam.gserviceaccount.com. Grant it roles/storage.objectViewer on the specific bucket (not the project). Attach the SA to the VM. 5) Verify: SSH from bastion works; direct SSH from internet fails; frontend can reach :8080; VM can read the bucket but not write to it.

ACE exam note: Firewall rules are stateful. GCP uses deny-all implicit rules — you must explicitly allow traffic. Prefer service account-based IAM over user-based IAM for VM workloads.

Key takeaways

VPC is global, subnets are regional; firewall rules are stateful, evaluated by priority (lower number = higher), default deny on ingress, default allow on egress.
Identity: prefer Predefined roles over Basic (Owner/Editor/Viewer are too broad); bind to groups instead of individuals; for workloads use service accounts + Workload Identity, never personal credentials or downloaded JSON keys.
Defence-in-depth: Cloud KMS for CMEK/CSEK, VPC Service Controls for service perimeters (data exfiltration protection), Cloud Armor for L7 WAF + DDoS on the Global HTTP(S) LB.

⚡ Mini-quiz — Drill VPC scope, firewall priority, IAM least privilege, KMS / VPC-SC / Armor selection.

Quick quiz →

06

Operations: Monitoring, Logging & Deployment

3 lessons · ~4 hours

Day-2 operations on GCP rest on three pillars: Cloud Monitoring (metrics, dashboards, alerting policies), Cloud Logging (log buckets, sinks, exports to BigQuery/GCS for retention), and a CI/CD pipeline built on Cloud Build + Artifact Registry + Cloud Deploy. The exam frequently asks "what do I need to install on a Compute Engine VM to get metrics?" — answer is always the Ops Agent (managed services auto-emit).

Cloud Monitoring & Alerting

Cloud Monitoring (formerly Stackdriver)

Collects metrics from GCP resources, AWS, and on-premises with the Ops Agent
Metrics Explorer — query and visualize any metric
Dashboards — custom or pre-built resource dashboards
Alerting policies — trigger notifications via email, Pub/Sub, PagerDuty, Slack when metrics breach thresholds
Uptime checks — periodic HTTP/HTTPS/TCP checks to verify service availability globally
Ops Agent — required on Compute Engine VMs to collect system metrics and logs; install with one command

Cloud Trace & Profiler

Cloud Trace — distributed tracing; analyzes latency across microservices; identifies slow operations
Cloud Profiler — continuous CPU and memory profiling for production workloads
Error Reporting — aggregates application exceptions and errors; groups similar errors; notifies on new error types

Compute Engine does NOT automatically send logs or metrics to Cloud Monitoring. You must install the Ops Agent. Managed services (GKE, App Engine, Cloud Run) auto-send logs.

Cloud Logging & Audit Logs

Cloud Logging

Centralized log management for GCP services, VMs (with Ops Agent), and custom applications
Log sinks — route log entries to Cloud Storage, BigQuery, Pub/Sub, or Splunk for archiving and analytics
Log-based metrics — create custom metrics from log patterns to trigger alerts
Retention: Admin Activity logs = 400 days; Data Access logs = 30 days (default)

Cloud Audit Logs

Admin Activity — records API calls that modify resources (always on, no charge)
Data Access — records API calls that read resource configurations or data (disabled by default; can generate very high volume)
System Event — Google-generated system events (always on)
Policy Denied — records when access is denied by VPC Service Controls

For compliance, export Admin Activity and Data Access logs to a Cloud Storage bucket with Object Lock (WORM) or to BigQuery for long-term retention. This is a common ACE exam scenario.

Infrastructure as Code & CI/CD on GCP

Deployment Options

Cloud Deployment Manager — GCP-native IaC using YAML/Python/Jinja templates; older tooling
Terraform — industry standard IaC; GCP provider; state stored in Cloud Storage GCS backend; multi-cloud capable
Use Terraform for new projects — better community support, state management, and multi-cloud portability

CI/CD on GCP

Cloud Build — fully managed CI/CD service; runs build steps in containers; triggered by GitHub, Cloud Source Repositories, or manually
Artifact Registry — stores Docker images, Maven, npm, Python packages; replaces Container Registry
Cloud Deploy — managed continuous delivery to GKE and Cloud Run; enforces deployment pipelines with approvals
Typical pipeline: git push → Cloud Build (build + test + push image) → Artifact Registry → Cloud Deploy/gcloud deploy

Exam-Relevant Scenarios

"Store Terraform state for team collaboration" → GCS backend bucket with versioning
"Build and deploy containers automatically on merge" → Cloud Build + Artifact Registry + Cloud Run
"Enforce only approved images run on GKE" → Binary Authorization

Cloud Build = CI (build/test). Cloud Deploy = CD (release management). Artifact Registry = image storage. Know how these three connect for a complete GCP-native pipeline.

☁ Scenario — setting up an uptime check alert with Cloud Monitoring

Situation: A public-facing web app at https://app.example.com must page the on-call engineer within 2 minutes if the site goes down. Currently there is no alerting.

Walk: 1) Cloud Console → Monitoring → Uptime Checks → Create. Target: HTTPS, hostname app.example.com, path /health, check interval 1 min, timeout 10s. Checker regions: USA, Europe, Asia (multi-region validates it's not a regional blip). 2) Create an Alerting Policy: condition = uptime check failure from ≥2 out of 3 regions for ≥1 min. Notification channel: PagerDuty webhook + email. 3) Test: temporarily block the health endpoint → Monitoring flags failure after 1 min → alert fires → PagerDuty page sent. 4) Review logs: Cloud Logging → Logs Explorer → resource.type="https_lb_rule" to see which requests errored.

ACE exam note: Cloud Monitoring = metrics + uptime + alerting. Cloud Logging = structured logs, queryable with Logging Query Language. Error Reporting = aggregates application exceptions automatically. Cloud Trace = distributed request tracing for latency analysis.

Key takeaways

Cloud Monitoring needs the Ops Agent on Compute Engine VMs to ship metrics + logs; managed services (GKE, App Engine, Cloud Run) auto-emit; alerting policies fire on thresholds and route to channels (email, Slack, PagerDuty).
Cloud Logging stores logs in _Default + _Required buckets; for long-term compliance, create a sink exporting Admin Activity / Data Access logs to GCS (with Object Lock) or BigQuery — a recurring ACE scenario.
GCP-native CI/CD chain: Cloud Build (CI) → Artifact Registry (image / package storage) → Cloud Deploy (CD with progression policies); pair with Binary Authorization on GKE to only run signed/approved images.

⚡ Mini-quiz — Drill Ops Agent vs auto-emit, log sinks, and the Cloud Build/Deploy/Registry pipeline.

Quick quiz →

Study on the go with our IT certification podcast

Course Modules

What is Google Cloud?

GCP Resource Hierarchy

Setting Up Your Environment

Essential gcloud Commands

Billing Concepts

Pricing Models

Machine Types

Boot Disks & Persistent Storage

VM Lifecycle

Instance Templates

Managed Instance Groups (MIGs)

Load Balancer Types

Key Concepts

Cluster Modes

Node Pools

Core Objects

Scaling

Networking

Workload Identity

Other GKE Security Best Practices

Cloud Run

Cloud Functions (2nd gen)

App Engine

Storage Classes

Key Features

Cloud SQL

Cloud Spanner

Firestore

Cloud Bigtable

Memorystore

VPC Concepts

Firewall Rules

Hybrid Connectivity

IAM Principals

Roles

Best Practices

Cloud KMS (Key Management Service)

VPC Service Controls

Cloud Armor

Cloud Monitoring (formerly Stackdriver)

Cloud Trace & Profiler

Cloud Logging

Cloud Audit Logs

Deployment Options

CI/CD on GCP

Exam-Relevant Scenarios