Exam fact	Details
Exam code	AZ-104
Full name	Microsoft Azure Administrator Associate
Questions	40–60 (MCQ, case studies, drag-and-drop, hot-area)
Passing score	700 / 1000
Duration	120 minutes
Price	$165 USD
Prerequisites	6+ months hands-on Azure experience recommended
Renewal	Free annual online assessment (no re-exam required)

Exam domain weights

Domain 1 — Manage Azure Identities and Governance 20–25%

Domain 2 — Implement and Manage Storage 15–20%

Domain 3 — Deploy and Manage Azure Compute Resources 20–25%

Domain 4 — Implement and Manage Virtual Networking 15–20%

Domain 5 — Monitor and Maintain Azure Resources 10–15%

Course modules

Module 13 lessons

Azure Identities — Microsoft Entra ID and Governance

Understand the identity backbone of every Azure subscription. Learn to create and manage users, groups, and external guests (B2B). Configure role-based access control (RBAC) at management group, subscription, resource group, and resource scope. Apply Azure Policy definitions and initiatives to enforce organisational standards, and use resource locks to prevent accidental deletion or modification.

Microsoft Entra ID users and groups B2B guest access Management groups and subscriptions RBAC built-in and custom roles Azure Policy definitions and initiatives Resource locks (CanNotDelete / ReadOnly) Entra ID administrative units

📖 Read in-depth chapter ▾

Lesson 1.1 Microsoft Entra ID — users, groups, and the tenant model

Every Azure subscription is anchored to exactly one Microsoft Entra ID tenant. The tenant holds the directory of users and groups that any resource in any subscription it owns can authorise against. Understanding how members, guests, and security groups interact is the foundation everything else in AZ-104 sits on.

Key concepts

Tenants vs subscriptions: a tenant is an identity boundary; a subscription is a billing boundary. One tenant can own many subscriptions, but a single subscription can trust only one tenant at a time. Tenants are identified by both a name (contoso.onmicrosoft.com) and a tenant ID GUID.
User types: Member users exist inside the tenant directory. Guest users are externals invited via B2B — they authenticate at their home tenant and are authorised here. Hybrid identities are on-prem AD users projected into Entra ID by Azure AD Connect or Entra Cloud Sync.
Security groups vs Microsoft 365 groups: security groups are for access-control assignments (RBAC, Conditional Access). Microsoft 365 groups bundle a SharePoint site, Outlook mailbox, and Teams team — they should not be used for cloud-resource RBAC because of the attached collaboration workspace.
Dynamic groups: membership is computed from a rule against user attributes (e.g., (user.department -eq "Engineering")). They require an Entra ID P1 licence and are essential for scaling RBAC without manual roster maintenance.
Administrative Units (AUs): sub-divide the tenant for delegated administration without creating a new tenant. A helpdesk admin scoped to an AU can reset passwords only for users inside that AU. Useful for multi-region or multi-business-unit orgs.

Concrete example

Acme acquires a small subsidiary. Rather than create a separate tenant, they place all subsidiary users into an Administrative Unit called AU-Subsidiary, then create a dynamic group grp-subsidiary-all whose rule is (user.companyName -eq "Subsidiary Inc"). The subsidiary's IT lead gets the User Administrator role scoped only to that AU, allowing them to manage subsidiary users without touching the parent's directory.

Key takeaway: the tenant is the identity boundary. Solve for delegation with groups, AUs, and scoped roles — not by spinning up new tenants. New tenants mean separate licensing, separate Conditional Access, and a real boundary to cross.

⚡ Mini-quiz

Practise this lesson with the Identity questions in the AZ-104 bank → study mode (10 questions).

Lesson 1.2 RBAC at scale — scope, role assignments, and custom roles

Azure RBAC sits on top of Entra ID identities and grants permissions through three-part assignments: security principal + role definition + scope. Choosing the right scope is the single biggest design decision in any Azure environment.

Key concepts

Scope hierarchy: management group → subscription → resource group → resource. Assignments inherit downward. A role granted at the management-group level applies to every subscription beneath it. Always assign at the narrowest scope that satisfies the requirement.
Built-in roles to memorise: Owner (full + grants), Contributor (full minus grants), Reader (read-only), User Access Administrator (manages access without managing resources), plus service-specific roles like Virtual Machine Contributor and Storage Blob Data Reader.
Data plane vs control plane: Storage Account Contributor can manage the account but not read the blobs. Storage Blob Data Reader can read blob data but cannot change the account configuration. The split is intentional — least privilege depends on it.
Custom roles: define your own roleDefinition in JSON with actions, notActions, dataActions, notDataActions, and a list of assignableScopes. Useful when no built-in role matches — e.g., "restart VMs but never resize them".
Privileged Identity Management (PIM): turns standing access into just-in-time activation with approval workflow and audit. AZ-104 covers the basics; the deeper PIM material is on AZ-500. Eligible vs Active vs Permanent are the three assignment types.

Concrete example

A FinOps engineer needs to tag and analyse every resource across all subscriptions but must not be able to deploy or modify resources. The right grant is the built-in Tag Contributor role at the management group scope (so the assignment inherits to every subscription, current and future), combined with Reader at the same scope. No custom role is required.

Key takeaway: scope downward; assign upward only when a role legitimately needs to apply to many subscriptions. The exam loves trick questions where the correct answer is a narrower scope than the obvious one.

⚡ Mini-quiz

Drill on RBAC scope decisions → quick quiz (5 questions).

Lesson 1.3 Azure Policy and resource locks — guardrails that scale

RBAC tells you who can act. Azure Policy and resource locks tell you what can happen, regardless of who's trying. Together they create the governance layer that turns ad-hoc cloud usage into a managed estate.

Key concepts

Policy definitions: JSON documents with a policyRule (an if / then with effect). Effects you must know: Deny, Audit, Append, Modify, DeployIfNotExists, AuditIfNotExists, Disabled. Built-in definitions cover common controls — start there.
Initiatives: a named group of policy definitions assigned together. The built-in Azure Security Benchmark initiative bundles ~200 controls. Assigning an initiative is one operation that switches on many policies at once.
Assignment scope and exclusions: assignments inherit, just like RBAC. Exclude individual subscriptions or resource groups via the notScopes list. Use parameters to tune a single definition (allowed locations, required tags) per assignment.
Resource locks: ReadOnly blocks any write or delete; CanNotDelete blocks only deletes. Locks apply regardless of RBAC — even an Owner can't bypass them without removing the lock first. Apply at resource-group level for whole-environment protection.
Remediation: for DeployIfNotExists and Modify effects, existing non-compliant resources need a remediation task to be brought into line. The task uses the assignment's managed identity to perform the corrective deploy.

Concrete example

A regulated workload must never be deployed outside eastus2 or westus2, must always have the tag CostCenter, and the production resource group must never be deletable. Solution: assign the built-in Allowed locations policy (effect: Deny) at subscription scope with the two regions as parameters, assign the Require a tag on resources policy at the same scope with CostCenter as the parameter, and place a CanNotDelete lock on the production resource group. The three controls together implement the requirement with one initiative-style bundle.

Key takeaway: RBAC is "who can act", Policy is "what is allowed", Locks are "what can't be undone". Most AZ-104 governance questions are really asking you to pick the correct layer — and the answer is almost never RBAC alone.

⚡ Mini-quiz

Test policy effects and lock behaviour → study mode (10 questions).

Module 23 lessons

Governance Tools — Cost Management, Tags, and Blueprints

Apply financial governance and resource organisation at scale. Use Azure Cost Management + Billing to set budgets, create cost alerts, and analyse spend by tag. Design tagging strategies that support chargebacks and policy compliance. Use Azure Blueprints (now largely superseded by deployment stacks) to bundle policies, role assignments, and ARM templates into a repeatable governance artefact.

Azure Cost Management budgets and alerts Tags — inheritance, Azure Policy enforcement Azure Blueprints artifacts Subscription and resource group organisation Azure Advisor recommendations Azure Service Health alerts

📖 Read in-depth chapter ▾

Lesson 2.1 Cost Management — budgets, alerts, and analysis

Azure bills every resource by the second and surfaces the total in Cost Management. Setting it up correctly is half "configure budgets" and half "tag everything so you can break costs down". AZ-104 expects you to know which control sits at which scope and which alert pattern is correct for which audience.

Key concepts

Cost Management scope: a budget can be scoped to a management group, subscription, resource group, or even a billing scope (EA, MCA). Lower scopes inherit upward visibility but you set the budget at the level you want to control. Use management-group budgets for shared services, subscription budgets for business units, resource-group budgets for projects.
Budgets vs cost alerts: a budget is a target amount with threshold alerts (e.g., 50%, 80%, 100% of $5,000/month). A cost alert is a generic notification when monthly spend exceeds a fixed dollar value, regardless of any planned budget. Budgets are the standard answer for "percentage-of-plan" notifications.
Actual vs Forecasted: alert thresholds can fire on the actual spend or on the forecast Azure projects for the remaining period. Forecasted alerts let you act before the overage happens. Pair both — actual at 100% as a tripwire, forecasted at 120% as a "we will overshoot" warning.
Cost analysis: the analytics UI for slicing spend by tag, resource group, service, region, meter, time period. Save common views for monthly reviews. Export Cost Management data to a Storage Account on a schedule for long-term retention or Power BI dashboards.
Action targets: budget alerts route through Action Groups (the same primitive as Azure Monitor alerts — see Module 8). Email and webhook are the most common. Integrate to PagerDuty / Slack / Teams via the webhook target.
Reservations and Savings Plans: not directly tested at the configuration level on AZ-104, but you should know that Reservations are commitment-based discounts for fixed SKUs/regions, Savings Plans are flexible commitment to a $/hour spend, and Azure Advisor surfaces both as recommendations when usage warrants.

Concrete example

A FinOps team needs notification when monthly spend on the prod subscription hits 80% of its $30k budget AND when forecast spend exceeds $36k. Design: a subscription-scope budget of $30,000/month with alert thresholds at 50% / 80% / 100% on Actual cost, plus a forecast threshold at 120% on Forecasted cost. Action group ag-finops-alerts routes to email + the Slack webhook. The 80% Actual alert and the 120% Forecast alert are independent signals — both fire if both conditions trip, giving the team multiple chances to react.

Key takeaway: budgets at the scope you want to control, alerts on both Actual and Forecast, Action Groups for routing. Reservations and Savings Plans are the cost-optimisation lever — surface them via Advisor.

⚡ Mini-quiz

Practise budget-scope and alert-mode decisions → study mode (10 questions).

Lesson 2.2 Tags and tag enforcement via Azure Policy

Tags are key/value pairs you attach to resources (and resource groups, and subscriptions). They cost nothing, but every downstream control — cost allocation, automated start/stop, RBAC scoping, lifecycle policies — depends on them being consistent. The exam asks how to apply them at scale and how to enforce they exist.

Key concepts

Tag mechanics: up to 50 tags per resource (some services cap lower), case-insensitive keys, values up to 256 chars. Tags are NOT inherited automatically from resource group or subscription to child resources — you have to set them on each resource explicitly (or enforce them via Policy).
Policy Append effect: the Append policy effect adds a tag to a resource at create time, but ONLY if the resource is being created or updated. It does NOT modify existing non-compliant resources. Use when you want defaults applied to new deployments without breaking existing ones.
Policy Modify effect: the Modify effect can add/remove/replace tags on existing resources via a remediation task. Modify is the right answer when "fix every resource currently missing the CostCenter tag". Modify needs a managed identity assigned to the policy assignment.
Inherit tag from resource group: Azure ships a built-in policy called Inherit a tag from the resource group with Modify effect. Assign it once per subscription with the tag name parameterised; every child resource gets the RG's tag value applied via remediation. The standard pattern for "stamp the env tag everywhere".
Tag-based billing and Cost Management: Cost Management can group/slice by tag IF the tag exists on the resource at the time the cost meter is recorded. Tags applied after the fact don't retroactively reattribute past costs — start tagging on day 1.
RBAC for tag management: the built-in Tag Contributor role grants tag management at the assigned scope. The Tag Reader role is read-only. Use these to delegate tag management to FinOps without granting broader resource permissions.

Concrete example

A subsidiary needs every new resource to inherit the resource group's CostCenter tag, and the FinOps team needs to back-fill the tag on existing resources that lack it. Two assignments solve this: (1) the built-in Inherit a tag from the resource group policy with parameter tagName=CostCenter, Modify effect, scope: subscription. Run a remediation task after assignment to apply to existing resources. (2) A separate Require a tag on resources policy with Deny effect blocks future creates that omit the tag. Together: existing resources are fixed, new ones can't be created non-compliant.

Key takeaway: tags don't inherit by default — use Policy with the Modify effect to enforce them on existing AND new resources. Append for "new only", Modify (+ remediation) for "everything". Tag from day 1 or your Cost Management views will be useless.

⚡ Mini-quiz

Drill tag policy effects → quick quiz (5 questions).

Lesson 2.3 Service Health, Advisor, and deployment governance

Three more governance surfaces complete the AZ-104 picture: Service Health tells you when Microsoft has an issue, Advisor tells you when YOU have an issue, and Blueprints / deployment stacks let you bundle infrastructure + policy as one artefact.

Key concepts

Azure Service Health: tenant-aware view of Azure-side incidents, planned maintenance, and health advisories filtered to subscriptions and regions you actually use. Distinct from the public status page (azure.status.microsoft) which is global and noisy. Always configure Service Health alerts — they're free.
Service Health alerts: create an alert rule of type Service Health, pick service / region / event type (Service Issue, Planned Maintenance, Health Advisory, Security Advisory), route through an Action Group. The right answer for "page me when Azure has an outage in my region".
Resource Health: per-resource Microsoft-reported health status (Available, Unavailable, Unknown). Useful as a fast first check during incident triage. Resource Health alerts can fire when a specific resource transitions to Unavailable.
Azure Advisor: recommendation engine across five pillars — Cost, Security, Reliability, Operational Excellence, Performance. Cost recommendations surface Reservation/Savings-Plan opportunities (Module 2.1) and underutilised VMs. Security recommendations are shared with Microsoft Defender for Cloud. Set up Advisor digest emails monthly per subscription.
Azure Blueprints: a now-legacy artefact that bundles ARM templates + policies + role assignments + resource groups into a single deployable unit, with version locking. Microsoft is deprecating Blueprints in favour of Template Specs and Deployment Stacks — know Blueprints exist (they still appear on the exam) but recommend Deployment Stacks for new builds.
Deployment Stacks: the modern replacement — a managed deployment that owns its child resources, prevents drift via deny assignments, and supports clean teardown of everything in the stack. The GitOps-compatible answer for "deploy a landing zone reproducibly".

Concrete example

A platform team builds an Azure Landing Zone — a repeatable bundle of management groups, subscriptions, policy assignments, and shared services. Old way: Azure Blueprint with locked version. Modern way: a Deployment Stack containing the ARM/Bicep templates with deny-delete on the resources it creates, plus an associated Policy Initiative at the management-group scope. Add a Service Health alert on all subscriptions for region-wide Service Issues routed to the platform team's Teams channel, and an Advisor digest email for the cost-management lead.

Key takeaway: Service Health for Microsoft-side incidents, Advisor for your own optimisation backlog. Blueprints still appear on the exam but Deployment Stacks are what you'd build today. Always wire Service Health alerts — they cost nothing and catch the "is it me or Azure" question before you're 30 minutes into a wild-goose debug.

⚡ Mini-quiz

Test Service Health and Advisor scenarios → study mode (10 questions).

Module 33 lessons

Azure Storage — Accounts, Redundancy, and Access Control

Master the Azure Storage platform that underpins blobs, files, queues, and tables. Choose the right redundancy tier (LRS, ZRS, GRS, GZRS, RA-GRS) for each workload's RPO requirements. Configure storage accounts with the correct performance tier (Standard vs Premium), access tier (Hot, Cool, Cold, Archive), and lifecycle management policies. Secure storage with Shared Access Signatures, Microsoft Entra ID RBAC, and storage account firewall rules with private endpoints.

LRS vs ZRS vs GRS vs GZRS Blob access tiers and lifecycle policies SAS tokens (account / service / user delegation) Storage account firewall and virtual network rules Azure Files and File Sync Immutable storage (WORM policies) Storage encryption and CMK

📖 Read in-depth chapter ▾

Lesson 3.1 Storage accounts and the redundancy ladder

Every blob, file share, queue, and table in Azure lives inside a storage account. The account fixes three things you can't easily change later: the kind (general-purpose v2, BlockBlob, FileStorage), the performance tier (Standard or Premium), and the redundancy model (LRS / ZRS / GRS / GZRS, with optional RA suffix). Picking the right combination is the most-asked storage topic on AZ-104.

Key concepts

Account kinds: StorageV2 (GPv2) is the modern default — supports blobs, files, queues, tables, all tiers. BlockBlobStorage is Premium-only for blobs needing low-latency. FileStorage is Premium-only for SMB/NFS file shares. Avoid the legacy Storage (GPv1) — it can't do access tiers.
LRS (Locally Redundant): 3 synchronous copies inside one datacentre. Cheapest, 11 nines durability. Survives disk and rack failure, NOT a building failure. Default for dev/test and anything you can re-derive.
ZRS (Zone Redundant): 3 synchronous copies across 3 availability zones in one region. Survives a whole AZ outage. Same cost as LRS+ in regions that support it; should be the default for new production accounts.
GRS (Geo Redundant): LRS in the primary region + asynchronous copy to a paired secondary region. Survives region-wide failure but secondary copy is hours behind. Read-Access variant (RA-GRS) lets you read from the secondary at any time.
GZRS (Geo-Zone Redundant): ZRS in primary + async copy to a secondary region. The strongest option short of customer-managed multi-region replication. RA-GZRS adds secondary read access. Required for the 99.99% read-access SLA on multi-region critical workloads.
What you CAN change vs CAN'T: redundancy is changeable in-place between LRS↔GRS and ZRS↔GZRS once provisioned. Switching across the LRS↔ZRS divide requires a customer-managed copy. Account kind is immutable — pick correctly.

Concrete example

A SaaS company stores customer file uploads. The legal team requires the data survive a regional outage and the read SLA must be 99.99%. Picking StorageV2 + Standard + RA-GZRS satisfies all three. If they later relax the regional-outage requirement (the data can be re-uploaded), they can downgrade to ZRS in-place to cut cost roughly in half. The account kind stays GPv2 throughout.

Key takeaway: ZRS for new production by default. Add geo (G prefix) when you need cross-region durability; add RA (read access) when you need a hot secondary; switch to GZRS only when you legitimately need both. LRS is the dev/test default — never the production default.

⚡ Mini-quiz

Drill the LRS / ZRS / GRS / GZRS decisions → study mode (10 questions).

Lesson 3.2 Blob access tiers, lifecycle policies, and immutability

Storage cost on Azure is dominated by which tier your blobs sit in and how long they stay there. Tiering by hand doesn't scale; lifecycle policies move data through tiers declaratively. Add immutability when compliance needs the data to be tamper-proof.

Key concepts

Four access tiers: Hot (frequent reads, highest storage cost, lowest access fee), Cool (≥30 days, cheaper storage, per-GB access fee), Cold (≥90 days, even cheaper, higher access fee), Archive (offline, cheapest storage, rehydration hours). Tier per blob, not per container.
Rehydration from Archive: two priorities — Standard (up to 15 hours) and High (under 1 hour, more expensive). You can't read from Archive directly; rehydrate first to Hot or Cool, then read. Plan the cost of priority retrievals into compliance budgets.
Lifecycle management policies: JSON rules with filters (prefix, blob type, days since last modified) and actions (transition tier, delete). Run daily at no cost. The canonical 7-year compliance policy: move to Cool at 30 days, Cold at 90, Archive at 180, delete at 7 years.
Soft delete vs versioning: blob soft-delete is a recycle bin (retention 1-365 days), versioning keeps every write as a new version. Container soft-delete protects whole containers. Point-in-time restore (PITR) for block blobs combines versioning + change feed for granular restore.
Immutable storage (WORM): two flavours — time-based retention policies block writes/deletes until the timer expires; legal holds block them until removed by an authorised user. Apply at container level. Required for SEC 17a-4 / FINRA compliance. Once locked, even the account owner cannot delete.

Concrete example

A bank stores trade-confirmation PDFs that the regulator requires for seven years and must not be alterable. The right design: a dedicated container with a time-based immutability policy set to 7 years (and locked, so the policy itself can't be shortened), plus a lifecycle policy that transitions blobs Hot → Cool at 30 days → Cold at 90 days → Archive at 180 days. Immutability runs alongside tiering — Archive can still be immutable. After 7 years the immutability expires and a separate lifecycle rule deletes the blob.

Key takeaway: Hot/Cool/Cold/Archive plus lifecycle policy is the cost-optimisation pattern. Immutability is the compliance pattern. They compose — choose tier for cost, immutability for trust, and lifecycle to drive both.

⚡ Mini-quiz

Practise tier-transition and immutability scenarios → quick quiz (5 questions).

Lesson 3.3 Securing storage — SAS, firewall, RBAC, and encryption

Storage accounts have public endpoints by default. Locking them down has four layers, each addressing a different attacker. The exam loves "the credential was leaked — what's the safest replacement?" questions; the answer is almost always user-delegation SAS or a private endpoint.

Key concepts

Shared Key vs SAS: the account's two access keys grant full control — never embed them in clients. SAS tokens are time-bounded, scope-limited URL signatures. Three SAS types: account SAS (signed by access key, broad), service SAS (signed by access key, scoped to one service / resource), user-delegation SAS (signed by an Entra ID OAuth token — no account key needed, revocable, the modern preferred form).
RBAC on the data plane: roles like Storage Blob Data Reader, Storage Blob Data Contributor, Storage File Data SMB Share Reader. These are SEPARATE from the management-plane roles (Storage Account Contributor can manage the account but cannot read blobs). Always assign data-plane roles for application access — no SAS needed for service principals.
Storage firewall & VNet rules: set Public access to "Selected networks", then allow specific IP CIDRs and specific subnets (subnet must have the Storage service endpoint enabled). Default-deny everything else. Combine with Trusted Microsoft Services exception so Azure Monitor / Backup can still reach the account.
Private endpoints: the modern way to lock down to a VNet. The storage service gets a private IP inside your subnet; public endpoint can be disabled entirely. Pair with a Private DNS zone (privatelink.blob.core.windows.net) linked to the VNet so DNS resolves to the private IP. Replaces service endpoints for new builds.
Encryption: data at rest is always encrypted (Microsoft-managed keys by default). Customer-Managed Keys (CMK) replace the platform key with one you hold in Azure Key Vault; required for some compliance regimes. Encryption-in-transit is enforced by setting Secure transfer required (TLS 1.2 minimum).

Concrete example

A leaked SAS token is overwriting blobs in production. Two-step fix: (1) rotate the storage account's access keys to invalidate every account- and service-SAS token issued under those keys; (2) reissue access to applications as user-delegation SAS tokens signed by an Entra ID OAuth credential, so a future leak can be revoked by signing out the corresponding identity instead of rotating account keys. Long-term, disable Shared Key access entirely on the account and require Entra ID auth.

Key takeaway: user-delegation SAS > service SAS > account SAS > access keys. Data-plane RBAC for services. Private endpoints > service endpoints > firewall > nothing. CMK only when compliance requires it.

⚡ Mini-quiz

Test SAS-type and firewall decisions → study mode (10 questions).

Module 43 lessons

Azure Virtual Machines — Deployment, Sizing, and High Availability

Deploy and manage Azure VMs across the full lifecycle. Select the right VM size family (general purpose, compute optimised, memory optimised, GPU) for each workload. Architect for high availability using Availability Sets (fault domains, update domains) and Availability Zones. Configure VM Scale Sets with autoscale rules, custom images, and instance health probes. Manage OS disks, data disks (managed vs unmanaged), disk caching modes, and Azure Disk Encryption (ADE).

VM size families (B, D, E, F, N series) Availability Sets — fault domains and update domains Availability Zones VM Scale Sets (Uniform vs Flexible) Managed disks — Standard HDD/SSD, Premium SSD, Ultra Azure Disk Encryption (ADE) Custom script extensions and cloud-init Azure Dedicated Hosts

📖 Read in-depth chapter ▾

Lesson 4.1 VM size families — picking the right SKU

Azure has ~400 VM SKUs. The exam doesn't ask you to memorise all of them — it asks you to map a workload to the right family letter and to know the gotchas of the families candidates love (B-series burstable) and hate (oversized memory-optimised running CPU-bound code). Pick the wrong family and you either pay 3× or hit a credit cliff.

Key concepts

General purpose (B, D, Dsv5, Av2): balanced CPU-to-memory. B-series is burstable with CPU credits — cheap for idle workloads, performance cliff when credits exhaust. D-series is the production default for web tier / app tier; Dsv5 with Premium SSD is current-gen.
Compute optimised (F, Fsv2): high CPU-to-memory ratio. Good for build agents, batch processing, gaming servers, transcoding. Never use for in-memory databases — you'll run out of RAM before you saturate CPU.
Memory optimised (E, Esv5, M): high memory-to-CPU. Right for SAP HANA, large SQL servers, in-memory caches, Redis at scale. The M-series goes up to multiple TB of RAM but is expensive per hour.
Storage optimised (L, Lsv3): local NVMe attached. Best for NoSQL, Cassandra, MongoDB, big-data — anything that wants ephemeral fast scratch. Data on the local NVMe is LOST on deallocate; never store production state there.
GPU (N — NC for compute, ND for deep learning, NV for visualisation): CUDA workloads, ML training, remote desktop rendering. Expensive, capacity-constrained — production deployments often hit quota limits before cost limits.
Dedicated Hosts: physical-server allocation reserved to your subscription. Required for some BYOL licensing (Windows Server, SQL Server) and for compliance regimes that demand hardware isolation. Pay for the host whether VMs run on it or not.

Concrete example

A team complains that their B4ms VMs running a chat app "go slow at peak". Diagnosis: B-series CPU credit exhaustion under sustained load. Two-step fix: (1) right-size to D4ds_v5 for the production tier — D-series has no credit mechanism, sustained CPU is a non-event; (2) keep the B-series for staging where workloads are bursty. Always look at the CPU Credits Remaining metric in Azure Monitor when investigating "slow" B-series VMs.

Key takeaway: B for burstable / dev / non-prod, D for balanced production, F for CPU-bound batch, E/M for memory-bound, L for local-NVMe workloads, N for GPU, dedicated hosts only when licensing or compliance forces it.

⚡ Mini-quiz

Practise VM-family decisions → study mode (10 questions).

Lesson 4.2 Availability — Sets, Zones, and Scale Sets

Single-VM deployments have no SLA. Azure publishes a specific compute SLA only when you deploy redundantly using one of three patterns: Availability Sets (in-region rack spread), Availability Zones (in-region datacentre spread), or VM Scale Sets (autoscaled fleet). Picking between them is a question of cost, scale, and the SLA the workload actually requires.

Key concepts

Availability Set: a logical grouping that spreads VMs across fault domains (different racks/PDUs in the same datacentre) and update domains (groups updated separately during maintenance). Default 3 FD × 5 UD. SLA: 99.95% with 2+ VMs in the set. Same-region, same-datacentre — does not survive a building outage.
Availability Zone: physically separate datacentres in one region (typically 3 zones, often miles apart). Place ≥2 VMs across ≥2 zones for SLA 99.99% — the strongest in-region option. Resources are zonal (pinned to a zone) or zone-redundant (LB / IP spans zones).
VM Scale Set (VMSS): auto-scaling group. Two orchestration modes: Uniform (identical instances, simpler, faster scale, can't mix sizes) and Flexible (heterogeneous, can mix sizes/zones, supports up to 1000 VMs, modern default). Flexible mode combines the SLA of AZs with autoscale.
Autoscale rules: trigger on metric (CPU, memory, custom Application Insights) or schedule. Always pair scale-out and scale-in rules; set a cooldown long enough that new instances absorb load before scale-in fires (avoids the flapping you saw in Module 6's UDR lesson, applied to compute).
Proximity Placement Group (PPG): places VMs in the same physical rack/cluster for lowest latency between them. Required for tight-latency workloads (HPC, in-region clusters). Can combine with Availability Sets (PPG inside set) but NOT with multi-zone deployments.

Concrete example

A retail site needs to handle Black Friday traffic with a 99.99% SLA. Design: deploy a VMSS in Flexible mode spread across three Availability Zones, autoscale 4-40 instances on CPU>70%, behind a zone-redundant Standard Load Balancer. SLA matches AZ deployment (99.99%) AND auto-scales for the peak. The same workload on an Availability Set would be capped at 99.95% AND wouldn't survive a zone outage.

Key takeaway: Availability Set = 99.95% in-region rack spread, easy retrofit. Availability Zone = 99.99% datacentre spread, modern default. VMSS Flexible across zones = same 99.99% PLUS autoscale. Use PPG only when latency matters more than the AZ spread.

⚡ Mini-quiz

Drill Availability Set vs Zone vs VMSS decisions → quick quiz (5 questions).

Lesson 4.3 Managed disks, encryption, and VM extensions

Disks and extensions are the bookends of VM provisioning. Disks determine the performance ceiling and the cost floor; extensions are how you wire automation into first boot so you never need to manually log into a VM after deploy.

Key concepts

Managed disk tiers: Standard HDD (cheap, dev/test, low IOPS), Standard SSD (entry-level production, web servers), Premium SSD (production database OS, low-latency tier), Premium SSD v2 (independently configurable IOPS / throughput / size — the modern default for performance-sensitive workloads), Ultra Disk (sub-millisecond, expensive, narrow VM-family support).
Disk caching modes: None (Ultra / write-heavy logs), ReadOnly (database data files), ReadWrite (OS disks). Misconfigured caching is a common database-performance complaint — a SQL log file with ReadWrite cache is slower than no cache because Azure must flush on every write.
Azure Disk Encryption (ADE) vs Server-Side Encryption (SSE): SSE is on by default for every disk (platform-managed key or CMK). ADE is BitLocker (Windows) / dm-crypt (Linux) inside the guest, with the unlock key in Azure Key Vault — needed when the OS itself must enforce disk-level encryption (compliance regimes). The two compose: SSE encrypts the underlying storage, ADE encrypts inside the OS.
Cloud-init (Linux) and Custom Script Extension (Windows + Linux): two ways to run first-boot configuration. Cloud-init runs from a YAML file in the OS profile — declarative, idempotent, the Linux default. Custom Script Extension downloads and runs a shell/PowerShell script after boot — works on any OS, supported by ARM/Bicep templates.
Other key extensions: Domain Join (joins Windows VMs to AD), Azure Monitor Agent (AMA) (sends metrics + logs to Log Analytics), Network Watcher Agent (enables packet capture + IP flow verify), VM Backup (consistent snapshots via Recovery Services Vault). Extensions are declarative in templates and run on every (re)deploy.

Concrete example

A SQL Server VM is hitting IOPS ceilings on its P30 Premium SSD (5,000 IOPS). Options: (a) bump to P40 — doubles cost, only marginally faster; (b) migrate to Premium SSD v2 with 16,000 provisioned IOPS — cheaper than P50 and IOPS independently dialled. While doing the migration, leave the OS disk on P10 (read-cache enabled, only the OS reads matter), set the SQL log disk caching to None, and the data disk caching to ReadOnly. Lock the OS-level disk encryption via ADE pulling its key from Key Vault. Disk choice + caching together drop the IOPS complaint without touching the VM size.

Key takeaway: Premium SSD v2 is the modern performance default. Caching mode follows the workload (None for logs, ReadOnly for data files, ReadWrite for OS). SSE is automatic; reach for ADE only when compliance demands guest-level encryption. Use extensions for every post-deploy step — never SSH/RDP to configure.

⚡ Mini-quiz

Practise disk tier and caching scenarios → study mode (10 questions).

Module 53 lessons

App Service, Containers, and Serverless Compute

Deploy web applications and containerised workloads using Azure PaaS compute. Configure Azure App Service plans (Free, Shared, Basic, Standard, Premium, Isolated), deployment slots for blue-green releases, and auto-scale rules. Run containers with Azure Container Instances (ACI) for quick ad-hoc tasks and Azure Kubernetes Service (AKS) for orchestrated workloads. Understand Azure Functions consumption vs Premium plan and when to use them.

App Service plan tiers and scaling Deployment slots and slot swap Azure Container Instances (ACI) Azure Kubernetes Service (AKS) basics Azure Container Registry (ACR) Azure Functions — consumption vs Premium Managed identities for App Service

📖 Read in-depth chapter ▾

Lesson 5.1 App Service — plans, scaling, and deployment slots

Azure App Service is the PaaS workhorse for web apps and APIs. The unit of compute is the App Service plan — the VM SKU and instance count that hosts one or more apps. Picking the right plan tier and using deployment slots properly are the two App Service decisions the exam tests heaviest.

Key concepts

Plan tiers ladder: F1 Free / D1 Shared (dev only — no custom domains, limited CPU minutes), B Basic (small workloads, manual scale), S Standard (production minimum — autoscale, 5 deployment slots, custom domains + SNI), P/Pv2/Pv3 Premium (faster VMs, 20 slots, better autoscale), I Isolated (App Service Environment — single-tenant in your VNet).
Scale up vs scale out: Scale up changes the instance SKU (more CPU/RAM per instance). Scale out increases the instance count. Scale out is automatic via autoscale rules (the same metric/schedule pattern as VMSS). Scale up requires moving the plan to a larger tier — a brief restart on each app.
Deployment slots: staging environments inside the same App Service plan — separate hostnames, separate app settings, but shared compute. The canonical pattern: deploy to staging, run smoke tests, then swap with production. Swap warms up the new instance before flipping the hostname — zero-downtime deploys.
Slot settings: some app settings are "slot-specific" (don't swap) — connection strings to staging databases, environment markers. Mark them as slot settings in the configuration UI; they stay with the slot during swap.
Custom domains and TLS: bind a custom domain via DNS verification (CNAME or TXT record). Upload a TLS cert (PFX) or use App Service Managed Certificates (free, auto-renewing, for non-naked-domain CNAMEs). For App-Service-managed certs, the domain must already be bound and the validation record visible.
VNet integration vs Private Endpoint: VNet integration gives the App outbound access into a VNet (talks to your databases). Private endpoint exposes the App over a private IP for inbound traffic from a VNet. Both can be combined for fully private apps.

Concrete example

A startup runs production on a Standard S1 App Service plan and wants zero-downtime deploys. Design: enable a staging deployment slot, configure the CI pipeline to deploy each release to staging, run health checks against the staging URL, then issue a swap. Mark the APPINSIGHTS_INSTRUMENTATIONKEY as a slot setting so staging and production each use their own Application Insights resource. Autoscale rule on the plan: scale out from 2 to 6 instances when CPU > 70% for 10 minutes.

Key takeaway: Standard or Premium for production (slots + autoscale + custom domains). Deployment slot swap is the AZ-104 answer for zero-downtime releases. Mark environment-specific config as slot settings. VNet integration for outbound; private endpoint for inbound.

⚡ Mini-quiz

Practise plan-tier and slot-swap scenarios → study mode (10 questions).

Lesson 5.2 Containers in Azure — ACR, ACI, AKS

Azure has three primary container services: a registry to store images (ACR), a fastest-path-to-run primitive (ACI), and a full orchestrator (AKS). The exam asks you to pick the right one for the workload — most candidates over-reach for AKS when ACI or App Service Containers would do.

Key concepts

Azure Container Registry (ACR): private Docker / OCI registry. Tiers: Basic (dev), Standard (small prod), Premium (geo-replication, private endpoints, content trust, customer-managed keys). Image scanning via Microsoft Defender for Containers.
Authentication to ACR: three options — admin user (single account, fine for testing — disable in production), service principal (CI/CD pipelines), managed identity (the modern preferred way — AKS / ACI / App Service pull directly using their assigned managed identity, no secrets).
Azure Container Instances (ACI): single-container or container-group serverless compute. Pay per second of CPU/memory. Best for: scheduled batch jobs, ephemeral CI build agents, bursty workloads from AKS via the Virtual Kubelet. NOT a long-running web tier — no built-in load balancing or autoscaling group.
Azure Kubernetes Service (AKS): managed Kubernetes — Microsoft runs the control plane free of charge, you pay only for worker nodes. The right answer when you need K8s primitives (Deployments, Services, NetworkPolicies, custom controllers, helm charts). Node pools can mix VM SKUs and zone-spreads; integrates with Azure CNI for VNet-native pod IPs.
AKS cluster choices: auto-upgrade channels (none / patch / stable / rapid), node-image-only upgrades vs full cluster upgrades, system vs user node pools, scaling: Cluster Autoscaler (node count) plus Horizontal Pod Autoscaler (replica count). AKS-specific RBAC integrates with Entra ID for kubectl auth.
App Service for Containers: often forgotten — App Service can run a custom Docker image directly. No Kubernetes, no learning curve, slot swap and managed identities all still work. The right answer when "we packaged the app as a container but only need one to a few replicas".

Concrete example

A team wants to run a nightly data-import job from a custom Python image. They don't run any other containers. Design: build the image and push to an ACR Basic registry. Use an ACI container group with restart policy OnFailure, scheduled via an Azure Logic App or Automation runbook every night at 02:00 UTC. ACI's managed identity pulls from ACR — no service-principal secret to rotate. AKS would be overkill (no orchestration needed); App Service Containers would keep the import running as a server, not as a one-off job.

Key takeaway: ACR for images. ACI for one-off / bursty / no-orchestration. AKS when you need K8s. App Service for Containers as a quiet middle option. Always authenticate via managed identity — never the ACR admin user in production.

⚡ Mini-quiz

Drill ACI vs AKS vs App Service decisions → quick quiz (5 questions).

Lesson 5.3 Azure Functions — triggers, plans, and managed identities

Functions is Azure's event-driven serverless compute — write code that runs in response to a trigger (HTTP request, timer, queue message, blob created) and let the platform handle scaling. The exam tests you on hosting plans (which has the biggest cost/cold-start trade-off) and on the security primitive (managed identities) that ties Functions to the rest of Azure.

Key concepts

Hosting plans: Consumption (true serverless — pay per execution and per GB-second, cold starts of a few hundred ms, 10-minute max execution), Premium (pre-warmed instances, no cold start, VNet integration, longer execution), App Service Plan (run Functions on an existing plan — same compute, predictable cost). Consumption is the default; Premium when cold start hurts.
Triggers and bindings: each function has exactly one trigger (HTTP, Timer, Queue, Blob, Event Grid, Service Bus, Cosmos DB) and zero or more bindings (input/output for blob, queue, table, Cosmos). Bindings remove boilerplate — declare them in function.json and the runtime injects the connection.
Durable Functions: extension for orchestrating long-running workflows with checkpointing — chained activities, fan-out/fan-in patterns, human-approval steps with timers. The right answer when you need workflow state to survive a process restart.
Managed identities: system-assigned (lifecycle tied to the resource) or user-assigned (independent, reusable across resources). Enable on the Function App and grant data-plane RBAC on target services (Storage Blob Data Reader, etc.). No connection strings, no secrets.
Key Vault references: in App Settings, write @Microsoft.KeyVault(SecretUri=...) and the Function fetches the secret at runtime using its managed identity. Centralises secret management without code changes.
Cold starts and mitigation: on Consumption, the first request to an idle instance pays a cold-start penalty (200ms-1s for .NET, more for Python). Mitigations: use Premium plan (always-warm), use a Timer-triggered Function to ping the HTTP endpoint every 5 minutes, or accept the latency for low-traffic apps.

Concrete example

A function ingests files from a Storage Account blob container, processes each one, and writes the result to a Cosmos DB collection. Design: Function App on the Consumption plan, blob trigger on the source container, Cosmos DB output binding. Enable system-assigned managed identity, grant it Storage Blob Data Contributor on the source storage and Cosmos DB Built-in Data Contributor on the target — no connection strings anywhere. Cold start is fine; the trigger is batch and doesn't need sub-100ms latency. If the workload grows to need VNet integration to a private Cosmos endpoint, upgrade the plan to Premium.

Key takeaway: Consumption plan for true serverless cost. Premium when you can't tolerate cold starts or need VNet integration. Managed identities + Key Vault references for secrets. Durable Functions for stateful workflows. One trigger per function, many bindings.

⚡ Mini-quiz

Test Functions plan and managed-identity decisions → study mode (10 questions).

Module 63 lessons

Virtual Networking — VNets, Subnets, and Connectivity

Design and implement Azure virtual networks. Plan IP address spaces, subnets, and CIDR allocations that avoid overlap. Configure Network Security Groups (NSG) with inbound/outbound rules using service tags and Application Security Groups (ASG). Implement VNet peering (local and global) and understand the non-transitive peering limitation. Connect on-premises to Azure using VPN Gateway (Policy-based vs Route-based, active-active, BGP) and Azure ExpressRoute (private peering vs Microsoft peering).

VNet address space and subnet planning NSG rules — priority, service tags, ASG VNet peering — local and global (non-transitive) VPN Gateway SKUs — Basic, VpnGw1-5 Site-to-site, point-to-site, and VNet-to-VNet VPN ExpressRoute circuits and peering types Azure Bastion for secure VM access User-defined routes (UDR) and forced tunneling

📖 Read in-depth chapter ▾

Lesson 6.1 VNets, subnets, and address-space planning

Every Azure deployment with private IP traffic lives inside a Virtual Network. The decisions you make on day one — address space, subnet layout, reserved IPs — are the hardest to undo, because changing a VNet's address range later requires either repeering everything or accepting downtime. AZ-104 expects you to size CIDRs correctly the first time.

Key concepts

Address space rules: a VNet can carry one or more RFC1918 CIDR blocks. Avoid overlap with on-prem, with peered VNets, and with any subnet you might peer to later. Plan a region-wide supernet (e.g., 10.10.0.0/16 for eastus2, 10.20.0.0/16 for westus2) and carve subnets out of it.
Subnet sizing: Azure reserves 5 IPs per subnet (network + first three + broadcast). A /29 gives you 3 usable addresses, not 8 — sometimes too small for a gateway subnet. The GatewaySubnet must be at least /27 (recommended /26) to support all SKUs and active-active deployments.
Reserved special subnets: name matters. GatewaySubnet is the only subnet a VPN/ExpressRoute Gateway can deploy into. AzureBastionSubnet must be /26 or larger. AzureFirewallSubnet must be /26 or larger. These names are exact-match; misspell them and the deploy fails.
Service endpoints vs private endpoints: a service endpoint turns Azure traffic to a PaaS service (Storage, SQL, KeyVault) onto the Microsoft backbone but the service keeps its public IP. A private endpoint assigns the PaaS service a private IP inside your subnet, and pairs with a private DNS zone for name resolution — the modern preferred pattern for everything new.
Subnet delegation: certain services (Azure Container Instances, App Service VNet Integration, SQL Managed Instance) require a subnet they fully control. Delegate the subnet to the service at create time; it can't be added later without rebuilding.

Concrete example

A two-region web app. Region 1 takes 10.10.0.0/16, region 2 takes 10.20.0.0/16 — no overlap, peerable. Inside region 1, carve 10.10.1.0/24 for the app tier (~250 usable IPs is plenty), 10.10.2.0/24 for data, and 10.10.255.0/27 for the GatewaySubnet (32 IPs, satisfies the /27 minimum). Add a private endpoint for the Storage Account into the data subnet so blob traffic never traverses the public internet, and link the private DNS zone privatelink.blob.core.windows.net to both VNets.

Key takeaway: CIDR mistakes are migration-expensive. Plan a region-wide supernet on day one, leave gaps for future subnets, and use private endpoints — not service endpoints — for anything new. Memorise the reserved subnet names exactly.

⚡ Mini-quiz

Practise VNet sizing and reserved-subnet rules → study mode (10 questions).

Lesson 6.2 NSGs, ASGs, and user-defined routes

Once the VNet is built, two layers control traffic: Network Security Groups filter what's allowed, and User-Defined Routes change where traffic goes. Both are simple in isolation, but exam scenarios almost always combine them — and the priority and direction rules trip up the most candidates.

Key concepts

NSG rule evaluation: rules have a priority (100-4096) and a direction (inbound / outbound). Lowest priority number wins; once a rule matches, no further rules are evaluated. The first three default rules at priority 65000-65500 are immutable — they allow intra-VNet and Load-Balancer health probes by default.
Service tags: symbolic names (Internet, VirtualNetwork, AzureLoadBalancer, Storage.EastUS) that resolve to Microsoft-maintained IP lists. Use them instead of hard-coding CIDRs — Microsoft updates the underlying lists as services scale.
Application Security Groups (ASG): tag VMs / NICs with an ASG name and reference the ASG in NSG rules instead of IPs. Lets you express "allow web-tier to talk to data-tier" semantically. Both source and destination of an NSG rule can be ASGs in the same VNet.
NSG attachment: at the NIC or at the subnet (or both — they both apply). When both attach, the rule must allow at BOTH layers for inbound and at BOTH layers for outbound. Attaching at the subnet is the recommended scale-out pattern.
User-Defined Routes: a route table associated to a subnet overrides Azure's default system routes. Common pattern: route 0.0.0.0/0 with next hop Virtual Appliance at the NVA's IP to force all egress through a firewall. Don't forget to enable IP forwarding on the NVA NIC itself.

Concrete example

A three-tier app. Tag web VMs with ASG asg-web, app VMs with asg-app, DB VMs with asg-db. Subnet NSG: priority 100 inbound — allow from Internet to asg-web on 443 only; priority 110 inbound — allow from asg-web to asg-app on 8080 only; priority 120 — allow from asg-app to asg-db on 1433 only; priority 4096 — deny everything else inbound. Egress to all internet flows through a UDR pointing to a Palo Alto NVA in the hub VNet.

Key takeaway: lowest priority wins, both NIC and subnet NSGs apply, ASGs let you describe intent instead of IPs, and UDRs are how you force traffic through an inspection appliance. NSGs filter, UDRs steer — different tools, never substitute for each other.

⚡ Mini-quiz

Drill NSG priority and UDR mechanics → quick quiz (5 questions).

Lesson 6.3 Hybrid connectivity — peering, VPN Gateway, ExpressRoute

Real environments stitch Azure to other Azure and to on-prem. Three primary options exist and the exam routinely asks which to pick given a bandwidth, latency, and SLA constraint. The single rule that catches most candidates: VNet peering is not transitive.

Key concepts

VNet peering: connects two VNets at the Azure backbone — sub-millisecond, fully-meshed once configured. Local peering is within a region; global peering crosses regions. Always non-transitive: A↔B and B↔C does NOT give you A↔C. Hub-and-spoke designs use a Network Virtual Appliance or Azure Firewall in the hub to forward between spokes.
Gateway transit: a peering flag that lets a spoke use the hub's VPN/ExpressRoute Gateway to reach on-prem. Enable Use remote gateways on the spoke side and Allow gateway transit on the hub side. Without it, each spoke would need its own gateway.
VPN Gateway: Policy-based for legacy IKEv1 + static routes, single tunnel — basically deprecated. Route-based for IKEv2, BGP, multiple tunnels, point-to-site (P2S) and VNet-to-VNet — the choice for everything new. SKUs VpnGw1–VpnGw5 scale bandwidth and concurrent tunnels.
Active-active VPN: deploy two gateway instances in different Availability Zones, each with its own public IP. Both terminate tunnels concurrently for higher throughput and zone-redundant failover. Required for any production workload that needs the 99.95% Gateway SLA.
ExpressRoute: private circuit through a connectivity provider, bypassing the public internet. Private peering connects on-prem to your Azure VNets; Microsoft peering connects on-prem to Microsoft 365 / public Azure PaaS over the private circuit. ExpressRoute Global Reach connects two ExpressRoute circuits to each other over the Microsoft backbone (useful for branch-to-branch over Azure).

Concrete example

Hub-and-spoke design. Hub VNet holds the VPN Gateway terminating an on-prem IPsec tunnel. Two spokes (prod, dev) peer to the hub with Allow gateway transit on the hub side and Use remote gateways on the spoke side. The prod spoke can now reach on-prem through the hub gateway. To allow prod ↔ dev traffic (which peering alone won't deliver because peering is non-transitive), deploy an Azure Firewall in the hub and add UDRs in each spoke pointing inter-spoke traffic at the firewall.

Key takeaway: peering is fast and non-transitive — that one fact drives every hub-and-spoke design. Route-based VPN Gateway for everything new, ExpressRoute when you need a private circuit with SLA, Global Reach to connect circuits. Memorise the gateway-transit flag pair — that question is almost guaranteed.

⚡ Mini-quiz

Test your hybrid-connectivity decisions → study mode (10 questions).

Module 73 lessons

Load Balancing, DNS, and Network Services

Distribute traffic and resolve names at scale with Azure network services. Compare Azure Load Balancer (Layer 4, internal vs public, Basic vs Standard SKU) with Application Gateway (Layer 7, WAF, SSL termination, URL-based routing). Use Azure DNS to host public and private zones, configure custom domains for App Services, and delegate subdomains. Understand Traffic Manager (DNS-based global routing) and Azure Front Door (anycast CDN with WAF) for multi-region scenarios.

Azure Load Balancer — Basic vs Standard SKU Internal vs public load balancer Application Gateway — WAF, URL routing, SSL offload Azure DNS public and private zones Private DNS zones and VNet links Traffic Manager routing methods Azure Front Door vs CDN Network Watcher tools (IP flow verify, NSG diagnostics)

📖 Read in-depth chapter ▾

Lesson 7.1 Azure Load Balancer — Layer 4 traffic distribution

Azure Load Balancer is the regional Layer-4 distributor — it forwards TCP/UDP packets to a backend pool based on a five-tuple hash, without ever looking inside the payload. It's free for inbound rules, blisteringly fast, and the right choice anywhere you don't need URL-aware routing. The exam routinely contrasts the SKUs and the public-vs-internal split, so know both cold.

Key concepts

Public vs Internal: a public Load Balancer has a public frontend IP and faces the internet (typical: web tier). An internal Load Balancer has a private frontend IP in a VNet (typical: app-tier balancing for tier-to-tier traffic). The backend pool can be VMs, VMSS, or specific NICs.
Basic vs Standard SKU: Basic is legacy (deprecating Sept 2025), no SLA, no Availability Zones, no HA Ports. Standard is the modern default — 99.99% SLA when zone-redundant, Availability Zone support, HA Ports rule, secure-by-default (must be allowed by NSG explicitly). Always pick Standard for new builds.
Load balancing rules: map a frontend (IP+port) to a backend pool + health probe. Distribution mode: 5-tuple hash (default, true round-robin-ish), Source IP (2-tuple, session affinity), Source IP and Protocol (3-tuple). Use 5-tuple unless an app needs sticky sessions and can't store state externally.
Health probes: TCP, HTTP, or HTTPS (HTTPS Standard SKU only). Backend instance is considered unhealthy after configurable consecutive failures. Probe path/protocol is what tells the LB whether to keep sending traffic. Build a real probe endpoint in your app (e.g., /healthz) — don't probe /.
Outbound rules and SNAT: Standard LB needs explicit outbound rules (or assign a Public IP per VM) — internet egress is NOT default-on like the Basic SKU. SNAT port exhaustion is the most common production failure: each backend gets a finite pool of ephemeral ports for outbound connections. Mitigation: more frontend IPs, NAT Gateway, or fewer egress connections.
HA Ports: a single rule (port=0) that load-balances every TCP/UDP flow simultaneously. Required for active-active NVA deployments (firewalls, security appliances) — without HA Ports you'd need a rule per port, which doesn't scale to "all ports".

Concrete example

A web tier of 6 Linux VMs across 3 Availability Zones needs internet-facing load balancing with 99.99% SLA. Design: deploy a Standard SKU public Load Balancer, zone-redundant frontend IP, backend pool of all 6 VMs, an HTTP health probe pointing at /healthz on port 8080, a load-balancing rule mapping public 443 to backend 8080 with 5-tuple distribution. Add an outbound rule with 16 SNAT ports per VM (or attach NAT Gateway for cleaner egress). Basic SKU would not meet the 99.99% requirement and is being retired anyway.

Key takeaway: Standard SKU, zone-redundant, dedicated health-probe endpoint, deliberate SNAT planning (NAT Gateway or extra frontend IPs). 5-tuple distribution unless you genuinely need affinity. HA Ports for NVA pairs.

⚡ Mini-quiz

Drill Load Balancer SKU and SNAT decisions → study mode (10 questions).

Lesson 7.2 Application Gateway — Layer 7 routing and WAF

When traffic decisions depend on the HTTP method, path, host header, or cookie, you've moved out of Layer 4 territory and need an Application Gateway. AG terminates TLS, reads the request, and applies routing rules — plus an optional Web Application Firewall that blocks OWASP-top-10 attacks inline.

Key concepts

v1 vs v2 SKU: v1 is deprecated. v2 (Standard_v2, WAF_v2) is the modern default — autoscaling, zone-redundant, supports HTTP/2 and private endpoints, header rewrites, static VIP for the lifetime of the gateway. Never deploy v1 for new workloads.
Listeners: the entry point — protocol (HTTP/HTTPS), port, hostname, optional TLS cert. Multi-site listeners use the SNI hostname to route different domains through one frontend IP. Wildcard listeners (e.g., *.contoso.com) reduce listener sprawl.
Backend pools and HTTP settings: the backend pool can be VMs, VMSS, App Services, IP addresses, or FQDNs. HTTP settings define the protocol and port AG uses to talk to the backend, the cookie-based affinity setting, connection draining, and which probe to use.
Routing rules: basic (one listener → one backend pool) or path-based (listener → URL path map with multiple paths each going to a backend pool). Path-based routing lets you fan /api/* to one backend pool and /static/* to another, behind a single IP and TLS cert.
SSL termination and end-to-end TLS: AG can terminate TLS at the gateway (offload — backend traffic is plaintext, simpler) OR re-encrypt to the backend (end-to-end, backend cert must be trusted by AG). End-to-end is what compliance regimes typically demand.
WAF: the WAF_v2 SKU adds Web Application Firewall with the OWASP Core Rule Set. Two modes: Detection (logs only) and Prevention (blocks). Start in Detection to find false positives, then switch to Prevention. Custom rules support rate limiting per IP.

Concrete example

A SaaS platform hosts app.contoso.com (web tier in eastus2) and api.contoso.com (API tier in the same region). They want one public IP, one TLS certificate (wildcard), WAF protection, and SQL-injection blocking. Design: Application Gateway WAF_v2 with one frontend IP, two multi-site listeners (one per hostname using SNI), two backend pools (one VMSS each), end-to-end TLS using the same wildcard cert. WAF mode set to Prevention with OWASP CRS 3.2 plus a custom rate-limit rule of 100 req/min per source IP. A plain Standard Load Balancer would not enable any of the host-aware routing or WAF protection.

Key takeaway: Layer 4 = Standard Load Balancer (cheap, fast, no app awareness). Layer 7 = Application Gateway v2 (host/path routing, TLS offload, WAF). Pick WAF_v2 anytime traffic from the public internet hits HTTP — not just for compliance.

⚡ Mini-quiz

Practise L4-vs-L7 routing decisions → quick quiz (5 questions).

Lesson 7.3 Global load balancing and DNS — Traffic Manager, Front Door, Azure DNS

Load Balancer and Application Gateway are regional — they distribute traffic inside one region. When you need to balance across regions or serve global users from the closest edge, you reach for one of two distinct primitives: Traffic Manager (DNS-based) or Azure Front Door (anycast). DNS itself — Azure DNS public and private zones — sits underneath everything.

Key concepts

Azure DNS public zones: host a public domain (contoso.com) in Azure. You delegate at the registrar by setting NS records to Azure's name servers. Records: A, AAAA, CNAME, MX, TXT, NS, SOA, SRV, CAA. Alias records (special A/AAAA/CNAME) auto-update when they point at an Azure resource (e.g., a Public IP — IP changes, alias follows).
Azure Private DNS zones: private resolution inside one or more VNets. Link the zone to each VNet (registration link for the home VNet — auto-creates records for VMs; resolution links for VNets that should just resolve). The standard pattern for Private Endpoint name resolution (privatelink.blob.core.windows.net).
Traffic Manager (DNS): a global DNS-based load balancer — clients resolve the Traffic Manager hostname to one of N regional endpoints. Routing methods: Performance (closest endpoint by latency), Geographic (by source country/region — compliance), Weighted (manual %), Priority (active/passive failover), Multivalue, Subnet. DNS-based means failover latency is bounded by client TTL caching (~30s to several minutes).
Azure Front Door (anycast): a global L7 reverse proxy with an anycast IP that terminates at the nearest Microsoft edge POP. Includes WAF, caching, URL-based routing, instant failover. Use over Traffic Manager whenever you need (a) sub-second failover, (b) WAF at the edge, or (c) CDN caching. Standard / Premium tiers — Premium adds private origin support.
Azure CDN: a content delivery network for static assets. Front Door overlaps in functionality and is increasingly the recommended choice for both CDN and global LB. CDN profiles from Microsoft, Akamai, Verizon — Akamai/Verizon variants are being retired in favour of Microsoft-native Front Door Standard.
Decision pattern: regional internal/external traffic → Load Balancer. Regional HTTP with routing/WAF → Application Gateway. Global, multi-region with DNS-failover acceptable → Traffic Manager. Global L7 with WAF + caching + instant failover → Front Door. Static assets → Front Door / CDN.

Concrete example

A global e-commerce site needs (1) edge caching of static assets, (2) WAF at the edge, (3) sub-second failover from primary (eastus2) to secondary (westeurope) on regional outage. Design: Azure Front Door Premium with two origin groups (one per region), priority-based routing (eastus2 primary, westeurope failover), WAF policy in Prevention mode, caching policy on /static/* with 1-hour TTL. Host the domain in Azure DNS public zone with an alias CNAME pointing at the Front Door endpoint. Traffic Manager would not give edge caching or sub-second failover; a single regional Application Gateway would not survive a region outage.

Key takeaway: regional = Load Balancer / Application Gateway. Global = Traffic Manager (DNS-based) or Front Door (anycast + edge). Front Door is the modern default for global L7 + WAF + caching. Azure DNS underneath both.

⚡ Mini-quiz

Drill global vs regional load-balancing decisions → study mode (10 questions).

Module 83 lessons

Monitoring, Backup, and Azure Site Recovery

Keep Azure resources healthy and recoverable. Configure Azure Monitor with metrics, diagnostic settings, and Log Analytics workspaces. Build alerts using metric alert rules, log query alerts, and activity log alerts routed to action groups (email, SMS, webhook, Logic App). Protect VMs and Azure SQL with Azure Backup (Recovery Services Vault, backup policies, soft-delete). Replicate VMs to a secondary region with Azure Site Recovery for disaster recovery — understand RPO, RTO, test failover vs planned failover vs unplanned failover.

Azure Monitor metrics and diagnostic settings Log Analytics workspaces and KQL basics Alert rules — metric, log query, activity log Action groups (email, SMS, webhook, ITSM) Azure Backup — Recovery Services Vault VM backup policies and instant restore Azure Site Recovery — replication, failover, failback Azure Monitor Workbooks and Insights

📖 Read in-depth chapter ▾

Lesson 8.1 Azure Monitor — metrics, logs, and diagnostic settings

Every Azure resource emits two kinds of telemetry: metrics (numeric time series, 1-minute granularity, kept 93 days for free) and logs (timestamped JSON records, sent to a Log Analytics workspace you own, kept as long as you pay for). Knowing which signal your alert needs — and where it lives — is what AZ-104 tests on this topic.

Key concepts

Platform metrics vs custom metrics: platform metrics (CPU %, network bytes, request count) are emitted by every Azure service automatically. Custom metrics come from inside your app via the Azure Monitor Metrics API or Application Insights SDK. Both are stored in the same time-series store; both feed metric alerts.
Activity log: tenant-wide control-plane log — who created/modified/deleted what, when, from which IP. Default-on, immutable, 90-day retention. The auditing source of record (you saw this in Module 1's policy lesson).
Diagnostic settings: the bridge that exports resource-specific logs and metrics to a destination (Log Analytics workspace, Storage Account, or Event Hub). Without a diagnostic setting, resource logs are NOT collected. Set per-resource or via Azure Policy at scale.
Log Analytics workspace: the queryable log store. One workspace per region per "data sovereignty boundary" is the practical pattern. Pricing tiers: Pay-as-you-go (per GB ingested), Commitment Tiers (capacity reservation, ~30% cheaper), Sentinel-tier when SIEM is on.
KQL (Kusto Query Language): the read language for Log Analytics. Pipe-style — AzureActivity | where TimeGenerated > ago(1h) | where OperationName == "Delete Virtual Machine" | project Caller, ResourceId. Know where, summarize, join, project, ago(), bin().
Insights: pre-built dashboards on top of Log Analytics — VM Insights (perf, dependency map), Container Insights (AKS), Application Insights (web apps, distributed traces). Each Insights solution provisions the right diagnostic settings + queries automatically.

Concrete example

You need to know which storage account had the most blob requests in the last 24 hours, by client IP. Steps: (1) on each storage account, create a diagnostic setting sending StorageRead, StorageWrite, and StorageDelete categories to your Log Analytics workspace; (2) wait for ingestion; (3) run the KQL query StorageBlobLogs | summarize count() by AccountName, CallerIpAddress | top 10 by count_ desc. Without the diagnostic setting in step 1, the data simply doesn't exist — metrics alone wouldn't break down by IP.

Key takeaway: metrics for fast numeric monitoring, logs (via diagnostic settings to Log Analytics) for forensic detail. KQL is the language of logs — learn enough where / summarize / project to answer "who did what" questions in seconds.

⚡ Mini-quiz

Practise metric-vs-log decisions → study mode (10 questions).

Lesson 8.2 Alert rules and Action Groups

Telemetry only matters if someone is told when it crosses a line. Azure has three alert-rule types and one reusable notification primitive (Action Groups). The exam routinely asks you to pick the right rule type given a condition — most candidates default to "metric alert" even when a "log query alert" is the correct answer.

Key concepts

Metric alert: evaluates a platform-or-custom metric on a frequency (every minute) against a static or dynamic threshold. Fast (sub-minute latency), cheap (~$0.10/month per signal), no KQL. Right for CPU %, request count, queue depth — anything that is a numeric time series.
Log query (scheduled query) alert: runs a KQL query against Log Analytics on a schedule (5 minutes minimum). Right when the condition can only be expressed as a query — "more than 5 failed sign-ins from one IP in 10 minutes", "any 5xx response from the API tier". Slower and pricier than metric alerts but vastly more expressive.
Activity log alert: fires when a control-plane event matches a filter — "anyone deleting a VM in subscription X", "any role assignment created at subscription scope". Free, sub-minute latency, the standard "governance event" alarm.
Action Groups: a reusable bundle of notification targets (email, SMS, voice call, push, webhook, Azure Function, Logic App, Event Hub, ITSM connector). One Action Group is referenced by many alert rules — change the on-call rotation once, every alert follows.
Action Rules and alert suppression: Action Rules add or override notification behaviour by scope/filter (suppress all alerts during a maintenance window, change severity, route by tag). Lets you keep alert rules clean and handle exceptions declaratively.
Severity and auto-resolution: severities Sev 0 (critical) → Sev 4 (verbose). Alerts auto-resolve when the condition clears — so dashboards reflect current state, not historical fires. Action Groups send a notification on both fire and resolve unless explicitly suppressed.

Concrete example

A pager rotation needs to be notified when a production VM is deleted by ANY user. The right design: an Activity Log alert with the filter Operation: Microsoft.Compute/virtualMachines/delete + Status: Succeeded, scoped to the production subscription, severity Sev 1, action group ag-prod-oncall (PagerDuty webhook + email). A metric alert would not work — there's no metric for "deletion happened". A log query alert would work but is slower and costs more. Activity log alert is the right primitive.

Key takeaway: metric alert for numeric thresholds, log query alert when the condition is a query, activity log alert for governance events. Always route through Action Groups — never hard-code recipients on individual alert rules.

⚡ Mini-quiz

Drill alert-rule-type decisions → quick quiz (5 questions).

Lesson 8.3 Azure Backup and Azure Site Recovery

Backup is "I deleted the data, give me a copy". Site Recovery is "the whole region is gone, give me a running system". They sound similar and run in the same Recovery Services Vault, but they answer different questions — and the exam loves to put both choices in the same answer list to see which one you reach for.

Key concepts

Recovery Services Vault: the regional container for Azure Backup AND Azure Site Recovery. One vault per region, per workload group. Storage redundancy (LRS / ZRS / GRS) chosen at create — GRS gives cross-region copies of backups for "the primary region is gone" scenarios.
Azure Backup — VM: snapshot-based, application-consistent on Windows (VSS) and Linux (pre/post scripts). Backup policy defines schedule (daily / weekly), retention (daily / weekly / monthly / yearly — the GFS pattern), and instant-restore snapshots. Restore points are immutable per Soft Delete (14 days default, up to 180 days).
Azure Backup — other workloads: SQL on IaaS (transaction-log backups, point-in-time restore to second-level granularity), Azure Files (snapshot-based with policy), SAP HANA, MARS agent for individual files/folders on-prem, Azure Backup Server for Hyper-V / VMware on-prem.
Azure Site Recovery (ASR): continuous replication to a secondary region. Two terms to know cold: RPO (Recovery Point Objective — how much data you can afford to lose, typically minutes for ASR), RTO (Recovery Time Objective — how long restoration takes, typically minutes-to-hours). ASR's RPO is sub-minute for most VM sizes.
Three failover modes: Test failover creates an isolated copy in the secondary region — non-disruptive, the ONLY mode safe to run on a real Tuesday. Planned failover shuts the primary down cleanly, replicates the last delta, then brings up the secondary — zero data loss, used for region migrations. Unplanned failover brings up the secondary immediately, accepting whatever data was replicated — used during real disasters.
Failback: after primary is healthy again, replicate the secondary's writes back to primary, then planned-failover to primary. Don't skip the test-failover step before going live — it's the only safe way to validate the DR runbook quarterly.

Concrete example

A regulated workload has two requirements: (1) recover deleted SQL data within 5 minutes of a user error; (2) survive a regional outage with an RPO under 1 minute and an RTO under 30 minutes. Two distinct controls: enable Azure Backup for Azure SQL with transaction-log retention 30 days (satisfies #1 — point-in-time restore is second-level granularity); separately enable Azure Site Recovery on the SQL Server VMs replicating to a paired region, run a quarterly test-failover to validate the DR runbook (satisfies #2). Both run in the same Recovery Services Vault but solve different problems.

Key takeaway: Backup for granular point-in-time recovery, ASR for regional failover. Both live in the Recovery Services Vault. Memorise the three failover modes — Test (safe), Planned (zero data loss, primary cooperative), Unplanned (immediate, may lose seconds of writes).

⚡ Mini-quiz

Practise Backup-vs-ASR scenarios → study mode (10 questions).

Reinforce what you just read 60 scenario-based questions covering every AZ-104 domain — track your score, no signup.

⚡ Take the quiz Podcast

🌐

VNet peering is non-transitive — always

If VNet A is peered to VNet B, and VNet B is peered to VNet C, traffic from A cannot reach C automatically. You must add a direct peering between A and C, or route through a hub VNet using a Network Virtual Appliance (NVA) with UDRs. This is the single most common AZ-104 networking trap.

💾

Redundancy: GRS vs ZRS — which one?

ZRS replicates synchronously across 3 availability zones in one region — best for zone outages. GRS replicates asynchronously to a paired region — best for region outages. GZRS combines both. For exam scenarios asking about "regional disaster recovery" pick GRS or GZRS; for "datacenter failure in the same region" pick ZRS.

⚙️

Availability Sets ≠ Availability Zones

Availability Sets protect against hardware failures within a single datacenter via fault domains and update domains — they give 99.95% SLA. Availability Zones protect against entire datacenter failures by spreading VMs across physically separate buildings — they give 99.99% SLA. The exam will test which one you pick for a given scenario.

6-week study plan

Week 1

Identity, governance, and cost management Modules 1–2. Create users and groups in a free Entra ID tenant. Build a custom RBAC role and assign it at resource group scope. Create a deny policy for unapproved regions and assign it to a subscription. Set up a budget with a 90% alert.

Week 2

Storage accounts and access patterns Module 3. Create a GPv2 storage account with ZRS. Upload blobs to Hot and Cool tiers and move one to Archive. Generate a user delegation SAS with a 1-hour expiry. Enable the storage account firewall and add a VNet rule. Configure a lifecycle management policy to transition blobs after 30 days.

Week 3

Virtual machines and compute Modules 4–5. Deploy a VM in an Availability Set (2 fault domains, 5 update domains). Attach a managed Premium SSD data disk and set caching to ReadOnly. Create a VM Scale Set with a CPU-based autoscale rule. Deploy an App Service with a deployment slot and perform a slot swap.

Week 4

Virtual networking and connectivity Module 6. Create two VNets with non-overlapping CIDRs and set up bidirectional peering. Deploy a VPN Gateway (VpnGw1, route-based) and verify connectivity. Add NSG rules using service tags. Configure a UDR to force-tunnel internet traffic through an NVA.

Week 5

Load balancing, DNS, and network services Module 7. Create a Standard public Load Balancer with a health probe and load balancing rule. Create an Application Gateway with a WAF policy and URL path-based routing. Host a custom domain in Azure DNS and link a private DNS zone to a VNet. Test Traffic Manager with weighted routing across two App Service instances.

Week 6

Monitoring, backup, and review Module 8. Set up a Log Analytics workspace and connect it to VM diagnostics. Create a metric alert for CPU > 80% with an action group sending email. Configure a VM backup policy (daily, 7-day retention). Enable ASR replication for a VM to a paired region and run a test failover. Take the full 60-question CertQuests AZ-104 quiz.

⚠️ Top 4 AZ-104 exam mistakes

Confusing Availability Sets and Availability Zones: Availability Sets protect within one datacenter (fault domains / update domains, 99.95% SLA). Availability Zones protect across datacenters in the same region (99.99% SLA). Exam scenarios that mention "multiple datacenters in the same region" require AZs, not Availability Sets.
Peering transitivity trap: VNet peering is never transitive by default. A → B → C does not allow A to reach C. Memorise this — it appears in almost every AZ-104 networking scenario.
SAS token type confusion: An account SAS gives access to multiple services. A service SAS is scoped to one service. A user delegation SAS is signed with an Entra ID credential (most secure). The exam tests which type to choose based on the security requirement.
Load Balancer SKU mismatch: Standard Load Balancer requires Standard SKU public IP addresses. You cannot mix Basic and Standard SKUs. Standard LB also requires NSG on the backend pool subnet — it is secure-by-default (no inbound internet traffic without an NSG rule), unlike Basic LB.