Top 10 AZ-305 interview questions for Azure architect loops in 2026
The AZ-305 exam proves you can pick a service. The Azure architect interview proves you can pick the right one when it costs ten times more and a wrong call gets noticed. These ten questions are what comes up in 2026 Azure Solutions Architect and Principal Cloud Engineer loops — landing zones, networking topology, identity boundaries, data tier choice, DR design, and the FinOps conversation. They test design judgment, trade-off explanations, and the operational depth a hiring manager wants from someone they pay $145–185k.
If you’re still weighing the cert, start with our AZ-305 ROI breakdown and the foundational AZ-104 interview questions first.
The 10 questions
1. Hub-and-spoke vs Azure Virtual WAN — when do you pick which?
Hub-and-spoke is the default for one to three regions with a stable, small-branch footprint: one VNet hub per region, spokes peered in, Azure Firewall or an NVA in the hub, and a VPN or ExpressRoute gateway terminating on-prem traffic. Predictable, well-understood, and route tables you can read in your head.
Virtual WAN earns its premium at 10+ branches, multi-region any-to-any with transit between ExpressRoute and VPN, or when a managed Microsoft backbone is required for SD-WAN integration with Cisco Viptela, Aruba EdgeConnect, or Fortinet Secure SD-WAN. The trade-off: vWAN is a managed service with less knob-level control and a meaningfully higher monthly bill at small scale.
Pick hub-and-spoke first. Migrate to vWAN when branch count or any-to-any transit makes hub-and-spoke route tables a Tuesday-morning maintenance window. The interview signal is that you can name the threshold, not that you reflexively pick the newest service.
2. Azure Front Door vs Application Gateway vs Load Balancer vs Traffic Manager — what goes where?
Four products, four jobs. Memorize the layers, because architect loops will sketch a diagram and ask you to label the boxes.
- Front Door: global L7. Anycast edge in 180+ POPs, WAF, CDN caching, TLS offload, path-based routing across regions. The 2026 default at the public edge.
- Application Gateway: regional L7 inside a VNet. WAF, mTLS, private backends, URL rewrite, cookie-based session affinity. Lives in the landing zone, not at the edge.
- Standard Load Balancer: L4 inside a region. TCP/UDP, no inspection, ultra-low latency. In front of AKS LoadBalancer services, VMSS, or any non-HTTP workload.
- Traffic Manager: DNS-based global routing without proxying. Use only when the workload needs a protocol Front Door can’t carry, or when you must avoid the TLS hop. Increasingly niche in 2026.
A typical production stack: Front Door (WAF) at the edge → Application Gateway per region inside the spoke → Standard Load Balancer or AKS Ingress in front of the workload. The non-obvious signal: explain why you don’t put App Gateway and Front Door in the same diagram for purely regional workloads — double WAF cost and double TLS terminations for no resilience win.
3. How do you design Entra ID Conditional Access for a mid-size enterprise?
Start from personas and named locations, not from individual policies. The reference layout interviewers like:
- Block legacy auth everywhere. Single policy, no exclusions other than break-glass.
- Require MFA for all users with two break-glass accounts excluded. Break-glass accounts use FIDO2 hardware keys stored offline; their sign-ins are alerted on by Sentinel.
- Require compliant or hybrid-joined device for any privileged role (Global Admin, Security Admin, subscription Owner).
- Sign-in risk and user-risk gating via Entra ID Protection — medium risk forces password change, high risk blocks.
- Privileged Identity Management for every Global Admin and subscription Owner. Time-bound, justification required, MFA on activation.
Two non-negotiables architects get burned on if they skip them: at least two break-glass accounts excluded from every CA policy (because one Conditional Access mistake at 3 a.m. has locked entire tenants out), and What-If + report-only rollout before every enforce flip. The interview signal is that you understand CA as layered defense with a recovery path, not a single “require MFA” checkbox.
4. Walk me through the Azure landing zone you would deploy on day one.
Azure Landing Zone (the modern name for CAF Enterprise Scale) is the answer. The layout interviewers expect:
- Management Group hierarchy: Tenant Root > Platform / Landing Zones / Decommissioned / Sandbox. Platform splits into Identity, Management, Connectivity.
- Azure Policy initiatives applied at MG level: Microsoft Cloud Security Benchmark + ISO 27001 + an internal CIS overlay. Audit-then-deny rollout.
- Connectivity subscription: hub VNet, ExpressRoute or VPN gateway, Azure Firewall Premium, Private DNS resolver, central DNS zones.
- Identity subscription: Entra Domain Services if any legacy AD-joined workloads remain. Otherwise the identity plane sits in Entra ID at tenant level.
- Management subscription: central Log Analytics workspace, Microsoft Sentinel, Azure Backup vaults, Azure Monitor action groups.
- Workload subscriptions peered as spokes, deployed via Bicep or Terraform from a central repo with PR gates and policy-as-code.
The architect signal is three things: you separate platform from workload, you scale policy via Management Groups instead of per-subscription assignments, and you treat the landing zone as code with a deployment pipeline from day one. Candidates who describe ClickOps in the Portal score one tier below candidates who say “all of this is Bicep, gated by a PR review and an Azure DevOps pipeline.”
5. Storage replication — LRS, ZRS, GRS, RA-GZRS — how do you pick?
Four tiers, doubling in cost. The trap is reflexively picking GRS for “production.”
- LRS (3 copies in one datacenter) — non-critical dev/test, anything regenerable from upstream.
- ZRS (3 copies across availability zones in one region) — the 2026 default for production data that needs to survive a datacenter failure without crossing regions. Most modern Azure workloads land here.
- GRS / RA-GRS (async copy to the paired region; RA makes the secondary readable) — data with cross-region resilience requirements and a tolerable 15-minute RPO.
- GZRS / RA-GZRS — zone resilience plus geo-replication. The highest tier; cost roughly 4× LRS.
Architect-level traps: GRS replication is asynchronous (RPO ~15 min, occasionally higher under regional stress), and account failover is a one-way door under the legacy customer-managed failover model — you cannot fail back without re-replication. The right answer is “ZRS by default; GZRS only when we’ve modeled the RPO and the read pattern justifies the double cost.”
6. Design a DR strategy for a tier-1 Azure-hosted application with RPO 15 min, RTO 1 hour.
Warm Standby in the paired region. The layers:
- Compute: AKS or VMSS in primary region active, secondary region scaled to a minimal footprint with the same Bicep templates and a pipeline that can scale it on declared failover (declarative target, not a manual runbook).
- Data: Azure SQL Hyperscale with active geo-replication (sub-second RPO, async commit), or Cosmos DB with multi-region writes if the workload tolerates eventual consistency. Storage on RA-GZRS plus Object Replication for hot containers.
- Networking: Azure Front Door with health-probe-driven priority routing — primary at priority 1, secondary at priority 2. Private DNS zones replicated to both regions. Hub VNet provisioned in both regions, ExpressRoute circuit at both ends.
- Identity: Entra is global, no DR design needed there.
- Operations: runbook + automated failover scripts versioned in the same repo as the infrastructure.
The architect-level closer: a DR design that has never been rehearsed has an effective RTO of “maybe.” The plan is the quarterly failover game day with measured RTO, not the diagram. Interviewers want to hear you mention it before they ask.
7. How do you enforce governance and cost discipline across 200 subscriptions?
Policy at scope, tags at deploy, chargeback as the cultural lever.
- Management Groups for policy scope. Initiatives (MCSB, ISO 27001, internal overlay) assigned at the right MG level with audit-then-deny rollouts.
- Deny-tag policy on every workload subscription so a resource missing
owner,cost-center, orenvironmenttags can’t deploy. The earlier you enforce tags, the easier chargeback becomes. - Azure Cost Management: subscription budgets with action-group alerts at 50/75/90/100%. Weekly Advisor exports to Log Analytics.
- Reserved Instances and Savings Plans on the steady-state baseline only. Never on the variable layer — that locks you into spend you may not need.
- Chargeback model. Cost lands on the team that spent it. Nothing else aligns incentives at scale — central FinOps teams without chargeback authority become advisory roles that get ignored.
Senior architects describe the cultural lever as much as the technical controls. That’s the signal.
8. Private Endpoint vs Service Endpoint vs Public Access — when do you use each?
Three tiers, decreasing public exposure.
- Public access with firewall rules: the PaaS service has a public endpoint, IP allowlist controls access. Acceptable for dev; almost never for production data planes in 2026.
- Service Endpoints: a VNet-level allowlist over the Microsoft backbone. The resource still has a public endpoint but accepts traffic only from your VNet’s public IPs. Cheaper than Private Endpoint; survives for cost-sensitive secondary workloads.
- Private Endpoints: inject a private IP from your VNet directly into the PaaS service (Storage, SQL, Key Vault, Cosmos, App Service). The public endpoint can be disabled entirely. Traffic stays on the Microsoft backbone, works cross-tenant via Private Link.
The 2026 architect default: Private Endpoint for any production PaaS hosting customer data, with Private DNS zones centralized in the connectivity hub and resolved via Azure Private DNS resolver. The detail interviewers probe on: DNS. Private Endpoints fail silently when a spoke can’t resolve the privatelink zone — central DNS resolution is half the design.
9. Cosmos DB vs Azure SQL Hyperscale vs PostgreSQL Flexible Server — design call.
Pick by access pattern, not by vendor preference. The decision matrix:
- Cosmos DB: global writes, single-digit-millisecond latency at any region, schemaless documents, predictable RU/s cost. APIs, gaming, IoT telemetry, real-time leaderboards. Avoid for OLTP that needs cross-document transactions.
- Azure SQL Hyperscale: OLTP that needs strong ACID, full T-SQL, 100+ TB storage, and read-scale via secondary replicas. Line-of-business apps, ISV multi-tenant SaaS where strong consistency is a hard requirement.
- PostgreSQL Flexible Server: open-source compatibility, lower TCO than SQL on commodity workloads, lift-and-shift migrations from on-prem Postgres. Pick it when the team already runs Postgres on EC2 or VMs.
The trap: candidates default to Cosmos because it’s the modern Azure-branded service. The architect-level call is “I’d run the read/write mix, latency target, consistency model, and TCO through a decision matrix — here are the four questions I’d ask the product team first.” Naming the questions matters more than naming the service.
10. AKS in an enterprise landing zone — what does your reference architecture look like?
AKS shows up in roughly two-thirds of 2026 architect loops. The expected layout:
- Private cluster with API server VNet integration. No public API endpoint.
- System and user node pools separated — system pool taints with
CriticalAddonsOnlytoleration so workload pods can’t schedule on it. - Azure CNI Overlay or Cilium for pod networking. Kubenet is legacy; do not propose it.
- Workload Identity for pod-to-PaaS auth. AAD Pod Identity is deprecated — mentioning it costs you points.
- Azure Key Vault provider for Secrets Store CSI instead of mounting secrets directly.
- Defender for Containers on, with Azure Policy for Kubernetes for admission control (no privileged pods, no hostNetwork, image-pull policy enforcement).
- Ingress: Application Gateway for Containers or NGINX behind App Gateway. Never a public LoadBalancer service in production.
- Lifecycle: Bicep or Terraform with a GitOps reconciler (Flux or Argo) for in-cluster state. Surge node pools and PDB-aware drains during upgrades.
The interview signal is depth on the operational layer — upgrades, surge capacity, PDB strategy, observability — not just the day-one diagram. AZ-305 candidates who can sketch the diagram but stall on “walk me through a control-plane upgrade” flag as junior. Read up on AKS LTS releases and node OS upgrades before the loop.
What these questions test
The AZ-305 exam screens for service-by-service knowledge. The architect interview screens for design judgment: which service, why this one over that one, where the trade-off bites, and what you do when it bites. Every answer above pivots on a trade-off (hub-and-spoke vs vWAN, ZRS vs GZRS, Private Endpoint vs Service Endpoint, Cosmos vs SQL) and on operational evidence (Log Analytics queries, KQL, Bicep templates, game days). Memorize the metric and service names — privatelink.* DNS zones, Microsoft.Network/azureFirewalls, Workload Identity, Application Gateway for Containers. Architect interviewers screen on whether you reach for them unprompted.
Practice AZ-305 right now — no signup
CertQuests has engineer-written AZ-305 design-scenario questions with full explanations on every answer. Free, no account required.
Frequently asked questions
Hub-and-spoke vs Virtual WAN — when do you pick which in 2026?
Hub-and-spoke is the default at one to three regions with a small-branch footprint. Virtual WAN earns its premium at 10+ branches, multi-region any-to-any transit, or SD-WAN integration. Pick hub-and-spoke first; migrate to vWAN when branch count or transit complexity makes hub-and-spoke route tables unmanageable.
Front Door vs Application Gateway vs Load Balancer vs Traffic Manager?
Front Door is global L7 (anycast edge + WAF). Application Gateway is regional L7 inside a VNet (WAF + mTLS + URL rewrite). Standard Load Balancer is regional L4 (TCP/UDP, no inspection). Traffic Manager is DNS-based global routing without proxying. Typical stack: Front Door at the edge → App Gateway per region → LB in front of the workload.
How do you design Entra ID Conditional Access at enterprise scale?
Persona-based policy: block legacy auth everywhere, require MFA for all users, require compliant device for privileged roles, sign-in/user-risk gating via Identity Protection, PIM for every Global Admin and Owner. Two non-negotiables: two FIDO2 break-glass accounts excluded, and What-If + report-only before every enforce flip.
How do you pick LRS vs ZRS vs GRS vs RA-GZRS?
LRS for dev/test, ZRS as the production default (zone resilience in one region), GRS/RA-GRS when cross-region resilience and a 15-minute RPO justify the cost, GZRS/RA-GZRS for the highest tier (zone + geo). Cost roughly doubles per tier. Don’t default to GRS without modeling the read pattern — geo-replication is async with a one-way failover under the legacy model.
What does a tier-1 Azure DR design look like for RPO 15 min, RTO 1 hour?
Warm Standby in the paired region: compute scaled-down with the same Bicep templates and a pipeline that can scale on declared failover, SQL Hyperscale with active geo-replication or Cosmos multi-region, Storage on RA-GZRS, Front Door with health-probe priority routing, Private DNS replicated to both regions. The closer: a quarterly failover game day with measured RTO — designs that have never been rehearsed have an effective RTO of “maybe.”
How we wrote this
No Microsoft, training-vendor, or bootcamp revenue. Questions were sourced from Azure architect and principal cloud engineer interview reports on Reddit (r/AZURE, r/cscareerquestions), the Microsoft Tech Community, LinkedIn interview threads, and the Azure architecture review channels on Microsoft Learn, cross-referenced against the Microsoft Cloud Adoption Framework, the Azure Architecture Center, and the BLS Occupational Outlook for compensation context. Tell us what you’d update.
Last reviewed: June 25, 2026.