Course Modules
01
Cloud Concepts
3 lessons
Key Concepts
- Cloud Computing Defined: The delivery of computing services — servers, storage, databases, networking, software, analytics, and intelligence — over the Internet ("the cloud") to offer faster innovation, flexible resources, and economies of scale
- Shared Responsibility Model: Security responsibilities are split between the cloud provider and the customer. The provider secures the physical infrastructure; the customer secures their data, identities, and application configurations. The boundary shifts depending on the service model (IaaS, PaaS, SaaS)
- IaaS (Infrastructure as a Service): Provides virtualized computing resources on demand — VMs, networking, and storage. The customer manages the OS, middleware, and applications. Examples: Azure Virtual Machines, Azure Virtual Networks
- PaaS (Platform as a Service): Provides a managed platform for building and deploying applications without managing the underlying infrastructure. The provider handles the OS, patching, and runtime. Examples: Azure App Service, Azure SQL Database
- SaaS (Software as a Service): Fully managed software delivered over the Internet on a subscription basis. The provider manages everything from infrastructure to the application itself. Examples: Microsoft 365, Microsoft Teams, Dynamics 365
Key Concepts
- High Availability (HA): Cloud services are designed to provide continuous operation through redundant infrastructure. Azure offers SLAs (Service Level Agreements) guaranteeing uptime percentages — for example, 99.99% uptime means roughly 52 minutes of downtime per year
- Scalability & Elasticity: Scalability lets you add resources to handle increased load (vertical: bigger VM; horizontal: more VMs). Elasticity automatically adjusts resources based on demand in real time, so you never over-provision or under-provision
- Agility & Speed: Cloud resources can be provisioned in minutes rather than weeks. Teams can quickly spin up development environments, test new ideas, and tear them down when finished, accelerating the innovation cycle
- Disaster Recovery & Geo-Distribution: Cloud providers offer built-in backup, replication, and recovery services across geographically distributed datacenters. Azure Site Recovery and geo-redundant storage ensure business continuity even during regional failures
- CapEx vs OpEx: Traditional IT requires Capital Expenditure (CapEx) — large upfront investment in physical hardware. Cloud computing shifts to Operational Expenditure (OpEx) — pay-as-you-go pricing where you only pay for what you consume, reducing financial risk
Key Concepts
- On-Premises vs Cloud: On-premises requires the organization to purchase, manage, and maintain all hardware and software. Cloud offloads infrastructure management to the provider, letting teams focus on business value rather than datacenter operations
- Public Cloud: Resources owned and operated by a third-party provider (like Microsoft Azure) and shared across multiple tenants. Offers the greatest scalability and lowest upfront cost. No capital expenditure — everything is OpEx
- Private Cloud: Dedicated infrastructure used exclusively by a single organization, either on-premises or hosted by a third party. Provides maximum control and security but requires higher CapEx and operational overhead
- Hybrid Cloud: Combines public and private clouds, allowing data and applications to move between them. Organizations can keep sensitive workloads on-premises while leveraging the public cloud for burst capacity, disaster recovery, or less-sensitive workloads
- Choosing the Right Model: IaaS for maximum control (lift-and-shift migrations), PaaS for developer productivity (web apps, APIs), SaaS for ready-to-use solutions (email, CRM). Many real-world architectures combine all three models
☁ Scenario — picking the right cloud service model
Situation: A 200-person company wants to move email, calendar, and document collaboration to the cloud. The IT team has 2 people and can only manage user accounts — they cannot patch servers, deploy application updates, or maintain uptime.
Decision: This is SaaS (Microsoft 365). The provider manages everything from the physical datacenter to the running application. IT manages only identities, licenses, and data. If the team needed to host their own app, the answer would be PaaS; if they needed full OS control for a legacy system, IaaS.
Shared responsibility check: Under SaaS the customer is still responsible for which users have access, what data they put in, and MFA enrollment. "The provider handles security" is never the full answer — data and identities stay with the customer regardless of model.
- Service-model responsibility: IaaS = you manage OS+app, PaaS = you manage app+data, SaaS = you manage data+identities only. Provider always handles physical + host; customer always handles data + access.
- Scalability means you can add resources (vertical = bigger VM, horizontal = more VMs); elasticity means the platform does it automatically based on load. The exam tests this wording precisely.
- CapEx (datacenter capital upfront) → OpEx (pay-as-you-go) is the cornerstone economic argument. Know SLA math: 99.9% ≈ 8.7h/year downtime, 99.99% ≈ 52 min/year, 99.999% ≈ 5 min/year.
02
Azure Architecture
3 lessons
Key Concepts
- Azure Regions: A region is a set of datacenters deployed within a latency-defined perimeter and connected through a dedicated low-latency network. Azure has 60+ regions worldwide, more than any other cloud provider, enabling you to deploy resources close to your users
- Availability Zones: Physically separate locations within an Azure region, each with independent power, cooling, and networking. Deploying across availability zones protects against datacenter-level failures. Not all regions support availability zones
- Region Pairs: Each Azure region is paired with another region within the same geography (e.g., East US paired with West US). During planned maintenance, Azure updates one region at a time. In a widespread outage, one region from each pair is prioritized for recovery
- Sovereign Regions: Isolated Azure instances for government or regulatory compliance needs. Azure Government (US) and Azure China (operated by 21Vianet) are physically and logically separated from the main Azure cloud
- Choosing a Region: Consider proximity to users (latency), service availability (not all services are in all regions), compliance requirements (data residency laws), and pricing (costs vary by region)
Key Concepts
- Azure Resources: Any entity managed by Azure — virtual machines, storage accounts, databases, virtual networks, web apps, and more. Every resource must belong to exactly one resource group
- Resource Groups: Logical containers that hold related Azure resources for an application or project. Resources in a group share the same lifecycle — deploy, update, and delete together. Resource groups cannot be nested, but resources can interact across groups
- Azure Subscriptions: A billing and access control boundary. Each subscription links to an Azure account and provides authenticated and authorized access to Azure resources. Organizations often use multiple subscriptions to separate environments (dev, staging, prod) or departments
- Management Groups: Containers that sit above subscriptions and allow you to apply governance conditions (Azure Policy, RBAC) at scale. Management groups can be nested up to six levels deep, creating a hierarchy that mirrors your organizational structure
- Hierarchy: The full hierarchy is: Azure AD Tenant → Management Groups → Subscriptions → Resource Groups → Resources. Policies and RBAC applied at a higher level are inherited by all levels below
Key Concepts
- What Is ARM: Azure Resource Manager is the deployment and management layer for Azure. All requests — whether from the Azure portal, CLI, PowerShell, SDKs, or REST APIs — go through ARM, which authenticates and authorizes them before passing them to the Azure service
- ARM Templates: JSON files that define the infrastructure and configuration for your Azure deployment. They enable Infrastructure as Code (IaC) — declarative, repeatable, and version-controlled deployments. You define what you want, and ARM handles the how
- Bicep: A domain-specific language (DSL) that provides a simpler syntax for authoring ARM templates. Bicep files compile down to ARM template JSON but are easier to read and write. Microsoft recommends Bicep over raw JSON for new projects
- Declarative vs Imperative: ARM templates and Bicep are declarative — you describe the desired end state. Imperative approaches (CLI scripts, PowerShell) describe step-by-step commands. Declarative templates are idempotent: deploying the same template multiple times produces the same result
- Benefits of ARM: Consistent management layer, declarative templates, dependency management (resources deploy in correct order), RBAC integration, tagging for cost tracking and organization, and the ability to deploy, manage, and monitor resources as a group
☁ Scenario — designing an Azure resource hierarchy for 3 teams
Situation: You're onboarding Dev, QA, and Production teams. Each needs isolated billing and separate governance policies, but all employees use the same company directory (Entra ID tenant).
Design: One Entra ID tenant (single directory, shared identities). One Management Group with baseline Policy applied (e.g., "allowed regions = West Europe + North Europe"). Three Subscriptions — one per team — for billing isolation and quota separation. Within each subscription, Resource Groups per project lifecycle (e.g., webapp-prod-rg, database-prod-rg). Tags (env=prod, team=production) on every resource for cost reporting.
Why not one subscription? You can't easily split billing or apply different governance rules without subscription-level boundaries. The management group enforces the "must not deploy outside Europe" policy across all three subscriptions automatically — no need to repeat it per subscription.
- Region pairs are linked across >300 miles; planned platform updates roll out one pair-half at a time, and geo-redundant storage replicates between them by default.
- Resource groups are the unit of lifecycle and access — every resource belongs to exactly one. Tags inherit nothing automatically; you must propagate via Policy or scripts.
- ARM templates / Bicep are idempotent — re-running an unchanged deployment is a no-op. ARM is also where RBAC, locks, and Azure Policy are evaluated, before the resource provider sees the request.
03
Core Azure Services
4 lessons
Key Concepts
- Azure Virtual Machines (VMs): IaaS offering that provides full control over the operating system, installed software, and configuration. Choose from Windows or Linux images, select VM sizes (CPU, RAM, disk), and manage patching and updates yourself. Ideal for lift-and-shift migrations and custom workloads
- Azure App Service: A PaaS platform for building, deploying, and scaling web apps, REST APIs, and mobile backends. Supports .NET, Java, Node.js, Python, and PHP. Handles infrastructure management, OS patching, and auto-scaling so developers focus on code
- Azure Container Instances (ACI): The simplest way to run containers in Azure without managing servers or orchestrators. Provides fast startup, per-second billing, and is ideal for short-lived tasks, batch processing, or simple container workloads
- Azure Kubernetes Service (AKS): A managed Kubernetes orchestration service for deploying, scaling, and managing containerized applications at scale. Azure manages the control plane (API server, etcd, scheduler); you manage the worker nodes and application containers
- Azure Functions: A serverless compute service that runs event-driven code without provisioning infrastructure. You write functions triggered by HTTP requests, timers, queue messages, or database changes. Pay only for execution time (consumption plan) — zero cost when idle
Key Concepts
- Azure Virtual Network (VNet): The fundamental building block for private networking in Azure. VNets enable Azure resources to communicate with each other, with the Internet, and with on-premises networks. You define address spaces and subnets. VNets are region-scoped and can be connected via VNet peering
- VPN Gateway: Sends encrypted traffic between an Azure VNet and an on-premises network over the public Internet using IPsec/IKE tunnels (site-to-site VPN) or allows individual clients to connect (point-to-site VPN). Cost-effective for moderate bandwidth needs
- Azure ExpressRoute: A private, dedicated connection from your on-premises infrastructure to Azure that does not traverse the public Internet. Provides higher bandwidth (up to 100 Gbps), lower latency, and more reliability than VPN. Used by enterprises with strict performance or compliance requirements
- Network Security Groups (NSGs): Act as a virtual firewall for controlling inbound and outbound traffic to Azure resources. NSG rules filter by source/destination IP, port, and protocol. Applied at the subnet or network interface level. Evaluated by priority (lowest number = highest priority)
- Azure DNS: A hosting service for DNS domains that provides name resolution using Microsoft's global network of DNS servers. Supports both public DNS zones (Internet-facing) and private DNS zones (VNet-internal name resolution). Does not allow purchasing domain names — use App Service Domains or a third-party registrar
Key Concepts
- Azure Blob Storage: Object storage optimized for massive amounts of unstructured data — images, videos, documents, backups, and logs. Access via REST API, Azure CLI, or client libraries. Supports three blob types: Block (most common), Append (log files), and Page (VHD disks)
- Azure File Storage: Fully managed file shares accessible via the SMB (Server Message Block) and NFS protocols. Can be mounted by cloud and on-premises machines simultaneously. Ideal for replacing or extending on-premises file servers and for shared configuration files
- Azure Queue Storage: A simple message queuing service for storing large numbers of messages (up to 64 KB each). Enables asynchronous communication between application components. Used to decouple services and handle traffic spikes gracefully
- Azure Table Storage: A NoSQL key-value store for semi-structured data. Provides fast access to large datasets without the complexity of a relational database. Suitable for flexible datasets like user profiles, device information, or metadata. Azure Cosmos DB Table API offers a premium alternative
- Storage Access Tiers: Hot (frequently accessed, highest storage cost, lowest access cost), Cool (infrequently accessed, 30-day minimum, lower storage cost), Cold (rarely accessed, 90-day minimum, lower storage cost than Cool), and Archive (rarely accessed, 180-day minimum, lowest storage cost, highest access cost and retrieval latency of hours)
Key Concepts
- Azure SQL Database: A fully managed PaaS relational database engine based on Microsoft SQL Server. Handles patching, backups, monitoring, and high availability automatically. Supports elastic pools for cost-efficient management of multiple databases with varying usage patterns
- Azure Cosmos DB: A globally distributed, multi-model NoSQL database designed for low-latency and high-throughput applications. Supports multiple APIs: SQL (Core), MongoDB, Cassandra, Gremlin (graph), and Table. Offers single-digit millisecond reads and writes with guaranteed 99.999% availability
- Azure Database for MySQL: A managed MySQL community edition service in the cloud. Provides built-in high availability, automated backups, and scaling without application downtime. Fully compatible with existing MySQL tools, drivers, and applications
- Azure Database for PostgreSQL: A managed PostgreSQL service available in two deployment modes: Single Server (general-purpose) and Flexible Server (recommended, offers more control over maintenance, HA configurations, and cost optimization). Supports PostgreSQL extensions and is compatible with existing tools
- Choosing the Right Database: Use Azure SQL for relational workloads requiring T-SQL compatibility. Use Cosmos DB for globally distributed, low-latency NoSQL workloads. Use MySQL or PostgreSQL managed services for applications already built on those open-source engines. The managed service approach reduces administrative overhead for all options
☁ Scenario — choosing the right compute service for a web migration
Situation: A startup runs a Django web app on 3 on-premises Linux VMs. They want to move to Azure, eliminate OS patching overhead, and add auto-scaling for Black Friday traffic spikes — without rewriting any code.
Answer: Azure App Service (PaaS). No OS to patch, built-in auto-scale rules, HTTPS/TLS included, custom domain support, and Django is natively supported. The team just pushes code.
Why not the others? Azure VMs would preserve the existing setup but keep the patching burden. Azure Functions would require rewriting the app as event-driven handlers — too invasive. AKS is powerful but adds Kubernetes operational complexity they don't need. App Service is the sweet spot: less management than VMs, less rework than Functions, less ops than AKS.
Exam pattern: "Reduce infrastructure management" + "no code changes" = App Service. "Respond to events/triggers" = Functions. "Full OS control" = VMs.
- Compute decision tree: VMs = full control / lift-and-shift, App Service = managed web apps, Functions = event-driven serverless, AKS = orchestrated containers.
- Storage tiers (Hot / Cool / Cold / Archive) trade off retrieval cost vs storage cost; Archive is cheapest to store but takes hours to rehydrate.
- Cosmos DB is global, multi-model, single-digit-ms latency — pick it whenever the scenario says "globally distributed" or "low-latency NoSQL". SQL Database is single-region (or read replicas) relational.
04
Azure Identity & Security
3 lessons
Key Concepts
- Microsoft Entra ID (formerly Azure AD): Microsoft's cloud-based identity and access management service. It authenticates users and provides single sign-on (SSO) to thousands of cloud applications including Microsoft 365, Azure, and third-party SaaS apps. Every Azure subscription is associated with an Entra ID tenant
- Tenants, Users & Groups: A tenant represents an organization and contains users, groups, and registered applications. Users can be members (internal to the org) or guests (B2B collaboration). Groups simplify access management by assigning permissions to a group rather than individual users
- Multi-Factor Authentication (MFA): Requires two or more verification methods: something you know (password), something you have (phone, security key), or something you are (biometrics). MFA blocks 99.9% of identity-based attacks and is strongly recommended for all users
- Single Sign-On (SSO): Users authenticate once and gain access to all connected applications without re-entering credentials. Reduces password fatigue, improves user experience, and centralizes access management. Supported via SAML, OIDC, and OAuth 2.0 protocols
- Conditional Access: Policies that enforce access requirements based on signals such as user location, device compliance, application sensitivity, and risk level. Example: require MFA when signing in from outside the corporate network or block access from untrusted countries entirely
Key Concepts
- Defense in Depth: A layered security strategy where multiple layers of protection slow down an attacker. The layers from outer to inner: physical security, identity & access, perimeter (DDoS, firewalls), network (segmentation, NSGs), compute (secure VMs, patching), application (secure coding, vulnerability scanning), and data (encryption at rest and in transit)
- Zero Trust Model: Assumes breach — never trust, always verify. Every access request is fully authenticated, authorized, and encrypted regardless of where it originates. The three principles: verify explicitly (authenticate based on all available data points), use least privilege access (JIT/JEA), and assume breach (minimize blast radius, segment access)
- Least Privilege Access: Users and applications receive only the minimum permissions needed to perform their tasks. Just-In-Time (JIT) access grants elevated permissions temporarily when needed and revokes them automatically. Reduces the attack surface and limits damage from compromised accounts
- Network Segmentation: Dividing the network into smaller segments using VNets, subnets, and NSGs to limit lateral movement if an attacker compromises one segment. Microsegmentation applies fine-grained policies at the workload level for even tighter control
- Encryption: Data should be encrypted at rest (Azure Storage Service Encryption, Azure Disk Encryption using BitLocker/DM-Crypt) and in transit (TLS/SSL for data moving between services or to users). Azure manages encryption keys by default, or you can bring your own keys (BYOK) via Azure Key Vault
Key Concepts
- Microsoft Defender for Cloud: A unified security management and threat protection platform that provides a secure score, security recommendations, and advanced threat detection across Azure, hybrid, and multi-cloud environments. Continuously assesses your resources and provides prioritized hardening recommendations
- Azure Key Vault: A cloud service for securely storing and managing secrets (API keys, passwords, connection strings), encryption keys, and SSL/TLS certificates. Applications authenticate to Key Vault using managed identities to retrieve secrets programmatically, avoiding hardcoded credentials in code
- Azure DDoS Protection: Provides defense against Distributed Denial of Service attacks. The Basic tier is free and automatically enabled for all Azure services. The Standard tier adds advanced mitigation capabilities, real-time attack metrics, diagnostic logging, and application-layer protection for public-facing resources
- Azure Firewall: A managed, cloud-native network security service that provides stateful packet inspection, application-level filtering (FQDN), threat intelligence-based filtering, and centralized network policy enforcement. Integrates with Azure Monitor for logging and analytics
- Microsoft Sentinel: A cloud-native SIEM (Security Information and Event Management) and SOAR (Security Orchestration, Automated Response) solution. Collects security data from across your entire organization, detects threats using AI and analytics, investigates incidents, and automates responses through playbooks
☁ Scenario — enforcing MFA only for remote logins via Conditional Access
Situation: The CFO accesses Azure-hosted financial reports from a personal laptop at home. Company policy: MFA is required whenever a sign-in originates outside the corporate office network. Office logins should remain seamless (no MFA prompt).
Conditional Access policy: Users = Finance group; Cloud apps = All cloud apps; Conditions → Locations = "Any location" EXCEPT the Named Location "Corporate HQ" (defined as the office's public IP range); Grant = Require multi-factor authentication. Save and enable the policy.
Result: Sign-ins from the office IP match the exception and pass through without MFA. Remote sign-ins (home, coffee shop, hotel) fall into "any location not excluded" and trigger the MFA grant control automatically.
Exam note: Conditional Access evaluates signals — user identity, device compliance, location, risk level — and returns a grant or block decision in real time. It is the Zero Trust policy engine in Entra ID.
- Entra ID ≠ on-prem AD: it's a cloud identity service for SaaS apps; you'd use Entra Domain Services if you need traditional AD features (Kerberos, LDAP) in the cloud.
- Conditional Access is the policy engine — combine signals (user, device, location, risk) with controls (require MFA, block, require compliant device) to gate access dynamically.
- Zero Trust's three pillars: verify explicitly (auth every request), least privilege (just-in-time, just-enough), assume breach (segment + monitor everything).
05
Azure Governance & Compliance
3 lessons
Key Concepts
- Azure Policy: A service that creates, assigns, and manages policies to enforce rules and ensure compliance across your Azure resources. Policies can audit existing resources and prevent non-compliant deployments. Example: enforce that all resources must be deployed in specific regions or require specific tags on resources
- Policy Definitions & Initiatives: A policy definition is a single rule (e.g., "allowed VM sizes"). An initiative is a collection of related policy definitions grouped together for a larger compliance goal (e.g., "ISO 27001 compliance" initiative containing multiple policies). Initiatives simplify management and assignment
- Policy Effects: Deny (block non-compliant resource creation), Audit (allow but flag as non-compliant), Append (add fields to a resource, e.g., add a tag), DeployIfNotExists (deploy a remediation resource if missing), and Disabled (policy is not enforced). Understanding effects is key to predicting policy behavior
- Azure Blueprints: A declarative way to orchestrate the deployment of resource templates, role assignments, policy assignments, and resource groups as a single versioned package. Blueprints enable repeatable, governed environment setup — ideal for new subscription provisioning and compliance-ready environments
- Compliance Dashboard: Azure Policy provides a compliance view showing which resources comply with assigned policies and which do not. Non-compliant resources can be remediated manually or automatically using remediation tasks that bring resources into compliance
Key Concepts
- Role-Based Access Control (RBAC): Manages who has access to Azure resources, what they can do, and what scope they can access. RBAC uses role assignments that combine a security principal (user, group, service principal, managed identity), a role definition (collection of permissions), and a scope (management group, subscription, resource group, or resource)
- Built-In Roles: Owner (full access including the ability to assign roles), Contributor (full access except assigning roles), Reader (view-only access), and User Access Administrator (manage user access only). Hundreds of service-specific roles exist (e.g., Virtual Machine Contributor, Storage Blob Data Reader)
- Custom Roles: When built-in roles do not match your needs, create custom role definitions with precisely the permissions required. Custom roles are defined in JSON and can be scoped to management groups, subscriptions, or resource groups
- Resource Locks: Prevent accidental deletion or modification of critical resources. Two lock levels: CanNotDelete (resources can be read and modified but not deleted) and ReadOnly (resources can only be read — no modifications or deletions). Locks are inherited from parent scopes
- Scope & Inheritance: RBAC and locks applied at a higher scope (management group or subscription) automatically apply to all child resources. The most permissive role wins when a user has multiple role assignments. Locks apply regardless of RBAC role — even an Owner cannot delete a CanNotDelete-locked resource without first removing the lock
Key Concepts
- Azure Pricing Calculator: A web-based tool for estimating the monthly cost of Azure services before deployment. You select services, configure options (region, tier, instance count), and get a detailed cost estimate. Essential for budgeting and proposals
- Total Cost of Ownership (TCO) Calculator: Compares the cost of running workloads on-premises versus in Azure. Input your current infrastructure (servers, databases, storage, networking) and the TCO calculator shows potential savings from migrating to Azure over 1-5 years, including labor, electricity, and datacenter costs
- Azure Cost Management + Billing: A built-in Azure service for monitoring, allocating, and optimizing cloud spending. Provides cost analysis dashboards, budget alerts (notify when spending approaches thresholds), and recommendations for cost optimization. Supports exporting cost data for external analysis
- Tags for Cost Tracking: Name-value pairs applied to Azure resources for organizing and categorizing. Common tags: Environment (prod, dev), Department (finance, engineering), CostCenter, Owner. Tags enable cost allocation by filtering and grouping spending in cost analysis reports
- Cost-Saving Strategies: Azure Reservations (commit to 1- or 3-year terms for up to 72% savings on VMs, databases, and storage), Azure Spot VMs (use spare capacity at deep discounts for interruptible workloads), right-sizing (choose the smallest VM size that meets performance needs), and auto-shutdown for dev/test VMs
☁ Scenario — preventing accidental deletion with a resource lock
Situation: A junior admin accidentally deleted the production SQL server last month during routine cleanup, causing 4 hours of downtime. Your task: prevent this from happening again, but don't block routine deployments or configuration updates.
Solution: CanNotDelete lock on the production resource group. Navigate to the resource group → Settings → Locks → Add lock. Set Name = "prod-no-delete", Lock type = CanNotDelete. Save.
Why CanNotDelete and not ReadOnly? ReadOnly would block all writes including legitimate deployments and configuration changes — too restrictive. CanNotDelete allows updates and new resource creation, but prevents the delete operation. Developers can still deploy; they just can't destroy production resources.
Key exam point: Locks override RBAC. Even a user with the Owner role cannot delete a CanNotDelete-locked resource without first removing the lock — which creates an intentional, auditable pause before destruction.
- Azure Policy is preventive (denies the deployment) — RBAC is permission-based (who can do what). Use Policy for "must" rules, RBAC for "who". Both inherit downward through the hierarchy.
- Resource locks:
ReadOnlyblocks modifications,CanNotDeleteblocks delete but allows changes. They override RBAC — even an Owner can't delete a locked resource without removing the lock first. - Budgets in Cost Management trigger alerts, not hard caps — they don't stop services. To actually stop spending you must script automation against the alert (e.g., shut down VMs).
06
Monitoring & Management Tools
3 lessons
Key Concepts
- Azure Portal: A web-based graphical interface for managing all Azure resources. Provides dashboards, resource creation wizards, and monitoring views. Ideal for visual exploration, one-off tasks, and learning Azure services. Accessible from any browser without installing software
- Azure CLI: A cross-platform command-line tool for managing Azure resources using concise commands (e.g.,
az vm create). Available on Windows, macOS, and Linux. Commands follow a consistentaz [group] [subgroup] [action]pattern. Ideal for scripting and automation in Bash environments - Azure PowerShell: A set of cmdlets for managing Azure resources using PowerShell syntax (e.g.,
New-AzVM). Runs on Windows, macOS, and Linux via PowerShell Core. Preferred by administrators already familiar with PowerShell scripting and the verb-noun naming convention - Azure Cloud Shell: A browser-based shell environment accessible directly from the Azure portal. Supports both Bash (with Azure CLI) and PowerShell. Pre-configured with Azure tools, Git, and common utilities. Requires an Azure Storage account for persisting files across sessions
- Azure Mobile App: Allows you to monitor Azure resources, check service health, run Cloud Shell commands, and respond to alerts from your mobile device. Useful for on-the-go management and receiving push notifications for critical events
Key Concepts
- Azure Monitor: The comprehensive monitoring platform for Azure that collects, analyzes, and acts on telemetry from cloud and on-premises resources. It gathers metrics (numerical time-series data) and logs (structured records of events) into a centralized data platform for visualization, alerting, and automation
- Log Analytics Workspace: A centralized repository for log data collected by Azure Monitor. You query this data using Kusto Query Language (KQL) to search, filter, and analyze logs. KQL queries power dashboards, alerts, and workbooks. Example: find all failed login attempts in the past 24 hours
- Application Insights: An Application Performance Management (APM) feature of Azure Monitor for live web applications. Automatically detects performance anomalies, tracks request rates, response times, failure rates, and dependency calls. Supports .NET, Java, Node.js, Python, and JavaScript applications
- Alerts & Action Groups: Azure Monitor alerts notify you when specific conditions are met in metrics or logs (e.g., CPU > 90% for 5 minutes, or a critical error in application logs). Action groups define what happens when an alert fires: send email/SMS, call a webhook, trigger an Azure Function, or create an ITSM ticket
- Metrics vs Logs: Metrics are lightweight numerical values sampled at regular intervals (e.g., CPU percentage every minute) — ideal for real-time monitoring and alerts. Logs are detailed, time-stamped records (e.g., full request traces, error messages) — ideal for deep investigation and root cause analysis
Key Concepts
- Azure Advisor: A personalized cloud consultant that analyzes your deployed resources and provides actionable recommendations across five categories: Reliability (high availability improvements), Security (vulnerability fixes), Performance (speed optimizations), Cost (spending reduction opportunities), and Operational Excellence (best practices)
- Advisor Recommendations: Each recommendation includes a description, impact level (High, Medium, Low), and step-by-step remediation instructions. Examples: resize underutilized VMs to save costs, enable MFA for admin accounts, configure backups for unprotected databases. Some recommendations can be applied directly from the Advisor dashboard
- Azure Service Health: Provides a personalized view of the health of Azure services and regions that you use. Three components: Azure Status (broad view of all Azure service health), Service Health (targeted view of services and regions you use, with alerts for outages and planned maintenance), and Resource Health (health of your specific individual resources)
- Service Health Alerts: Configure alerts to be notified when Azure service issues, planned maintenance, or health advisories affect your subscriptions and resources. Alerts can trigger email, SMS, webhook, or logic app actions. Proactive notification helps you prepare for maintenance and respond quickly to incidents
- Advisor vs Monitor vs Service Health: Azure Advisor provides proactive best-practice recommendations. Azure Monitor collects and analyzes operational telemetry from your resources. Service Health tracks the health of Azure platform services. Together, they provide a complete picture of optimization, operations, and platform reliability
☁ Scenario — diagnosing a $3,000 bill spike
Situation: The Azure invoice jumped $3,000 more than last month. Your manager wants an explanation before tomorrow's meeting.
Step 1 — Cost Management: Portal → Cost Management + Billing → Cost Analysis. Set date range to current month vs prior month. Group by Resource type → Virtual Machines cost doubled. Group by Resource → four new D4s_v3 VMs show up that weren't there last month, each running 24/7.
Step 2 — Azure Advisor: Portal → Advisor → Cost tab. Advisor flags those same four VMs: average CPU utilization under 5% over the past 7 days. Recommendation: right-size from D4s_v3 ($0.192/hr) to D2s_v3 ($0.096/hr). Estimated monthly saving: $1,800.
Step 3 — Set a budget: Cost Management → Budgets → Create budget at the previous month's baseline + 10%. Add an alert at 80% threshold → email the team. Note: budgets send alerts; they do not stop provisioning.
For the meeting: "Cost spike traced to 4 oversized VMs provisioned for load testing and never shut down. Right-sizing will save $1,800/month. Budget alert added to catch this next time."
- CLI vs PowerShell:
az ...uses dash-flags (cross-platform, JSON output);Get-AzVM-style cmdlets are PowerShell objects (richer pipelines on Windows). Both can do everything the portal does. - Azure Monitor is the umbrella — Metrics (numeric, real-time), Logs (KQL queryable via Log Analytics workspaces), Alerts (rules on metrics/logs), and Application Insights (app-level APM) all live under it.
- Azure Advisor surfaces five recommendation pillars: cost, security, reliability, operational excellence, performance. Service Health alerts you to platform-side incidents affecting your subscription.