Azure DP-100: Data Scientist Associate Exam Guide

What DP-100 actually tests

The DP-100 exam measures your ability to use Azure Machine Learning (Azure ML) to build, train, deploy, and manage machine learning models in a production environment. It is not a pure data science theory exam — you will not be tested on the mathematics of gradient descent or the derivation of backpropagation. It is an engineering certification: can you navigate the Azure ML workspace, configure the right compute, manage data assets, run training jobs, evaluate models responsibly, and deploy to endpoints that can handle real traffic?

Microsoft revised the DP-100 objectives significantly in late 2023, replacing the older Azure ML v1 SDK content with the Azure ML SDK v2 and the Azure ML CLI v2. Candidates who studied for the original DP-100 using Python v1 SDK tutorials will find the exam now expects azure-ai-ml rather than azureml-sdk, YAML-defined jobs rather than RunConfiguration objects, and MLflow for experiment tracking rather than the legacy run logging API. If your study materials reference RunConfiguration or estimators, they are out of date.

The five domains (2025–2026 objectives)

Domain 1 — Design a machine learning solution (20–25%)

This domain tests your ability to define the right Azure ML architecture before writing a single line of training code. Key topics include:

Selecting the appropriate compute type: compute instances for interactive development, compute clusters for training jobs, serverless compute for cost optimisation, and Kubernetes-attached compute for hybrid scenarios.
Configuring Azure ML workspace components: datastores, data assets, environments, and the role of the workspace registry for sharing assets across teams.
Designing for cost efficiency: choosing between on-demand vs. dedicated clusters, low-priority VMs, and autoscaling policies.
Connectivity and security: private endpoints, virtual network integration, managed identity for data access, and role assignments in the workspace.

Domain 2 — Explore data and train models (35–40%)

The heaviest domain, and the one where most of the SDK v2 content lives. You need to be comfortable with the full training workflow:

Data access patterns — Creating and consuming mltable and uri_file/uri_folder data assets; reading from Azure Blob Storage, Azure Data Lake Gen2, and Azure SQL via registered datastores.
AutoML — Configuring AutoML jobs for classification, regression, and time-series forecasting; understanding featurization, validation splits, exit criteria, and primary metric selection. AutoML appears more heavily on the exam than many candidates expect.
Custom training jobs — Writing command and sweep jobs with the SDK v2; structuring training scripts to accept arguments and log metrics via mlflow.log_metric(); building reusable environments from curated images vs. custom Docker images.
Hyperparameter tuning — Configuring sweep jobs with random, grid, and Bayesian sampling; early termination policies (Bandit, Median, Truncation Selection); understanding the trade-off between exploration and exploitation.
MLflow experiment tracking — Logging parameters, metrics, artefacts, and models with MLflow autologging vs. explicit logging; comparing runs in the Azure ML Studio UI.
Pipelines — Defining multi-step pipelines with the @pipeline decorator or YAML; passing data between steps; scheduling and triggering pipeline runs.

Domain 3 — Prepare a model for deployment (20–25%)

Before a model ships, it needs to be evaluated, registered, and packaged. This domain covers:

Model registration — Registering models from training runs, from local files, or from external sources; versioning and tagging; the difference between registering as mlflow_model vs. custom_model.
Responsible AI evaluation — Using the Azure ML Responsible AI dashboard to assess fairness (disaggregated metrics across cohorts), model interpretability (SHAP-based feature importances), error analysis (tree-based error explorer), and counterfactual analysis. The exam tests your ability to read the dashboard and draw the right conclusion — not the underlying mathematics.
Model packaging — Writing scoring scripts (init() and run() functions) for custom deployment; understanding when MLflow model packaging removes the need for a scoring script entirely.

Domain 4 — Deploy and retrain a model (10–15%)

The smallest domain by weight but high-stakes in production scenarios:

Managed online endpoints — Deploying to managed online endpoints with blue/green traffic splitting; configuring instance count, SKU, and autoscaling; testing with the Azure ML Studio test UI and REST client.
Batch endpoints — When to use batch vs. online endpoints; configuring parallel scoring for large datasets; monitoring batch endpoint run status.
Model monitoring — Setting up data drift and prediction drift monitors; configuring baseline datasets; understanding alert thresholds and the signal types (data drift, prediction drift, feature attribution drift).
Retraining triggers — Automating retraining with pipeline schedules or monitoring alerts; the pattern of re-registering a retrained model and updating endpoint traffic allocation.

Prerequisites and the broader Azure certification path

Microsoft does not enforce prerequisites for DP-100, but candidates without prior Azure experience will struggle. In practice, DP-100 expects you to already understand core Azure compute and storage concepts at approximately the AZ-900 level — what a storage account is, how resource groups work, what a service principal does. Candidates who also hold or are preparing for AZ-104 will find the workspace networking and RBAC sections much easier.

Within the data and AI certification stack, DP-100 sits at the associate tier. The expert-level extension is the Azure AI Engineer Associate (AI-102), which covers the Azure AI Services portfolio — Computer Vision, Language, Document Intelligence, and Azure OpenAI — complementing DP-100’s focus on classical and custom ML workflows. Many senior Azure ML engineers hold both.

Where DP-100 sits in the Azure AI certification stack

AZ-900 — Azure Fundamentals (prerequisite knowledge, not required)
DP-100 — Azure Data Scientist Associate ← you are here
AI-102 — Azure AI Engineer Associate (Azure AI Services + Azure OpenAI)
DP-203 — Azure Data Engineer Associate (Synapse, Data Factory, Databricks)
AZ-305 — Azure Solutions Architect Expert (enterprise architecture, includes AI/ML patterns)

Salary data for Azure Data Scientists in 2026

DP-100 has moved from a “nice to have” to a line-item requirement in enterprise Azure AI job descriptions over the past two years. The shift accelerated as organisations moved Azure OpenAI fine-tuning and RAG pipelines into Azure ML managed compute, making the workspace skills directly transferable to generative AI workloads.

Current market compensation for DP-100 holders in 2026:

Azure Data Scientist (mid-level, 3–5 years): $120k–$140k base in the US; €75k–€95k in Western Europe.
Senior Azure ML Engineer (5–8 years): $140k–$165k base in the US; €90k–€115k in Western Europe.
Azure AI/ML Architect (DP-100 + AI-102 + AZ-305): $165k–$195k base in the US; roles scarce but compensation at the top of the Azure certification pay scale.

Candidates who pair DP-100 with either AI-102 (for generative AI breadth) or DP-203 (for data pipeline depth) see the strongest offers in 2026, as most enterprise roles now require both the ML engineering component and either a data engineering or applied AI component.

Study strategy: what actually appears on the exam

The DP-100 exam skews heavily toward scenario questions rather than service-knowledge recall. You will rarely be asked “what is the default node count for a compute cluster?” You will be asked “a data science team needs to run 200 parallel hyperparameter sweep trials cost-effectively with minimal idle time between runs — what compute and sampling configuration is correct?”

High-value study areas based on the 2025–2026 objective weights

SDK v2 job syntax (YAML and Python) — You need to recognise correct vs. incorrect command job configurations, input/output declarations, and environment references. Practice writing jobs from scratch, not just reading them.
AutoML configuration — AutoML questions appear disproportionately often relative to the time most candidates spend on it. Know featurization options, primary metrics per task type, and how to interpret the AutoML leaderboard.
MLflow tracking vs. Azure ML logging — The exam tests whether you know when to use mlflow.autolog(), when to call mlflow.log_metric() explicitly, and how these appear in the Azure ML Studio UI.
Responsible AI dashboard interpretation — Practice reading cohort analysis results, identifying model fairness issues from disaggregated metrics, and recommending mitigation strategies. You do not need to code the dashboard — you need to interpret it.
Endpoint types and traffic management — Know when to use online vs. batch endpoints, how to configure blue/green splits, and what happens when an endpoint autoscales.

The candidates who pass DP-100 on first attempt are not the ones who memorised every Azure ML SDK class. They are the ones who understood the end-to-end workflow well enough to reason through scenarios they had never seen — and who spent the bulk of their preparation running actual experiments in an Azure ML workspace, not reading documentation.

Hands-on preparation: the Azure ML workspace advantage

Unlike certifications where you can prepare purely through practice questions, DP-100 rewards candidates who have actually used the Azure ML workspace. Microsoft provides 30-day free trial Azure accounts with $200 credit that is more than sufficient for DP-100 lab work. The complete preparation workflow on free credit:

Week 1: Provision a workspace, create a compute instance, run Jupyter notebooks exploring data with an mltable asset. Register a dataset from Azure Blob Storage. Create a simple command job with the SDK v2 and log metrics with MLflow.
Week 2: Run an AutoML classification job on a sample dataset (Titanic or Heart Disease work well). Explore the leaderboard, inspect the best model, register it. Intentionally configure a mismatched primary metric to understand the error.
Week 3: Build a two-step pipeline — a data preprocessing step that outputs a processed dataset, and a training step that consumes it. Schedule the pipeline. Configure a sweep job with 20 child runs using random sampling and a Bandit early termination policy.
Week 4: Deploy the best model from your sweep to a managed online endpoint. Split traffic 80/20 between two deployments. Set up a data drift monitor against a baseline dataset. Practice reading the Responsible AI dashboard for a classification model.

Four weeks of hands-on work costs approximately $20–40 in Azure compute at the standard student/trial rates. Candidates who complete this workflow arrive at the exam able to reason from experience rather than from memory — which is exactly what the scenario questions reward.

Bottom line for 2026 DP-100 candidates

The DP-100 is a worthwhile investment for any Azure professional moving into ML engineering, data science, or AI platform roles. The exam measures real engineering skill — workspace configuration, SDK v2 fluency, AutoML interpretation, and responsible AI evaluation — rather than theory. Pair it with practice questions to build scenario-recognition speed, and log 20–30 hours of actual Azure ML workspace time to build the intuition the exam expects. The $165 voucher cost and 700/1000 pass threshold make it one of the more accessible expert-adjacent certifications in the Microsoft stack.

Practice DP-100 exam-format questions on Azure Machine Learning, AutoML, MLflow, and MLOps deployment.

Start Practising →