CompTIA Data+ DA0-001

Data+ is the certification that separates people who consume data from people who work with it.

Every organisation collects data. The people who know how to query a database to extract the right subset, profile a dataset to identify quality issues before analysis begins, choose the correct chart type to make a trend visible rather than obscure it, and apply the right statistical test to a business question—those are the people CompTIA Data+ certifies. The exam is not a data science credential; it does not require machine learning implementation or Python programming at depth. Data+ tests the analytical reasoning and tooling knowledge that separates a data-literate professional from someone who can only read reports others have built.

The credential targets three audiences: IT generalists adding data skills to their profile, business analysts formalising self-taught analytics knowledge with a vendor-neutral certification, and career changers transitioning from operations, finance, or support roles into data analyst positions. The exam assumes familiarity with spreadsheets and basic database concepts but does not require prior certification. Candidates who have passed CompTIA A+, Network+, or Security+ will recognise the exam format and question style, though Data+ question scenarios are heavier on interpretation and reasoning than on memorised definitions.

What DA0-001 tests: the five domains

Domain 1 — Data Concepts and Environments (15%)

The foundational domain covering the landscape of data systems, storage formats, and processing paradigms. Questions test whether candidates can identify the right tool for a given data scenario rather than memorise definitions.

Structured, semi-structured, and unstructured data: Structured data lives in relational databases with a fixed schema—every row has the same columns. Semi-structured data has a flexible schema defined in the data itself (JSON, XML, CSV with variable fields). Unstructured data (images, audio, video, email text) has no predefined schema. The exam tests which format matches a described source system: a customer transaction table is structured; a sensor log in JSON is semi-structured; call centre recordings are unstructured. Processing unstructured data for analytics requires transformation into a queryable format, which is a frequent scenario question pattern.
Databases: relational vs. non-relational: Relational databases (RDBMS) store data in normalised tables with primary and foreign key relationships, enforcing referential integrity. SQL is the query language. Non-relational databases (NoSQL) optimise for specific access patterns: document stores (MongoDB) for flexible JSON-like schemas, key-value stores (Redis) for fast lookup by ID, column-family stores (Cassandra) for wide-row time-series data, and graph databases (Neo4j) for relationship traversal. The exam tests when to recommend each type: high-volume, low-latency lookups by a known key map to key-value; complex many-to-many relationships map to relational or graph; rapidly evolving schemas map to document.
Data warehouses, data marts, and data lakes: A data warehouse stores integrated, cleaned, historical data optimised for analytical queries—OLAP workloads with aggregations across large time ranges. A data mart is a subject-specific subset of a warehouse (sales data mart, finance data mart) serving a specific business unit with pre-aggregated metrics. A data lake stores raw data in its native format (structured, semi-structured, or unstructured) at low cost, with schema applied on read rather than on write. The exam tests the distinction: a request for a single source of truth for historical reporting maps to data warehouse; a request for low-cost storage of raw sensor streams for future analysis maps to data lake; a request for a department-specific reporting database maps to data mart.
ETL vs. ELT and OLTP vs. OLAP: ETL (Extract, Transform, Load) transforms data before loading into the target system—appropriate when the destination is a traditional data warehouse with a rigid schema. ELT (Extract, Load, Transform) loads raw data first and transforms it within the destination system—preferred for cloud data warehouses (Snowflake, BigQuery, Redshift) with sufficient compute to transform at query time. OLTP (Online Transaction Processing) databases handle high-volume, low-latency read/write operations for application transactions. OLAP (Online Analytical Processing) databases handle complex aggregation queries across large historical datasets for reporting. The exam tests which processing model fits a given workload: processing customer orders in real time maps to OLTP; running a quarterly sales summary across three years of transactions maps to OLAP.
Common file formats: CSV (comma-separated values) for flat tabular data, simple to produce but schema-free; JSON (JavaScript Object Notation) for nested, flexible structures; XML (Extensible Markup Language) for hierarchical data with namespace validation; Parquet (columnar binary format) for efficient analytical queries on large datasets in data lakes; Avro (row-based binary format with schema) for streaming data pipelines. The exam tests which format is appropriate for a described use case—Parquet for analytical queries on a data lake, CSV for simple flat exports from spreadsheet tools, JSON for API responses with nested objects.

Domain 2 — Data Mining (25%)

The heaviest domain by weight, covering the practical mechanics of acquiring, profiling, manipulating, and cleaning datasets. This is where the exam tests whether candidates can actually work with data, not just describe it.

SQL fundamentals: Data+ tests SQL at the level required to extract and manipulate data—not database administration or query optimisation. The core topics: SELECT statements with WHERE filters, ORDER BY, GROUP BY, and HAVING for aggregation. Aggregate functions: COUNT, SUM, AVG, MIN, MAX. Joins: INNER JOIN (returns rows matching in both tables), LEFT JOIN (returns all rows from the left table and matching rows from the right; unmatched right-side columns are NULL), RIGHT JOIN (opposite), and FULL OUTER JOIN (returns all rows from both tables). Subqueries (a SELECT nested inside another SELECT) and basic views. The exam presents a data requirement and asks which SQL construct or join type satisfies it.
Data profiling: Profiling a dataset before analysis identifies its characteristics and potential quality issues. Key profiling activities: checking row count and column count; examining data types (are numeric columns stored as text?); computing value distributions (frequency counts, min/max, percentiles); identifying null counts per column; detecting duplicate rows; checking referential integrity (do foreign key values exist in the referenced table?). The exam tests which profiling output reveals a specific problem: a column with 40% null values has a completeness issue; a numeric column containing the string “N/A” has a data type issue; two tables with the same customer appearing under different IDs have a deduplication issue.
Data manipulation: Transforming raw data into an analysis-ready format. Filtering: removing rows outside the scope of the analysis (date ranges, geographic regions, product categories). Sorting: ordering rows by one or more columns for ranking or sequential analysis. Aggregating: collapsing rows into summary statistics (total sales by region, average order value by month). Pivoting: rotating row values into columns to create a cross-tabulation (monthly sales per product as columns, each row representing a sales rep). Transposing: converting wide-format data (one column per time period) to long format (one row per time period) for consistent analysis across time series. These transformations are frequently tested through scenario questions describing a messy dataset and asking which manipulation step is needed next.
Data cleansing: Handling the four most common data quality problems. Null values: replace with a calculated substitute (mean imputation for continuous data, mode imputation for categorical data), remove rows with nulls when the null percentage is low and the missing data is not systematic, or leave as null and exclude from calculations where appropriate. Duplicates: identify duplicate rows using composite keys (customer ID + order date + amount) and deduplicate by retaining one record per unique combination—the most recent or the one with the most complete data. Outliers: values more than 2–3 standard deviations from the mean may be errors (a customer age of 250 is a data entry error) or valid extremes (a single very large transaction) requiring domain judgment before removal. Inconsistent formatting: date formats mixed between MM/DD/YYYY and YYYY-MM-DD, city names with varying capitalisation or abbreviations, currency values with and without symbols—all require standardisation before joining or aggregating across sources.
Data acquisition methods: How data enters an analysis pipeline. Manual data entry: high error rate, requires validation rules at input. Database query: precise extraction from an existing RDBMS using SQL. API calls: retrieving data from external services in JSON or XML; requires authentication (API keys, OAuth) and rate limit management. File imports: CSV, Excel, or JSON files delivered via FTP, S3, or email—batch processing with scheduled imports. Web scraping: extracting data from HTML pages using parsing tools; appropriate only when no API or structured export exists and terms of service permit it. The exam tests which acquisition method is appropriate for a given data source scenario.

Domain 3 — Data Analysis (23%)

This domain tests statistical reasoning, analysis type selection, and the ability to interpret results correctly. Data+ does not require advanced statistics at the level of a data scientist, but it does require understanding the right tool for each analysis question.

Descriptive statistics: Summarise the central tendency and spread of a dataset. Measures of central tendency: mean (arithmetic average, sensitive to outliers), median (middle value when sorted, robust to outliers), and mode (most frequent value, useful for categorical data). Measures of spread: range (max minus min), variance (average squared deviation from the mean), standard deviation (square root of variance, in the same units as the data), and interquartile range (IQR, the range of the middle 50% of values, robust to outliers). The exam tests when to use each measure: a salary dataset with a few executives earning ten times the median benefits from median over mean; a manufacturing process tolerance analysis benefits from standard deviation to quantify variation; a dataset with heavy outliers benefits from IQR for spread description.
Types of analysis: The four analytical approaches map to progressively harder business questions. Descriptive analysis answers “what happened?”—summarising historical data without interpretation (total monthly sales, customer count by region, product return rate). Diagnostic analysis answers “why did it happen?”—drilling into the data to find causes of an observed outcome (why did sales drop 15% in Q3? correlation with a competitor launch, a supply chain delay, or a marketing budget cut). Predictive analysis answers “what will happen?”—using historical patterns to forecast future outcomes (next quarter sales volume, customer churn probability). Prescriptive analysis answers “what should we do?”—recommending actions based on predicted outcomes (which customers to target with a retention offer, how to allocate inventory across regions). The exam tests which analysis type matches a described business question.
Correlation and regression: Correlation measures the strength and direction of the linear relationship between two variables. Pearson correlation coefficient (r) ranges from −1 (perfect negative correlation) to +1 (perfect positive correlation), with 0 indicating no linear relationship. Positive correlation: as one variable increases, the other tends to increase (advertising spend and sales). Negative correlation: as one variable increases, the other tends to decrease (price and demand). Correlation does not imply causation—the exam tests this distinction frequently. Linear regression models the relationship between a dependent variable (what you want to predict) and one or more independent variables (predictors), producing a line that minimises the sum of squared residuals. Multiple regression extends this to several predictors simultaneously.
Hypothesis testing concepts: Hypothesis testing evaluates whether an observed result is statistically significant or within the range of chance variation. The null hypothesis (H⊂0;) states there is no effect or difference; the alternative hypothesis (H⊂a;) states there is an effect. A p-value below the significance threshold (typically 0.05) means the result is unlikely under the null hypothesis, supporting rejection of H⊂0;. Type I error (false positive): rejecting a true null hypothesis—concluding a treatment works when it does not. Type II error (false negative): failing to reject a false null hypothesis—missing a real effect. The exam tests these concepts at a definitional level rather than requiring manual calculation of test statistics.
Trend analysis and time series: Identifying patterns in data over time. Trend: the long-term direction of a metric (growing, declining, or stable). Seasonality: predictable periodic fluctuations tied to the calendar (higher retail sales in Q4, lower service demand in summer). Cyclical variation: longer-term fluctuations not tied to a fixed calendar period. Noise: random variation that cannot be predicted. Moving averages smooth out short-term fluctuations to make the underlying trend more visible—a 3-month moving average averages the current month with the two prior months. Year-over-year (YoY) comparison removes seasonal effects by comparing the same period in consecutive years. The exam tests which analysis technique answers a specific trend or seasonality question.

Domain 4 — Visualization (23%)

Tied with Data Analysis at 23%, the Visualization domain tests chart selection, design principles, and dashboard thinking. The most common failure mode on Data+ is knowing the data but presenting it in a chart type that obscures the insight rather than revealing it.

Chart type selection: Each chart type matches a specific comparison or relationship. Bar chart (vertical or horizontal): compare discrete categories—sales by region, headcount by department. Use a horizontal bar when category labels are long. Line chart: show change over time for continuous data—monthly revenue, daily active users. Multiple lines compare trends across groups. Scatter plot: show the relationship between two continuous variables—advertising spend vs. sales, temperature vs. energy consumption. A trend line overlaid on the scatter reveals correlation direction. Pie chart: show proportional parts of a whole—market share by vendor, budget allocation by department. Limit to 5–6 slices; more than that obscures the comparison. Histogram: show the frequency distribution of a single continuous variable—age distribution of customers, delivery time distribution. Not a bar chart: bins represent ranges, not discrete categories. Box plot (box-and-whisker): show the distribution of a dataset including median, IQR, and outliers simultaneously—comparing distributions across groups (salary distribution by department). Heat map: show a matrix of values using colour intensity—correlation matrices, website click density, time-of-day sales patterns. Waterfall chart: show cumulative effect of sequential positive and negative values—profit/loss attribution, budget variance analysis. The exam presents a described dataset and asks which chart type best communicates the intended insight.
Design best practices: Effective data visualisation communicates the insight without requiring the reader to interpret it. Use descriptive titles that state the insight ("Q3 Sales Down 12% Year-Over-Year") rather than the data source ("Q3 Sales"). Label axes with units. Use consistent colour for the same category across all charts in a report. Avoid chartjunk: 3D effects, unnecessary gridlines, decorative images, and shadow effects that add visual noise without adding information. Use colour purposefully: single-hue sequential scales for ordered continuous data, diverging scales for data with a meaningful midpoint, categorical palettes for nominal groups. Ensure sufficient contrast for accessibility—do not rely solely on colour to distinguish data series.
Dashboard design: A dashboard surfaces the most important metrics for a specific audience and decision context. Key design principles: place the most important metric in the top-left (reading order); group related metrics visually; use consistent formatting for all numbers of the same type (currency, percentage, count); include date-range context for every metric so viewers know what period is displayed; provide drill-down paths from summary metrics to supporting detail. The exam tests which dashboard component serves a specific audience need: an executive summary dashboard shows KPIs and trend direction; an operational dashboard shows real-time or near-real-time counts and alerts; an analytical dashboard provides filters and drill-down for exploratory analysis.
Storytelling with data: Data storytelling structures analysis findings as a narrative. The three components: data (the evidence), narrative (the explanation of what the data means), and visuals (the charts that make the data accessible). A well-structured data story opens with the business question, presents the relevant data with appropriate visualisation, draws a clear conclusion, and recommends an action. The exam distinguishes between descriptive storytelling (what the data shows) and persuasive storytelling (why the audience should act on it). Annotations on charts—highlighting a specific data point with a callout explaining why it matters—guide the audience to the insight without requiring them to spot it independently.

Domain 5 — Data Governance, Quality, and Controls (14%)

The smallest domain by weight but increasingly prominent in job descriptions as organisations face regulatory scrutiny over how data is collected, stored, and used. The exam tests the vocabulary and frameworks of data governance rather than implementation depth.

Data classification: Organisations classify data by sensitivity to determine appropriate handling. Common classification tiers: Public—freely shareable with no restrictions (published product catalogues, press releases); Internal—for employees only, not intended for external audiences (internal wikis, operational procedures); Confidential—sensitive business data requiring access controls (financial forecasts, contract terms, HR data); Restricted—highest sensitivity, access on a strict need-to-know basis (personally identifiable information, payment card data, health records). The exam tests which classification tier applies to a described data type and what handling controls that classification requires.
Data governance framework components: Data governance defines who owns data, who can access it, how it is documented, and how its quality is maintained. Data stewards own the business definitions and quality of specific datasets. Data owners are accountable for data within their business domain and approve access requests. Data catalog: a centralised inventory of available datasets with metadata—description, source, owner, data types, update frequency, and usage restrictions. A data catalog enables data discovery: analysts find the right dataset without having to ask the data team where to look. Data lineage: the record of where data came from and how it was transformed at each stage—from source system to data warehouse to analytical report. Lineage enables impact analysis (what reports break if this table changes?) and regulatory audit (prove where this customer record came from).
Data quality dimensions: The six standard data quality dimensions used to evaluate a dataset. Accuracy: does the data correctly represent the real-world entity? (Is the customer’s address the one they actually live at?) Completeness: are all required fields populated? (Are there nulls in columns that should always have a value?) Consistency: does the same entity appear with the same values across systems? (Does the customer ID in the CRM match the customer ID in the billing system?) Timeliness: is the data current enough for the intended use? (A week-old inventory count is not timely for real-time stock management.) Validity: does the data conform to defined formats, ranges, and reference values? (Is the date in a valid calendar format? Is the age value within a plausible range?) Uniqueness: does each real-world entity appear only once in the dataset? (Are there duplicate customer records for the same person?) The exam tests which quality dimension is violated by a described data problem.
Privacy regulations: Data+ tests awareness of the major privacy frameworks that affect how organisations collect and process personal data. GDPR (General Data Protection Regulation): EU regulation requiring lawful basis for processing personal data, right to access and deletion for individuals, breach notification within 72 hours, and data protection by design. HIPAA (Health Insurance Portability and Accountability Act): US regulation protecting individually identifiable health information (Protected Health Information / PHI) held by covered entities and their business associates. CCPA (California Consumer Privacy Act): California state law giving consumers the right to know what personal data is collected, to opt out of its sale, and to request deletion. The exam tests which regulation applies to a described data scenario and what obligation it creates for the organisation.
Data lifecycle management: Data passes through defined stages from creation to retirement. Creation/capture: data enters the system from a transaction, sensor, form, or external source. Storage: data is persisted in a database, file system, or object store with appropriate access controls. Processing: data is transformed, enriched, and loaded into analytical systems. Use: data is queried, reported on, and consumed by business users and applications. Archival: data that is no longer actively used is moved to lower-cost, lower-performance storage (cold storage, tape archives) but retained for compliance or historical analysis. Destruction: data that has reached end-of-retention is securely deleted or de-identified. The exam tests which lifecycle stage applies to a described activity and what policies govern it.

The most commonly missed Data+ question type: the scenario that describes two analysis approaches and asks which one satisfies the business requirement. Knowing the definition of predictive vs. prescriptive analysis is not enough—you need to apply the distinction to a business context. Practice identifying the question type (what happened / why / what will happen / what should we do) before selecting the analysis method.

Exam format and what to expect

DA0-001 consists of up to 90 questions in a 90-minute window delivered via Pearson VUE at a testing centre or via online proctoring. The passing score is 700 out of 900. Question types include multiple-choice (one correct answer from four), multiple-response (two or more correct answers from four or five options), and performance-based questions (PBQs) that simulate interactions with a dataset or tool. PBQs on Data+ typically require selecting the correct SQL query for a described data retrieval task, identifying the right chart type for a described dataset, or determining which data quality issue is present in a sample dataset. PBQs appear early in the exam and cannot be skipped; candidates who spend too long on PBQs risk running out of time for the multiple-choice section.

CompTIA positions Data+ at the intermediate level—above A+ and Network+ in assumed technical literacy but not requiring deep programming or statistics background. The exam is explicitly vendor-neutral: questions do not test knowledge of specific tools like Tableau, Power BI, or Python libraries. Candidates with experience in any data analysis environment (Excel, Google Sheets, SQL databases, or BI tools) will find the tool-agnostic framing familiar. The 90-minute window is adequate for most candidates; the primary challenge is domain breadth rather than time pressure.

Certification stack and career context

Where Data+ fits in the analytics certification landscape

CompTIA A+ or Network+ (optional prerequisite) — CompTIA recommends 18–24 months of hands-on data analytics experience before sitting Data+, but no formal prerequisites are required. Candidates from an IT background who have A+ or Network+ will recognise the exam format; candidates from a business analytics background may find the IT-adjacent framing of some questions (database types, file formats) less familiar.
Microsoft Power BI Data Analyst (PL-300) — The natural tool-specific complement to Data+. Where Data+ certifies vendor-neutral analytical reasoning, PL-300 certifies Power BI implementation: DAX formula language, Power Query transformations, data model relationships, and report publishing to the Power BI Service. Data+ plus PL-300 is a common pairing for analysts whose primary tool is the Microsoft Power Platform.
Tableau Certified Data Analyst — The Tableau-specific counterpart for organisations standardised on Tableau. The certification tests calculated fields, table calculations, LOD expressions, dashboard design, and data preparation in Tableau Prep. Data+ provides the conceptual foundation; the Tableau cert proves tool proficiency.
Google Professional Data Engineer (GCP PDE) — The advanced progression for data professionals moving from analysis to engineering—building and operating the pipelines that produce the data Data+ analysts query. GCP PDE covers BigQuery, Dataflow, Pub/Sub, and Dataproc: the infrastructure level above the analytical layer Data+ certifies.
AWS Certified Data Engineer – Associate (DEA-C01) — The AWS equivalent for analysts moving toward data engineering on AWS. DEA-C01 covers Glue ETL, Kinesis streaming, Redshift, Lake Formation, and Athena—the AWS service layer that data engineers use to build the pipelines that analysts query.
CompTIA DataSys+ (DS0-001) — CompTIA’s database-specific certification, positioned as a peer to Data+ for professionals focused on database administration and data management rather than analytics. The two certifications complement rather than supersede each other.

Data+ salary data and job market (2026)

Data Analysts and Business Intelligence Analysts holding CompTIA Data+ earn $65k–$130k in 2026 across North American markets, with the wide range reflecting the variation in seniority, industry, and geographic market. Entry-level data analysts in general industries (retail, logistics, healthcare administration) start at $55k–$75k; mid-level analysts with 3–5 years of experience and a tool-specific certification (Power BI or Tableau) reach $80k–$110k; senior analysts and BI leads with 6+ years and business domain expertise reach $110k–$140k. Technology, financial services, and consulting sectors pay 15–25% above these ranges for the same experience level.

The demand for data-literate professionals continues to grow faster than overall IT hiring in 2026. The specific driver is not advanced machine learning but business analytics maturity: organisations that built data warehouses and BI platforms in 2020–2024 are now hiring analysts who can use those systems to produce actionable reports, not just technical engineers who built them. Data+ sits precisely in this gap—it certifies the analytical reasoning that business stakeholders need and IT teams often lack without formal data training. Roles where Data+ appears in job postings: Data Analyst, Business Analyst, BI Analyst, Reporting Analyst, Marketing Analyst, and Operations Analyst.

How to prepare for DA0-001

Data+ rewards candidates who can apply analytical concepts to realistic business scenarios rather than recall definitions in isolation. The exam question style describes a situation—a dataset with a specific quality problem, a business question requiring a specific analysis type, or a report requirement calling for a specific chart—and expects the candidate to reason to the correct answer. Candidates who study only by reading the CompTIA objectives document without practising scenario application consistently score below 700 on the first attempt.

CompTIA official resources: The CompTIA CertMaster Learn for Data+ course covers all five domains with interactive lessons and end-of-lesson assessments. CertMaster Practice provides adaptive question bank access with explanations. The official CompTIA Data+ Study Guide (Mike Chapple and David Seidl) is the most comprehensive single resource covering all exam objectives with worked examples.
SQL practice: Even though Data+ is not a deep SQL exam, the Data Mining domain (25%) tests basic query construction. Practice writing SELECT statements with WHERE, GROUP BY, HAVING, and the four join types (INNER, LEFT, RIGHT, FULL OUTER). Free platforms like SQLZoo, W3Schools SQL exercises, and Mode Analytics SQL Tutorial provide browser-based practice without requiring a local database setup.
Statistics fundamentals: Review the core descriptive statistics concepts (mean, median, mode, standard deviation, IQR) and the analysis type framework (descriptive, diagnostic, predictive, prescriptive) until you can apply them to a business scenario without hesitation. Khan Academy’s statistics course covers exactly the level required for the Data Analysis domain—no calculus or probability theory needed.
Visualization practice: The Visualization domain (23%) requires fast chart type selection. Build a quick reference: bar chart for categorical comparison, line chart for trends over time, scatter for two-variable relationships, histogram for continuous distribution, box plot for group distribution comparison, pie for parts of a whole (limited slices). Practice with free tools—Google Sheets, Excel, or Tableau Public—to build intuition for which chart type makes a specific dataset’s insight visible rather than obscured.
Exam-style practice questions: Use CompTIA-authorised practice tests and third-party question banks to practice the scenario-based question format before the real exam. Aim for consistent 80%+ accuracy on practice tests before scheduling. Pay particular attention to the Governance domain—the vocabulary of data classification, data quality dimensions, and privacy regulations is specific and easily confused without repetition.

Study time estimate for DA0-001

Candidates with active data analysis experience (SQL queries, Excel or BI tool use, report building in a professional context) typically need 4–8 weeks of focused study covering the exam objectives and practising scenario questions. Candidates transitioning from a non-data IT role (network technician, sysadmin, help desk) or from a non-technical business role (finance, operations, marketing) typically need 8–12 weeks, with additional time for SQL fundamentals and statistics concepts. The Governance domain consistently surprises first-time candidates who underestimate it—budget at least one full study session specifically for data classification tiers, the six quality dimensions, and the three major privacy regulations (GDPR, HIPAA, CCPA) with their key obligations.

Practice data analytics and IT certification questions with CertQuests — scenario-based quizzing built for CompTIA and cloud exams.

Browse Practice Packs →