Skip to main content
Stats for Scholars
Concepts Decision Tree Reporting Calculators Blog Software Cheat Sheets
Concepts Decision Tree Reporting Calculators Blog Software Cheat Sheets
Home Concepts Descriptive Statistics

Descriptive Statistics

  • Descriptive Statistics

Inferential Statistics

  • Chi-Square Test of Independence
  • Independent Samples t-Test
  • One-Way ANOVA
  • Paired Samples t-Test
  • Pearson Correlation
  • Simple Linear Regression

Effect Size & Power

  • Effect Size
  • Sample Size Determination
  • Statistical Power & Power Analysis

Reliability & Validity

  • Cronbach's Alpha
  • Inter-Rater Reliability

Descriptive Statistics

beginner Descriptive Statistics

Descriptive Statistics

Purpose
Summarizes and describes the main features of a dataset using measures of central tendency (mean, median, mode) and variability (standard deviation, variance, range).
When to Use
Always — descriptive statistics should be reported for every variable in every study before any inferential tests are conducted.
Data Type
Continuous (interval or ratio) for means and standard deviations; ordinal for medians; nominal for modes
Key Assumptions
Mean and standard deviation assume approximately symmetric distributions without extreme outliers. For skewed distributions, the median and interquartile range are preferred.
Tools
Effect Size Calculator on Subthesis →

What Are Descriptive Statistics?

Descriptive statistics are numerical summaries that describe, organize, and simplify a dataset. They reduce a collection of raw numbers into a few meaningful values that capture the essential features of your data: where the center is, how spread out the values are, and what the distribution looks like.

Every research paper, thesis, or report begins with descriptive statistics. Before you run a t-test, ANOVA, or regression, you must first describe your data. Descriptive statistics serve two purposes:

  1. Summarize the data for your readers so they understand what you measured and what the values look like.
  2. Check assumptions for inferential tests — many statistical procedures require data to be approximately normally distributed, free of outliers, and have adequate variability.

Descriptive statistics fall into two broad families:

  • Measures of central tendency — Where is the center of the distribution? (Mean, median, mode)
  • Measures of variability (dispersion) — How spread out are the values? (Standard deviation, variance, range, interquartile range)

When to Use It

Descriptive statistics are used in every quantitative study. Specifically:

  • Report means and standard deviations for all continuous variables in your study (APA 7th edition requires this).
  • Use medians and interquartile ranges when your data are skewed or contain outliers.
  • Report frequencies and percentages for categorical variables (e.g., 58% female, 42% male).
  • Include a descriptive statistics table in your results section before presenting any inferential analyses.

Assumptions

Descriptive statistics themselves have minimal assumptions, but choosing the right measure depends on the characteristics of your data:

  1. Level of measurement. The mean requires interval or ratio data. The median requires at least ordinal data. The mode can be used with any level of measurement.
  2. Distribution shape. The mean is appropriate for roughly symmetric distributions. For highly skewed data, the median is a better measure of center.
  3. Outliers. The mean is sensitive to extreme values; the median is resistant. If you have outliers, report both and explain the discrepancy.

Formula

Measures of Central Tendency

Mean (Arithmetic Average)

The sum of all values divided by the number of values:

Xˉ=∑i=1nXin=X1+X2+⋯+Xnn\bar{X} = \frac{\sum_{i=1}^{n} X_i}{n} = \frac{X_1 + X_2 + \cdots + X_n}{n} Xˉ=n∑i=1n​Xi​​=nX1​+X2​+⋯+Xn​​

The mean uses every data point, which makes it the most informative measure of center for symmetric data — but also the most sensitive to outliers.

Median

The middle value when data are ordered from lowest to highest. For nnn values:

  • If nnn is odd, the median is the value at position n+12\frac{n+1}{2}2n+1​.
  • If nnn is even, the median is the average of the values at positions n2\frac{n}{2}2n​ and n2+1\frac{n}{2} + 12n​+1.

The median is resistant to outliers and is the preferred measure of center for skewed distributions (e.g., income, reaction times).

Mode

The most frequently occurring value. A distribution can be:

  • Unimodal — one mode
  • Bimodal — two modes (suggesting two subgroups)
  • Multimodal — three or more modes

The mode is the only measure of central tendency that can be used with nominal data (e.g., the most common political affiliation).

Measures of Variability

Range

The simplest measure of spread:

Range=Xmax−Xmin\text{Range} = X_{max} - X_{min} Range=Xmax​−Xmin​

The range uses only the two most extreme values, making it highly sensitive to outliers and unstable across samples.

Variance

The average squared deviation from the mean. For a sample:

s2=∑i=1n(Xi−Xˉ)2n−1s^2 = \frac{\sum_{i=1}^{n}(X_i - \bar{X})^2}{n - 1} s2=n−1∑i=1n​(Xi​−Xˉ)2​

We divide by n−1n - 1n−1 (not nnn) to correct for the bias that occurs when estimating the population variance from a sample. This is called Bessel's correction.

Variance is measured in squared units, which makes it difficult to interpret directly (e.g., "squared years" or "squared points"). That is why we typically take the square root to get the standard deviation.

Standard Deviation

The square root of the variance:

s=∑i=1n(Xi−Xˉ)2n−1s = \sqrt{\frac{\sum_{i=1}^{n}(X_i - \bar{X})^2}{n - 1}} s=n−1∑i=1n​(Xi​−Xˉ)2​​

The standard deviation is expressed in the same units as the original data, making it far more interpretable than the variance. It tells you, on average, how far each data point falls from the mean.

Interquartile Range (IQR)

The range of the middle 50% of the data:

IQR=Q3−Q1\text{IQR} = Q_3 - Q_1 IQR=Q3​−Q1​

Where Q1Q_1Q1​ is the 25th percentile and Q3Q_3Q3​ is the 75th percentile. Like the median, the IQR is resistant to outliers and is preferred for skewed distributions.

The Normal Distribution

Many inferential statistics assume that data follow a normal (Gaussian) distribution — the familiar bell curve. Key properties:

f(x)=1σ2πe−12(x−μσ)2f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2} f(x)=σ2π​1​e−21​(σx−μ​)2

In a normal distribution:

  • About 68% of values fall within ±1\pm 1±1 standard deviation of the mean
  • About 95% of values fall within ±2\pm 2±2 standard deviations
  • About 99.7% of values fall within ±3\pm 3±3 standard deviations

This is known as the 68-95-99.7 rule (or the empirical rule).

You can assess normality using:

  • Visual methods: Histograms, Q-Q plots, box plots
  • Statistical tests: Shapiro-Wilk test (best for n<50n < 50n<50), Kolmogorov-Smirnov test (larger samples)
  • Skewness and kurtosis values: Values between −2-2−2 and +2+2+2 are generally considered acceptable

Worked Example

Scenario: A cognitive psychologist measures reaction time (in milliseconds) for n=10n = 10n=10 participants in a word recognition task.

Raw data (in ms): 420, 385, 510, 390, 400, 395, 880, 410, 405, 415

Step 1: Order the data.

385, 390, 395, 400, 405, 410, 415, 420, 510, 880

Step 2: Compute the mean.

Xˉ=385+390+395+400+405+410+415+420+510+88010=461010=461.0 ms\bar{X} = \frac{385 + 390 + 395 + 400 + 405 + 410 + 415 + 420 + 510 + 880}{10} = \frac{4610}{10} = 461.0 \text{ ms} Xˉ=10385+390+395+400+405+410+415+420+510+880​=104610​=461.0 ms

Step 3: Compute the median.

With n=10n = 10n=10 (even), the median is the average of the 5th and 6th values:

Median=405+4102=407.5 ms\text{Median} = \frac{405 + 410}{2} = 407.5 \text{ ms} Median=2405+410​=407.5 ms

Step 4: Note the discrepancy. The mean (461.0) is considerably higher than the median (407.5). This signals a positively skewed distribution, pulled by the outlier value of 880 ms.

Step 5: Compute the standard deviation.

First, compute each squared deviation from the mean:

XiX_iXi​ Xi−XˉX_i - \bar{X}Xi​−Xˉ (Xi−Xˉ)2(X_i - \bar{X})^2(Xi​−Xˉ)2
385 −76.0-76.0−76.0 5,7765{,}7765,776
390 −71.0-71.0−71.0 5,0415{,}0415,041
395 −66.0-66.0−66.0 4,3564{,}3564,356
400 −61.0-61.0−61.0 3,7213{,}7213,721
405 −56.0-56.0−56.0 3,1363{,}1363,136
410 −51.0-51.0−51.0 2,6012{,}6012,601
415 −46.0-46.0−46.0 2,1162{,}1162,116
420 −41.0-41.0−41.0 1,6811{,}6811,681
510 49.049.049.0 2,4012{,}4012,401
880 419.0419.0419.0 175,561175{,}561175,561

∑(Xi−Xˉ)2=206,390\sum(X_i - \bar{X})^2 = 206{,}390 ∑(Xi​−Xˉ)2=206,390

s=206,39010−1=206,3909=22,932.2=151.4 mss = \sqrt{\frac{206{,}390}{10 - 1}} = \sqrt{\frac{206{,}390}{9}} = \sqrt{22{,}932.2} = 151.4 \text{ ms} s=10−1206,390​​=9206,390​​=22,932.2​=151.4 ms

Step 6: Compute the range and IQR.

Range=880−385=495 ms\text{Range} = 880 - 385 = 495 \text{ ms} Range=880−385=495 ms

Q1Q_1Q1​ (median of lower half: 385, 390, 395, 400, 405) =395= 395=395

Q3Q_3Q3​ (median of upper half: 410, 415, 420, 510, 880) =420= 420=420

IQR=420−395=25 ms\text{IQR} = 420 - 395 = 25 \text{ ms} IQR=420−395=25 ms

Step 7: Interpret the results.

The large discrepancy between the range (495 ms) and the IQR (25 ms) confirms that the extreme value of 880 ms is an outlier. For these data, the median and IQR are better summaries than the mean and SD because the distribution is heavily right-skewed.

Measure Value
Mean 461.0 ms
Median 407.5 ms
Mode None (all unique)
SD 151.4 ms
Variance 22,932.2 ms$^2$
Range 495 ms
IQR 25 ms

Interpretation

Choosing the Right Measure

Data Characteristic Central Tendency Variability
Symmetric, no outliers Mean SD
Skewed or has outliers Median IQR
Nominal (categories) Mode --
Ordinal (ranked) Median IQR

Reading a Standard Deviation

The standard deviation tells you the "typical" distance from the mean. In the worked example, s=151.4s = 151.4s=151.4 ms is very large relative to Xˉ=461.0\bar{X} = 461.0Xˉ=461.0 ms, indicating high variability. The coefficient of variation (CV) puts this in perspective:

CV=sXˉ×100=151.4461.0×100=32.8%CV = \frac{s}{\bar{X}} \times 100 = \frac{151.4}{461.0} \times 100 = 32.8\% CV=Xˉs​×100=461.0151.4​×100=32.8%

A CV above 30% usually signals high variability. In this case, the outlier is the primary driver.

Standard Deviation vs. Standard Error

Students frequently confuse these two:

  • Standard deviation (sss) describes variability in the data — how spread out individual scores are.
  • Standard error of the mean (SESESE) describes variability in the sampling distribution of the mean — how much the sample mean would fluctuate across repeated samples.

SE=snSE = \frac{s}{\sqrt{n}} SE=n​s​

Report SDSDSD when describing your data. Report SESESE (or confidence intervals) when making inferences about the population mean.

Common Mistakes

  1. Reporting the mean for skewed data. If income data are right-skewed, the mean is inflated by high earners. The median provides a more representative picture. Always inspect histograms before choosing your summary statistics.

  2. Confusing SD and SE. Reporting SESESE in a descriptive table makes the variability look artificially small (since SE=s/nSE = s / \sqrt{n}SE=s/n​). APA style requires SDSDSD in descriptive tables unless you are specifically reporting precision of a mean estimate.

  3. Ignoring outliers. A single extreme value can dramatically change the mean and SD. Always check for outliers using box plots or z-scores (values with ∣z∣>3|z| > 3∣z∣>3 are typically flagged).

  4. Reporting too many decimal places. A mean reaction time of 461.00000 ms implies false precision. Generally, report one more decimal place than the original measurement. For whole-number data, one decimal place is sufficient.

  5. Forgetting to report variability. A mean without a measure of spread is incomplete. Saying "the average score was 75" does not tell the reader whether scores ranged from 70 to 80 or from 30 to 100. Always pair a central tendency measure with a variability measure.

  6. Computing the mean of ordinal data. Strictly speaking, you should not average Likert-scale items (e.g., 1--5 ratings) because the intervals between values may not be equal. In practice, researchers often do compute means of Likert-type scales, but this should be done thoughtfully and acknowledged as a limitation.

  7. Using the population formula instead of the sample formula. When computing variance and SD from a sample, always divide by n−1n - 1n−1, not nnn. Dividing by nnn underestimates the population variance.

How to Report in APA Format

In-text

Participants' average reaction time was M=461.0M = 461.0M=461.0 ms (SD=151.4SD = 151.4SD=151.4). Due to positive skew, the median (Mdn=407.5Mdn = 407.5Mdn=407.5 ms) may better represent the typical response.

Descriptive Statistics Table

APA recommends a table for studies with multiple variables:

Table 1

Descriptive Statistics for Study Variables

Variable nnn MMM SDSDSD Min Max Skewness
Reaction time (ms) 10 461.0 151.4 385 880 2.54
Accuracy (%) 10 88.3 6.2 78 97 -0.31

Key formatting guidelines:

  • Use MMM and SDSDSD (italicized) as column headers
  • Report values to one or two decimal places consistently
  • Include nnn, range, and skewness when space permits
  • Note if medians and IQRs are reported instead of means and SDs, and explain why
  • For categorical variables, report frequencies and percentages rather than means

Ready to calculate?

Now that you understand the concept, use the free Effect Size Calculator on Subthesis to run your own analysis.

Calculate Your Effect Size on Subthesis

Related Concepts

Pearson Correlation

Learn how to calculate and interpret the Pearson correlation coefficient (r) to measure the strength and direction of linear relationships between two variables.

Cronbach's Alpha

Understand Cronbach's alpha for measuring internal consistency reliability. Learn the formula, interpretation guidelines, and what to do when alpha is low.

Effect Size

Learn what effect size is, why it matters more than p-values alone, and how to calculate and interpret Cohen's d, Hedges' g, and eta-squared for your research.

Stats for Scholars

Statistics for Researchers, Not Statisticians

A Subthesis Resource

Learn

  • Statistical Concepts
  • Choose a Test
  • APA Reporting
  • Blog

Resources

  • Calculators
  • Cheat Sheets
  • About
  • FAQ
  • Privacy
  • Terms

© 2026 Angel Reyes / Subthesis. All rights reserved.

Privacy Policy Terms of Use