Skip to main content
Stats for Scholars
Concepts Decision Tree Reporting Calculators Blog Software Cheat Sheets
Concepts Decision Tree Reporting Calculators Blog Software Cheat Sheets
Home Concepts Mann-Whitney U Test

Descriptive Statistics

  • Descriptive Statistics

Inferential Statistics

  • Chi-Square Test of Independence
  • Independent Samples t-Test
  • Kruskal-Wallis H Test
  • Logistic Regression
  • Mann-Whitney U Test
  • Multiple Linear Regression
  • One-Way ANOVA
  • Paired Samples t-Test
  • Pearson Correlation
  • Repeated Measures ANOVA
  • Simple Linear Regression
  • Two-Way (Factorial) ANOVA
  • Wilcoxon Signed-Rank Test

Effect Size & Power

  • Effect Size
  • Sample Size Determination
  • Statistical Power & Power Analysis

Reliability & Validity

  • Cronbach's Alpha
  • Inter-Rater Reliability

Mann-Whitney U Test

intermediate Inferential Statistics

Mann-Whitney U Test

Purpose
Compares two independent groups to determine if they differ on an ordinal or continuous variable when the assumptions of the independent t-test are not met.
When to Use
When you have two independent groups and the dependent variable is ordinal, or continuous but not normally distributed, or when sample sizes are very small.
Data Type
Ordinal or continuous dependent variable; binary categorical independent variable
Key Assumptions
Independence of observations, similarly shaped distributions in each group (for interpreting as a difference in medians), at least ordinal measurement level.
Tools
Effect Size Calculator on Subthesis →

What Is the Mann-Whitney U Test?

The Mann-Whitney U test (also called the Wilcoxon rank-sum test) is a non-parametric test that compares two independent groups. It is the rank-based alternative to the independent samples t-test and does not require the dependent variable to be normally distributed.

Instead of comparing means directly, the Mann-Whitney U test ranks all observations from both groups together and then asks whether the ranks are systematically higher in one group than the other. Formally, it tests whether a randomly selected observation from one group tends to be larger (or smaller) than a randomly selected observation from the other group.

The test is especially useful when:

  • Your data are measured on an ordinal scale (e.g., Likert ratings, rankings)
  • Your continuous data are severely skewed or contain outliers
  • Your sample size is too small to rely on the Central Limit Theorem

When to Use It

Use a Mann-Whitney U test when:

  • You have one dependent variable that is at least ordinal
  • You have one categorical independent variable with exactly two independent groups
  • The normality assumption of the independent t-test is violated and sample sizes are small (n<30n < 30n<30 per group)
  • Your data are ranks or ratings rather than true continuous measurements

Examples:

  • Comparing customer satisfaction ratings (1-10 scale) between two stores
  • Comparing pain severity rankings between two treatment groups
  • Comparing income (often heavily skewed) between two regions with small samples

When to stick with the t-test instead: If your data are continuous and approximately normal (or your samples are large), the independent t-test is more powerful. The Mann-Whitney U test sacrifices some statistical power in exchange for fewer assumptions.

Assumptions

  1. Independence of observations. Each participant contributes only one data point, and participants in one group are unrelated to those in the other group.

  2. At least ordinal measurement. The dependent variable must be measured on an ordinal, interval, or ratio scale so that values can be meaningfully ranked.

  3. Similarly shaped distributions. If you want to interpret the result as a difference in medians, both groups must have distributions of the same shape (though they can differ in location). If the shapes differ, the test is still valid but is interpreted as a general test of stochastic dominance — whether values in one group tend to be larger.

  4. Continuous underlying distribution (for no ties). Ideally, there are no tied ranks. In practice, ties are common (especially with ordinal data), and software applies a correction automatically.

Formula

Step 1: Rank all observations. Combine both groups and assign ranks from 1 (smallest) to NNN (largest), where N=n1+n2N = n_1 + n_2N=n1​+n2​. Tied values receive the average of the ranks they would have occupied.

Step 2: Calculate the U statistic for each group.

U1=n1n2+n1(n1+1)2−R1U_1 = n_1 n_2 + \frac{n_1(n_1 + 1)}{2} - R_1 U1​=n1​n2​+2n1​(n1​+1)​−R1​

U2=n1n2+n2(n2+1)2−R2U_2 = n_1 n_2 + \frac{n_2(n_2 + 1)}{2} - R_2 U2​=n1​n2​+2n2​(n2​+1)​−R2​

Where R1R_1R1​ and R2R_2R2​ are the sums of ranks for Group 1 and Group 2. Note that U1+U2=n1×n2U_1 + U_2 = n_1 \times n_2U1​+U2​=n1​×n2​.

The test statistic is U=min⁡(U1,U2)U = \min(U_1, U_2)U=min(U1​,U2​).

Step 3: For large samples (n1,n2>20n_1, n_2 > 20n1​,n2​>20), the U statistic is approximately normal:

z=U−n1n22n1n2(n1+n2+1)12z = \frac{U - \frac{n_1 n_2}{2}}{\sqrt{\frac{n_1 n_2 (n_1 + n_2 + 1)}{12}}} z=12n1​n2​(n1​+n2​+1)​​U−2n1​n2​​​

Effect size (rank-biserial correlation):

rrb=1−2Un1n2r_{rb} = 1 - \frac{2U}{n_1 n_2} rrb​=1−n1​n2​2U​

Where ∣rrb∣|r_{rb}|∣rrb​∣ ranges from 0 to 1. Use the same benchmarks as a correlation: .10 (small), .30 (medium), .50 (large).

Worked Example

Scenario: A retail company wants to compare customer satisfaction ratings (1-10 scale) between Store A and Store B. Because ratings are ordinal and the distributions are skewed, a Mann-Whitney U test is appropriate.

Store A Store B
7 5
8 6
6 4
9 7
8 5
7 3
10 6

n1=7n_1 = 7n1​=7 (Store A), n2=7n_2 = 7n2​=7 (Store B), N=14N = 14N=14.

Step 1: Rank all 14 observations.

Value Group Rank
3 B 1
4 B 2
5 B 3.5
5 B 3.5
6 A 6
6 B 6
6 B 6
7 A 8.5
7 A 8.5
7 B 8.5
8 A 11
8 A 11
9 A 13
10 A 14

Step 2: Sum the ranks for each group.

RA=6+8.5+8.5+11+11+13+14=72R_A = 6 + 8.5 + 8.5 + 11 + 11 + 13 + 14 = 72 RA​=6+8.5+8.5+11+11+13+14=72

RB=1+2+3.5+3.5+6+6+8.5=30.5R_B = 1 + 2 + 3.5 + 3.5 + 6 + 6 + 8.5 = 30.5 RB​=1+2+3.5+3.5+6+6+8.5=30.5

Check: 72+30.5=102.572 + 30.5 = 102.572+30.5=102.5. Expected sum of all ranks: 14×15/2=10514 \times 15 / 2 = 10514×15/2=105. The small discrepancy is a rounding artifact from tied ranks; both should total 105. Recalculating ties more carefully: the three 6s occupy ranks 5, 6, 7 (average = 6); the three 7s occupy ranks 8, 9, 10 (average = 9). Corrected:

RA=6+9+9+11.5+11.5+13+14=74R_A = 6 + 9 + 9 + 11.5 + 11.5 + 13 + 14 = 74 RA​=6+9+9+11.5+11.5+13+14=74

RB=1+2+3.5+3.5+6+6+9=31R_B = 1 + 2 + 3.5 + 3.5 + 6 + 6 + 9 = 31 RB​=1+2+3.5+3.5+6+6+9=31

Check: 74+31=10574 + 31 = 10574+31=105. Correct.

Step 3: Calculate UUU.

UA=7×7+7×82−74=49+28−74=3U_A = 7 \times 7 + \frac{7 \times 8}{2} - 74 = 49 + 28 - 74 = 3 UA​=7×7+27×8​−74=49+28−74=3

UB=7×7+7×82−31=49+28−31=46U_B = 7 \times 7 + \frac{7 \times 8}{2} - 31 = 49 + 28 - 31 = 46 UB​=7×7+27×8​−31=49+28−31=46

U=min⁡(3,46)=3U = \min(3, 46) = 3 U=min(3,46)=3

Step 4: Determine significance.

For n1=n2=7n_1 = n_2 = 7n1​=n2​=7 at α=.05\alpha = .05α=.05 (two-tailed), the critical value of UUU from a Mann-Whitney table is 8. Since U=3<8U = 3 < 8U=3<8, the result is statistically significant.

Using the normal approximation:

z=3−49249×1512=3−24.561.25=−21.57.83=−2.75z = \frac{3 - \frac{49}{2}}{\sqrt{\frac{49 \times 15}{12}}} = \frac{3 - 24.5}{\sqrt{61.25}} = \frac{-21.5}{7.83} = -2.75 z=1249×15​​3−249​​=61.25​3−24.5​=7.83−21.5​=−2.75

This corresponds to p≈.006p \approx .006p≈.006 (two-tailed).

Step 5: Calculate effect size.

rrb=1−2×349=1−0.12=0.88r_{rb} = 1 - \frac{2 \times 3}{49} = 1 - 0.12 = 0.88 rrb​=1−492×3​=1−0.12=0.88

This is a large effect, indicating that satisfaction ratings at Store A are substantially higher than at Store B.

Interpretation

The Mann-Whitney U test revealed a statistically significant difference in customer satisfaction ratings between Store A (Median = 8) and Store B (Median = 5), U=3U = 3U=3, z=−2.75z = -2.75z=−2.75, p=.006p = .006p=.006, rrb=.88r_{rb} = .88rrb​=.88. Store A customers gave significantly higher satisfaction ratings than Store B customers, with a large effect size.

What to consider:

  • The Mann-Whitney U test tells you that one group tends to score higher, but it does not directly estimate how much higher in the original scale units.
  • If the distributions have different shapes (e.g., one is skewed and the other is symmetric), avoid interpreting the result as a difference in medians. Instead, describe it as one group tending to produce higher values than the other.
  • With very small samples, use exact p-values rather than the normal approximation. Most statistical software provides this option.

Common Mistakes

  1. Using it when the t-test is appropriate. If your data are continuous, roughly normal, and reasonably sized, the t-test is more powerful. The Mann-Whitney is not always "safer" — it can be less sensitive to real differences.

  2. Interpreting as a test of medians without checking distribution shape. The Mann-Whitney only compares medians when both group distributions have the same shape. Otherwise, it compares the overall tendency for one group to outscore the other.

  3. Forgetting to report an effect size. The rank-biserial correlation rrbr_{rb}rrb​ or the common-language effect size (probability of superiority) should accompany the test result.

  4. Ignoring tied ranks. Many ordinal variables produce ties. Software applies corrections automatically, but if you compute by hand, use the average-rank method for ties and the tie-corrected z-formula.

  5. Claiming the test requires no assumptions. Non-parametric does not mean assumption-free. Independence and ordinal measurement are still required, and distribution shape matters for interpretation.

How to Run It

```r # Mann-Whitney U test in R (called wilcox.test) wilcox.test(score ~ group, data = mydata, exact = FALSE)

With effect size (rank-biserial correlation)

library(effectsize) rank_biserial(score ~ group, data = mydata)

```python from scipy import stats import pingouin as pg # Using scipy u_stat, p_value = stats.mannwhitneyu(group1, group2, alternative='two-sided') # Using pingouin (includes effect size) result = pg.mwu(group1, group2, alternative='two-sided') print(result) ```
  1. Go to Analyze > Nonparametric Tests > Legacy Dialogs > 2 Independent Samples
  2. Move your dependent variable into the Test Variable List
  3. Move your grouping variable into the Grouping Variable box
  4. Click Define Groups and enter the two group codes (e.g., 1 and 2)
  5. Ensure Mann-Whitney U is checked under Test Type
  6. Click OK

SPSS reports the U statistic, the z-value, and the p-value (two-tailed). Calculate the effect size manually: r = z / √N.

Excel does not have a built-in Mann-Whitney U function. You can compute it manually:

  1. Combine both groups into one column and add a group label column
  2. Use RANK.AVG to rank all values (handles ties automatically)
  3. Use SUMIF to sum ranks for each group
  4. Calculate U using the formula: =n1*n2 + n1*(n1+1)/2 - R1
  5. For a large-sample z-test, compute the z-value and use NORM.S.DIST for the p-value

Alternatively, install the Real Statistics Resource Pack add-in, which provides a dedicated Mann-Whitney test function.

## How to Report in APA Format > A Mann-Whitney U test was conducted to compare customer satisfaction ratings between Store A and Store B. Satisfaction ratings were significantly higher for Store A (Mdn = 8) than for Store B (Mdn = 5), $U$ = 3, $z$ = -2.75, $p$ = .006, $r_{rb}$ = .88. The large effect size indicates that Store A customers consistently rated their satisfaction higher than Store B customers.

Ready to calculate?

Now that you understand the concept, use the free Effect Size Calculator on Subthesis to run your own analysis.

Calculate Effect Size for Your t-Test on Subthesis

Related Concepts

Independent Samples t-Test

Learn how to conduct and interpret an independent samples t-test, including assumptions, formulas, worked examples, and APA reporting guidelines.

Effect Size

Learn what effect size is, why it matters more than p-values alone, and how to calculate and interpret Cohen's d, Hedges' g, and eta-squared for your research.

Sample Size Determination

Learn how to calculate the right sample size for your research study using power analysis, effect size estimates, and practical planning considerations.

Stats for Scholars

Statistics for Researchers, Not Statisticians

A Subthesis Resource

Learn

  • Statistical Concepts
  • Choose a Test
  • APA Reporting
  • Blog

Resources

  • Calculators
  • Cheat Sheets
  • About
  • FAQ
  • Accessibility
  • Privacy
  • Terms

© 2026 Angel Reyes / Subthesis. All rights reserved.

Privacy Policy Terms of Use