Mann-Whitney U Test

Purpose

Compares two independent groups to determine if they differ on an ordinal or continuous variable when the assumptions of the independent t-test are not met.

When to Use

When you have two independent groups and the dependent variable is ordinal, or continuous but not normally distributed, or when sample sizes are very small.

Data Type

Ordinal or continuous dependent variable; binary categorical independent variable

Key Assumptions

Independence of observations, similarly shaped distributions in each group (for interpreting as a difference in medians), at least ordinal measurement level.

Tools

Effect Size Calculator on Subthesis →

What Is the Mann-Whitney U Test?

The Mann-Whitney U test (also called the Wilcoxon rank-sum test) is a non-parametric test that compares two independent groups. It is the rank-based alternative to the independent samples t-test and does not require the dependent variable to be normally distributed.

Instead of comparing means directly, the Mann-Whitney U test ranks all observations from both groups together and then asks whether the ranks are systematically higher in one group than the other. Formally, it tests whether a randomly selected observation from one group tends to be larger (or smaller) than a randomly selected observation from the other group.

The test is especially useful when:

Your data are measured on an ordinal scale (e.g., Likert ratings, rankings)
Your continuous data are severely skewed or contain outliers
Your sample size is too small to rely on the Central Limit Theorem

When to Use It

Use a Mann-Whitney U test when:

You have one dependent variable that is at least ordinal
You have one categorical independent variable with exactly two independent groups
The normality assumption of the independent t-test is violated and sample sizes are small ( $n < 30$ per group)
Your data are ranks or ratings rather than true continuous measurements

Examples:

Comparing customer satisfaction ratings (1-10 scale) between two stores
Comparing pain severity rankings between two treatment groups
Comparing income (often heavily skewed) between two regions with small samples

When to stick with the t-test instead: If your data are continuous and approximately normal (or your samples are large), the independent t-test is more powerful. The Mann-Whitney U test sacrifices some statistical power in exchange for fewer assumptions.

Assumptions

Independence of observations. Each participant contributes only one data point, and participants in one group are unrelated to those in the other group.
At least ordinal measurement. The dependent variable must be measured on an ordinal, interval, or ratio scale so that values can be meaningfully ranked.
Similarly shaped distributions. If you want to interpret the result as a difference in medians, both groups must have distributions of the same shape (though they can differ in location). If the shapes differ, the test is still valid but is interpreted as a general test of stochastic dominance — whether values in one group tend to be larger.
Continuous underlying distribution (for no ties). Ideally, there are no tied ranks. In practice, ties are common (especially with ordinal data), and software applies a correction automatically.

Formula

Step 1: Rank all observations. Combine both groups and assign ranks from 1 (smallest) to $N$ (largest), where $N = n_1 + n_2$ . Tied values receive the average of the ranks they would have occupied.

Step 2: Calculate the U statistic for each group.

U_1 = n_1 n_2 + \frac{n_1(n_1 + 1)}{2} - R_1

U_2 = n_1 n_2 + \frac{n_2(n_2 + 1)}{2} - R_2

Where $R_1$ and $R_2$ are the sums of ranks for Group 1 and Group 2. Note that $U_1 + U_2 = n_1 \times n_2$ .

The test statistic is $U = \min(U_1, U_2)$ .

Step 3: For large samples ( $n_1, n_2 > 20$ ), the U statistic is approximately normal:

z = \frac{U - \frac{n_1 n_2}{2}}{\sqrt{\frac{n_1 n_2 (n_1 + n_2 + 1)}{12}}}

Effect size (rank-biserial correlation):

r_{rb} = 1 - \frac{2U}{n_1 n_2}

Where $|r_{rb}|$ ranges from 0 to 1. Use the same benchmarks as a correlation: .10 (small), .30 (medium), .50 (large).

Worked Example

Scenario: A retail company wants to compare customer satisfaction ratings (1-10 scale) between Store A and Store B. Because ratings are ordinal and the distributions are skewed, a Mann-Whitney U test is appropriate.

Store A	Store B
7	5
8	6
6	4
9	7
8	5
7	3
10	6

$n_1 = 7$ (Store A), $n_2 = 7$ (Store B), $N = 14$ .

Step 1: Rank all 14 observations.

Value	Group	Rank
3	B	1
4	B	2
5	B	3.5
5	B	3.5
6	A	6
6	B	6
6	B	6
7	A	8.5
7	A	8.5
7	B	8.5
8	A	11
8	A	11
9	A	13
10	A	14

Step 2: Sum the ranks for each group.

R_A = 6 + 8.5 + 8.5 + 11 + 11 + 13 + 14 = 72

R_B = 1 + 2 + 3.5 + 3.5 + 6 + 6 + 8.5 = 30.5

Check: $72 + 30.5 = 102.5$ . Expected sum of all ranks: $14 \times 15 / 2 = 105$ . The small discrepancy is a rounding artifact from tied ranks; both should total 105. Recalculating ties more carefully: the three 6s occupy ranks 5, 6, 7 (average = 6); the three 7s occupy ranks 8, 9, 10 (average = 9). Corrected:

R_A = 6 + 9 + 9 + 11.5 + 11.5 + 13 + 14 = 74

R_B = 1 + 2 + 3.5 + 3.5 + 6 + 6 + 9 = 31

Check: $74 + 31 = 105$ . Correct.

Step 3: Calculate $U$ .

U_A = 7 \times 7 + \frac{7 \times 8}{2} - 74 = 49 + 28 - 74 = 3

U_B = 7 \times 7 + \frac{7 \times 8}{2} - 31 = 49 + 28 - 31 = 46

U = \min(3, 46) = 3

Step 4: Determine significance.

For $n_1 = n_2 = 7$ at $\alpha = .05$ (two-tailed), the critical value of $U$ from a Mann-Whitney table is 8. Since $U = 3 < 8$ , the result is statistically significant.

Using the normal approximation:

z = \frac{3 - \frac{49}{2}}{\sqrt{\frac{49 \times 15}{12}}} = \frac{3 - 24.5}{\sqrt{61.25}} = \frac{-21.5}{7.83} = -2.75

This corresponds to $p \approx .006$ (two-tailed).

Step 5: Calculate effect size.

r_{rb} = 1 - \frac{2 \times 3}{49} = 1 - 0.12 = 0.88

This is a large effect, indicating that satisfaction ratings at Store A are substantially higher than at Store B.

Interpretation

The Mann-Whitney U test revealed a statistically significant difference in customer satisfaction ratings between Store A (Median = 8) and Store B (Median = 5), $U = 3$ , $z = -2.75$ , $p = .006$ , $r_{rb} = .88$ . Store A customers gave significantly higher satisfaction ratings than Store B customers, with a large effect size.

What to consider:

The Mann-Whitney U test tells you that one group tends to score higher, but it does not directly estimate how much higher in the original scale units.
If the distributions have different shapes (e.g., one is skewed and the other is symmetric), avoid interpreting the result as a difference in medians. Instead, describe it as one group tending to produce higher values than the other.
With very small samples, use exact p-values rather than the normal approximation. Most statistical software provides this option.

Common Mistakes

Using it when the t-test is appropriate. If your data are continuous, roughly normal, and reasonably sized, the t-test is more powerful. The Mann-Whitney is not always "safer" — it can be less sensitive to real differences.
Interpreting as a test of medians without checking distribution shape. The Mann-Whitney only compares medians when both group distributions have the same shape. Otherwise, it compares the overall tendency for one group to outscore the other.
Forgetting to report an effect size. The rank-biserial correlation $r_{rb}$ or the common-language effect size (probability of superiority) should accompany the test result.
Ignoring tied ranks. Many ordinal variables produce ties. Software applies corrections automatically, but if you compute by hand, use the average-rank method for ties and the tie-corrected z-formula.
Claiming the test requires no assumptions. Non-parametric does not mean assumption-free. Independence and ordinal measurement are still required, and distribution shape matters for interpretation.

How to Run It

```r # Mann-Whitney U test in R (called wilcox.test) wilcox.test(score ~ group, data = mydata, exact = FALSE)

With effect size (rank-biserial correlation)

library(effectsize) rank_biserial(score ~ group, data = mydata)

```python
from scipy import stats
import pingouin as pg

# Using scipy
u_stat, p_value = stats.mannwhitneyu(group1, group2, alternative='two-sided')

# Using pingouin (includes effect size)
result = pg.mwu(group1, group2, alternative='two-sided')
print(result)
```


Go to Analyze > Nonparametric Tests > Legacy Dialogs > 2 Independent Samples
Move your dependent variable into the Test Variable List
Move your grouping variable into the Grouping Variable box
Click Define Groups and enter the two group codes (e.g., 1 and 2)
Ensure Mann-Whitney U is checked under Test Type
Click OK

SPSS reports the U statistic, the z-value, and the p-value (two-tailed). Calculate the effect size manually: r = z / √N.


Excel does not have a built-in Mann-Whitney U function. You can compute it manually:

Combine both groups into one column and add a group label column
Use RANK.AVG to rank all values (handles ties automatically)
Use SUMIF to sum ranks for each group
Calculate U using the formula: =n1*n2 + n1*(n1+1)/2 - R1
For a large-sample z-test, compute the z-value and use NORM.S.DIST for the p-value

Alternatively, install the Real Statistics Resource Pack add-in, which provides a dedicated Mann-Whitney test function.



## How to Report in APA Format

> A Mann-Whitney U test was conducted to compare customer satisfaction ratings between Store A and Store B. Satisfaction ratings were significantly higher for Store A (Mdn = 8) than for Store B (Mdn = 5), $U$ = 3, $z$ = -2.75, $p$ = .006, $r_{rb}$ = .88. The large effect size indicates that Store A customers consistently rated their satisfaction higher than Store B customers.

Ready to calculate?

Now that you understand the concept, use the free Effect Size Calculator on Subthesis to run your own analysis.

Calculate Effect Size for Your t-Test on Subthesis

Mann-Whitney U Test

Mann-Whitney U Test

What Is the Mann-Whitney U Test?

When to Use It

Assumptions

Formula

Worked Example

Interpretation

Common Mistakes

How to Run It

With effect size (rank-biserial correlation)

Related Concepts

Independent Samples t-Test

Effect Size

Sample Size Determination