Mann-Whitney U Test
Mann-Whitney U Test
What Is the Mann-Whitney U Test?
The Mann-Whitney U test (also called the Wilcoxon rank-sum test) is a non-parametric test that compares two independent groups. It is the rank-based alternative to the independent samples t-test and does not require the dependent variable to be normally distributed.
Instead of comparing means directly, the Mann-Whitney U test ranks all observations from both groups together and then asks whether the ranks are systematically higher in one group than the other. Formally, it tests whether a randomly selected observation from one group tends to be larger (or smaller) than a randomly selected observation from the other group.
The test is especially useful when:
- Your data are measured on an ordinal scale (e.g., Likert ratings, rankings)
- Your continuous data are severely skewed or contain outliers
- Your sample size is too small to rely on the Central Limit Theorem
When to Use It
Use a Mann-Whitney U test when:
- You have one dependent variable that is at least ordinal
- You have one categorical independent variable with exactly two independent groups
- The normality assumption of the independent t-test is violated and sample sizes are small ( per group)
- Your data are ranks or ratings rather than true continuous measurements
Examples:
- Comparing customer satisfaction ratings (1-10 scale) between two stores
- Comparing pain severity rankings between two treatment groups
- Comparing income (often heavily skewed) between two regions with small samples
When to stick with the t-test instead: If your data are continuous and approximately normal (or your samples are large), the independent t-test is more powerful. The Mann-Whitney U test sacrifices some statistical power in exchange for fewer assumptions.
Assumptions
-
Independence of observations. Each participant contributes only one data point, and participants in one group are unrelated to those in the other group.
-
At least ordinal measurement. The dependent variable must be measured on an ordinal, interval, or ratio scale so that values can be meaningfully ranked.
-
Similarly shaped distributions. If you want to interpret the result as a difference in medians, both groups must have distributions of the same shape (though they can differ in location). If the shapes differ, the test is still valid but is interpreted as a general test of stochastic dominance — whether values in one group tend to be larger.
-
Continuous underlying distribution (for no ties). Ideally, there are no tied ranks. In practice, ties are common (especially with ordinal data), and software applies a correction automatically.
Formula
Step 1: Rank all observations. Combine both groups and assign ranks from 1 (smallest) to (largest), where . Tied values receive the average of the ranks they would have occupied.
Step 2: Calculate the U statistic for each group.
Where and are the sums of ranks for Group 1 and Group 2. Note that .
The test statistic is .
Step 3: For large samples (), the U statistic is approximately normal:
Effect size (rank-biserial correlation):
Where ranges from 0 to 1. Use the same benchmarks as a correlation: .10 (small), .30 (medium), .50 (large).
Worked Example
Scenario: A retail company wants to compare customer satisfaction ratings (1-10 scale) between Store A and Store B. Because ratings are ordinal and the distributions are skewed, a Mann-Whitney U test is appropriate.
| Store A | Store B |
|---|---|
| 7 | 5 |
| 8 | 6 |
| 6 | 4 |
| 9 | 7 |
| 8 | 5 |
| 7 | 3 |
| 10 | 6 |
(Store A), (Store B), .
Step 1: Rank all 14 observations.
| Value | Group | Rank |
|---|---|---|
| 3 | B | 1 |
| 4 | B | 2 |
| 5 | B | 3.5 |
| 5 | B | 3.5 |
| 6 | A | 6 |
| 6 | B | 6 |
| 6 | B | 6 |
| 7 | A | 8.5 |
| 7 | A | 8.5 |
| 7 | B | 8.5 |
| 8 | A | 11 |
| 8 | A | 11 |
| 9 | A | 13 |
| 10 | A | 14 |
Step 2: Sum the ranks for each group.
Check: . Expected sum of all ranks: . The small discrepancy is a rounding artifact from tied ranks; both should total 105. Recalculating ties more carefully: the three 6s occupy ranks 5, 6, 7 (average = 6); the three 7s occupy ranks 8, 9, 10 (average = 9). Corrected:
Check: . Correct.
Step 3: Calculate .
Step 4: Determine significance.
For at (two-tailed), the critical value of from a Mann-Whitney table is 8. Since , the result is statistically significant.
Using the normal approximation:
This corresponds to (two-tailed).
Step 5: Calculate effect size.
This is a large effect, indicating that satisfaction ratings at Store A are substantially higher than at Store B.
Interpretation
The Mann-Whitney U test revealed a statistically significant difference in customer satisfaction ratings between Store A (Median = 8) and Store B (Median = 5), , , , . Store A customers gave significantly higher satisfaction ratings than Store B customers, with a large effect size.
What to consider:
- The Mann-Whitney U test tells you that one group tends to score higher, but it does not directly estimate how much higher in the original scale units.
- If the distributions have different shapes (e.g., one is skewed and the other is symmetric), avoid interpreting the result as a difference in medians. Instead, describe it as one group tending to produce higher values than the other.
- With very small samples, use exact p-values rather than the normal approximation. Most statistical software provides this option.
Common Mistakes
-
Using it when the t-test is appropriate. If your data are continuous, roughly normal, and reasonably sized, the t-test is more powerful. The Mann-Whitney is not always "safer" — it can be less sensitive to real differences.
-
Interpreting as a test of medians without checking distribution shape. The Mann-Whitney only compares medians when both group distributions have the same shape. Otherwise, it compares the overall tendency for one group to outscore the other.
-
Forgetting to report an effect size. The rank-biserial correlation or the common-language effect size (probability of superiority) should accompany the test result.
-
Ignoring tied ranks. Many ordinal variables produce ties. Software applies corrections automatically, but if you compute by hand, use the average-rank method for ties and the tie-corrected z-formula.
-
Claiming the test requires no assumptions. Non-parametric does not mean assumption-free. Independence and ordinal measurement are still required, and distribution shape matters for interpretation.
How to Run It
With effect size (rank-biserial correlation)
library(effectsize) rank_biserial(score ~ group, data = mydata)
```python
from scipy import stats
import pingouin as pg
# Using scipy
u_stat, p_value = stats.mannwhitneyu(group1, group2, alternative='two-sided')
# Using pingouin (includes effect size)
result = pg.mwu(group1, group2, alternative='two-sided')
print(result)
```
Go to Analyze > Nonparametric Tests > Legacy Dialogs > 2 Independent Samples
Move your dependent variable into the Test Variable List
Move your grouping variable into the Grouping Variable box
Click Define Groups and enter the two group codes (e.g., 1 and 2)
Ensure Mann-Whitney U is checked under Test Type
Click OK
SPSS reports the U statistic, the z-value, and the p-value (two-tailed). Calculate the effect size manually: r = z / √N.
Excel does not have a built-in Mann-Whitney U function. You can compute it manually:
Combine both groups into one column and add a group label column
Use RANK.AVG to rank all values (handles ties automatically)
Use SUMIF to sum ranks for each group
Calculate U using the formula: =n1*n2 + n1*(n1+1)/2 - R1
For a large-sample z-test, compute the z-value and use NORM.S.DIST for the p-value
Alternatively, install the Real Statistics Resource Pack add-in, which provides a dedicated Mann-Whitney test function.
Ready to calculate?
Now that you understand the concept, use the free Effect Size Calculator on Subthesis to run your own analysis.
Related Concepts
Independent Samples t-Test
Learn how to conduct and interpret an independent samples t-test, including assumptions, formulas, worked examples, and APA reporting guidelines.
Effect Size
Learn what effect size is, why it matters more than p-values alone, and how to calculate and interpret Cohen's d, Hedges' g, and eta-squared for your research.
Sample Size Determination
Learn how to calculate the right sample size for your research study using power analysis, effect size estimates, and practical planning considerations.