Kruskal-Wallis H Test
Kruskal-Wallis H Test
What Is the Kruskal-Wallis H Test?
The Kruskal-Wallis H test (also called the Kruskal-Wallis one-way analysis of variance by ranks) is a non-parametric test that compares three or more independent groups. It is the rank-based alternative to the one-way ANOVA and extends the Mann-Whitney U test to more than two groups.
Like other rank-based tests, the Kruskal-Wallis test works by ranking all observations from all groups together and then testing whether the average ranks differ significantly across groups. If one group tends to have consistently higher (or lower) values, its average rank will deviate from the overall average rank.
The test produces an H statistic, which follows an approximate chi-square distribution. A large H indicates that at least one group differs from the others.
When to Use It
Use a Kruskal-Wallis H test when:
- You have one dependent variable that is at least ordinal
- You have one categorical independent variable with three or more independent groups
- The normality or homogeneity of variances assumption of one-way ANOVA is violated
- Your sample sizes are small and you cannot rely on the robustness of ANOVA
Examples:
- Comparing anxiety scores across three therapy types (CBT, psychodynamic, medication)
- Comparing customer satisfaction rankings across four product brands
- Comparing pain levels across three dosage groups when data are ordinal
When to use one-way ANOVA instead: If your data are continuous, approximately normal within each group, and the variances are similar, one-way ANOVA has more statistical power. With large, balanced samples ( per group), ANOVA is robust even to moderate violations.
Assumptions
-
Independence of observations. Each participant contributes one data point, and participants in different groups are unrelated.
-
At least ordinal measurement. The dependent variable must be rankable.
-
Similarly shaped distributions. For interpreting the result as a comparison of medians, the distributions in all groups should have the same shape (but may differ in central tendency). If the shapes differ, the Kruskal-Wallis test is still valid but is interpreted as a test of whether the groups differ in their overall distribution of ranks.
Formula
Step 1: Rank all observations. Combine all groups and assign ranks from 1 to , where is the total sample size. Tied values receive average ranks.
Step 2: Calculate the H statistic.
Where:
- = number of groups
- = sample size in group
- = sum of ranks in group
- = total sample size
Tie correction: When there are tied ranks, divide by:
Where is the number of tied observations in the $i$th group of ties.
Degrees of freedom:
Under , the test statistic approximately follows a distribution with degrees of freedom (provided each group has at least 5 observations).
Effect size (epsilon-squared):
This ranges from 0 to 1 and represents the proportion of variance in ranks explained by group membership. An alternative is eta-squared based on H:
Worked Example
Scenario: A clinical psychologist compares anxiety scores (measured on a 0-20 ordinal self-report scale) across three therapy types: Cognitive-Behavioral Therapy (CBT), Psychodynamic Therapy, and Medication Only. Each group has 6 patients, and the outcome is measured after 12 weeks of treatment.
| CBT | Psychodynamic | Medication |
|---|---|---|
| 5 | 9 | 12 |
| 3 | 11 | 14 |
| 7 | 8 | 10 |
| 4 | 10 | 15 |
| 6 | 7 | 13 |
| 2 | 12 | 11 |
, .
Step 1: Rank all 18 observations.
| Value | Group | Rank |
|---|---|---|
| 2 | CBT | 1 |
| 3 | CBT | 2 |
| 4 | CBT | 3 |
| 5 | CBT | 4 |
| 6 | CBT | 5 |
| 7 | CBT/Psych | 6.5 |
| 7 | Psych | 6.5 |
| 8 | Psych | 8 |
| 9 | Psych | 9 |
| 10 | Psych | 10 |
| 11 | Psych/Med | 11.5 |
| 11 | Med | 11.5 |
| 12 | CBT/Psych/Med | 13.5 |
| 12 | Med | 13.5 |
| 13 | Med | 15 |
| 14 | Med | 16 |
| 15 | Med | 17 |
Wait — let us be precise. Sorting all 18 values: 2, 3, 4, 5, 6, 7, 7, 8, 9, 10, 10, 11, 11, 12, 12, 13, 14, 15.
Correcting — each group contributes exactly 6 values. The combined sorted data:
| Rank | Value | Group |
|---|---|---|
| 1 | 2 | CBT |
| 2 | 3 | CBT |
| 3 | 4 | CBT |
| 4 | 5 | CBT |
| 5 | 6 | CBT |
| 6.5 | 7 | CBT |
| 6.5 | 7 | Psych |
| 8 | 8 | Psych |
| 9 | 9 | Psych |
| 10.5 | 10 | Psych |
| 10.5 | 10 | Med |
| 12.5 | 11 | Psych |
| 12.5 | 11 | Med |
| 14 | 12 | Psych |
| 15 | 13 | Med |
| 16 | 14 | Med |
| 17 | 15 | Med |
That is only 17. The value 12 appears in both Med and Psych. Let us list all 18 values:
CBT: 2, 3, 4, 5, 6, 7. Psych: 7, 8, 9, 10, 11, 12. Med: 10, 11, 12, 13, 14, 15.
Sorted: 2, 3, 4, 5, 6, 7, 7, 8, 9, 10, 10, 11, 11, 12, 12, 13, 14, 15.
| Rank | Value | Group |
|---|---|---|
| 1 | 2 | CBT |
| 2 | 3 | CBT |
| 3 | 4 | CBT |
| 4 | 5 | CBT |
| 5 | 6 | CBT |
| 6.5 | 7 | CBT |
| 6.5 | 7 | Psych |
| 8 | 8 | Psych |
| 9 | 9 | Psych |
| 10.5 | 10 | Psych |
| 10.5 | 10 | Med |
| 12.5 | 11 | Psych |
| 12.5 | 11 | Med |
| 14.5 | 12 | Psych |
| 14.5 | 12 | Med |
| 16 | 13 | Med |
| 17 | 14 | Med |
| 18 | 15 | Med |
Step 2: Sum the ranks for each group.
Check: . Correct.
Step 3: Calculate the H statistic.
Step 4: Determine the p-value.
With , we compare to the chi-square distribution. The critical value at with is 5.99. Since , the result is statistically significant ().
Step 5: Calculate effect size.
This is a large effect — group membership explains approximately 75% of the variance in ranks.
Post-Hoc Tests
A significant Kruskal-Wallis test tells you that at least one group differs, but not which specific groups differ. Follow up with pairwise Mann-Whitney U tests using a Bonferroni correction:
Where is the number of pairwise comparisons. With 3 groups: , so .
An alternative is Dunn's test, which is specifically designed as a post-hoc test for the Kruskal-Wallis and uses the rank sums from the omnibus test rather than re-ranking within each pair.
Interpretation
The Kruskal-Wallis test revealed a statistically significant difference in anxiety scores across the three therapy types, , , .
Post-hoc pairwise comparisons with Bonferroni correction would likely show:
- CBT (Mdn = 4.5) < Psychodynamic (Mdn = 9.5),
- CBT (Mdn = 4.5) < Medication (Mdn = 12.5),
- Psychodynamic (Mdn = 9.5) < Medication (Mdn = 12.5),
The CBT group had the lowest post-treatment anxiety scores, followed by the Psychodynamic group, with the Medication Only group having the highest anxiety.
Common Mistakes
-
Stopping at the omnibus test. A significant H tells you that at least one group differs. You must run post-hoc pairwise tests (e.g., Dunn's test or Bonferroni-corrected Mann-Whitney U tests) to identify which groups differ.
-
Running multiple Mann-Whitney tests without correction. Performing all pairwise comparisons at inflates the Type I error rate, just as running multiple t-tests does. Apply a correction (Bonferroni or use Dunn's test).
-
Interpreting as a test of medians when distributions differ in shape. If one group is skewed and another is symmetric, the Kruskal-Wallis test may be significant even when medians are identical. It is technically a test of the distribution of ranks.
-
Using it with very small groups. The chi-square approximation for is unreliable when any group has fewer than 5 observations. Use the exact test in such cases.
-
Forgetting to report an effect size. Report or alongside , , and for a complete picture.
-
Using Kruskal-Wallis for repeated measures. If the same participants are measured across three or more conditions, use the Friedman test instead (the non-parametric equivalent of repeated-measures ANOVA).
How to Run It
Post-hoc pairwise comparisons (Dunn's test)
library(dunn.test) dunn.test(mydata$score, mydata$group, method = "bonferroni")
Effect size (epsilon-squared)
library(effectsize) rank_epsilon_squared(score ~ group, data = mydata)
```python
from scipy import stats
import scikit_posthocs as sp
# Kruskal-Wallis test
h_stat, p_value = stats.kruskal(group1, group2, group3)
print(f"H = {h_stat:.2f}, p = {p_value:.4f}")
# Post-hoc Dunn's test with Bonferroni correction
dunn = sp.posthoc_dunn([group1, group2, group3], p_adjust='bonferroni')
print(dunn)
```
Go to Analyze > Nonparametric Tests > Legacy Dialogs > K Independent Samples
Move your dependent variable into the Test Variable List
Move your grouping variable into the Grouping Variable box
Click Define Range and enter the minimum and maximum group codes
Ensure Kruskal-Wallis H is checked
Click OK
SPSS reports the H statistic, degrees of freedom, and asymptotic p-value. For post-hoc tests, use Analyze > Nonparametric Tests > Independent Samples (the newer dialog), which offers pairwise comparisons automatically.
Excel does not have a built-in Kruskal-Wallis test. To compute it manually:
Combine all data into one column with a group label column
Use RANK.AVG to rank all values
Use SUMIF and COUNTIF to calculate the rank sum and sample size for each group
Apply the H formula: =12/(N*(N+1)) * (SUM(Rj^2/nj)) - 3*(N+1)
Use CHISQ.DIST.RT(H, df) to obtain the p-value
For a more automated approach, install the Real Statistics Resource Pack add-in, which includes a dedicated Kruskal-Wallis function with post-hoc tests.
Ready to calculate?
Now that you understand the concept, use the free Effect Size Calculator on Subthesis to run your own analysis.
Related Concepts
One-Way ANOVA
Learn how to conduct a one-way ANOVA to compare three or more group means, including F-ratio formulas, post-hoc tests, and effect size with eta-squared.
Effect Size
Learn what effect size is, why it matters more than p-values alone, and how to calculate and interpret Cohen's d, Hedges' g, and eta-squared for your research.