Kruskal-Wallis H Test

Purpose

Tests whether three or more independent groups differ on an ordinal or continuous variable when the assumptions of one-way ANOVA are not met.

When to Use

When you have three or more independent groups and the dependent variable is ordinal or not normally distributed.

Data Type

Ordinal or continuous dependent variable; categorical independent variable with 3+ groups

Key Assumptions

Independence of observations, similarly shaped distributions in each group, at least ordinal measurement level.

Tools

Effect Size Calculator on Subthesis →

What Is the Kruskal-Wallis H Test?

The Kruskal-Wallis H test (also called the Kruskal-Wallis one-way analysis of variance by ranks) is a non-parametric test that compares three or more independent groups. It is the rank-based alternative to the one-way ANOVA and extends the Mann-Whitney U test to more than two groups.

Like other rank-based tests, the Kruskal-Wallis test works by ranking all observations from all groups together and then testing whether the average ranks differ significantly across groups. If one group tends to have consistently higher (or lower) values, its average rank will deviate from the overall average rank.

The test produces an H statistic, which follows an approximate chi-square distribution. A large H indicates that at least one group differs from the others.

When to Use It

Use a Kruskal-Wallis H test when:

You have one dependent variable that is at least ordinal
You have one categorical independent variable with three or more independent groups
The normality or homogeneity of variances assumption of one-way ANOVA is violated
Your sample sizes are small and you cannot rely on the robustness of ANOVA

Examples:

Comparing anxiety scores across three therapy types (CBT, psychodynamic, medication)
Comparing customer satisfaction rankings across four product brands
Comparing pain levels across three dosage groups when data are ordinal

When to use one-way ANOVA instead: If your data are continuous, approximately normal within each group, and the variances are similar, one-way ANOVA has more statistical power. With large, balanced samples ( $n \geq 25$ per group), ANOVA is robust even to moderate violations.

Assumptions

Independence of observations. Each participant contributes one data point, and participants in different groups are unrelated.
At least ordinal measurement. The dependent variable must be rankable.
Similarly shaped distributions. For interpreting the result as a comparison of medians, the distributions in all groups should have the same shape (but may differ in central tendency). If the shapes differ, the Kruskal-Wallis test is still valid but is interpreted as a test of whether the groups differ in their overall distribution of ranks.

Formula

Step 1: Rank all observations. Combine all groups and assign ranks from 1 to $N$ , where $N = \sum n_j$ is the total sample size. Tied values receive average ranks.

Step 2: Calculate the H statistic.

H = \frac{12}{N(N+1)} \sum_{j=1}^{k} \frac{R_j^2}{n_j} - 3(N+1)

Where:

$k$ = number of groups
$n_j$ = sample size in group $j$
$R_j$ = sum of ranks in group $j$
$N$ = total sample size

Tie correction: When there are tied ranks, divide $H$ by:

1 - \frac{\sum (t_i^3 - t_i)}{N^3 - N}

Where $t_i$ is the number of tied observations in the $i$th group of ties.

Degrees of freedom: $df = k - 1$

Under $H_0$ , the test statistic $H$ approximately follows a $\chi^2$ distribution with $k - 1$ degrees of freedom (provided each group has at least 5 observations).

Effect size (epsilon-squared):

\epsilon^2 = \frac{H}{(N^2 - 1)/(N + 1)} = \frac{H(N + 1)}{N^2 - 1}

This ranges from 0 to 1 and represents the proportion of variance in ranks explained by group membership. An alternative is eta-squared based on H:

\eta^2_H = \frac{H - k + 1}{N - k}

Worked Example

Scenario: A clinical psychologist compares anxiety scores (measured on a 0-20 ordinal self-report scale) across three therapy types: Cognitive-Behavioral Therapy (CBT), Psychodynamic Therapy, and Medication Only. Each group has 6 patients, and the outcome is measured after 12 weeks of treatment.

CBT	Psychodynamic	Medication
5	9	12
3	11	14
7	8	10
4	10	15
6	7	13
2	12	11

$n_1 = n_2 = n_3 = 6$ , $N = 18$ .

Step 1: Rank all 18 observations.

Value	Group	Rank
2	CBT	1
3	CBT	2
4	CBT	3
5	CBT	4
6	CBT	5
7	CBT/Psych	6.5
7	Psych	6.5
8	Psych	8
9	Psych	9
10	Psych	10
11	Psych/Med	11.5
11	Med	11.5
12	CBT/Psych/Med	13.5
12	Med	13.5
13	Med	15
14	Med	16
15	Med	17

Wait — let us be precise. Sorting all 18 values: 2, 3, 4, 5, 6, 7, 7, 8, 9, 10, 10, 11, 11, 12, 12, 13, 14, 15.

Correcting — each group contributes exactly 6 values. The combined sorted data:

Rank	Value	Group
1	2	CBT
2	3	CBT
3	4	CBT
4	5	CBT
5	6	CBT
6.5	7	CBT
6.5	7	Psych
8	8	Psych
9	9	Psych
10.5	10	Psych
10.5	10	Med
12.5	11	Psych
12.5	11	Med
14	12	Psych
15	13	Med
16	14	Med
17	15	Med

That is only 17. The value 12 appears in both Med and Psych. Let us list all 18 values:

CBT: 2, 3, 4, 5, 6, 7. Psych: 7, 8, 9, 10, 11, 12. Med: 10, 11, 12, 13, 14, 15.

Sorted: 2, 3, 4, 5, 6, 7, 7, 8, 9, 10, 10, 11, 11, 12, 12, 13, 14, 15.

Rank	Value	Group
1	2	CBT
2	3	CBT
3	4	CBT
4	5	CBT
5	6	CBT
6.5	7	CBT
6.5	7	Psych
8	8	Psych
9	9	Psych
10.5	10	Psych
10.5	10	Med
12.5	11	Psych
12.5	11	Med
14.5	12	Psych
14.5	12	Med
16	13	Med
17	14	Med
18	15	Med

Step 2: Sum the ranks for each group.

R_{\text{CBT}} = 1 + 2 + 3 + 4 + 5 + 6.5 = 21.5

R_{\text{Psych}} = 6.5 + 8 + 9 + 10.5 + 12.5 + 14.5 = 61

R_{\text{Med}} = 10.5 + 12.5 + 14.5 + 16 + 17 + 18 = 88.5

Check: $21.5 + 61 + 88.5 = 171 = 18 \times 19 / 2$ . Correct.

Step 3: Calculate the H statistic.

H = \frac{12}{18 \times 19}\left(\frac{21.5^2}{6} + \frac{61^2}{6} + \frac{88.5^2}{6}\right) - 3(19)

= \frac{12}{342}\left(\frac{462.25}{6} + \frac{3721}{6} + \frac{7832.25}{6}\right) - 57

= 0.03509 \times (77.04 + 620.17 + 1305.38) - 57

= 0.03509 \times 2002.58 - 57

= 70.27 - 57 = 13.27

Step 4: Determine the p-value.

With $df = 3 - 1 = 2$ , we compare $H = 13.27$ to the chi-square distribution. The critical value at $\alpha = .05$ with $df = 2$ is 5.99. Since $13.27 > 5.99$ , the result is statistically significant ( $p = .001$ ).

Step 5: Calculate effect size.

\eta^2_H = \frac{H - k + 1}{N - k} = \frac{13.27 - 2}{18 - 3} = \frac{11.27}{15} = 0.75

This is a large effect — group membership explains approximately 75% of the variance in ranks.

Post-Hoc Tests

A significant Kruskal-Wallis test tells you that at least one group differs, but not which specific groups differ. Follow up with pairwise Mann-Whitney U tests using a Bonferroni correction:

\alpha_{\text{adjusted}} = \frac{.05}{m}

Where $m$ is the number of pairwise comparisons. With 3 groups: $m = 3$ , so $\alpha_{\text{adjusted}} = .05/3 = .0167$ .

An alternative is Dunn's test, which is specifically designed as a post-hoc test for the Kruskal-Wallis and uses the rank sums from the omnibus test rather than re-ranking within each pair.

Interpretation

The Kruskal-Wallis test revealed a statistically significant difference in anxiety scores across the three therapy types, $H(2) = 13.27$ , $p = .001$ , $\eta^2_H = .75$ .

Post-hoc pairwise comparisons with Bonferroni correction would likely show:

CBT (Mdn = 4.5) < Psychodynamic (Mdn = 9.5), $p < .05$
CBT (Mdn = 4.5) < Medication (Mdn = 12.5), $p < .01$
Psychodynamic (Mdn = 9.5) < Medication (Mdn = 12.5), $p < .05$

The CBT group had the lowest post-treatment anxiety scores, followed by the Psychodynamic group, with the Medication Only group having the highest anxiety.

Common Mistakes

Stopping at the omnibus test. A significant H tells you that at least one group differs. You must run post-hoc pairwise tests (e.g., Dunn's test or Bonferroni-corrected Mann-Whitney U tests) to identify which groups differ.
Running multiple Mann-Whitney tests without correction. Performing all pairwise comparisons at $\alpha = .05$ inflates the Type I error rate, just as running multiple t-tests does. Apply a correction (Bonferroni or use Dunn's test).
Interpreting as a test of medians when distributions differ in shape. If one group is skewed and another is symmetric, the Kruskal-Wallis test may be significant even when medians are identical. It is technically a test of the distribution of ranks.
Using it with very small groups. The chi-square approximation for $H$ is unreliable when any group has fewer than 5 observations. Use the exact test in such cases.
Forgetting to report an effect size. Report $\eta^2_H$ or $\epsilon^2$ alongside $H$ , $df$ , and $p$ for a complete picture.
Using Kruskal-Wallis for repeated measures. If the same participants are measured across three or more conditions, use the Friedman test instead (the non-parametric equivalent of repeated-measures ANOVA).

How to Run It

```r # Kruskal-Wallis test in R kruskal.test(score ~ group, data = mydata)

Post-hoc pairwise comparisons (Dunn's test)

library(dunn.test) dunn.test(mydata$score, mydata$group, method = "bonferroni")

Effect size (epsilon-squared)

library(effectsize) rank_epsilon_squared(score ~ group, data = mydata)

```python
from scipy import stats
import scikit_posthocs as sp

# Kruskal-Wallis test
h_stat, p_value = stats.kruskal(group1, group2, group3)
print(f"H = {h_stat:.2f}, p = {p_value:.4f}")

# Post-hoc Dunn's test with Bonferroni correction
dunn = sp.posthoc_dunn([group1, group2, group3], p_adjust='bonferroni')
print(dunn)
```


Go to Analyze > Nonparametric Tests > Legacy Dialogs > K Independent Samples
Move your dependent variable into the Test Variable List
Move your grouping variable into the Grouping Variable box
Click Define Range and enter the minimum and maximum group codes
Ensure Kruskal-Wallis H is checked
Click OK

SPSS reports the H statistic, degrees of freedom, and asymptotic p-value. For post-hoc tests, use Analyze > Nonparametric Tests > Independent Samples (the newer dialog), which offers pairwise comparisons automatically.


Excel does not have a built-in Kruskal-Wallis test. To compute it manually:

Combine all data into one column with a group label column
Use RANK.AVG to rank all values
Use SUMIF and COUNTIF to calculate the rank sum and sample size for each group
Apply the H formula: =12/(N*(N+1)) * (SUM(Rj^2/nj)) - 3*(N+1)
Use CHISQ.DIST.RT(H, df) to obtain the p-value

For a more automated approach, install the Real Statistics Resource Pack add-in, which includes a dedicated Kruskal-Wallis function with post-hoc tests.



## How to Report in APA Format

> A Kruskal-Wallis H test was conducted to compare post-treatment anxiety scores across three therapy types (CBT, Psychodynamic, and Medication Only). The test indicated a statistically significant difference in anxiety scores, $H$(2) = 13.27, $p$ = .001, $\eta^2_H$ = .75. Dunn's post-hoc tests with Bonferroni correction revealed that the CBT group (Mdn = 4.5) reported significantly lower anxiety than both the Psychodynamic group (Mdn = 9.5) and the Medication Only group (Mdn = 12.5). The Medication Only group also reported significantly higher anxiety than the Psychodynamic group.

Ready to calculate?

Now that you understand the concept, use the free Effect Size Calculator on Subthesis to run your own analysis.

Calculate Effect Size for Your ANOVA on Subthesis

Related Concepts

One-Way ANOVA

Learn how to conduct a one-way ANOVA to compare three or more group means, including F-ratio formulas, post-hoc tests, and effect size with eta-squared.

Effect Size

Learn what effect size is, why it matters more than p-values alone, and how to calculate and interpret Cohen's d, Hedges' g, and eta-squared for your research.