Skip to main content
Stats for Scholars
Concepts Decision Tree Reporting Calculators Blog Software Cheat Sheets
Concepts Decision Tree Reporting Calculators Blog Software Cheat Sheets
Home Concepts Kruskal-Wallis H Test

Descriptive Statistics

  • Descriptive Statistics

Inferential Statistics

  • Chi-Square Test of Independence
  • Independent Samples t-Test
  • Kruskal-Wallis H Test
  • Logistic Regression
  • Mann-Whitney U Test
  • Multiple Linear Regression
  • One-Way ANOVA
  • Paired Samples t-Test
  • Pearson Correlation
  • Repeated Measures ANOVA
  • Simple Linear Regression
  • Two-Way (Factorial) ANOVA
  • Wilcoxon Signed-Rank Test

Effect Size & Power

  • Effect Size
  • Sample Size Determination
  • Statistical Power & Power Analysis

Reliability & Validity

  • Cronbach's Alpha
  • Inter-Rater Reliability

Kruskal-Wallis H Test

intermediate Inferential Statistics

Kruskal-Wallis H Test

Purpose
Tests whether three or more independent groups differ on an ordinal or continuous variable when the assumptions of one-way ANOVA are not met.
When to Use
When you have three or more independent groups and the dependent variable is ordinal or not normally distributed.
Data Type
Ordinal or continuous dependent variable; categorical independent variable with 3+ groups
Key Assumptions
Independence of observations, similarly shaped distributions in each group, at least ordinal measurement level.
Tools
Effect Size Calculator on Subthesis →

What Is the Kruskal-Wallis H Test?

The Kruskal-Wallis H test (also called the Kruskal-Wallis one-way analysis of variance by ranks) is a non-parametric test that compares three or more independent groups. It is the rank-based alternative to the one-way ANOVA and extends the Mann-Whitney U test to more than two groups.

Like other rank-based tests, the Kruskal-Wallis test works by ranking all observations from all groups together and then testing whether the average ranks differ significantly across groups. If one group tends to have consistently higher (or lower) values, its average rank will deviate from the overall average rank.

The test produces an H statistic, which follows an approximate chi-square distribution. A large H indicates that at least one group differs from the others.

When to Use It

Use a Kruskal-Wallis H test when:

  • You have one dependent variable that is at least ordinal
  • You have one categorical independent variable with three or more independent groups
  • The normality or homogeneity of variances assumption of one-way ANOVA is violated
  • Your sample sizes are small and you cannot rely on the robustness of ANOVA

Examples:

  • Comparing anxiety scores across three therapy types (CBT, psychodynamic, medication)
  • Comparing customer satisfaction rankings across four product brands
  • Comparing pain levels across three dosage groups when data are ordinal

When to use one-way ANOVA instead: If your data are continuous, approximately normal within each group, and the variances are similar, one-way ANOVA has more statistical power. With large, balanced samples (n≥25n \geq 25n≥25 per group), ANOVA is robust even to moderate violations.

Assumptions

  1. Independence of observations. Each participant contributes one data point, and participants in different groups are unrelated.

  2. At least ordinal measurement. The dependent variable must be rankable.

  3. Similarly shaped distributions. For interpreting the result as a comparison of medians, the distributions in all groups should have the same shape (but may differ in central tendency). If the shapes differ, the Kruskal-Wallis test is still valid but is interpreted as a test of whether the groups differ in their overall distribution of ranks.

Formula

Step 1: Rank all observations. Combine all groups and assign ranks from 1 to NNN, where N=∑njN = \sum n_jN=∑nj​ is the total sample size. Tied values receive average ranks.

Step 2: Calculate the H statistic.

H=12N(N+1)∑j=1kRj2nj−3(N+1)H = \frac{12}{N(N+1)} \sum_{j=1}^{k} \frac{R_j^2}{n_j} - 3(N+1) H=N(N+1)12​j=1∑k​nj​Rj2​​−3(N+1)

Where:

  • kkk = number of groups
  • njn_jnj​ = sample size in group jjj
  • RjR_jRj​ = sum of ranks in group jjj
  • NNN = total sample size

Tie correction: When there are tied ranks, divide HHH by:

1−∑(ti3−ti)N3−N1 - \frac{\sum (t_i^3 - t_i)}{N^3 - N} 1−N3−N∑(ti3​−ti​)​

Where tit_iti​ is the number of tied observations in the $i$th group of ties.

Degrees of freedom: df=k−1df = k - 1df=k−1

Under H0H_0H0​, the test statistic HHH approximately follows a χ2\chi^2χ2 distribution with k−1k - 1k−1 degrees of freedom (provided each group has at least 5 observations).

Effect size (epsilon-squared):

ϵ2=H(N2−1)/(N+1)=H(N+1)N2−1\epsilon^2 = \frac{H}{(N^2 - 1)/(N + 1)} = \frac{H(N + 1)}{N^2 - 1} ϵ2=(N2−1)/(N+1)H​=N2−1H(N+1)​

This ranges from 0 to 1 and represents the proportion of variance in ranks explained by group membership. An alternative is eta-squared based on H:

ηH2=H−k+1N−k\eta^2_H = \frac{H - k + 1}{N - k} ηH2​=N−kH−k+1​

Worked Example

Scenario: A clinical psychologist compares anxiety scores (measured on a 0-20 ordinal self-report scale) across three therapy types: Cognitive-Behavioral Therapy (CBT), Psychodynamic Therapy, and Medication Only. Each group has 6 patients, and the outcome is measured after 12 weeks of treatment.

CBT Psychodynamic Medication
5 9 12
3 11 14
7 8 10
4 10 15
6 7 13
2 12 11

n1=n2=n3=6n_1 = n_2 = n_3 = 6n1​=n2​=n3​=6, N=18N = 18N=18.

Step 1: Rank all 18 observations.

Value Group Rank
2 CBT 1
3 CBT 2
4 CBT 3
5 CBT 4
6 CBT 5
7 CBT/Psych 6.5
7 Psych 6.5
8 Psych 8
9 Psych 9
10 Psych 10
11 Psych/Med 11.5
11 Med 11.5
12 CBT/Psych/Med 13.5
12 Med 13.5
13 Med 15
14 Med 16
15 Med 17

Wait — let us be precise. Sorting all 18 values: 2, 3, 4, 5, 6, 7, 7, 8, 9, 10, 10, 11, 11, 12, 12, 13, 14, 15.

Correcting — each group contributes exactly 6 values. The combined sorted data:

Rank Value Group
1 2 CBT
2 3 CBT
3 4 CBT
4 5 CBT
5 6 CBT
6.5 7 CBT
6.5 7 Psych
8 8 Psych
9 9 Psych
10.5 10 Psych
10.5 10 Med
12.5 11 Psych
12.5 11 Med
14 12 Psych
15 13 Med
16 14 Med
17 15 Med

That is only 17. The value 12 appears in both Med and Psych. Let us list all 18 values:

CBT: 2, 3, 4, 5, 6, 7. Psych: 7, 8, 9, 10, 11, 12. Med: 10, 11, 12, 13, 14, 15.

Sorted: 2, 3, 4, 5, 6, 7, 7, 8, 9, 10, 10, 11, 11, 12, 12, 13, 14, 15.

Rank Value Group
1 2 CBT
2 3 CBT
3 4 CBT
4 5 CBT
5 6 CBT
6.5 7 CBT
6.5 7 Psych
8 8 Psych
9 9 Psych
10.5 10 Psych
10.5 10 Med
12.5 11 Psych
12.5 11 Med
14.5 12 Psych
14.5 12 Med
16 13 Med
17 14 Med
18 15 Med

Step 2: Sum the ranks for each group.

RCBT=1+2+3+4+5+6.5=21.5R_{\text{CBT}} = 1 + 2 + 3 + 4 + 5 + 6.5 = 21.5 RCBT​=1+2+3+4+5+6.5=21.5

RPsych=6.5+8+9+10.5+12.5+14.5=61R_{\text{Psych}} = 6.5 + 8 + 9 + 10.5 + 12.5 + 14.5 = 61 RPsych​=6.5+8+9+10.5+12.5+14.5=61

RMed=10.5+12.5+14.5+16+17+18=88.5R_{\text{Med}} = 10.5 + 12.5 + 14.5 + 16 + 17 + 18 = 88.5 RMed​=10.5+12.5+14.5+16+17+18=88.5

Check: 21.5+61+88.5=171=18×19/221.5 + 61 + 88.5 = 171 = 18 \times 19 / 221.5+61+88.5=171=18×19/2. Correct.

Step 3: Calculate the H statistic.

H=1218×19(21.526+6126+88.526)−3(19)H = \frac{12}{18 \times 19}\left(\frac{21.5^2}{6} + \frac{61^2}{6} + \frac{88.5^2}{6}\right) - 3(19) H=18×1912​(621.52​+6612​+688.52​)−3(19)

=12342(462.256+37216+7832.256)−57= \frac{12}{342}\left(\frac{462.25}{6} + \frac{3721}{6} + \frac{7832.25}{6}\right) - 57 =34212​(6462.25​+63721​+67832.25​)−57

=0.03509×(77.04+620.17+1305.38)−57= 0.03509 \times (77.04 + 620.17 + 1305.38) - 57 =0.03509×(77.04+620.17+1305.38)−57

=0.03509×2002.58−57= 0.03509 \times 2002.58 - 57 =0.03509×2002.58−57

=70.27−57=13.27= 70.27 - 57 = 13.27 =70.27−57=13.27

Step 4: Determine the p-value.

With df=3−1=2df = 3 - 1 = 2df=3−1=2, we compare H=13.27H = 13.27H=13.27 to the chi-square distribution. The critical value at α=.05\alpha = .05α=.05 with df=2df = 2df=2 is 5.99. Since 13.27>5.9913.27 > 5.9913.27>5.99, the result is statistically significant (p=.001p = .001p=.001).

Step 5: Calculate effect size.

ηH2=H−k+1N−k=13.27−218−3=11.2715=0.75\eta^2_H = \frac{H - k + 1}{N - k} = \frac{13.27 - 2}{18 - 3} = \frac{11.27}{15} = 0.75 ηH2​=N−kH−k+1​=18−313.27−2​=1511.27​=0.75

This is a large effect — group membership explains approximately 75% of the variance in ranks.

Post-Hoc Tests

A significant Kruskal-Wallis test tells you that at least one group differs, but not which specific groups differ. Follow up with pairwise Mann-Whitney U tests using a Bonferroni correction:

αadjusted=.05m\alpha_{\text{adjusted}} = \frac{.05}{m} αadjusted​=m.05​

Where mmm is the number of pairwise comparisons. With 3 groups: m=3m = 3m=3, so αadjusted=.05/3=.0167\alpha_{\text{adjusted}} = .05/3 = .0167αadjusted​=.05/3=.0167.

An alternative is Dunn's test, which is specifically designed as a post-hoc test for the Kruskal-Wallis and uses the rank sums from the omnibus test rather than re-ranking within each pair.

Interpretation

The Kruskal-Wallis test revealed a statistically significant difference in anxiety scores across the three therapy types, H(2)=13.27H(2) = 13.27H(2)=13.27, p=.001p = .001p=.001, ηH2=.75\eta^2_H = .75ηH2​=.75.

Post-hoc pairwise comparisons with Bonferroni correction would likely show:

  • CBT (Mdn = 4.5) < Psychodynamic (Mdn = 9.5), p<.05p < .05p<.05
  • CBT (Mdn = 4.5) < Medication (Mdn = 12.5), p<.01p < .01p<.01
  • Psychodynamic (Mdn = 9.5) < Medication (Mdn = 12.5), p<.05p < .05p<.05

The CBT group had the lowest post-treatment anxiety scores, followed by the Psychodynamic group, with the Medication Only group having the highest anxiety.

Common Mistakes

  1. Stopping at the omnibus test. A significant H tells you that at least one group differs. You must run post-hoc pairwise tests (e.g., Dunn's test or Bonferroni-corrected Mann-Whitney U tests) to identify which groups differ.

  2. Running multiple Mann-Whitney tests without correction. Performing all pairwise comparisons at α=.05\alpha = .05α=.05 inflates the Type I error rate, just as running multiple t-tests does. Apply a correction (Bonferroni or use Dunn's test).

  3. Interpreting as a test of medians when distributions differ in shape. If one group is skewed and another is symmetric, the Kruskal-Wallis test may be significant even when medians are identical. It is technically a test of the distribution of ranks.

  4. Using it with very small groups. The chi-square approximation for HHH is unreliable when any group has fewer than 5 observations. Use the exact test in such cases.

  5. Forgetting to report an effect size. Report ηH2\eta^2_HηH2​ or ϵ2\epsilon^2ϵ2 alongside HHH, dfdfdf, and ppp for a complete picture.

  6. Using Kruskal-Wallis for repeated measures. If the same participants are measured across three or more conditions, use the Friedman test instead (the non-parametric equivalent of repeated-measures ANOVA).

How to Run It

```r # Kruskal-Wallis test in R kruskal.test(score ~ group, data = mydata)

Post-hoc pairwise comparisons (Dunn's test)

library(dunn.test) dunn.test(mydata$score, mydata$group, method = "bonferroni")

Effect size (epsilon-squared)

library(effectsize) rank_epsilon_squared(score ~ group, data = mydata)

```python from scipy import stats import scikit_posthocs as sp # Kruskal-Wallis test h_stat, p_value = stats.kruskal(group1, group2, group3) print(f"H = {h_stat:.2f}, p = {p_value:.4f}") # Post-hoc Dunn's test with Bonferroni correction dunn = sp.posthoc_dunn([group1, group2, group3], p_adjust='bonferroni') print(dunn) ```
  1. Go to Analyze > Nonparametric Tests > Legacy Dialogs > K Independent Samples
  2. Move your dependent variable into the Test Variable List
  3. Move your grouping variable into the Grouping Variable box
  4. Click Define Range and enter the minimum and maximum group codes
  5. Ensure Kruskal-Wallis H is checked
  6. Click OK

SPSS reports the H statistic, degrees of freedom, and asymptotic p-value. For post-hoc tests, use Analyze > Nonparametric Tests > Independent Samples (the newer dialog), which offers pairwise comparisons automatically.

Excel does not have a built-in Kruskal-Wallis test. To compute it manually:

  1. Combine all data into one column with a group label column
  2. Use RANK.AVG to rank all values
  3. Use SUMIF and COUNTIF to calculate the rank sum and sample size for each group
  4. Apply the H formula: =12/(N*(N+1)) * (SUM(Rj^2/nj)) - 3*(N+1)
  5. Use CHISQ.DIST.RT(H, df) to obtain the p-value

For a more automated approach, install the Real Statistics Resource Pack add-in, which includes a dedicated Kruskal-Wallis function with post-hoc tests.

## How to Report in APA Format > A Kruskal-Wallis H test was conducted to compare post-treatment anxiety scores across three therapy types (CBT, Psychodynamic, and Medication Only). The test indicated a statistically significant difference in anxiety scores, $H$(2) = 13.27, $p$ = .001, $\eta^2_H$ = .75. Dunn's post-hoc tests with Bonferroni correction revealed that the CBT group (Mdn = 4.5) reported significantly lower anxiety than both the Psychodynamic group (Mdn = 9.5) and the Medication Only group (Mdn = 12.5). The Medication Only group also reported significantly higher anxiety than the Psychodynamic group.

Ready to calculate?

Now that you understand the concept, use the free Effect Size Calculator on Subthesis to run your own analysis.

Calculate Effect Size for Your ANOVA on Subthesis

Related Concepts

One-Way ANOVA

Learn how to conduct a one-way ANOVA to compare three or more group means, including F-ratio formulas, post-hoc tests, and effect size with eta-squared.

Effect Size

Learn what effect size is, why it matters more than p-values alone, and how to calculate and interpret Cohen's d, Hedges' g, and eta-squared for your research.

Stats for Scholars

Statistics for Researchers, Not Statisticians

A Subthesis Resource

Learn

  • Statistical Concepts
  • Choose a Test
  • APA Reporting
  • Blog

Resources

  • Calculators
  • Cheat Sheets
  • About
  • FAQ
  • Accessibility
  • Privacy
  • Terms

© 2026 Angel Reyes / Subthesis. All rights reserved.

Privacy Policy Terms of Use