Wilcoxon Signed-Rank Test
Wilcoxon Signed-Rank Test
What Is the Wilcoxon Signed-Rank Test?
The Wilcoxon signed-rank test is a non-parametric test for comparing two related measurements. It is the rank-based alternative to the paired samples t-test and does not require the difference scores to be normally distributed.
Like the paired t-test, this test works with difference scores — it calculates the difference between each pair of observations. But instead of using the raw differences, it ranks the absolute differences and then compares the sum of ranks for positive differences against the sum of ranks for negative differences.
The test is named after Frank Wilcoxon, who introduced it in 1945. It is one of the most widely used non-parametric methods in the biomedical and social sciences.
When to Use It
Use a Wilcoxon signed-rank test when:
- The same participants are measured under two conditions or at two time points
- The dependent variable is ordinal (e.g., pain ratings, Likert-scale responses)
- The difference scores are not normally distributed (skewed, heavy-tailed, or contain outliers) and sample sizes are small
- You have matched pairs where each observation in one condition corresponds to an observation in the other
Examples:
- Pain ratings before and after a medical treatment
- Quality-of-life scores before and after rehabilitation
- Preference ratings for two products rated by the same consumers
- Anxiety levels before and after a therapy session
When to use the paired t-test instead: If the difference scores are approximately normal (or your sample is large, ), the paired t-test has greater statistical power.
Assumptions
-
Dependent (paired) observations. Each participant provides two measurements, or participants are matched in pairs.
-
Independence between pairs. While the two measurements within each pair are related, different pairs must be independent of each other.
-
At least ordinal measurement. The differences must be rankable — you need to know which differences are larger than others.
-
Symmetric distribution of differences. The distribution of should be roughly symmetric around the median. This is a weaker assumption than normality but is required for the test to be valid. If the differences are highly asymmetric, consider the sign test instead.
Formula
Step 1: Calculate difference scores.
Discard any pairs where (no change). Let be the number of remaining pairs.
Step 2: Rank the absolute differences.
Rank from smallest to largest. Tied absolute values receive the average rank.
Step 3: Calculate signed rank sums.
The test statistic is .
Note: Some software reports (the sum of positive ranks). Check your output to know which convention is used.
Step 4: For large samples (), use the normal approximation:
Effect size:
Where is the total number of pairs (including zeros, depending on convention). Benchmarks: (small), (medium), (large).
Worked Example
Scenario: A physiotherapist measures pain ratings (0-10 scale) in 10 patients before and after a new stretching protocol.
| Patient | Before | After | | | Rank of | Signed Rank | |:-------:|:------:|:-----:|:------:|:--------:|:----------------:|:-----------:| | 1 | 7 | 4 | -3 | 3 | 6.5 | -6.5 | | 2 | 5 | 5 | 0 | — | — | — | | 3 | 8 | 5 | -3 | 3 | 6.5 | -6.5 | | 4 | 6 | 4 | -2 | 2 | 4 | -4 | | 5 | 9 | 6 | -3 | 3 | 6.5 | -6.5 | | 6 | 4 | 3 | -1 | 1 | 1.5 | -1.5 | | 7 | 7 | 5 | -2 | 2 | 4 | -4 | | 8 | 8 | 7 | -1 | 1 | 1.5 | -1.5 | | 9 | 6 | 3 | -3 | 3 | 6.5 | -6.5 | | 10 | 5 | 3 | -2 | 2 | 4 | -4 |
Step 1: Calculate differences.
All differences are computed as After - Before. Patient 2 has and is excluded. We have usable pairs.
Step 2: Rank the absolute differences.
- : Patients 6 and 8 share ranks 1-2; each gets rank 1.5
- : Patients 4, 7, and 10 share ranks 3-5; each gets rank 4
- : Patients 1, 3, 5, and 9 share ranks 6-9; each gets rank 7.5
Correction for the ranks of : four values occupy positions 6, 7, 8, 9, so the average rank is . Updating the table:
| Patient | Signed Rank |
|---|---|
| 1 | -7.5 |
| 3 | -7.5 |
| 4 | -4 |
| 5 | -7.5 |
| 6 | -1.5 |
| 7 | -4 |
| 8 | -1.5 |
| 9 | -7.5 |
| 10 | -4 |
Step 3: Calculate the signed rank sums.
Step 4: Determine significance.
For at (two-tailed), the critical value of from a Wilcoxon table is 5. Since , the result is statistically significant.
Using the normal approximation:
This yields (two-tailed).
Step 5: Calculate effect size.
This is a large effect.
Interpretation
The Wilcoxon signed-rank test indicated that pain ratings were significantly lower after the stretching protocol (Mdn = 4) than before (Mdn = 6.5), , , , . All nine patients with a change reported reduced pain, and the large effect size indicates a substantial and consistent reduction.
What to consider:
- A of 0 (or close to it) means that virtually all differences point in the same direction — a very strong result.
- Report medians rather than means when using non-parametric tests, because the test is based on ranks rather than raw values.
- If the symmetry assumption is questionable, consider the simpler sign test, which only uses the direction of differences (positive vs. negative) without ranking their magnitudes.
Common Mistakes
-
Using it when data are paired but the test should be the paired t-test. If difference scores are approximately normal and you have adequate sample size, the paired t-test is more powerful. Reserve the Wilcoxon for clear violations of normality.
-
Including zero differences. Pairs with no change () must be excluded before ranking. Failing to remove them inflates the sample size and distorts the test statistic.
-
Ignoring the symmetry assumption. The Wilcoxon signed-rank test assumes the difference scores are symmetrically distributed. If differences are heavily skewed, the sign test is a safer alternative (though less powerful).
-
Reporting means instead of medians. Since the Wilcoxon is rank-based, medians are the appropriate measure of central tendency to report alongside the test.
-
Forgetting the effect size. Always compute or another appropriate effect size. The p-value alone does not convey the magnitude of the change.
-
Confusing with the Wilcoxon rank-sum test. The signed-rank test is for paired data. The rank-sum test (Mann-Whitney U) is for independent groups. The names are similar, but the tests are fundamentally different.
How to Run It
Effect size
library(effectsize) rank_biserial(mydata$before, mydata$after, paired = TRUE)
```python
from scipy import stats
import pingouin as pg
# Using scipy
stat, p_value = stats.wilcoxon(before, after, alternative='two-sided')
# Using pingouin
result = pg.wilcoxon(before, after, alternative='two-sided')
print(result)
```
Go to Analyze > Nonparametric Tests > Legacy Dialogs > 2 Related Samples
Select your two measurement variables (e.g., Before and After) and move them into the Test Pairs box
Ensure Wilcoxon is checked under Test Type
Click OK
SPSS reports the Z statistic and the asymptotic p-value (two-tailed). It also shows the number of negative ranks, positive ranks, and ties. Calculate the effect size manually as r = |Z| / √N.
Excel does not have a built-in Wilcoxon signed-rank test. To compute it manually:
Calculate the difference for each pair in a new column
Remove any rows where the difference is zero
Take the absolute value of each difference
Use RANK.AVG to rank the absolute differences
Multiply each rank by the sign of the original difference (+1 or -1)
Sum the positive signed ranks (W+) and the negative signed ranks (W-) separately
Compare W = min(W+, W-) to a critical value table
Alternatively, install the Real Statistics Resource Pack add-in for an automated Wilcoxon signed-rank test.
Ready to calculate?
Now that you understand the concept, use the free Effect Size Calculator on Subthesis to run your own analysis.
Related Concepts
Paired Samples t-Test
Learn how to conduct a paired samples t-test for pre/post designs and repeated measures, with formulas, worked examples, and APA reporting format.
Effect Size
Learn what effect size is, why it matters more than p-values alone, and how to calculate and interpret Cohen's d, Hedges' g, and eta-squared for your research.