Wilcoxon Signed-Rank Test

Purpose

Compares two related measurements to determine if there is a statistically significant difference when the paired t-test assumptions are not met.

When to Use

When you have two related measurements (pre/post, matched pairs) and the difference scores are ordinal or not normally distributed.

Data Type

Ordinal or continuous dependent variable measured twice on the same participants (or matched pairs)

Key Assumptions

The difference scores are independent across participants, symmetrically distributed around the median, and measured on at least an ordinal scale.

Tools

Effect Size Calculator on Subthesis →

What Is the Wilcoxon Signed-Rank Test?

The Wilcoxon signed-rank test is a non-parametric test for comparing two related measurements. It is the rank-based alternative to the paired samples t-test and does not require the difference scores to be normally distributed.

Like the paired t-test, this test works with difference scores — it calculates the difference between each pair of observations. But instead of using the raw differences, it ranks the absolute differences and then compares the sum of ranks for positive differences against the sum of ranks for negative differences.

The test is named after Frank Wilcoxon, who introduced it in 1945. It is one of the most widely used non-parametric methods in the biomedical and social sciences.

When to Use It

Use a Wilcoxon signed-rank test when:

The same participants are measured under two conditions or at two time points
The dependent variable is ordinal (e.g., pain ratings, Likert-scale responses)
The difference scores are not normally distributed (skewed, heavy-tailed, or contain outliers) and sample sizes are small
You have matched pairs where each observation in one condition corresponds to an observation in the other

Examples:

Pain ratings before and after a medical treatment
Quality-of-life scores before and after rehabilitation
Preference ratings for two products rated by the same consumers
Anxiety levels before and after a therapy session

When to use the paired t-test instead: If the difference scores are approximately normal (or your sample is large, $n > 30$ ), the paired t-test has greater statistical power.

Assumptions

Dependent (paired) observations. Each participant provides two measurements, or participants are matched in pairs.
Independence between pairs. While the two measurements within each pair are related, different pairs must be independent of each other.
At least ordinal measurement. The differences must be rankable — you need to know which differences are larger than others.
Symmetric distribution of differences. The distribution of $D_i = X_{i,2} - X_{i,1}$ should be roughly symmetric around the median. This is a weaker assumption than normality but is required for the test to be valid. If the differences are highly asymmetric, consider the sign test instead.

Formula

Step 1: Calculate difference scores.

D_i = X_{i,\text{post}} - X_{i,\text{pre}}

Discard any pairs where $D_i = 0$ (no change). Let $n_r$ be the number of remaining pairs.

Step 2: Rank the absolute differences.

Rank $|D_1|, |D_2|, \ldots, |D_{n_r}|$ from smallest to largest. Tied absolute values receive the average rank.

Step 3: Calculate signed rank sums.

W^+ = \sum \text{ranks of positive differences}

W^- = \sum \text{ranks of negative differences}

The test statistic is $W = \min(W^+, W^-)$ .

Note: Some software reports $W = W^+$ (the sum of positive ranks). Check your output to know which convention is used.

Step 4: For large samples ( $n_r > 20$ ), use the normal approximation:

z = \frac{W^+ - \frac{n_r(n_r + 1)}{4}}{\sqrt{\frac{n_r(n_r + 1)(2n_r + 1)}{24}}}

Effect size:

r = \frac{z}{\sqrt{n}}

Where $n$ is the total number of pairs (including zeros, depending on convention). Benchmarks: $|r| = .10$ (small), $.30$ (medium), $.50$ (large).

Worked Example

Scenario: A physiotherapist measures pain ratings (0-10 scale) in 10 patients before and after a new stretching protocol.

| Patient | Before | After | $D_i$ | $|D_i|$ | Rank of $|D_i|$ | Signed Rank | |:-------:|:------:|:-----:|:------:|:--------:|:----------------:|:-----------:| | 1 | 7 | 4 | -3 | 3 | 6.5 | -6.5 | | 2 | 5 | 5 | 0 | — | — | — | | 3 | 8 | 5 | -3 | 3 | 6.5 | -6.5 | | 4 | 6 | 4 | -2 | 2 | 4 | -4 | | 5 | 9 | 6 | -3 | 3 | 6.5 | -6.5 | | 6 | 4 | 3 | -1 | 1 | 1.5 | -1.5 | | 7 | 7 | 5 | -2 | 2 | 4 | -4 | | 8 | 8 | 7 | -1 | 1 | 1.5 | -1.5 | | 9 | 6 | 3 | -3 | 3 | 6.5 | -6.5 | | 10 | 5 | 3 | -2 | 2 | 4 | -4 |

Step 1: Calculate differences.

All differences are computed as After - Before. Patient 2 has $D = 0$ and is excluded. We have $n_r = 9$ usable pairs.

Step 2: Rank the absolute differences.

$|D| = 1$ : Patients 6 and 8 share ranks 1-2; each gets rank 1.5
$|D| = 2$ : Patients 4, 7, and 10 share ranks 3-5; each gets rank 4
$|D| = 3$ : Patients 1, 3, 5, and 9 share ranks 6-9; each gets rank 7.5

Correction for the ranks of $|D| = 3$ : four values occupy positions 6, 7, 8, 9, so the average rank is $(6+7+8+9)/4 = 7.5$ . Updating the table:

Patient	Signed Rank
1	-7.5
3	-7.5
4	-4
5	-7.5
6	-1.5
7	-4
8	-1.5
9	-7.5
10	-4

Step 3: Calculate the signed rank sums.

W^+ = 0 \quad \text{(no positive differences)}

W^- = 7.5 + 7.5 + 4 + 7.5 + 1.5 + 4 + 1.5 + 7.5 + 4 = 45

W = \min(0, 45) = 0

Step 4: Determine significance.

For $n_r = 9$ at $\alpha = .05$ (two-tailed), the critical value of $W$ from a Wilcoxon table is 5. Since $W = 0 < 5$ , the result is statistically significant.

Using the normal approximation:

z = \frac{0 - \frac{9 \times 10}{4}}{\sqrt{\frac{9 \times 10 \times 19}{24}}} = \frac{0 - 22.5}{\sqrt{71.25}} = \frac{-22.5}{8.44} = -2.67

This yields $p \approx .008$ (two-tailed).

Step 5: Calculate effect size.

r = \frac{|{-2.67}|}{\sqrt{10}} = \frac{2.67}{3.16} = 0.84

This is a large effect.

Interpretation

The Wilcoxon signed-rank test indicated that pain ratings were significantly lower after the stretching protocol (Mdn = 4) than before (Mdn = 6.5), $W = 0$ , $z = -2.67$ , $p = .008$ , $r = .84$ . All nine patients with a change reported reduced pain, and the large effect size indicates a substantial and consistent reduction.

What to consider:

A $W$ of 0 (or close to it) means that virtually all differences point in the same direction — a very strong result.
Report medians rather than means when using non-parametric tests, because the test is based on ranks rather than raw values.
If the symmetry assumption is questionable, consider the simpler sign test, which only uses the direction of differences (positive vs. negative) without ranking their magnitudes.

Common Mistakes

Using it when data are paired but the test should be the paired t-test. If difference scores are approximately normal and you have adequate sample size, the paired t-test is more powerful. Reserve the Wilcoxon for clear violations of normality.
Including zero differences. Pairs with no change ( $D = 0$ ) must be excluded before ranking. Failing to remove them inflates the sample size and distorts the test statistic.
Ignoring the symmetry assumption. The Wilcoxon signed-rank test assumes the difference scores are symmetrically distributed. If differences are heavily skewed, the sign test is a safer alternative (though less powerful).
Reporting means instead of medians. Since the Wilcoxon is rank-based, medians are the appropriate measure of central tendency to report alongside the test.
Forgetting the effect size. Always compute $r = z / \sqrt{n}$ or another appropriate effect size. The p-value alone does not convey the magnitude of the change.
Confusing with the Wilcoxon rank-sum test. The signed-rank test is for paired data. The rank-sum test (Mann-Whitney U) is for independent groups. The names are similar, but the tests are fundamentally different.

How to Run It

```r # Wilcoxon signed-rank test in R wilcox.test(mydata$before, mydata$after, paired = TRUE, exact = FALSE)

Effect size

library(effectsize) rank_biserial(mydata$before, mydata$after, paired = TRUE)

```python
from scipy import stats
import pingouin as pg

# Using scipy
stat, p_value = stats.wilcoxon(before, after, alternative='two-sided')

# Using pingouin
result = pg.wilcoxon(before, after, alternative='two-sided')
print(result)
```


Go to Analyze > Nonparametric Tests > Legacy Dialogs > 2 Related Samples
Select your two measurement variables (e.g., Before and After) and move them into the Test Pairs box
Ensure Wilcoxon is checked under Test Type
Click OK

SPSS reports the Z statistic and the asymptotic p-value (two-tailed). It also shows the number of negative ranks, positive ranks, and ties. Calculate the effect size manually as r = |Z| / √N.


Excel does not have a built-in Wilcoxon signed-rank test. To compute it manually:

Calculate the difference for each pair in a new column
Remove any rows where the difference is zero
Take the absolute value of each difference
Use RANK.AVG to rank the absolute differences
Multiply each rank by the sign of the original difference (+1 or -1)
Sum the positive signed ranks (W⁺) and the negative signed ranks (W^-) separately
Compare W = min(W⁺, W^-) to a critical value table

Alternatively, install the Real Statistics Resource Pack add-in for an automated Wilcoxon signed-rank test.



## How to Report in APA Format

> A Wilcoxon signed-rank test was conducted to evaluate the effect of a stretching protocol on patient pain ratings. Results indicated a statistically significant reduction in pain from pre-treatment (Mdn = 6.5) to post-treatment (Mdn = 4), $W$ = 0, $z$ = -2.67, $p$ = .008, $r$ = .84. The large effect size indicates that the stretching protocol produced a substantial and consistent decrease in pain ratings across patients.

Ready to calculate?

Now that you understand the concept, use the free Effect Size Calculator on Subthesis to run your own analysis.

Calculate Effect Size for Your t-Test on Subthesis

Related Concepts

Paired Samples t-Test

Learn how to conduct a paired samples t-test for pre/post designs and repeated measures, with formulas, worked examples, and APA reporting format.

Effect Size

Learn what effect size is, why it matters more than p-values alone, and how to calculate and interpret Cohen's d, Hedges' g, and eta-squared for your research.