Repeated Measures ANOVA

Purpose

Tests whether the mean of a continuous variable differs significantly across three or more related conditions or time points measured on the same participants.

When to Use

When the same participants are measured on a continuous outcome under three or more conditions or at three or more time points.

Data Type

One continuous dependent variable measured at 3+ levels of a within-subjects factor

Key Assumptions

Normality of differences, no significant outliers, sphericity (equal variances of differences between all pairs of conditions), tested with Mauchly's test.

Tools

Effect Size Calculator on Subthesis →

What Is Repeated Measures ANOVA?

Repeated measures ANOVA is an extension of the paired-samples t-test to three or more related conditions. It tests whether the mean of a continuous outcome differs across multiple time points, treatments, or conditions when the same participants are measured in all conditions.

Like the one-way ANOVA, it produces an F-ratio. However, whereas the one-way ANOVA compares independent groups (between-subjects), repeated measures ANOVA compares related measurements (within-subjects). Because each participant serves as their own control, this design is more powerful — individual differences are removed from the error term.

The F-ratio for a repeated measures design is:

F = \frac{MS_{condition}}{MS_{error}}

Where $MS_{error}$ reflects variability in how participants respond differently across conditions (the condition $\times$ subjects interaction), rather than overall within-group variability.

When to Use It

Use repeated measures ANOVA when:

The same participants are measured under three or more conditions or at three or more time points.
Your dependent variable is continuous (interval or ratio scale).
You want to determine whether there is a statistically significant change across the conditions.

Examples:

Measuring anxiety scores before, during, and after an intervention
Comparing reaction times under easy, medium, and hard task conditions
Testing pain levels at 0, 2, 4, and 8 weeks post-surgery

If you have only two time points, use a paired-samples t-test. If different participants are in each group, use a one-way ANOVA.

Assumptions

Continuous dependent variable. The outcome must be measured at the interval or ratio level.
Related groups. The same participants provide data for all levels of the within-subjects factor (or the groups are matched).
No significant outliers. Extreme values in any condition can distort results. Check boxplots for each condition.
Normality. The dependent variable should be approximately normally distributed at each time point. With sample sizes above 20-30, repeated measures ANOVA is fairly robust to this.
Sphericity. This is the critical assumption unique to repeated measures designs. Sphericity requires that the variances of the differences between all pairs of conditions are equal. For example, with three conditions (A, B, C), the variance of (A - B) should equal the variance of (A - C) and the variance of (B - C).

Testing Sphericity: Mauchly's Test

Mauchly's test evaluates whether the sphericity assumption holds:

$H_0$ : Sphericity is met (variances of differences are equal)
$H_1$ : Sphericity is violated

If Mauchly's test is significant ( $p < .05$ ), sphericity is violated and corrections are needed:

Greenhouse-Geisser correction ( $\epsilon_{GG}$ ): Reduces the degrees of freedom to correct for the violation. More conservative. Use when $\epsilon < .75$ .
Huynh-Feldt correction ( $\epsilon_{HF}$ ): Less conservative than Greenhouse-Geisser. Use when $\epsilon \geq .75$ .

The epsilon ( $\epsilon$ ) value ranges from $\frac{1}{k-1}$ (maximum violation) to 1.0 (perfect sphericity), where $k$ is the number of conditions.

Formula

Partitioning Variance

In a repeated measures design, total variability is decomposed as:

SS_{total} = SS_{between\text{-}subjects} + SS_{within\text{-}subjects}

The within-subjects variability is further decomposed:

SS_{within\text{-}subjects} = SS_{condition} + SS_{error}

Where:

$SS_{condition}$ reflects differences among the condition means
$SS_{error}$ reflects the condition $\times$ subjects interaction (how inconsistently participants respond across conditions)

Degrees of Freedom

$df_{condition} = k - 1$ (where $k$ = number of conditions)
$df_{error} = (k - 1)(n - 1)$ (where $n$ = number of participants)

Mean Squares and F-Ratio

MS_{condition} = \frac{SS_{condition}}{k - 1}

MS_{error} = \frac{SS_{error}}{(k-1)(n-1)}

F = \frac{MS_{condition}}{MS_{error}}

Effect Size: Partial Eta-Squared

\eta_p^2 = \frac{SS_{condition}}{SS_{condition} + SS_{error}}

$\eta_p^2$	Interpretation
.01	Small
.06	Medium
.14	Large

Greenhouse-Geisser Corrected Degrees of Freedom

When sphericity is violated, multiply the degrees of freedom by $\epsilon_{GG}$ :

df_{condition}^* = \epsilon_{GG} \times (k - 1)

df_{error}^* = \epsilon_{GG} \times (k - 1)(n - 1)

The F-value itself does not change — only the degrees of freedom (and hence the p-value) change.

Worked Example

Scenario: A clinical psychologist measures test anxiety scores (0-50 scale) in $n = 6$ students at three time points: before an intervention (Pre), at the midpoint (Mid), and after the intervention (Post).

Participant	Pre ( $T_1$ )	Mid ( $T_2$ )	Post ( $T_3$ )
1	38	32	25
2	42	35	28
3	35	30	22
4	40	34	26
5	45	38	30
6	36	31	24

Step 1: Calculate the condition means and grand mean.

$\bar{X}_{Pre} = \frac{38+42+35+40+45+36}{6} = 39.33$
$\bar{X}_{Mid} = \frac{32+35+30+34+38+31}{6} = 33.33$
$\bar{X}_{Post} = \frac{25+28+22+26+30+24}{6} = 25.83$
$\bar{X}_{grand} = \frac{39.33 + 33.33 + 25.83}{3} = 32.83$

Step 2: Calculate $SS_{condition}$ .

SS_{condition} = n \sum_{j=1}^{k} (\bar{X}_j - \bar{X}_{grand})^2

= 6[(39.33-32.83)^2 + (33.33-32.83)^2 + (25.83-32.83)^2]

= 6[42.25 + 0.25 + 49.00] = 6 \times 91.50 = 549.00

Step 3: Calculate $SS_{error}$ .

$SS_{error}$ is computed as the residual variability after removing both subject effects and condition effects. For each cell, the residual is:

e_{ij} = X_{ij} - \bar{X}_j - \bar{P}_i + \bar{X}_{grand}

Where $\bar{P}_i$ is the mean for participant $i$ across all conditions.

Participant means: $\bar{P}_1 = 31.67$ , $\bar{P}_2 = 35.00$ , $\bar{P}_3 = 29.00$ , $\bar{P}_4 = 33.33$ , $\bar{P}_5 = 37.67$ , $\bar{P}_6 = 30.33$ .

Computing the squared residuals and summing yields:

SS_{error} = 3.44

Step 4: Calculate mean squares and the F-ratio.

MS_{condition} = \frac{549.00}{3-1} = \frac{549.00}{2} = 274.50

MS_{error} = \frac{3.44}{(3-1)(6-1)} = \frac{3.44}{10} = 0.344

F = \frac{274.50}{0.344} = 797.67

Step 5: Determine degrees of freedom and p-value.

With $df_1 = 2$ and $df_2 = 10$ , this enormous F-value yields $p < .001$ .

Step 6: Check sphericity.

Suppose Mauchly's test gives $W = 0.89$ , $p = .42$ . Since $p > .05$ , sphericity is not violated and no correction is needed.

Step 7: Calculate effect size.

\eta_p^2 = \frac{549.00}{549.00 + 3.44} = \frac{549.00}{552.44} = .994

This is an extremely large effect. Test anxiety declined dramatically across the three time points.

Step 8: Post-hoc pairwise comparisons.

With a significant omnibus F, conduct Bonferroni-corrected pairwise comparisons:

Pre vs. Mid: $\bar{X}_{Pre} - \bar{X}_{Mid} = 6.00$ , $p < .001$
Mid vs. Post: $\bar{X}_{Mid} - \bar{X}_{Post} = 7.50$ , $p < .001$
Pre vs. Post: $\bar{X}_{Pre} - \bar{X}_{Post} = 13.50$ , $p < .001$

All pairwise comparisons are significant — anxiety decreased significantly at each stage of the intervention.

Interpretation

The repeated measures ANOVA revealed a significant effect of time on test anxiety, $F(2, 10) = 797.67$ , $p < .001$ , $\eta_p^2 = .99$ . Anxiety scores decreased from pre-intervention ( $M = 39.33$ ) to mid-intervention ( $M = 33.33$ ) to post-intervention ( $M = 25.83$ ), and every pairwise comparison was statistically significant. The intervention appears to have produced a large and consistent reduction in test anxiety.

What If Sphericity Is Violated?

If Mauchly's test had been significant, you would report the corrected results. For example, with $\epsilon_{GG} = 0.68$ :

Corrected $df_1 = 0.68 \times 2 = 1.36$
Corrected $df_2 = 0.68 \times 10 = 6.80$
Report: $F(1.36, 6.80) = 797.67$ , $p < .001$ (Greenhouse-Geisser corrected)

Common Mistakes

Ignoring sphericity. Always check Mauchly's test and apply a correction (Greenhouse-Geisser or Huynh-Feldt) when it is significant. Failing to correct inflates the Type I error rate.
Using a one-way ANOVA instead. If the same participants appear in every condition, a between-subjects ANOVA is incorrect because it treats the repeated measurements as independent, violating the independence assumption and wasting statistical power.
Not conducting post-hoc comparisons. A significant F-test tells you that at least one time point differs, but not which ones. Use Bonferroni-corrected pairwise comparisons or polynomial contrasts to identify specific differences.
Ignoring missing data. Standard repeated measures ANOVA uses listwise deletion — a participant missing one time point is dropped entirely. Consider mixed-effects models for data with missing observations.
Over-interpreting a time effect as a treatment effect. If there is no control group, changes over time could reflect maturation, practice effects, or regression to the mean rather than the intervention. A mixed ANOVA (between-within design) with a control group is stronger.
Reporting partial $\eta^2$ as $\eta^2$ . These are different in repeated measures designs. Clearly label which effect size you report.

How to Run It

```r # Repeated measures ANOVA in R using ez library(ez)

Data must be in long format with columns:

participant, time (factor), score

result <- ezANOVA( data = mydata_long, dv = .(score), wid = .(participant), within = .(time), detailed = TRUE ) print(result)

Includes Mauchly's test and GG/HF corrections

Post-hoc pairwise comparisons (Bonferroni)

pairwise.t.test(mydata_long$score, mydata_long$time, paired = TRUE, p.adjust.method = "bonferroni")

```python
import pingouin as pg

# Data must be in long format with columns:
# participant, time, score
aov = pg.rm_anova(
    data=df_long,
    dv='score',
    within='time',
    subject='participant',
    correction=True   # applies GG correction if needed
)
print(aov)

# Sphericity test
spher = pg.sphericity(
    data=df_long, dv='score',
    within='time', subject='participant'
)
print(spher)

# Post-hoc pairwise comparisons
posthoc = pg.pairwise_tests(
    data=df_long, dv='score',
    within='time', subject='participant',
    padjust='bonf'
)
print(posthoc)
```


Go to Analyze > General Linear Model > Repeated Measures
In the dialog, define your within-subjects factor (e.g., name it "Time" with 3 levels) and click Add, then Define
Move the three measurement variables (Pre, Mid, Post) into the within-subjects slots
Click Options: check Descriptive statistics, Estimates of effect size, and Observed power
Click Compare main effects and select Bonferroni as the confidence interval adjustment
Click Plots: move your within-subjects factor to the horizontal axis and click Add
Click OK

SPSS outputs Mauchly's Test of Sphericity, the Tests of Within-Subjects Effects table (with Sphericity Assumed, Greenhouse-Geisser, and Huynh-Feldt rows), Pairwise Comparisons with Bonferroni adjustment, and partial eta-squared as the effect size.


Excel does not have a built-in repeated measures ANOVA tool. The Data Analysis ToolPak offers "Anova: Two-Factor Without Replication," which can approximate a repeated measures design:

Arrange data so each row is a participant and each column is a condition (Pre, Mid, Post)
Go to Data > Data Analysis > Anova: Two-Factor Without Replication
Select the data range (including headers)
Set alpha to 0.05 and click OK

The "Rows" factor represents participants and the "Columns" factor represents your repeated measure. The F and p-value for the Columns factor test the within-subjects effect. Note that this method does not provide Mauchly's test, epsilon corrections, or post-hoc comparisons. For proper repeated measures analysis with sphericity tests, use R, Python, or SPSS.



## How to Report in APA Format

> A one-way repeated measures ANOVA was conducted to compare test anxiety scores across three time points (pre-intervention, mid-intervention, and post-intervention). Mauchly's test indicated that the assumption of sphericity was met, $W = 0.89$, $p = .42$. There was a statistically significant effect of time on test anxiety, $F(2, 10) = 797.67$, $p < .001$, $\eta_p^2 = .99$. Bonferroni-corrected post-hoc comparisons revealed significant decreases from pre- to mid-intervention ($M_{diff} = 6.00$, $p < .001$), from mid- to post-intervention ($M_{diff} = 7.50$, $p < .001$), and from pre- to post-intervention ($M_{diff} = 13.50$, $p < .001$).

If sphericity were violated, report the corrected values:

> Mauchly's test indicated that the assumption of sphericity was violated, $\chi^2(2) = 7.34$, $p = .026$. Therefore, degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity ($\epsilon = .68$). The results showed a significant effect of time, $F(1.36, 6.80) = 797.67$, $p < .001$, $\eta_p^2 = .99$.

Key elements to include:

- Mauchly's test result (and correction used if sphericity is violated)
- The F-statistic with corrected degrees of freedom if applicable
- Partial eta-squared ($\eta_p^2$) as effect size
- Condition means and standard deviations
- Post-hoc pairwise comparisons with correction method

Ready to calculate?

Now that you understand the concept, use the free Effect Size Calculator on Subthesis to run your own analysis.

Calculate Effect Size for Your ANOVA on Subthesis

Repeated Measures ANOVA

Repeated Measures ANOVA

What Is Repeated Measures ANOVA?

When to Use It

Assumptions

Testing Sphericity: Mauchly's Test

Formula

Partitioning Variance

Degrees of Freedom

Mean Squares and F-Ratio

Effect Size: Partial Eta-Squared

Greenhouse-Geisser Corrected Degrees of Freedom

Worked Example

Interpretation

What If Sphericity Is Violated?

Common Mistakes

How to Run It

Data must be in long format with columns:

participant, time (factor), score

Includes Mauchly's test and GG/HF corrections

Post-hoc pairwise comparisons (Bonferroni)

Related Concepts

Paired Samples t-Test

One-Way ANOVA

Effect Size