One-Way ANOVA

Purpose

Tests whether there is a statistically significant difference among the means of three or more independent groups.

When to Use

When you have one continuous dependent variable and one categorical independent variable with three or more levels (groups).

Data Type

Continuous (interval or ratio) dependent variable; categorical independent variable with 3+ groups

Key Assumptions

Independence of observations, approximately normal distributions within each group, homogeneity of variances across groups (tested with Levene's test).

Tools

Effect Size Calculator on Subthesis →

What Is One-Way ANOVA?

ANOVA stands for Analysis of Variance. The one-way ANOVA tests whether the means of three or more independent groups differ significantly. It is the extension of the independent samples t-test to more than two groups.

Despite its name, ANOVA actually compares means by analyzing variance — specifically, it compares the variability between groups to the variability within groups. If the between-group variability is substantially larger than the within-group variability, there is evidence that at least one group mean differs from the others.

The test produces an F-ratio:

F = \frac{\text{Variance between groups}}{\text{Variance within groups}} = \frac{MS_{between}}{MS_{within}}

If the null hypothesis is true (all group means are equal), the F-ratio should be close to 1.0. The further $F$ is above 1, the stronger the evidence against $H_0$ .

When to Use It

Use a one-way ANOVA when:

You have one continuous dependent variable (e.g., test scores, reaction time, weight loss)
You have one categorical independent variable (the factor) with three or more levels (groups)
The groups are independent (different participants in each group)

Examples:

Comparing patient satisfaction scores across four hospital departments
Comparing crop yield for three fertilizer types
Comparing exam performance across five study methods

Why not just run multiple t-tests? Running all pairwise t-tests inflates the Type I error rate. With 3 groups, you would need 3 comparisons; with 5 groups, 10 comparisons. At $\alpha = .05$ , the probability of at least one false positive with $k$ comparisons is:

1 - (1 - .05)^k

For 3 comparisons: $1 - 0.95^3 = .143$ (14.3% false positive rate instead of 5%). ANOVA controls this by testing all groups simultaneously with a single omnibus test.

Assumptions

Independence of observations. Participants are randomly and independently assigned to groups. Each person appears in only one group.
Normality. The dependent variable is approximately normally distributed within each group. ANOVA is robust to moderate violations when sample sizes are roughly equal and each group has $n \geq 20$ .
Homogeneity of variances. The population variances are equal across all groups. Test this with Levene's test. If violated, use Welch's ANOVA or the Brown-Forsythe test as alternatives.

Rule of thumb: If the largest group standard deviation is no more than twice the smallest, the assumption is reasonably satisfied.

Formula

Decomposing Variance

The total variability in the data is partitioned into two sources:

SS_{total} = SS_{between} + SS_{within}

Between-groups sum of squares (variability due to group differences):

SS_{between} = \sum_{j=1}^{k} n_j (\bar{X}_j - \bar{X}_{grand})^2

Within-groups sum of squares (variability due to individual differences within groups):

SS_{within} = \sum_{j=1}^{k} \sum_{i=1}^{n_j} (X_{ij} - \bar{X}_j)^2

Mean Squares

MS_{between} = \frac{SS_{between}}{df_{between}} = \frac{SS_{between}}{k - 1}

MS_{within} = \frac{SS_{within}}{df_{within}} = \frac{SS_{within}}{N - k}

Where $k$ is the number of groups and $N$ is the total sample size.

F-Ratio

F = \frac{MS_{between}}{MS_{within}}

The F-statistic follows an $F$ -distribution with $df_1 = k - 1$ and $df_2 = N - k$ .

Effect Size: Eta-Squared

\eta^2 = \frac{SS_{between}}{SS_{total}}

This tells you the proportion of total variance explained by the group variable.

$\eta^2$	Interpretation
.01	Small
.06	Medium
.14	Large

Partial eta-squared ( $\eta_p^2$ ) is often reported in factorial designs and equals $\eta^2$ in a one-way ANOVA. Be aware that SPSS reports partial $\eta^2$ by default.

Omega-squared ( $\omega^2$ ) is a less biased alternative:

\omega^2 = \frac{SS_{between} - (k-1) \cdot MS_{within}}{SS_{total} + MS_{within}}

Worked Example

Scenario: A marketing researcher tests three advertising strategies (humor, emotional, and informational) to see which produces the highest purchase intention ratings (scale of 1-10). Each group has 8 participants.

Humor	Emotional	Informational
7	8	5
6	9	4
8	7	6
7	8	5
5	9	3
8	8	6
6	7	4
7	6	5

Step 1: Calculate group means and grand mean.

$\bar{X}_1$ (Humor) $= 54 / 8 = 6.75$
$\bar{X}_2$ (Emotional) $= 62 / 8 = 7.75$
$\bar{X}_3$ (Informational) $= 38 / 8 = 4.75$
$\bar{X}_{grand} = 154 / 24 = 6.42$

Step 2: Calculate $SS_{between}$ .

SS_{between} = 8(6.75 - 6.42)^2 + 8(7.75 - 6.42)^2 + 8(4.75 - 6.42)^2

= 8(0.11) + 8(1.77) + 8(2.79) = 0.87 + 14.15 + 22.35 = 37.37

Step 3: Calculate $SS_{within}$ .

For each group, sum the squared deviations from the group mean:

Humor: $(7-6.75)^2 + (6-6.75)^2 + \cdots + (7-6.75)^2 = 5.50$
Emotional: $(8-7.75)^2 + (9-7.75)^2 + \cdots + (6-7.75)^2 = 7.50$
Informational: $(5-4.75)^2 + (4-4.75)^2 + \cdots + (5-4.75)^2 = 7.50$

SS_{within} = 5.50 + 7.50 + 7.50 = 20.50

Step 4: Calculate mean squares.

MS_{between} = \frac{37.37}{3 - 1} = \frac{37.37}{2} = 18.69

MS_{within} = \frac{20.50}{24 - 3} = \frac{20.50}{21} = 0.976

Step 5: Calculate the F-ratio.

F = \frac{18.69}{0.976} = 19.15

Step 6: Determine the p-value.

With $df_1 = 2$ and $df_2 = 21$ , the critical value of $F$ at $\alpha = .05$ is approximately 3.47. Our $F = 19.15$ far exceeds this, so $p < .001$ .

Step 7: Calculate effect size.

SS_{total} = SS_{between} + SS_{within} = 37.37 + 20.50 = 57.87

\eta^2 = \frac{37.37}{57.87} = .65

This is a very large effect — advertising strategy explains 65% of the variance in purchase intention.

Post-Hoc Tests

A significant F-test tells you that at least one group mean differs, but not which groups differ. Post-hoc tests identify specific pairwise differences while controlling the family-wise error rate.

Tukey's HSD (Honestly Significant Difference)

The most commonly used post-hoc test. Compares all possible pairs of means while maintaining the overall $\alpha = .05$ .

HSD = q_{\alpha, k, df_{within}} \times \sqrt{\frac{MS_{within}}{n}}

Where $q$ is the studentized range statistic.

Bonferroni Correction

Divides $\alpha$ by the number of comparisons. More conservative than Tukey for many comparisons but works with unequal sample sizes.

\alpha_{adjusted} = \frac{\alpha}{m}

Where $m$ is the number of pairwise comparisons: $m = \frac{k(k-1)}{2}$ .

Which Post-Hoc Test to Use?

Test	Best When
Tukey HSD	Equal sample sizes, all pairwise comparisons needed
Bonferroni	Unequal sample sizes, or only a few planned comparisons
Games-Howell	Unequal variances and/or unequal sample sizes
Dunnett	Comparing each group to a single control group

Interpretation

For our example, $F(2, 21) = 19.15$ , $p < .001$ , $\eta^2 = .65$ .

This means:

The omnibus test is significant. At least one advertising strategy produces different purchase intention ratings than the others.
The effect is large. Advertising strategy explains 65% of the variance in purchase intention.
Post-hoc tests are needed to determine which specific groups differ.

Tukey HSD post-hoc tests would likely reveal:

Emotional > Informational ( $p < .001$ )
Humor > Informational ( $p < .01$ )
Emotional vs. Humor ( $p$ may or may not be significant — the difference is 1.0 point)

Common Mistakes

Running multiple t-tests instead of ANOVA. This inflates the family-wise Type I error rate. Use ANOVA first, then post-hoc tests if significant.
Skipping post-hoc tests after a significant F. The ANOVA F-test only tells you something differs — you need post-hoc tests to learn what differs.
Running post-hoc tests after a non-significant F. If the omnibus F is not significant, do not go fishing for pairwise differences.
Ignoring unequal variances. If Levene's test is significant, use Welch's ANOVA with Games-Howell post-hoc tests instead of the standard ANOVA with Tukey.
Reporting $\eta^2$ as partial $\eta^2$ (or vice versa). In a one-way ANOVA these are identical, but in factorial designs they differ. Be explicit about which you report.
Confusing Cohen's $d$ and Cohen's $f$ . For ANOVA, use Cohen's $f$ for power analysis:

f = \sqrt{\frac{\eta^2}{1 - \eta^2}}

Where $f = 0.10$ (small), $f = 0.25$ (medium), $f = 0.40$ (large).

How to Run It

```r # One-way ANOVA in R model <- aov(score ~ group, data = mydata) summary(model)

Effect size (eta-squared)

library(effectsize) eta_squared(model)

Post-hoc pairwise comparisons (Tukey HSD)

TukeyHSD(model)

```python
from scipy import stats
import pingouin as pg

# Using scipy
f_stat, p_value = stats.f_oneway(group1, group2, group3)

# Using pingouin (includes effect size and post-hoc)
aov = pg.anova(dv='score', between='group', data=df)
print(aov)

# Post-hoc tests
posthoc = pg.pairwise_tukey(dv='score', between='group', data=df)
print(posthoc)
```


Go to Analyze > Compare Means > One-Way ANOVA
Move your dependent variable into the Dependent List
Move your grouping variable into the Factor box
Click Post Hoc and select Tukey (or Bonferroni for unequal sample sizes)
Click Options and check Descriptive and Homogeneity of variance test
Click OK

If Levene's test is significant, use Analyze > Compare Means > One-Way ANOVA > Options > Welch and use Games-Howell post-hoc tests instead of Tukey.


Use the Data Analysis ToolPak (enable via File > Options > Add-ins):

Go to Data > Data Analysis > Anova: Single Factor
Select your input range (each group in a separate column)
Set Grouped By: Columns
Set alpha to 0.05
Click OK

Excel produces the ANOVA summary table with F-statistic, p-value, and F-critical. It does not produce post-hoc tests or effect sizes — calculate eta-squared manually as SS_between / SS_total.



## How to Report in APA Format

> A one-way ANOVA was conducted to compare the effect of advertising strategy on purchase intention ratings across humor, emotional, and informational conditions. There was a statistically significant effect of advertising strategy, $F$(2, 21) = 19.15, $p$ < .001, $\eta^2$ = .65. Tukey post-hoc comparisons indicated that the emotional condition ($M$ = 7.75, $SD$ = 0.97) and the humor condition ($M$ = 6.75, $SD$ = 0.89) both produced significantly higher purchase intention than the informational condition ($M$ = 4.75, $SD$ = 0.97), $p$ < .001 and $p$ = .003, respectively. The difference between the emotional and humor conditions was not statistically significant ($p$ = .08).

Ready to calculate?

Now that you understand the concept, use the free Effect Size Calculator on Subthesis to run your own analysis.

Calculate Effect Size for Your ANOVA on Subthesis

One-Way ANOVA

One-Way ANOVA

What Is One-Way ANOVA?

When to Use It

Assumptions

Formula

Decomposing Variance

Mean Squares

F-Ratio

Effect Size: Eta-Squared

Worked Example

Post-Hoc Tests

Tukey's HSD (Honestly Significant Difference)

Bonferroni Correction

Which Post-Hoc Test to Use?

Interpretation

Common Mistakes

How to Run It

Effect size (eta-squared)

Post-hoc pairwise comparisons (Tukey HSD)

Related Concepts

Independent Samples t-Test

Effect Size

Statistical Power & Power Analysis

Kruskal-Wallis H Test

Two-Way (Factorial) ANOVA

Repeated Measures ANOVA

Sample Size Determination