Getting Started with R for Research
R is a free, open-source programming language built for statistical computing. It is used across every research discipline — from psychology and education to public health and economics. If you are coming from a point-and-click background with SPSS or Excel, R will feel different at first. You type commands instead of clicking menus. But that learning curve pays off quickly: R is more flexible, more powerful, and completely free.
This guide will get you from zero to running your first analysis.
Installing R and RStudio
You need two things:
- R — the language itself. Download it from CRAN. Choose the version for your operating system and install it with the default settings.
- RStudio — the interface that makes R usable. Download the free Desktop version from Posit. RStudio gives you a code editor, a console, a file browser, and a plot viewer all in one window.
Always install R first, then RStudio. RStudio will automatically detect your R installation.
The RStudio Interface
When you open RStudio, you will see four panels:
- Source (top left): Where you write and save your R scripts. Think of this as your analysis notebook.
- Console (bottom left): Where commands actually run. You can type directly here for quick calculations, but for anything you want to reproduce, write it in a script file first.
- Environment (top right): Shows your loaded data and variables. When you import a dataset, it appears here.
- Files/Plots/Help (bottom right): A multipurpose panel for browsing files, viewing plots, and reading documentation.
Basic R Syntax
R works by typing commands. Here are the essentials:
# Assign a value to a variable
x <- 5
# Create a vector (a list of numbers)
scores <- c(85, 90, 78, 92, 88, 76, 95, 83)
# Calculate the mean
mean(scores)
# Calculate the standard deviation
sd(scores)
# Get a quick summary
summary(scores)
The <- symbol is how you assign values in R. The c() function combines values into a vector. These two concepts are the foundation of everything else.
Tip: Use the # symbol to add comments to your code. Future-you will thank present-you when you reopen a script six months later and can actually understand what it does.
Loading Data
Most researchers work with CSV or Excel files. To load a CSV file:
# Load a CSV file
mydata <- read.csv("path/to/your/datafile.csv")
# View the first few rows
head(mydata)
# Check the structure
str(mydata)
For Excel files, you will need the readxl package:
# Install the package (only needed once)
install.packages("readxl")
# Load it
library(readxl)
# Read the Excel file
mydata <- read_excel("path/to/your/datafile.xlsx")
You can also use RStudio's point-and-click import: go to File > Import Dataset and follow the prompts. This is a perfectly fine approach while you are learning.
Running a t-Test
Suppose you have two groups and you want to compare their means — the classic independent samples t-test. If your data has a column called score and a column called group with values "treatment" and "control":
# Independent samples t-test
t.test(score ~ group, data = mydata)
That single line does what takes five clicks in SPSS. The ~ symbol means "predicted by" — you are testing whether score differs by group.
For a paired samples t-test (e.g., pre-test and post-test on the same participants):
# Paired samples t-test
t.test(mydata$pretest, mydata$posttest, paired = TRUE)
Reading R Output
When you run t.test(), R returns something like this:
Welch Two Sample t-test
data: score by group
t = 2.45, df = 38.7, p-value = 0.019
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.85 9.15
sample estimates:
mean in group control mean in group treatment
72.5 77.5
Here is what matters:
- t = 2.45 — the test statistic
- df = 38.7 — degrees of freedom (Welch's correction adjusts this)
- p-value = 0.019 — below .05, so the difference is statistically significant
- Group means — control averaged 72.5, treatment averaged 77.5
- 95% confidence interval — the true difference likely falls between 0.85 and 9.15
R does not automatically compute effect sizes, but you can get Cohen's d easily with the right package.
Recommended Packages
R's power comes from its packages — free add-ons that extend its capabilities. Install these early:
install.packages(c("tidyverse", "psych", "effectsize", "rstatix"))
- tidyverse — A collection of packages for data manipulation and visualization. Includes
dplyrfor data wrangling andggplot2for publication-quality plots. This is the single most important package to learn. - psych — Built for behavioral science research. The
describe()function gives you means, standard deviations, skewness, kurtosis, and more in one clean table. Also includes functions for reliability analysis and factor analysis. - effectsize — Calculates Cohen's d, Hedges' g, eta-squared, and other effect size measures directly from your test results.
- rstatix — Provides pipe-friendly versions of common statistical tests. Makes it easier to run t-tests, ANOVAs, and correlations within a tidy workflow.
Tips for Beginners
- Write scripts, not console commands. Always save your work in
.Rscript files. This makes your analysis reproducible and shareable. - Google your errors. Every R user gets error messages constantly. Copy the error message into a search engine — someone has almost certainly had the same problem and solved it on Stack Overflow.
- Start with tidyverse. Learning
dplyrandggplot2early will make everything else easier. The tidyverse style is more readable than base R for most tasks. - Use projects. Create an RStudio Project for each research study. This keeps your files organized and makes file paths easier to manage.
- Do not memorize — reference. Nobody remembers every function. Keep cheat sheets handy (RStudio publishes excellent ones for free) and look things up as needed.
When You Need a Quick Calculation
Sometimes you just need a fast effect size or power analysis result without writing a full script — especially during a proposal defense or committee meeting. The free calculators on Subthesis let you compute effect sizes, power analyses, and reliability coefficients in your browser. They pair well with R for when you want to double-check a result or get a quick estimate before coding up the full analysis.
R has a steep first hour and a gentle slope after that. Write your first script, run your first t-test, and you will see why so many researchers swear by it.