Getting Started with R for Research

R is a free, open-source programming language built for statistical computing. It is used across every research discipline — from psychology and education to public health and economics. If you are coming from a point-and-click background with SPSS or Excel, R will feel different at first. You type commands instead of clicking menus. But that learning curve pays off quickly: R is more flexible, more powerful, and completely free.

This guide will get you from zero to running your first analysis.

Installing R and RStudio

You need two things:

R — the language itself. Download it from CRAN. Choose the version for your operating system and install it with the default settings.
RStudio — the interface that makes R usable. Download the free Desktop version from Posit. RStudio gives you a code editor, a console, a file browser, and a plot viewer all in one window.

Always install R first, then RStudio. RStudio will automatically detect your R installation.

The RStudio Interface

When you open RStudio, you will see four panels:

Source (top left): Where you write and save your R scripts. Think of this as your analysis notebook.
Console (bottom left): Where commands actually run. You can type directly here for quick calculations, but for anything you want to reproduce, write it in a script file first.
Environment (top right): Shows your loaded data and variables. When you import a dataset, it appears here.
Files/Plots/Help (bottom right): A multipurpose panel for browsing files, viewing plots, and reading documentation.

Basic R Syntax

R works by typing commands. Here are the essentials:

# Assign a value to a variable
x <- 5

# Create a vector (a list of numbers)
scores <- c(85, 90, 78, 92, 88, 76, 95, 83)

# Calculate the mean
mean(scores)

# Calculate the standard deviation
sd(scores)

# Get a quick summary
summary(scores)

The <- symbol is how you assign values in R. The c() function combines values into a vector. These two concepts are the foundation of everything else.

Tip: Use the # symbol to add comments to your code. Future-you will thank present-you when you reopen a script six months later and can actually understand what it does.

Loading Data

Most researchers work with CSV or Excel files. To load a CSV file:

# Load a CSV file
mydata <- read.csv("path/to/your/datafile.csv")

# View the first few rows
head(mydata)

# Check the structure
str(mydata)

For Excel files, you will need the readxl package:

# Install the package (only needed once)
install.packages("readxl")

# Load it
library(readxl)

# Read the Excel file
mydata <- read_excel("path/to/your/datafile.xlsx")

You can also use RStudio's point-and-click import: go to File > Import Dataset and follow the prompts. This is a perfectly fine approach while you are learning.

Running a t-Test

Suppose you have two groups and you want to compare their means — the classic independent samples t-test. If your data has a column called score and a column called group with values "treatment" and "control":

# Independent samples t-test
t.test(score ~ group, data = mydata)

That single line does what takes five clicks in SPSS. The ~ symbol means "predicted by" — you are testing whether score differs by group.

For a paired samples t-test (e.g., pre-test and post-test on the same participants):

# Paired samples t-test
t.test(mydata$pretest, mydata$posttest, paired = TRUE)

Reading R Output

When you run t.test(), R returns something like this:

	Welch Two Sample t-test

data:  score by group
t = 2.45, df = 38.7, p-value = 0.019
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.85  9.15
sample estimates:
mean in group control  mean in group treatment
                 72.5                    77.5

Here is what matters:

t = 2.45 — the test statistic
df = 38.7 — degrees of freedom (Welch's correction adjusts this)
p-value = 0.019 — below .05, so the difference is statistically significant
Group means — control averaged 72.5, treatment averaged 77.5
95% confidence interval — the true difference likely falls between 0.85 and 9.15

R does not automatically compute effect sizes, but you can get Cohen's d easily with the right package.

Recommended Packages

R's power comes from its packages — free add-ons that extend its capabilities. Install these early:

install.packages(c("tidyverse", "psych", "effectsize", "rstatix"))

tidyverse — A collection of packages for data manipulation and visualization. Includes dplyr for data wrangling and ggplot2 for publication-quality plots. This is the single most important package to learn.
psych — Built for behavioral science research. The describe() function gives you means, standard deviations, skewness, kurtosis, and more in one clean table. Also includes functions for reliability analysis and factor analysis.
effectsize — Calculates Cohen's d, Hedges' g, eta-squared, and other effect size measures directly from your test results.
rstatix — Provides pipe-friendly versions of common statistical tests. Makes it easier to run t-tests, ANOVAs, and correlations within a tidy workflow.

Tips for Beginners

Write scripts, not console commands. Always save your work in .R script files. This makes your analysis reproducible and shareable.
Google your errors. Every R user gets error messages constantly. Copy the error message into a search engine — someone has almost certainly had the same problem and solved it on Stack Overflow.
Start with tidyverse. Learning dplyr and ggplot2 early will make everything else easier. The tidyverse style is more readable than base R for most tasks.
Use projects. Create an RStudio Project for each research study. This keeps your files organized and makes file paths easier to manage.
Do not memorize — reference. Nobody remembers every function. Keep cheat sheets handy (RStudio publishes excellent ones for free) and look things up as needed.

When You Need a Quick Calculation

Sometimes you just need a fast effect size or power analysis result without writing a full script — especially during a proposal defense or committee meeting. The free calculators on Subthesis let you compute effect sizes, power analyses, and reliability coefficients in your browser. They pair well with R for when you want to double-check a result or get a quick estimate before coding up the full analysis.

R has a steep first hour and a gentle slope after that. Write your first script, run your first t-test, and you will see why so many researchers swear by it.