Getting Started with R for Research

R is a free, open-source programming language built for statistical computing. It is used across every research discipline — from psychology and education to public health and economics. If you are coming from a point-and-click background with SPSS or Excel, R will feel different at first. You type commands instead of clicking menus. But that learning curve pays off quickly: R is more flexible, more powerful, and completely free.

This guide will get you from zero to running your first analysis.

Installing R and RStudio

You need two things:

  1. R — the language itself. Download it from CRAN. Choose the version for your operating system and install it with the default settings.
  2. RStudio — the interface that makes R usable. Download the free Desktop version from Posit. RStudio gives you a code editor, a console, a file browser, and a plot viewer all in one window.

Always install R first, then RStudio. RStudio will automatically detect your R installation.

The RStudio Interface

When you open RStudio, you will see four panels:

  • Source (top left): Where you write and save your R scripts. Think of this as your analysis notebook.
  • Console (bottom left): Where commands actually run. You can type directly here for quick calculations, but for anything you want to reproduce, write it in a script file first.
  • Environment (top right): Shows your loaded data and variables. When you import a dataset, it appears here.
  • Files/Plots/Help (bottom right): A multipurpose panel for browsing files, viewing plots, and reading documentation.

Basic R Syntax

R works by typing commands. Here are the essentials:

# Assign a value to a variable
x <- 5

# Create a vector (a list of numbers)
scores <- c(85, 90, 78, 92, 88, 76, 95, 83)

# Calculate the mean
mean(scores)

# Calculate the standard deviation
sd(scores)

# Get a quick summary
summary(scores)

The <- symbol is how you assign values in R. The c() function combines values into a vector. These two concepts are the foundation of everything else.

Tip: Use the # symbol to add comments to your code. Future-you will thank present-you when you reopen a script six months later and can actually understand what it does.

Loading Data

Most researchers work with CSV or Excel files. To load a CSV file:

# Load a CSV file
mydata <- read.csv("path/to/your/datafile.csv")

# View the first few rows
head(mydata)

# Check the structure
str(mydata)

For Excel files, you will need the readxl package:

# Install the package (only needed once)
install.packages("readxl")

# Load it
library(readxl)

# Read the Excel file
mydata <- read_excel("path/to/your/datafile.xlsx")

You can also use RStudio's point-and-click import: go to File > Import Dataset and follow the prompts. This is a perfectly fine approach while you are learning.

Running a t-Test

Suppose you have two groups and you want to compare their means — the classic independent samples t-test. If your data has a column called score and a column called group with values "treatment" and "control":

# Independent samples t-test
t.test(score ~ group, data = mydata)

That single line does what takes five clicks in SPSS. The ~ symbol means "predicted by" — you are testing whether score differs by group.

For a paired samples t-test (e.g., pre-test and post-test on the same participants):

# Paired samples t-test
t.test(mydata$pretest, mydata$posttest, paired = TRUE)

Reading R Output

When you run t.test(), R returns something like this:

	Welch Two Sample t-test

data:  score by group
t = 2.45, df = 38.7, p-value = 0.019
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.85  9.15
sample estimates:
mean in group control  mean in group treatment
                 72.5                    77.5

Here is what matters:

  • t = 2.45 — the test statistic
  • df = 38.7 — degrees of freedom (Welch's correction adjusts this)
  • p-value = 0.019 — below .05, so the difference is statistically significant
  • Group means — control averaged 72.5, treatment averaged 77.5
  • 95% confidence interval — the true difference likely falls between 0.85 and 9.15

R does not automatically compute effect sizes, but you can get Cohen's d easily with the right package.

Recommended Packages

R's power comes from its packages — free add-ons that extend its capabilities. Install these early:

install.packages(c("tidyverse", "psych", "effectsize", "rstatix"))
  • tidyverse — A collection of packages for data manipulation and visualization. Includes dplyr for data wrangling and ggplot2 for publication-quality plots. This is the single most important package to learn.
  • psych — Built for behavioral science research. The describe() function gives you means, standard deviations, skewness, kurtosis, and more in one clean table. Also includes functions for reliability analysis and factor analysis.
  • effectsize — Calculates Cohen's d, Hedges' g, eta-squared, and other effect size measures directly from your test results.
  • rstatix — Provides pipe-friendly versions of common statistical tests. Makes it easier to run t-tests, ANOVAs, and correlations within a tidy workflow.

Tips for Beginners

  • Write scripts, not console commands. Always save your work in .R script files. This makes your analysis reproducible and shareable.
  • Google your errors. Every R user gets error messages constantly. Copy the error message into a search engine — someone has almost certainly had the same problem and solved it on Stack Overflow.
  • Start with tidyverse. Learning dplyr and ggplot2 early will make everything else easier. The tidyverse style is more readable than base R for most tasks.
  • Use projects. Create an RStudio Project for each research study. This keeps your files organized and makes file paths easier to manage.
  • Do not memorize — reference. Nobody remembers every function. Keep cheat sheets handy (RStudio publishes excellent ones for free) and look things up as needed.

When You Need a Quick Calculation

Sometimes you just need a fast effect size or power analysis result without writing a full script — especially during a proposal defense or committee meeting. The free calculators on Subthesis let you compute effect sizes, power analyses, and reliability coefficients in your browser. They pair well with R for when you want to double-check a result or get a quick estimate before coding up the full analysis.

R has a steep first hour and a gentle slope after that. Write your first script, run your first t-test, and you will see why so many researchers swear by it.