Lab 1: Introduction to R

Author

Feng Qiu, Liyuan Xuan

Published

September 15, 2025

1 Download and Install

R
R is a free software environment for statistical computing and graphics. To download, go to CRAN and select the installer for your operating system (Windows, Mac, or Linux).
RStudio
RStudio is a user-friendly application that helps you write in R and enhances your programming experience. To download, visit this website and select the installer for your operating system.

After R and RStudio are installed, we will only need to use RStudio for this and future labs. The default RStudio layout has three panes: Console, Environment, and Output.

You can customize your RStudio working environment via Tools > Global Options in the top menu bar.

2 Working Directory

The working directory is a folder path on your computer that sets the default location for files you read into R or save out of R. Think of it as a little “flag” on your computer tied to your project.

To find your current working directory, type the code below in your Console pane and press Enter.
```
getwd()
```
```
[1] "D:/University of Alberta/0. Teaching/AREC513/Fall2025/labs"
```
To change your current working directory, you can
1. Use the code below
```
setwd("C:/Program Files")  
```
Tip

In R, file paths are separated by forward slash /, not backward slash \. The full path needs to be wrapped with the double quotations " "
1. Use the top menu bar Session > Set Working Directory > Choose Directory
2. To set a default working directory, go to Tools > Global Options > General > Default working directory

3 R Basics

3.1 Calculations

1 + 1

[1] 2

5 * 6

[1] 30

2 ^ 4

[1] 16

(8 + 5 * 6 / 3) / 2

[1] 9

3.2 Creating Objects

a <- 5 * 6
a

[1] 30

b <- 8 + a / 3
b

[1] 18

c <- "AREC 513"
c

[1] "AREC 513"

Tip

In RStudio, you can use the shortcut Alt + - (hyphen) to write the assignment operator <-.

3.3 Data Types

There are four main data types in R we will use in the labs: numeric, integer, logical, and character.

Numeric data, also known as quantitative data

is.numeric(b)     # test if the object "b" is numeric

[1] TRUE

Integer data, stores whole numbers without decimals

d <- 8L     # to generate integer variable, add "L" after the number
is.integer(d)

[1] TRUE

Logical data, stores only TRUE or FALSE
```
e <- TRUE
is.logical(e)
```
```
[1] TRUE
```
Character data, stores text strings
```
is.character(c)
```
```
[1] TRUE
```

Tip

You can also use class() to check the class of an object. For example, try class(a).

To convert your objects to a specific type, use as.numeric(), as.integer(), as.logical(), or as.character().

3.4 Data Structures

When analyzing data, you rarely deal with objects that store one single value or datasets with one single variable. This section discusses some common data structures.

3.4.1 Vectors

3.4.1.1 Create a Vector

Vectors play a crucial role in R, R is a vectorized language. A vector is a collection of values/elements, all of the same type. We can use the function c() to create a vector, which means “combine”.

grades_1 <- c(75, 76, 77, 78, 79, 80, 81, 82, 83, 84)
grades_1

 [1] 75 76 77 78 79 80 81 82 83 84

grades_2 <- c(75:84) # to generate continuous values from 75 to 80, you can use ":" directly.
grades_2

 [1] 75 76 77 78 79 80 81 82 83 84

names <- c("Marshall", "Ruby", "Peppa", "George", "Suzy", "Danny", 
           "Pedro", "Rebecca", "Rubble", "Ryder", "Max", "Chase")
names

 [1] "Marshall" "Ruby"     "Peppa"    "George"   "Suzy"     "Danny"   
 [7] "Pedro"    "Rebecca"  "Rubble"   "Ryder"    "Max"      "Chase"

3.4.1.2 Vector Operations

grades_1 + 5 # operations apply to each element in the vector

 [1] 80 81 82 83 84 85 86 87 88 89

grades_1 * 2

 [1] 150 152 154 156 158 160 162 164 166 168

sqrt(grades_1)

 [1] 8.660254 8.717798 8.774964 8.831761 8.888194 8.944272 9.000000 9.055385
 [9] 9.110434 9.165151

grades_1 >= 78

 [1] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

grades_1 == 78

 [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

Warning

When testing for equality, double == has to be used.

3.4.1.3 Factor Vectors

Factor variables are important when building statistical models. There are two types of factor variable,

nominal: categorical variable with no inherent order among the categories

jbp <- c("poor", "excellent", "fair", "poor", "good") # create a nominal job performance vector
jbp

[1] "poor"      "excellent" "fair"      "poor"      "good"

class(jbp)

[1] "character"

jbp.1 <- as.factor(jbp)     # converting your character vector to factor
jbp.1

[1] poor      excellent fair      poor      good     
Levels: excellent fair good poor

class(jbp.1)

[1] "factor"

ordinal: categorical variable with a defined ranking among the categories

# create a job performance vector, that is nominal
jbp.2 <- factor(jbp, levels = c("poor", "fair", "good", "excellent"))     # assigning an ordinal ranking
jbp.2

[1] poor      excellent fair      poor      good     
Levels: poor fair good excellent

3.4.1.4 Subset Vectors and Select Elements

To subset a vector or select certain elements from a vector, we use the square brackets [ ].

By Position

grades_1[1]         # select the 1st element

[1] 75

grades_1[-2]        # exclude the 2nd element

[1] 75 77 78 79 80 81 82 83 84

grades_1[3:5]       # select elements 3 to 5

[1] 77 78 79

grades_1[c(1,3,5)]  # select the 1st, 3rd, and 5th elements

[1] 75 77 79

By Condition

grades_1[grades_1 >= 78]               # select grades_1 that are equal or above 78

[1] 78 79 80 81 82 83 84

grades_1[grades_1 != 78]               # select grades_1 that do not equal to 78

[1] 75 76 77 79 80 81 82 83 84

jbp.2[jbp.2 %in% c("poor", "fair")]    # select job performance that is "poor" or "fair"

[1] poor fair poor
Levels: poor fair good excellent

Tip

If you wish to apply multiple conditions, the common operators are & (Shift + 7) for AND and | (Shift + \) for OR. Try grades_1[grades_1 >= 78 & grades_1 != 80].

3.4.2 Matrices

A matrix is a collection of data elements arranged in a two-dimensional rectangular layout. The elements of a matrix must be of the same type of data. Matrices are commonly used in mathematics and statistics. Math matrices and R matrices are different concepts. Matrices in R are broader.

ma.1 <- matrix(1:15, nrow = 5, ncol =3)
ma.1

     [,1] [,2] [,3]
[1,]    1    6   11
[2,]    2    7   12
[3,]    3    8   13
[4,]    4    9   14
[5,]    5   10   15

ma.2 <- matrix(names, nrow = 6, ncol = 2)
ma.2

     [,1]       [,2]     
[1,] "Marshall" "Pedro"  
[2,] "Ruby"     "Rebecca"
[3,] "Peppa"    "Rubble" 
[4,] "George"   "Ryder"  
[5,] "Suzy"     "Max"    
[6,] "Danny"    "Chase"

Since matrices are two-dimensional, to subset or select elements from a matrix using [ ], you need to define row and/or column index in [ROW, COL].

ma.1[2, ]   # select all elements in the 2nd row of ma.1

[1]  2  7 12

ma.1[, 2]   # select all elements in the 2nd column of ma.1

[1]  6  7  8  9 10

ma.1[2, 2]  # select the element in row 2 and column 2 of ma.1

[1] 7

ma.2[3, 2]  # select the element in row 3 and column 2 of ma.2

[1] "Rubble"

3.4.3 Arrays

In R, arrays are the data objects that can store data in more than two dimensions. Data need to be the same type.

array.1 <- array(1:24, dim = c(2, 3, 4))      # 2 rows * 3 columns * 4 matrices
array.1

, , 1

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

, , 2

     [,1] [,2] [,3]
[1,]    7    9   11
[2,]    8   10   12

, , 3

     [,1] [,2] [,3]
[1,]   13   15   17
[2,]   14   16   18

, , 4

     [,1] [,2] [,3]
[1,]   19   21   23
[2,]   20   22   24

3.4.4 Lists

Lists are R objects that can contain multiple components of different data types, data structures, and dimensions.

names

 [1] "Marshall" "Ruby"     "Peppa"    "George"   "Suzy"     "Danny"   
 [7] "Pedro"    "Rebecca"  "Rubble"   "Ryder"    "Max"      "Chase"

grades_1

 [1] 75 76 77 78 79 80 81 82 83 84

list.1 <- list(name = names, grade = grades_1, random_num = 9)      # create a list with 3 components
list.1

$name
 [1] "Marshall" "Ruby"     "Peppa"    "George"   "Suzy"     "Danny"   
 [7] "Pedro"    "Rebecca"  "Rubble"   "Ryder"    "Max"      "Chase"   

$grade
 [1] 75 76 77 78 79 80 81 82 83 84

$random_num
[1] 9

names(list.1)

[1] "name"       "grade"      "random_num"

To select from a list, you can keep using index numbers with square brackets [ ], or you can use the extractor operator $.

list.1[1]     # create a new list with only the 1st component from list.1

$name
 [1] "Marshall" "Ruby"     "Peppa"    "George"   "Suzy"     "Danny"   
 [7] "Pedro"    "Rebecca"  "Rubble"   "Ryder"    "Max"      "Chase"

list.1[[1]]   # extract all elements from the 1st component of list.1 (i.e. the name vector)

 [1] "Marshall" "Ruby"     "Peppa"    "George"   "Suzy"     "Danny"   
 [7] "Pedro"    "Rebecca"  "Rubble"   "Ryder"    "Max"      "Chase"

list.1$name   # extract all the elements from the component named "name" from list.1

 [1] "Marshall" "Ruby"     "Peppa"    "George"   "Suzy"     "Danny"   
 [7] "Pedro"    "Rebecca"  "Rubble"   "Ryder"    "Max"      "Chase"

list.1[[1]][1]

[1] "Marshall"

list.1$name[1]

[1] "Marshall"

3.4.5 Data Frames

A data frame is a list whose elements are equal-length vectors, and vectors can be different data types. Basically, a data frame is a limited version of a list, or a flexible version of a matrix. In a data frame, vectors/variables can be different types, but the length needs to be the same.

df1 <- data.frame(Name = names[1:10], Grade = grades_1)
df1

Name	Grade
Marshall	75
Ruby	76
Peppa	77
George	78
Suzy	79
Danny	80
Pedro	81
Rebecca	82
Rubble	83
Ryder	84

df1$Name

 [1] "Marshall" "Ruby"     "Peppa"    "George"   "Suzy"     "Danny"   
 [7] "Pedro"    "Rebecca"  "Rubble"   "Ryder"

df1$Grade[1:5]

[1] 75 76 77 78 79

df1$Grade[df1$Grade >= 80]

[1] 80 81 82 83 84

df1$Name[df1$Grade >= 80]

[1] "Danny"   "Pedro"   "Rebecca" "Rubble"  "Ryder"

df1[df1$Grade >= 80, ]

	Name	Grade
6	Danny	80
7	Pedro	81
8	Rebecca	82
9	Rubble	83
10	Ryder	84

mean(df1$Grade)

[1] 79.5

4 R Packages

To extend the capabilities of R, various packages are developed to handle different tasks: data manipulation, analysis, and visualization.

4.1 Base Packages

The base packages providing basic functions and datasets are pre-included with R installation. For example, the package datasets, which provides a collection of datasets.

Try data(), which returns the list of datasets within this package.
To load one of the listed datasets, e.g. mtcars, type data(mtcars). This dataset provides statistics from road tests on multiple models of cars.

4.2 Contributed Packages

Many other contributed packages, designed to implement specific operations, are not included with R by default. To utilize these packages, we need to install and load them individually.

Install a package, type install.packages("tidyverse")
Load the installed package, type library(tidyverse)

Warning

When installing, the package names must be in quotes " ". When loading, they are not necessary.

After loading the package, we can use the functions in tidyverse. tidyverse is a collection of packages, such as dplyr and tidyr for data manipulation, ggplot2 for data visualization, lubridate for processing time-series data.

Example with select from dplyr, type ?select first to see the R Documentation for this function.

data(mpg)     # new dataset mpg also comes with tidyverse

head(mpg)     # head() displays the first 6 observations of a dataset

manufacturer	model	displ	year	cyl	trans	drv	cty	hwy	fl	class
audi	a4	1.8	1999	4	auto(l5)	f	18	29	p	compact
audi	a4	1.8	1999	4	manual(m5)	f	21	29	p	compact
audi	a4	2.0	2008	4	manual(m6)	f	20	31	p	compact
audi	a4	2.0	2008	4	auto(av)	f	21	30	p	compact
audi	a4	2.8	1999	6	auto(l5)	f	16	26	p	compact
audi	a4	2.8	1999	6	manual(m5)	f	18	26	p	compact

head(select(mpg, c(manufacturer, model, year)))      # select() selects named variables from a data frame

manufacturer	model	year
audi	a4	1999
audi	a4	1999
audi	a4	2008
audi	a4	2008
audi	a4	1999
audi	a4	1999

head(filter(mpg, year >= 2000))     # filter() subsets a data frame based on defined conditions

manufacturer	model	displ	year	cyl	trans	drv	cty	hwy	fl	class
audi	a4	2.0	2008	4	manual(m6)	f	20	31	p	compact
audi	a4	2.0	2008	4	auto(av)	f	21	30	p	compact
audi	a4	3.1	2008	6	auto(av)	f	18	27	p	compact
audi	a4 quattro	2.0	2008	4	manual(m6)	4	20	28	p	compact
audi	a4 quattro	2.0	2008	4	auto(s6)	4	19	27	p	compact
audi	a4 quattro	3.1	2008	6	auto(s6)	4	17	25	p	compact

See how packages simplify the code and make it more intuitive. More details about tidyverse will be discussed in Lab 2.

5 R Script

R script is simply a text file containing a set of commands and comments, which allows you to save, edit, and execute your code. To create an R script, in the top toolbar, select File > New File > R Script.

Opening an R script creates a new pane to your RStudio, the Source pane. In this pane, you can edit and save your code. To run your code, you can hold Ctrl and press Enter, or click the Run button. This will run the current line of code, or through your selected lines of code.

Note

All lines entered in an R script are interpreted as code to be run by R, unless they begin with # (Shift + 3). Using # is useful for documenting and explaining your code, or for temporarily disabling sections of code.

6 Useful Resources

This section lists some useful resources for your exploration of R.

6.1 Base R Cheat Sheet

Click here for more information

6.2 An Introduction to R

An Introduction to R, by Venables, W. N., Smith, D. M., & R Development Core Team

7 Exercise

Load the dataset “mpg” and work through the exercises below. Note, “mpg” is included in the tidyverse package, so you will need to load the package first.

Calculate the mean, range, minimum, and maximum of the variable “hwy” across all models. Then, combine these statistics into one vector. (Tip: look up the RDocumentation for the functions mean, range, min, and max).
Since “hwy” is measured in miles per gallon, create a new variable in mpg that expresses “hwy” in litres per 100 km.
Identify the models of cars that are most fuel-efficient. Which classes of cars are least fuel-efficient?
Compute the quantiles of “hwy”. Can you also calculate the tertiles instead? (Tip: look up for the RDocumentation for the function quantile).
Now, based on the tertiles you calculated, assign “least efficient”, “medium”, and “most efficient” labels to all models. Try using both base R indexing and the function ifelse.