Lab 1: Introduction to R
1 Download and Install
R
R is a free software environment for statistical computing and graphics. To download, go to CRAN and select the installer for your operating system (Windows, Mac, or Linux).RStudio
RStudio is a user-friendly application that helps you write in R and enhances your programming experience. To download, visit this website and select the installer for your operating system.
After R and RStudio are installed, we will only need to use RStudio for this and future labs. The default RStudio layout has three panes: Console, Environment, and Output.
You can customize your RStudio working environment via Tools > Global Options in the top menu bar.
2 Working Directory
The working directory is a folder path on your computer that sets the default location for files you read into R or save out of R. Think of it as a little “flag” on your computer tied to your project.
To find your current working directory, type the code below in your Console pane and press Enter.
getwd()
[1] "D:/University of Alberta/0. Teaching/AREC513/Fall2025/labs"
To change your current working directory, you can
- Use the code below
setwd("C:/Program Files")
TipIn R, file paths are separated by forward slash
/
, not backward slash\
. The full path needs to be wrapped with the double quotations" "
Use the top menu bar Session > Set Working Directory > Choose Directory
To set a default working directory, go to Tools > Global Options > General > Default working directory
3 R Basics
3.1 Calculations
1 + 1
[1] 2
5 * 6
[1] 30
2 ^ 4
[1] 16
8 + 5 * 6 / 3) / 2 (
[1] 9
3.2 Creating Objects
<- 5 * 6
a a
[1] 30
<- 8 + a / 3
b b
[1] 18
<- "AREC 513"
c c
[1] "AREC 513"
In RStudio, you can use the shortcut Alt + - (hyphen) to write the assignment operator <-
.
3.3 Data Types
There are four main data types in R we will use in the labs: numeric, integer, logical, and character.
Numeric data, also known as quantitative data
is.numeric(b) # test if the object "b" is numeric
[1] TRUE
Integer data, stores whole numbers without decimals
<- 8L # to generate integer variable, add "L" after the number d is.integer(d)
[1] TRUE
Logical data, stores only TRUE or FALSE
<- TRUE e is.logical(e)
[1] TRUE
Character data, stores text strings
is.character(c)
[1] TRUE
You can also use class()
to check the class of an object. For example, try class(a)
.
To convert your objects to a specific type, use as.numeric()
, as.integer()
, as.logical()
, or as.character()
.
3.4 Data Structures
When analyzing data, you rarely deal with objects that store one single value or datasets with one single variable. This section discusses some common data structures.
3.4.1 Vectors
3.4.1.1 Create a Vector
Vectors play a crucial role in R, R is a vectorized language. A vector is a collection of values/elements, all of the same type. We can use the function c() to create a vector, which means “combine”.
<- c(75, 76, 77, 78, 79, 80, 81, 82, 83, 84)
grades_1 grades_1
[1] 75 76 77 78 79 80 81 82 83 84
<- c(75:84) # to generate continuous values from 75 to 80, you can use ":" directly.
grades_2 grades_2
[1] 75 76 77 78 79 80 81 82 83 84
<- c("Marshall", "Ruby", "Peppa", "George", "Suzy", "Danny",
names "Pedro", "Rebecca", "Rubble", "Ryder", "Max", "Chase")
names
[1] "Marshall" "Ruby" "Peppa" "George" "Suzy" "Danny"
[7] "Pedro" "Rebecca" "Rubble" "Ryder" "Max" "Chase"
3.4.1.2 Vector Operations
+ 5 # operations apply to each element in the vector grades_1
[1] 80 81 82 83 84 85 86 87 88 89
* 2 grades_1
[1] 150 152 154 156 158 160 162 164 166 168
sqrt(grades_1)
[1] 8.660254 8.717798 8.774964 8.831761 8.888194 8.944272 9.000000 9.055385
[9] 9.110434 9.165151
>= 78 grades_1
[1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
== 78 grades_1
[1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
When testing for equality, double ==
has to be used.
3.4.1.3 Factor Vectors
Factor variables are important when building statistical models. There are two types of factor variable,
- nominal: categorical variable with no inherent order among the categories
<- c("poor", "excellent", "fair", "poor", "good") # create a nominal job performance vector
jbp jbp
[1] "poor" "excellent" "fair" "poor" "good"
class(jbp)
[1] "character"
.1 <- as.factor(jbp) # converting your character vector to factor
jbp.1 jbp
[1] poor excellent fair poor good
Levels: excellent fair good poor
class(jbp.1)
[1] "factor"
- ordinal: categorical variable with a defined ranking among the categories
# create a job performance vector, that is nominal
.2 <- factor(jbp, levels = c("poor", "fair", "good", "excellent")) # assigning an ordinal ranking
jbp.2 jbp
[1] poor excellent fair poor good
Levels: poor fair good excellent
3.4.1.4 Subset Vectors and Select Elements
To subset a vector or select certain elements from a vector, we use the square brackets [ ]
.
By Position
1] # select the 1st element grades_1[
[1] 75
-2] # exclude the 2nd element grades_1[
[1] 75 77 78 79 80 81 82 83 84
3:5] # select elements 3 to 5 grades_1[
[1] 77 78 79
c(1,3,5)] # select the 1st, 3rd, and 5th elements grades_1[
[1] 75 77 79
By Condition
>= 78] # select grades_1 that are equal or above 78 grades_1[grades_1
[1] 78 79 80 81 82 83 84
!= 78] # select grades_1 that do not equal to 78 grades_1[grades_1
[1] 75 76 77 79 80 81 82 83 84
.2[jbp.2 %in% c("poor", "fair")] # select job performance that is "poor" or "fair" jbp
[1] poor fair poor Levels: poor fair good excellent
TipIf you wish to apply multiple conditions, the common operators are
&
(Shift + 7) for AND and|
(Shift + \) for OR. Trygrades_1[grades_1 >= 78 & grades_1 != 80]
.
3.4.2 Matrices
A matrix is a collection of data elements arranged in a two-dimensional rectangular layout. The elements of a matrix must be of the same type of data. Matrices are commonly used in mathematics and statistics. Math matrices and R matrices are different concepts. Matrices in R are broader.
.1 <- matrix(1:15, nrow = 5, ncol =3)
ma.1 ma
[,1] [,2] [,3]
[1,] 1 6 11
[2,] 2 7 12
[3,] 3 8 13
[4,] 4 9 14
[5,] 5 10 15
.2 <- matrix(names, nrow = 6, ncol = 2)
ma.2 ma
[,1] [,2]
[1,] "Marshall" "Pedro"
[2,] "Ruby" "Rebecca"
[3,] "Peppa" "Rubble"
[4,] "George" "Ryder"
[5,] "Suzy" "Max"
[6,] "Danny" "Chase"
Since matrices are two-dimensional, to subset or select elements from a matrix using [ ]
, you need to define row and/or column index in [ROW, COL]
.
.1[2, ] # select all elements in the 2nd row of ma.1 ma
[1] 2 7 12
.1[, 2] # select all elements in the 2nd column of ma.1 ma
[1] 6 7 8 9 10
.1[2, 2] # select the element in row 2 and column 2 of ma.1 ma
[1] 7
.2[3, 2] # select the element in row 3 and column 2 of ma.2 ma
[1] "Rubble"
3.4.3 Arrays
In R, arrays are the data objects that can store data in more than two dimensions. Data need to be the same type.
.1 <- array(1:24, dim = c(2, 3, 4)) # 2 rows * 3 columns * 4 matrices
array.1 array
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
, , 3
[,1] [,2] [,3]
[1,] 13 15 17
[2,] 14 16 18
, , 4
[,1] [,2] [,3]
[1,] 19 21 23
[2,] 20 22 24
3.4.4 Lists
Lists are R objects that can contain multiple components of different data types, data structures, and dimensions.
names
[1] "Marshall" "Ruby" "Peppa" "George" "Suzy" "Danny"
[7] "Pedro" "Rebecca" "Rubble" "Ryder" "Max" "Chase"
grades_1
[1] 75 76 77 78 79 80 81 82 83 84
.1 <- list(name = names, grade = grades_1, random_num = 9) # create a list with 3 components
list.1 list
$name
[1] "Marshall" "Ruby" "Peppa" "George" "Suzy" "Danny"
[7] "Pedro" "Rebecca" "Rubble" "Ryder" "Max" "Chase"
$grade
[1] 75 76 77 78 79 80 81 82 83 84
$random_num
[1] 9
names(list.1)
[1] "name" "grade" "random_num"
To select from a list, you can keep using index numbers with square brackets [ ]
, or you can use the extractor operator $
.
.1[1] # create a new list with only the 1st component from list.1 list
$name
[1] "Marshall" "Ruby" "Peppa" "George" "Suzy" "Danny"
[7] "Pedro" "Rebecca" "Rubble" "Ryder" "Max" "Chase"
.1[[1]] # extract all elements from the 1st component of list.1 (i.e. the name vector) list
[1] "Marshall" "Ruby" "Peppa" "George" "Suzy" "Danny"
[7] "Pedro" "Rebecca" "Rubble" "Ryder" "Max" "Chase"
.1$name # extract all the elements from the component named "name" from list.1 list
[1] "Marshall" "Ruby" "Peppa" "George" "Suzy" "Danny"
[7] "Pedro" "Rebecca" "Rubble" "Ryder" "Max" "Chase"
.1[[1]][1] list
[1] "Marshall"
.1$name[1] list
[1] "Marshall"
3.4.5 Data Frames
A data frame is a list whose elements are equal-length vectors, and vectors can be different data types. Basically, a data frame is a limited version of a list, or a flexible version of a matrix. In a data frame, vectors/variables can be different types, but the length needs to be the same.
<- data.frame(Name = names[1:10], Grade = grades_1)
df1 df1
Name | Grade |
---|---|
Marshall | 75 |
Ruby | 76 |
Peppa | 77 |
George | 78 |
Suzy | 79 |
Danny | 80 |
Pedro | 81 |
Rebecca | 82 |
Rubble | 83 |
Ryder | 84 |
$Name df1
[1] "Marshall" "Ruby" "Peppa" "George" "Suzy" "Danny"
[7] "Pedro" "Rebecca" "Rubble" "Ryder"
$Grade[1:5] df1
[1] 75 76 77 78 79
$Grade[df1$Grade >= 80] df1
[1] 80 81 82 83 84
$Name[df1$Grade >= 80] df1
[1] "Danny" "Pedro" "Rebecca" "Rubble" "Ryder"
$Grade >= 80, ] df1[df1
Name | Grade | |
---|---|---|
6 | Danny | 80 |
7 | Pedro | 81 |
8 | Rebecca | 82 |
9 | Rubble | 83 |
10 | Ryder | 84 |
mean(df1$Grade)
[1] 79.5
4 R Packages
To extend the capabilities of R, various packages are developed to handle different tasks: data manipulation, analysis, and visualization.
4.1 Base Packages
The base packages providing basic functions and datasets are pre-included with R installation. For example, the package datasets, which provides a collection of datasets.
Try
data()
, which returns the list of datasets within this package.To load one of the listed datasets, e.g. mtcars, type
data(mtcars)
. This dataset provides statistics from road tests on multiple models of cars.
4.2 Contributed Packages
Many other contributed packages, designed to implement specific operations, are not included with R by default. To utilize these packages, we need to install and load them individually.
Install a package, type
install.packages("tidyverse")
Load the installed package, type
library(tidyverse)
When installing, the package names must be in quotes " "
. When loading, they are not necessary.
After loading the package, we can use the functions in tidyverse. tidyverse is a collection of packages, such as dplyr and tidyr for data manipulation, ggplot2 for data visualization, lubridate for processing time-series data.
Example with
select
from dplyr, type?select
first to see the R Documentation for this function.data(mpg) # new dataset mpg also comes with tidyverse head(mpg) # head() displays the first 6 observations of a dataset
manufacturer model displ year cyl trans drv cty hwy fl class audi a4 1.8 1999 4 auto(l5) f 18 29 p compact audi a4 1.8 1999 4 manual(m5) f 21 29 p compact audi a4 2.0 2008 4 manual(m6) f 20 31 p compact audi a4 2.0 2008 4 auto(av) f 21 30 p compact audi a4 2.8 1999 6 auto(l5) f 16 26 p compact audi a4 2.8 1999 6 manual(m5) f 18 26 p compact head(select(mpg, c(manufacturer, model, year))) # select() selects named variables from a data frame
manufacturer model year audi a4 1999 audi a4 1999 audi a4 2008 audi a4 2008 audi a4 1999 audi a4 1999 head(filter(mpg, year >= 2000)) # filter() subsets a data frame based on defined conditions
manufacturer model displ year cyl trans drv cty hwy fl class audi a4 2.0 2008 4 manual(m6) f 20 31 p compact audi a4 2.0 2008 4 auto(av) f 21 30 p compact audi a4 3.1 2008 6 auto(av) f 18 27 p compact audi a4 quattro 2.0 2008 4 manual(m6) 4 20 28 p compact audi a4 quattro 2.0 2008 4 auto(s6) 4 19 27 p compact audi a4 quattro 3.1 2008 6 auto(s6) 4 17 25 p compact See how packages simplify the code and make it more intuitive. More details about tidyverse will be discussed in Lab 2.
5 R Script
R script is simply a text file containing a set of commands and comments, which allows you to save, edit, and execute your code. To create an R script, in the top toolbar, select File > New File > R Script.
Opening an R script creates a new pane to your RStudio, the Source pane. In this pane, you can edit and save your code. To run your code, you can hold Ctrl and press Enter, or click the Run button. This will run the current line of code, or through your selected lines of code.
All lines entered in an R script are interpreted as code to be run by R, unless they begin with #
(Shift + 3). Using #
is useful for documenting and explaining your code, or for temporarily disabling sections of code.
6 Useful Resources
This section lists some useful resources for your exploration of R.
6.1 Base R Cheat Sheet
Click here for more information
6.2 An Introduction to R
An Introduction to R, by Venables, W. N., Smith, D. M., & R Development Core Team
7 Exercise
Load the dataset “mpg” and work through the exercises below. Note, “mpg” is included in the tidyverse package, so you will need to load the package first.
Calculate the mean, range, minimum, and maximum of the variable “hwy” across all models. Then, combine these statistics into one vector. (Tip: look up the RDocumentation for the functions
mean
,range
,min
, andmax
).Since “hwy” is measured in miles per gallon, create a new variable in mpg that expresses “hwy” in litres per 100 km.
Identify the models of cars that are most fuel-efficient. Which classes of cars are least fuel-efficient?
Compute the quantiles of “hwy”. Can you also calculate the tertiles instead? (Tip: look up for the RDocumentation for the function
quantile
).Now, based on the tertiles you calculated, assign “least efficient”, “medium”, and “most efficient” labels to all models. Try using both base R indexing and the function
ifelse
.