R Labs for AREC 513 - Econometric Applications
  • Home
  • Lab 1
  • Lab 2
  • Lab 3
  • Lab 4
  • Lab 5
  • Lab 6
  • Answer Key

Section

  • 1 Download and Install
  • 2 Working Directory
  • 3 R Basics
    • 3.1 Calculations
    • 3.2 Creating Objects
    • 3.3 Data Types
    • 3.4 Data Structures
      • 3.4.1 Vectors
      • 3.4.2 Matrices
      • 3.4.3 Arrays
      • 3.4.4 Lists
      • 3.4.5 Data Frames
  • 4 R Packages
    • 4.1 Base Packages
    • 4.2 Contributed Packages
  • 5 R Script
  • 6 Useful Resources
    • 6.1 Base R Cheat Sheet
    • 6.2 An Introduction to R
  • 7 Exercise

Lab 1: Introduction to R

Author

Feng Qiu, Liyuan Xuan

Published

September 15, 2025

1 Download and Install

  • R
    R is a free software environment for statistical computing and graphics. To download, go to CRAN and select the installer for your operating system (Windows, Mac, or Linux).

  • RStudio
    RStudio is a user-friendly application that helps you write in R and enhances your programming experience. To download, visit this website and select the installer for your operating system.

After R and RStudio are installed, we will only need to use RStudio for this and future labs. The default RStudio layout has three panes: Console, Environment, and Output.

screenshot of RStudio

You can customize your RStudio working environment via Tools > Global Options in the top menu bar.

2 Working Directory

The working directory is a folder path on your computer that sets the default location for files you read into R or save out of R. Think of it as a little “flag” on your computer tied to your project.

  • To find your current working directory, type the code below in your Console pane and press Enter.

    getwd()
    [1] "D:/University of Alberta/0. Teaching/AREC513/Fall2025/labs"
  • To change your current working directory, you can

    1. Use the code below
    setwd("C:/Program Files")  
    Tip

    In R, file paths are separated by forward slash /, not backward slash \. The full path needs to be wrapped with the double quotations " "

    1. Use the top menu bar Session > Set Working Directory > Choose Directory

    2. To set a default working directory, go to Tools > Global Options > General > Default working directory

3 R Basics

3.1 Calculations

1 + 1
[1] 2
5 * 6
[1] 30
2 ^ 4
[1] 16
(8 + 5 * 6 / 3) / 2
[1] 9

3.2 Creating Objects

a <- 5 * 6
a
[1] 30
b <- 8 + a / 3
b
[1] 18
c <- "AREC 513"
c
[1] "AREC 513"
Tip

In RStudio, you can use the shortcut Alt + - (hyphen) to write the assignment operator <-.

3.3 Data Types

There are four main data types in R we will use in the labs: numeric, integer, logical, and character.

  • Numeric data, also known as quantitative data

    is.numeric(b)     # test if the object "b" is numeric
    [1] TRUE
  • Integer data, stores whole numbers without decimals

    d <- 8L     # to generate integer variable, add "L" after the number
    is.integer(d)
    [1] TRUE
  • Logical data, stores only TRUE or FALSE

    e <- TRUE
    is.logical(e)
    [1] TRUE
  • Character data, stores text strings

    is.character(c)
    [1] TRUE
Tip

You can also use class() to check the class of an object. For example, try class(a).

To convert your objects to a specific type, use as.numeric(), as.integer(), as.logical(), or as.character().

3.4 Data Structures

When analyzing data, you rarely deal with objects that store one single value or datasets with one single variable. This section discusses some common data structures.

3.4.1 Vectors

3.4.1.1 Create a Vector

Vectors play a crucial role in R, R is a vectorized language. A vector is a collection of values/elements, all of the same type. We can use the function c() to create a vector, which means “combine”.

grades_1 <- c(75, 76, 77, 78, 79, 80, 81, 82, 83, 84)
grades_1
 [1] 75 76 77 78 79 80 81 82 83 84
grades_2 <- c(75:84) # to generate continuous values from 75 to 80, you can use ":" directly.
grades_2
 [1] 75 76 77 78 79 80 81 82 83 84
names <- c("Marshall", "Ruby", "Peppa", "George", "Suzy", "Danny", 
           "Pedro", "Rebecca", "Rubble", "Ryder", "Max", "Chase")
names
 [1] "Marshall" "Ruby"     "Peppa"    "George"   "Suzy"     "Danny"   
 [7] "Pedro"    "Rebecca"  "Rubble"   "Ryder"    "Max"      "Chase"   
3.4.1.2 Vector Operations
grades_1 + 5 # operations apply to each element in the vector
 [1] 80 81 82 83 84 85 86 87 88 89
grades_1 * 2
 [1] 150 152 154 156 158 160 162 164 166 168
sqrt(grades_1)
 [1] 8.660254 8.717798 8.774964 8.831761 8.888194 8.944272 9.000000 9.055385
 [9] 9.110434 9.165151
grades_1 >= 78
 [1] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
grades_1 == 78
 [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
Warning

When testing for equality, double == has to be used.

3.4.1.3 Factor Vectors

Factor variables are important when building statistical models. There are two types of factor variable,

  • nominal: categorical variable with no inherent order among the categories
jbp <- c("poor", "excellent", "fair", "poor", "good") # create a nominal job performance vector
jbp
[1] "poor"      "excellent" "fair"      "poor"      "good"     
class(jbp)
[1] "character"
jbp.1 <- as.factor(jbp)     # converting your character vector to factor
jbp.1
[1] poor      excellent fair      poor      good     
Levels: excellent fair good poor
class(jbp.1)
[1] "factor"
  • ordinal: categorical variable with a defined ranking among the categories
# create a job performance vector, that is nominal
jbp.2 <- factor(jbp, levels = c("poor", "fair", "good", "excellent"))     # assigning an ordinal ranking
jbp.2
[1] poor      excellent fair      poor      good     
Levels: poor fair good excellent
3.4.1.4 Subset Vectors and Select Elements

To subset a vector or select certain elements from a vector, we use the square brackets [ ].

  • By Position

    grades_1[1]         # select the 1st element
    [1] 75
    grades_1[-2]        # exclude the 2nd element
    [1] 75 77 78 79 80 81 82 83 84
    grades_1[3:5]       # select elements 3 to 5
    [1] 77 78 79
    grades_1[c(1,3,5)]  # select the 1st, 3rd, and 5th elements
    [1] 75 77 79
  • By Condition

    grades_1[grades_1 >= 78]               # select grades_1 that are equal or above 78
    [1] 78 79 80 81 82 83 84
    grades_1[grades_1 != 78]               # select grades_1 that do not equal to 78
    [1] 75 76 77 79 80 81 82 83 84
    jbp.2[jbp.2 %in% c("poor", "fair")]    # select job performance that is "poor" or "fair" 
    [1] poor fair poor
    Levels: poor fair good excellent
    Tip

    If you wish to apply multiple conditions, the common operators are & (Shift + 7) for AND and | (Shift + \) for OR. Try grades_1[grades_1 >= 78 & grades_1 != 80].

3.4.2 Matrices

A matrix is a collection of data elements arranged in a two-dimensional rectangular layout. The elements of a matrix must be of the same type of data. Matrices are commonly used in mathematics and statistics. Math matrices and R matrices are different concepts. Matrices in R are broader.

ma.1 <- matrix(1:15, nrow = 5, ncol =3)
ma.1
     [,1] [,2] [,3]
[1,]    1    6   11
[2,]    2    7   12
[3,]    3    8   13
[4,]    4    9   14
[5,]    5   10   15
ma.2 <- matrix(names, nrow = 6, ncol = 2)
ma.2
     [,1]       [,2]     
[1,] "Marshall" "Pedro"  
[2,] "Ruby"     "Rebecca"
[3,] "Peppa"    "Rubble" 
[4,] "George"   "Ryder"  
[5,] "Suzy"     "Max"    
[6,] "Danny"    "Chase"  

Since matrices are two-dimensional, to subset or select elements from a matrix using [ ], you need to define row and/or column index in [ROW, COL].

ma.1[2, ]   # select all elements in the 2nd row of ma.1
[1]  2  7 12
ma.1[, 2]   # select all elements in the 2nd column of ma.1
[1]  6  7  8  9 10
ma.1[2, 2]  # select the element in row 2 and column 2 of ma.1
[1] 7
ma.2[3, 2]  # select the element in row 3 and column 2 of ma.2
[1] "Rubble"

3.4.3 Arrays

In R, arrays are the data objects that can store data in more than two dimensions. Data need to be the same type.

array.1 <- array(1:24, dim = c(2, 3, 4))      # 2 rows * 3 columns * 4 matrices
array.1
, , 1

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

, , 2

     [,1] [,2] [,3]
[1,]    7    9   11
[2,]    8   10   12

, , 3

     [,1] [,2] [,3]
[1,]   13   15   17
[2,]   14   16   18

, , 4

     [,1] [,2] [,3]
[1,]   19   21   23
[2,]   20   22   24

3.4.4 Lists

Lists are R objects that can contain multiple components of different data types, data structures, and dimensions.

names
 [1] "Marshall" "Ruby"     "Peppa"    "George"   "Suzy"     "Danny"   
 [7] "Pedro"    "Rebecca"  "Rubble"   "Ryder"    "Max"      "Chase"   
grades_1
 [1] 75 76 77 78 79 80 81 82 83 84
list.1 <- list(name = names, grade = grades_1, random_num = 9)      # create a list with 3 components
list.1
$name
 [1] "Marshall" "Ruby"     "Peppa"    "George"   "Suzy"     "Danny"   
 [7] "Pedro"    "Rebecca"  "Rubble"   "Ryder"    "Max"      "Chase"   

$grade
 [1] 75 76 77 78 79 80 81 82 83 84

$random_num
[1] 9
names(list.1)
[1] "name"       "grade"      "random_num"

To select from a list, you can keep using index numbers with square brackets [ ], or you can use the extractor operator $.

list.1[1]     # create a new list with only the 1st component from list.1
$name
 [1] "Marshall" "Ruby"     "Peppa"    "George"   "Suzy"     "Danny"   
 [7] "Pedro"    "Rebecca"  "Rubble"   "Ryder"    "Max"      "Chase"   
list.1[[1]]   # extract all elements from the 1st component of list.1 (i.e. the name vector)
 [1] "Marshall" "Ruby"     "Peppa"    "George"   "Suzy"     "Danny"   
 [7] "Pedro"    "Rebecca"  "Rubble"   "Ryder"    "Max"      "Chase"   
list.1$name   # extract all the elements from the component named "name" from list.1
 [1] "Marshall" "Ruby"     "Peppa"    "George"   "Suzy"     "Danny"   
 [7] "Pedro"    "Rebecca"  "Rubble"   "Ryder"    "Max"      "Chase"   
list.1[[1]][1]
[1] "Marshall"
list.1$name[1]
[1] "Marshall"

3.4.5 Data Frames

A data frame is a list whose elements are equal-length vectors, and vectors can be different data types. Basically, a data frame is a limited version of a list, or a flexible version of a matrix. In a data frame, vectors/variables can be different types, but the length needs to be the same.

df1 <- data.frame(Name = names[1:10], Grade = grades_1)
df1
Name Grade
Marshall 75
Ruby 76
Peppa 77
George 78
Suzy 79
Danny 80
Pedro 81
Rebecca 82
Rubble 83
Ryder 84
df1$Name
 [1] "Marshall" "Ruby"     "Peppa"    "George"   "Suzy"     "Danny"   
 [7] "Pedro"    "Rebecca"  "Rubble"   "Ryder"   
df1$Grade[1:5]
[1] 75 76 77 78 79
df1$Grade[df1$Grade >= 80]
[1] 80 81 82 83 84
df1$Name[df1$Grade >= 80]
[1] "Danny"   "Pedro"   "Rebecca" "Rubble"  "Ryder"  
df1[df1$Grade >= 80, ]
Name Grade
6 Danny 80
7 Pedro 81
8 Rebecca 82
9 Rubble 83
10 Ryder 84
mean(df1$Grade)
[1] 79.5

4 R Packages

To extend the capabilities of R, various packages are developed to handle different tasks: data manipulation, analysis, and visualization.

4.1 Base Packages

The base packages providing basic functions and datasets are pre-included with R installation. For example, the package datasets, which provides a collection of datasets.

  • Try data(), which returns the list of datasets within this package.

  • To load one of the listed datasets, e.g. mtcars, type data(mtcars). This dataset provides statistics from road tests on multiple models of cars.

4.2 Contributed Packages

Many other contributed packages, designed to implement specific operations, are not included with R by default. To utilize these packages, we need to install and load them individually.

  • Install a package, type install.packages("tidyverse")

  • Load the installed package, type library(tidyverse)

Warning

When installing, the package names must be in quotes " ". When loading, they are not necessary.

After loading the package, we can use the functions in tidyverse. tidyverse is a collection of packages, such as dplyr and tidyr for data manipulation, ggplot2 for data visualization, lubridate for processing time-series data.

  • Example with select from dplyr, type ?select first to see the R Documentation for this function.

    data(mpg)     # new dataset mpg also comes with tidyverse
    
    head(mpg)     # head() displays the first 6 observations of a dataset
    manufacturer model displ year cyl trans drv cty hwy fl class
    audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
    audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
    audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
    audi a4 2.0 2008 4 auto(av) f 21 30 p compact
    audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
    audi a4 2.8 1999 6 manual(m5) f 18 26 p compact
    head(select(mpg, c(manufacturer, model, year)))      # select() selects named variables from a data frame
    manufacturer model year
    audi a4 1999
    audi a4 1999
    audi a4 2008
    audi a4 2008
    audi a4 1999
    audi a4 1999
    head(filter(mpg, year >= 2000))     # filter() subsets a data frame based on defined conditions
    manufacturer model displ year cyl trans drv cty hwy fl class
    audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
    audi a4 2.0 2008 4 auto(av) f 21 30 p compact
    audi a4 3.1 2008 6 auto(av) f 18 27 p compact
    audi a4 quattro 2.0 2008 4 manual(m6) 4 20 28 p compact
    audi a4 quattro 2.0 2008 4 auto(s6) 4 19 27 p compact
    audi a4 quattro 3.1 2008 6 auto(s6) 4 17 25 p compact

    See how packages simplify the code and make it more intuitive. More details about tidyverse will be discussed in Lab 2.

5 R Script

R script is simply a text file containing a set of commands and comments, which allows you to save, edit, and execute your code. To create an R script, in the top toolbar, select File > New File > R Script.

Opening an R script creates a new pane to your RStudio, the Source pane. In this pane, you can edit and save your code. To run your code, you can hold Ctrl and press Enter, or click the Run button. This will run the current line of code, or through your selected lines of code.

The Source Pane and R Sript
Note

All lines entered in an R script are interpreted as code to be run by R, unless they begin with # (Shift + 3). Using # is useful for documenting and explaining your code, or for temporarily disabling sections of code.

6 Useful Resources

This section lists some useful resources for your exploration of R.

6.1 Base R Cheat Sheet

Click here for more information

6.2 An Introduction to R

An Introduction to R, by Venables, W. N., Smith, D. M., & R Development Core Team

7 Exercise

Load the dataset “mpg” and work through the exercises below. Note, “mpg” is included in the tidyverse package, so you will need to load the package first.

  1. Calculate the mean, range, minimum, and maximum of the variable “hwy” across all models. Then, combine these statistics into one vector. (Tip: look up the RDocumentation for the functions mean, range, min, and max).

  2. Since “hwy” is measured in miles per gallon, create a new variable in mpg that expresses “hwy” in litres per 100 km.

  3. Identify the models of cars that are most fuel-efficient. Which classes of cars are least fuel-efficient?

  4. Compute the quantiles of “hwy”. Can you also calculate the tertiles instead? (Tip: look up for the RDocumentation for the function quantile).

  5. Now, based on the tertiles you calculated, assign “least efficient”, “medium”, and “most efficient” labels to all models. Try using both base R indexing and the function ifelse.

  • © 2025 Liyuan Xuan — Built with Quarto