The answer key is currently withheld and will be made available within one week after the lab.
Lab 1
Load the dataset “mpg” and work through the exercises below. Note, “mpg” is included in the tidyverse package, so you will need to load the package first.
Calculate the mean, range, minimum, and maximum of the variable “hwy” across all models. Then, combine these statistics into one vector. (Tip: look up the RDocumentation for the functions mean, range, min, and max).
unique(mpg$class[mpg$hwy_Lp100km_1 ==max(mpg$hwy_Lp100km_1)]) # using unique function to remove duplicates
[1] "pickup" "suv"
Compute the quantiles of “hwy”. Can you also calculate the tertiles instead? (Tip: look up for the RDocumentation for the function quantile).
quantile(mpg$hwy)
0% 25% 50% 75% 100%
12 18 24 27 44
quantile(mpg$hwy, probs =seq(0, 1, 0.333))
0% 33.3% 66.6% 99.9%
12.000 19.589 26.000 44.000
quantile(mpg$hwy, c(0, 0.333, 0.666, 1)) # you can achieve the same using this
0% 33.3% 66.6% 100%
12.000 19.589 26.000 44.000
Now, based on the tertiles you calculated, assign “least efficient”, “medium”, and “most efficient” labels to all models. Try using both base R indexing and the function ifelse.
tertile <-quantile(mpg$hwy, c(0, 0.333, 0.666, 1))mpg$efficient_class_1 <-NA# optional, this eliminates the warning message of "Unknown or uninitialised column"mpg$efficient_class_1[mpg$hwy < tertile[2]] <-"least efficient"mpg$efficient_class_1[mpg$hwy >= tertile[2] & mpg$hwy < tertile[3]] <-"medium"mpg$efficient_class_1[mpg$hwy >= tertile[3]] <-"most efficient"# ORmpg$efficient_class_2 <-NA# given how ifelse functions are specified below, this step becomes necessarympg$efficient_class_2 <-ifelse(mpg$hwy < tertile[2], "least efficient", mpg$efficient_class_2)mpg$efficient_class_2 <-ifelse(mpg$hwy >= tertile[2] & mpg$hwy < tertile[3], "medium", mpg$efficient_class_2)mpg$efficient_class_2 <-ifelse(mpg$hwy >= tertile[3], "most efficient", mpg$efficient_class_2)unique(mpg$efficient_class_1 == mpg$efficient_class_2) # sanity check