Remember, before you can use the tidyverse, you need to load the package.
library(tidyverse)
Note that the sample tables used in the presentation can be accessed once the tidyverse is imported by using table1
, table2
, table3
, table4a
, and table4b
.
ChickWeight
dataset is tidy? Why are the others not?(Taken form R4DS)
gather()
and spread()
not perfectly symmetrical? Carefully consider the following example:stocks <- tibble(
year = c(2015, 2015, 2016, 2016),
half = c(1, 2, 1, 2),
return = c(1.88, 0.59, 0.92, 0.17)
)
stocks %>%
spread(year, return) %>%
gather("year", "return", `2015`, `2016`)
(Hint: look at the variable types and think about column names.)
Both spread()
and gather()
have a convert
argument. What does it do?
Why does this code fail?
table4a %>%
gather(1999, 2000, key = "year", value = "cases")
#> Error in eval(expr, envir, enclos):
#> Position must be between 0 and n
distinct()
. Could this offer an alternative solution?people <- tribble(
~name, ~key, ~value,
#-----------------|--------|------
"Phillip Woods", "age", 45,
"Phillip Woods", "height", 186,
"Phillip Woods", "age", 50,
"Jessica Cordero", "age", 37,
"Jessica Cordero", "height", 156
)
preg <- tribble(
~pregnant, ~male, ~female,
"yes", NA, 10,
"no", 20, 12
)
(Taken form R4DS)
extra
and fill
arguments do in separate()
? Experiment with the various options for the following two toy datasets:tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>%
separate(x, c("one", "two", "three"))
Expected 3 pieces. Additional pieces discarded in 1 rows [2].
tibble(x = c("a,b,c", "d,e", "f,g,i")) %>%
separate(x, c("one", "two", "three"))
Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [2].
Both unite()
and separate()
have a remove
argument. What does it do? Why would you set it to FALSE
?
(HARD) Compare and contrast separate()
and extract()
. Why are there three variations of separation (by position, by separator, and with groups), but only one unite?
olympics.csv
from this session’s data foldersummary()
, head()
and str()
nycflights13
using install.packages()
and load it with library()
planes
, flights
, and carrier
datasets. The following diagram may help with understanding the relations.flights
and airlines
datasets using the shared column carrier
. Only include observations that appear in the flights
datasetplanes
and flights
datasets using the shared column tailnum
. Only include observations that appear in both datasetscurrent_employees <- tibble(name = c('Ann', 'Brian', 'Dan', 'Elsa'),
years_experience = c(2.5, 4, 1.5, 0))
sales <- tibble(value = c(7, 5, 8, 4, 9, 4, 8, 5, 6, 7, 2, 5, 6, 2),
name = c('Brian', 'Ann', 'Brian', 'Dan', 'Brian', 'Cat', 'Brian',
'Ann', 'Ann', 'Dan', 'Dan', 'Cat', 'Ann', 'Dan'))
sales %>%
{???}_join(current_employees, by = 'name') %>%
group_by(name, years_experience) %>%
summarise(total_sales = sum(value)) %>%
mutate(total_sales = ifelse(is.na(total_sales), 0, total_sales)) %>%
ggplot(aes(x = years_experience, y = total_sales)) +
geom_point() +
geom_smooth(method = 'lm')
complete()
function do? When might you want to use it?fill()
function do? When might you want to use it?fill
arguments for spread()
and complete()
differ?fill()
do?semi_join()
and anti_join()
flights
and planes
datasets imported above, filter planes
to only show planes that have flown at least 400 times