Remember, before you can use the tidyverse, you need to load the package.
library(tidyverse)
First Steps
More plots with the mpg dataset
(Taken form R4DS)
- Run 
ggplot(data = mpg). What do you see? 
- How many rows are in mpg? How many columns?
 
- What does the 
drv variable describe? You may want to use ?mpg to find out 
- Make a scatter plot of 
hwy vs cyl 
- What happens if you make a scatter plot of 
class vs drv? Why is the plot not useful? 
 
Plots with other datasets
- Take a look at the 
iris dataset. What are its dimensions? What do its columns represent? 
- What are the ranges of each of the numeric columns. The 
summary() function might help you here 
- Make a plot of sepal width vs sepal length, and set all of the points to be green
 
- Repeat the previous plot but colour each point by the species of the flower
 
 
Remaking plots
- Take a look at the first and last few rows of the 
mtcars dataset 
- Access the 
cyl column of the dataset. Is this variable categorical, discrete, or continuous? 
- What steps would you go about to remake the following plot?
 

 
 
More Aesthetics
Size, Transparency, and Shape
- Using the 
mpg dataset, make a plot of city mileage vs highway mileage where the size of each point is determined by engine size (displ) 
- Plot sepal length vs sepal width using the 
iris dataset and control the transparency (alpha) of each point using the species variable. 
- Have a play with the 
Orange data set (note the capital ‘O’). Make a scatter plot of circumference against age where the shape of each point is determined by which tree the observation belongs to 
- Remake the standard 
hwy vs displ plot using the mpg data set but make all of the points hollow diamonds. How about solid triangles? 
 
Choosing Appropriate Aesthetics
(Q1/2 form R4DS)
- Which variables in 
mpg are categorical? Which are continuous/discrete? (The data set help file may be of use) 
- Map a continuous variable to colour, size, and shape. How does this differ from when you map a categorical variable?
 
- Plot the standard 
hwy vs displ graph using mpg and map the variable class to the size aesthetic. Was this a good idea? 
- Have a discussion with a partner or think for yourself: Which of the aesthetics you know are the clearest for displaying categorical data and which are best for continuous?
 
- In your own opinion, order the following aesthetics by how clear they are in representing a continuous variable: size, colour, transparency
 
 
Common Problems
- Using the 
mpg data set, make a plot of city milage against engine size. Map the variable class to the aesthetic shape. Is everything as you would expect? 
- Type the following code into the console. Why do you recieve an error message?
 
ggplot(iris) +
  geom_point(x = Sepal.Length, y = Petal.Length)
- Take a look at the 
airquality dataset. Type the following code into the console. Is the plot as you expected? 
ggplot(airquality) +
  geom_point(aes(x = Wind, y = Temp, col = Month))
- Why are the points in this plot not blue?
 
ggplot(mpg) +
  geom_point(aes(x = displ, y = hwy, colour = 'blue'))

- What happens when you map a variable to multiple aesthetics (say colour and size)? (It’s okay to answer, “nothing”, to this question but make sure you verify that first!)
 
 
 
Facetting
Basic Faceting
(Taken form R4DS)
- What happens when you facet a continuous variable?
 
- What do the empty cells in a plot with 
facet_grid(drv ~ cyl) mean? How do they relate to this plot? 
ggplot(mpg) +
  geom_point(aes(x = drv, y = cyl))
- What plots does the following code make? What does the 
. do? 
ggplot(mpg) +
  geom_point(aes(x = displ, y = hwy)) +
  facet_grid(drv ~ .)
ggplot(mpg) +
  geom_point(aes(x = displ, y = hwy)) +
  facet_grid(. ~ cyl)
- Take the first faceted plot from the presentation:
 
ggplot(mpg) +
  geom_point(aes(x = displ, y = hwy)) +
  facet_wrap(~ class, nrow = 2)
What are the advantages of faceting instead of the colour aesthetic? What are the disadvantages? How might the balance change if you had a larger data set?
- Read 
?facet_wrap. What does nrow do? What does ncol do? What other options control the layout of the individual panels? Why doesn’t facet_grid() have nrow and ncol parameters? 
- When using 
facet_grid() you should usually put the variable with more unique levels in the columns. Why? 
 
Combining Facets with Aesthetics
- Create a scatter plot of petal length vs petal width using the 
iris dataset and facet by species 
- Repeat the above plot whilst also colouring the species. Don’t forget to hide the colour legend
 
- Using the 
mpg dataset, plot hwy vs cty, map displ to the size aesthetic, map class to point colour, and facet columns by cyl and rows by drv. This plot is ridiculous but it does demonstrate the flexibilty of ggplot2 
 
 
Going Beyond
Labelling
- Run the following code. What does the extra 
labs(...) layer do? 
ggplot(mpg) +
  geom_point(aes(x = displ, y = hwy, colour = class)) +
  labs(x = "Engine Displacment (litres)", y = "Highway Milage (miles/gallon)",
       colour = "Car Type",
       title = "A scatter plot of engine displacment vs highway milage",
       subtitle = "Coloured by car type",
       caption = "Source: EPA (http://fueleconomy.gov)")
- Use this to take the plot from the ‘Remaking plots’ section and beautify it
 
- Pick any plot of your choosing an give it appropriate axis labels, a title, and - if possible - a data source
 
 
Diamonds and Overplotting
- Have a look at the 
diamonds dataset 
- Make a scatter plot of 
price against caret (this may take a long time to run). Is this plot easy to read? 
- How could you fix this problem? (perhaps you could manually set a certain aesthetic)
 
 
Explanatory and Response variables
- How do you decide which variable to map to the x-axis and which to plot to the y-axis?
 
- If you are unsure, web-search for the phrase “explanatory and response variables”
 
 
Positional Arguments
Begin with the following code
ggplot(mpg) +
  geom_point(aes(x = displ, y = hwy, colour = factor(class)))

- Try removing 
x = and y = from your geom_point call. Does everything still work? 
- Try removing 
colour = from your geom_point call. Does everything still work? 
- Take the original plot and specify the aesthetics in a different order, say 
y then colour then x. Does everything still work? 
 
 
---
title: "Into the Tidyverse"
subtitle: "Session One Exercises"
output: html_notebook
---

Remember, before you can use the tidyverse, you need to load the package.

```{r message=FALSE}
library(tidyverse)
```

## First Steps

### More plots with the mpg dataset

(**Taken form R4DS**)

1. Run `ggplot(data = mpg)`. What do you see?
2. How many rows are in mpg? How many columns?
3. What does the `drv` variable describe? You may want to use `?mpg` to find out
4. Make a scatter plot of `hwy` vs `cyl`
5. What happens if you make a scatter plot of `class` vs `drv`? Why is the plot not useful?

### Plots with other datasets

1. Take a look at the `iris` dataset. What are its dimensions? What do its columns represent?
2. What are the ranges of each of the numeric columns. The `summary()` function might help you here
3. Make a plot of sepal width vs sepal length, and set all of the points to be green
4. Repeat the previous plot but colour each point by the species of the flower

### Remaking plots

1. Take a look at the first and last few rows of the `mtcars` dataset
2. Access the `cyl` column of the dataset. Is this variable categorical, discrete, or continuous?
3. What steps would you go about to remake the following plot?

```{r echo=FALSE}
ggplot(mtcars) +
  geom_point(aes(x = disp, y = hp, colour = factor(cyl)))
```

## More Aesthetics 

### Size, Transparency, and Shape

1. Using the `mpg` dataset, make a plot of city mileage vs highway mileage where the size of each point is determined by engine size (`displ`)
2. Plot sepal length vs sepal width using the `iris` dataset and control the transparency (`alpha`) of each point using the `species` variable.
3. Have a play with the `Orange` data set (note the capital 'O'). Make a scatter plot of circumference against age where the shape of each point is determined by which tree the observation belongs to
4. Remake the standard `hwy` vs `displ` plot using the `mpg` data set but make all of the points hollow diamonds. How about solid triangles?

### Choosing Appropriate Aesthetics

(**Q1/2 form R4DS**)

1. Which variables in `mpg` are categorical? Which are continuous/discrete? (The data set help file may be of use)
2. Map a continuous variable to colour, size, and shape. How does this differ from when you map a categorical variable?
3. Plot the standard `hwy` vs `displ` graph using `mpg` and map the variable `class` to the `size` aesthetic. Was this a good idea?
4. Have a discussion with a partner or think for yourself: Which of the aesthetics you know are the clearest for displaying categorical data and which are best for continuous? 
5. In your own opinion, order the following aesthetics by how clear they are in representing a continuous variable: size, colour, transparency

### Common Problems

1. Using the `mpg` data set, make a plot of city milage against engine size. Map the variable `class` to the aesthetic `shape`. Is everything as you would expect?
2. Type the following code into the console. Why do you recieve an error message?

```{r eval = FALSE}
ggplot(iris) +
  geom_point(x = Sepal.Length, y = Petal.Length)
```

3. Take a look at the `airquality` dataset. Type the following code into the console. Is the plot as you expected?

```{r eval = FALSE}
ggplot(airquality) +
  geom_point(aes(x = Wind, y = Temp, col = Month))
```

4. Why are the points in this plot not blue?

```{r message = FALSE}
ggplot(mpg) +
  geom_point(aes(x = displ, y = hwy, colour = 'blue'))
```

5. What happens when you map a variable to multiple aesthetics (say colour and size)? (It's okay to answer, "nothing", to this question but make sure you verify that first!)

## Facetting

### Basic Faceting

(**Taken form R4DS**)

1. What happens when you facet a continuous variable?
2. What do the empty cells in a plot with `facet_grid(drv ~ cyl)` mean? How do they relate to this plot?

```{r eval = FALSE}
ggplot(mpg) +
  geom_point(aes(x = drv, y = cyl))
```

3. What plots does the following code make? What does the `.` do?

```{r eval = FALSE}
ggplot(mpg) +
  geom_point(aes(x = displ, y = hwy)) +
  facet_grid(drv ~ .)

ggplot(mpg) +
  geom_point(aes(x = displ, y = hwy)) +
  facet_grid(. ~ cyl)
```

4. Take the first faceted plot from the presentation:

```{r eval = FALSE}
ggplot(mpg) +
  geom_point(aes(x = displ, y = hwy)) +
  facet_wrap(~ class, nrow = 2)
```

What are the advantages of faceting instead of the colour aesthetic? What are the disadvantages? How might the balance change if you had a larger data set?

5. Read `?facet_wrap`. What does `nrow` do? What does `ncol` do? What other options control the layout of the individual panels? Why doesn't `facet_grid()` have `nrow` and `ncol` parameters?
6. When using `facet_grid()` you should usually put the variable with more unique levels in the columns. Why?


### Combining Facets with Aesthetics

1. Create a scatter plot of petal length vs petal width using the `iris` dataset and facet by species
2. Repeat the above plot whilst also colouring the species. Don't forget to hide the colour legend
3. Using the `mpg` dataset, plot `hwy` vs `cty`, map `displ` to the `size` aesthetic, map `class` to point colour, and facet columns by `cyl` and rows by `drv`. This plot is ridiculous but it does demonstrate the flexibilty of `ggplot2`

## Going Beyond

### Labelling

1. Run the following code. What does the extra `labs(...)` layer do?

```{r eval = FALSE}
ggplot(mpg) +
  geom_point(aes(x = displ, y = hwy, colour = class)) +
  labs(x = "Engine Displacment (litres)", y = "Highway Milage (miles/gallon)",
       colour = "Car Type",
       title = "A scatter plot of engine displacment vs highway milage",
       subtitle = "Coloured by car type",
       caption = "Source: EPA (http://fueleconomy.gov)")
```

2. Use this to take the plot from the 'Remaking plots' section and beautify it
3. Pick any plot of your choosing an give it appropriate axis labels, a title, and - if possible - a data source

### Diamonds and Overplotting

1. Have a look at the `diamonds` dataset
2. Make a scatter plot of `price` against `caret` (this may take a long time to run). Is this plot easy to read?
3. How could you fix this problem? (perhaps you could manually set a certain aesthetic)

### Explanatory and Response variables

1. How do you decide which variable to map to the x-axis and which to plot to the y-axis?
2. If you are unsure, web-search for the phrase "explanatory and response variables"

### Positional Arguments

Begin with the following code

```{r eval FALSE}
ggplot(mpg) +
  geom_point(aes(x = displ, y = hwy, colour = factor(class)))
```

1. Try removing `x = ` and `y = ` from your `geom_point` call. Does everything still work?
2. Try removing `colour = ` from your `geom_point` call. Does everything still work?
3. Take the original plot and specify the aesthetics in a different order, say `y` then `colour` then `x`. Does everything still work?
