Note: This exercise sheet is much shorter than the rest to allow more time in preparation for the DataViz Battle. If you feel like you need more practice with these topics, please have a look at the last part of chapter 3 from R4DS.
Remember, before you can use the tidyverse, you need to load the package.
library(tidyverse)
Missing Pieces
R Scripts
- Create an R Script that will produce a histogram of the city mileages in the
mpg
dataset
- Run this script
- Why is it important to include library imports at the start of our R scripts?
- Create an R Script containing a pipeline that takes the
diamonds
dataset, groups by color
, and summarises the mean of each group
- Leave a blank line and then add code to create a bar chart of the classes in
mpg
- Place your cursor on the first line of the script. What happens when you use
ctrl-enter
? What code is executed? Where does your cursor move to?
Saving output
- Write the head of the
iris
dataset to a CSV file called ‘iris.csv’
- Now, append the tail of the same dataset to the same CSV file
- Instead, overwrite the existing CSV with just the head
- What happens if you try to write a CSV to a directory (folder) that doesn’t exist?
- Create a jittered scatter plot of highway mileage against engine size for the
mpg
dataset. Use ggsave()
to save the output as a PNG image
- Create a boxplot of price for each cut in the
diamonds
dataset. Save this as an A5 PDF. You may find the units = "cm
argument useful.
Advanced Data Visualisation
Positional Adjustments
Create a bar chart of the cut
variable from the diamonds
dataset filled by clarity
. Use position = "identity"
and set transparency to a low value so all bars are visible. Is this a good plot?
Repeat the above plot but now use position = "dodge"
and then position = "fill"
Which of the above plots is best for answering questions regarding what distribution the clarities have for each cut?
Create a jitter plot both by using geom_point()
with position = "jitter"
and geom_jitter()
What parameters can be used to control the behaviour of geom_jitter()
? Instead of using position = "jitter"
, use position = position_jitter(...)
and confirm that you can pass in these same parameters
Compare and contrast the use of geom_jitter()
and geom_count()
in plotting hwy
against cty
with the mpg
dataset
Coordinate Systems
Create a horizontal bar plot of the mean price for each cut for the diamonds
dataset. You will need to use group_by()
, summarise()
and then either geom_col()
or geom_bar()
with stat = "identity"
Create a bar plot of the cylinder numbers in the mpg
dataset and convert it into a Coxcomb plot using coord_polar()
Create a pie chart of any appropriate variable you wish from the diamonds
dataset
What does the plot below tell you about the relationship between city and highway mileage in the mpg
dataset? Why is coord_fixed()
important and what does geom_abline()
do?
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point() +
geom_abline() +
coord_fixed()
Theming
Create hex plot of petal length and width with the iris
dataset. Apply different themes to this and decide on your favourite
Create a scatter plot of sepal length by width coloured by species. Use theme(legend.position = "bottom")
to move the legend. What about "left"
and "top"
? What does legend.position = "none"
do?
Create a plot of your choice and use the theme()
function to make the gridlines green
How can we use theming to improve the look of the pie chart we generated earlier?
Going Beyond
The DataViz battle is your opportunity to ‘go beyond’ so this section will be short this week. All it will contain are some suggested areas to look at to help improve your plot and improve your chances of winning the prize for an innovative plot.
Further Research
Look at the latest ggplot cheatsheet here. What geometries have we not mentioned?
Read the section titled ‘Statistical transformations’ from Chapter 3 of R4DS. This goes into more detail regarding the interplay between stats and geometries. Try creating a bar chart using stat_count()
Look at the help page for coord_cartesian()
. How can we use this to zoom in on our plot
Read this guide on axis scale and transformations in ggplot. In particular, have a look at the scale_*_date()
family of functions
Read this page of the ggplot documentation, showing how to setup custom facet labels
LS0tDQp0aXRsZTogIkludG8gdGhlIFRpZHl2ZXJzZSINCnN1YnRpdGxlOiAiU2Vzc2lvbiBGaXZlIEV4ZXJjaXNlcyINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCioqTm90ZToqKiBUaGlzIGV4ZXJjaXNlIHNoZWV0IGlzIG11Y2ggc2hvcnRlciB0aGFuIHRoZSByZXN0IHRvIGFsbG93IG1vcmUgdGltZSBpbiBwcmVwYXJhdGlvbiBmb3IgdGhlIERhdGFWaXogQmF0dGxlLiBJZiB5b3UgZmVlbCBsaWtlIHlvdSBuZWVkIG1vcmUgcHJhY3RpY2Ugd2l0aCB0aGVzZSB0b3BpY3MsIHBsZWFzZSBoYXZlIGEgbG9vayBhdCB0aGUgbGFzdCBwYXJ0IG9mIGNoYXB0ZXIgMyBmcm9tIFtSNERTXShodHRwczovL3I0ZHMuaGFkLmNvLm56L2RhdGEtdmlzdWFsaXNhdGlvbi5odG1sKS4NCg0KUmVtZW1iZXIsIGJlZm9yZSB5b3UgY2FuIHVzZSB0aGUgdGlkeXZlcnNlLCB5b3UgbmVlZCB0byBsb2FkIHRoZSBwYWNrYWdlLg0KDQpgYGB7ciBtZXNzYWdlPUZBTFNFfQ0KbGlicmFyeSh0aWR5dmVyc2UpDQpgYGANCg0KIyMgTWlzc2luZyBQaWVjZXMNCg0KIyMjIFIgU2NyaXB0cw0KDQoxLiBDcmVhdGUgYW4gUiBTY3JpcHQgdGhhdCB3aWxsIHByb2R1Y2UgYSBoaXN0b2dyYW0gb2YgdGhlIGNpdHkgbWlsZWFnZXMgaW4gdGhlIGBtcGdgIGRhdGFzZXQNCjIuIFJ1biB0aGlzIHNjcmlwdA0KMy4gV2h5IGlzIGl0IGltcG9ydGFudCB0byBpbmNsdWRlIGxpYnJhcnkgaW1wb3J0cyBhdCB0aGUgc3RhcnQgb2Ygb3VyIFIgc2NyaXB0cz8NCjQuIENyZWF0ZSBhbiBSIFNjcmlwdCBjb250YWluaW5nIGEgcGlwZWxpbmUgdGhhdCB0YWtlcyB0aGUgYGRpYW1vbmRzYCBkYXRhc2V0LCBncm91cHMgYnkgYGNvbG9yYCwgYW5kIHN1bW1hcmlzZXMgdGhlIG1lYW4gb2YgZWFjaCBncm91cA0KNS4gTGVhdmUgYSBibGFuayBsaW5lIGFuZCB0aGVuIGFkZCBjb2RlIHRvIGNyZWF0ZSBhIGJhciBjaGFydCBvZiB0aGUgY2xhc3NlcyBpbiBgbXBnYA0KNi4gUGxhY2UgeW91ciBjdXJzb3Igb24gdGhlIGZpcnN0IGxpbmUgb2YgdGhlIHNjcmlwdC4gV2hhdCBoYXBwZW5zIHdoZW4geW91IHVzZSBgY3RybC1lbnRlcmA/IFdoYXQgY29kZSBpcyBleGVjdXRlZD8gV2hlcmUgZG9lcyB5b3VyIGN1cnNvciBtb3ZlIHRvPw0KDQojIyMgU2F2aW5nIG91dHB1dA0KDQoxLiBXcml0ZSB0aGUgaGVhZCBvZiB0aGUgYGlyaXNgIGRhdGFzZXQgdG8gYSBDU1YgZmlsZSBjYWxsZWQgJ2lyaXMuY3N2Jw0KMi4gTm93LCBhcHBlbmQgdGhlIHRhaWwgb2YgdGhlIHNhbWUgZGF0YXNldCB0byB0aGUgc2FtZSBDU1YgZmlsZQ0KMy4gSW5zdGVhZCwgb3ZlcndyaXRlIHRoZSBleGlzdGluZyBDU1Ygd2l0aCBqdXN0IHRoZSBoZWFkDQo0LiBXaGF0IGhhcHBlbnMgaWYgeW91IHRyeSB0byB3cml0ZSBhIENTViB0byBhIGRpcmVjdG9yeSAoZm9sZGVyKSB0aGF0IGRvZXNuJ3QgZXhpc3Q/DQo1LiBDcmVhdGUgYSBqaXR0ZXJlZCBzY2F0dGVyIHBsb3Qgb2YgaGlnaHdheSBtaWxlYWdlIGFnYWluc3QgZW5naW5lIHNpemUgZm9yIHRoZSBgbXBnYCBkYXRhc2V0LiAgVXNlIGBnZ3NhdmUoKWAgdG8gc2F2ZSB0aGUgb3V0cHV0IGFzIGEgUE5HIGltYWdlDQo2LiBDcmVhdGUgYSBib3hwbG90IG9mIHByaWNlIGZvciBlYWNoIGN1dCBpbiB0aGUgYGRpYW1vbmRzYCBkYXRhc2V0LiBTYXZlIHRoaXMgYXMgYW4gQTUgUERGLiBZb3UgbWF5IGZpbmQgdGhlIGB1bml0cyA9ICJjbWAgYXJndW1lbnQgdXNlZnVsLg0KDQojIyBBZHZhbmNlZCBEYXRhIFZpc3VhbGlzYXRpb24NCg0KIyMjIFN0YXRpc3RpY2FsIFRyYW5zZm9ybWF0aW9ucw0KDQoxLiBJbiBvdXIgcHJvcG9ydGlvbmFsIGJhciBjaGFydCB3ZSBoYWQgdG8gd3JpdGUgYGdyb3VwID0gLTFgLiBXaHkgd2FzIHRoYXQgbmVlZGVkIGFuZCB3aGF0IHdvdWxkIGhhcHBlbiB3aXRob3V0IGl0Pw0KDQpgYGB7ciBldmFsPUZBTFNFfQ0KZ2dwbG90KGRhdGEgPSBkaWFtb25kcykgKw0KICBnZW9tX2JhcihtYXBwaW5nID0gYWVzKHggPSBjdXQsIHkgPSAuLnByb3AuLiwgZ3JvdXAgPSAtMSkpDQpgYGANCg0KMi4gV2hhdCBhcmUgdGhlIGRlZmF1bHQgc3RhdHMgZm9yIGBnZW9tX2xpbmUoKWAsIGBnZW9tX2hpc3RvZ3JhbSgpYCwgYW5kIGBnZW9tX2RlbnNpdHkoKWA/DQoNCjMuIFdoYXQgdmFyaWFibGVzIGRvZXMgYHN0YXRfc21vb3RoKClgIGNvbXB1dGU/IFdoYXQgcGFyYW1ldGVycyBjb250cm9sIGl0cyBiZWhhdmlvdXI/DQoNCiMjIyBQb3NpdGlvbmFsIEFkanVzdG1lbnRzDQoNCjEuIENyZWF0ZSBhIGJhciBjaGFydCBvZiB0aGUgYGN1dGAgdmFyaWFibGUgZnJvbSB0aGUgYGRpYW1vbmRzYCBkYXRhc2V0IGZpbGxlZCBieSBgY2xhcml0eWAuIFVzZSBgcG9zaXRpb24gPSAiaWRlbnRpdHkiYCBhbmQgc2V0IHRyYW5zcGFyZW5jeSB0byBhIGxvdyB2YWx1ZSBzbyBhbGwgYmFycyBhcmUgdmlzaWJsZS4gSXMgdGhpcyBhIGdvb2QgcGxvdD8NCg0KMi4gUmVwZWF0IHRoZSBhYm92ZSBwbG90IGJ1dCBub3cgdXNlIGBwb3NpdGlvbiA9ICJkb2RnZSJgIGFuZCB0aGVuIGBwb3NpdGlvbiA9ICJmaWxsImANCg0KMy4gV2hpY2ggb2YgdGhlIGFib3ZlIHBsb3RzIGlzIGJlc3QgZm9yIGFuc3dlcmluZyBxdWVzdGlvbnMgcmVnYXJkaW5nIHdoYXQgZGlzdHJpYnV0aW9uIHRoZSBjbGFyaXRpZXMgaGF2ZSBmb3IgZWFjaCBjdXQ/DQogDQo0LiBDcmVhdGUgYSBqaXR0ZXIgcGxvdCBib3RoIGJ5IHVzaW5nIGBnZW9tX3BvaW50KClgIHdpdGggYHBvc2l0aW9uID0gImppdHRlciJgIGFuZCBgZ2VvbV9qaXR0ZXIoKWANCg0KNS4gV2hhdCBwYXJhbWV0ZXJzIGNhbiBiZSB1c2VkIHRvIGNvbnRyb2wgdGhlIGJlaGF2aW91ciBvZiBgZ2VvbV9qaXR0ZXIoKWA/IEluc3RlYWQgb2YgdXNpbmcgYHBvc2l0aW9uID0gImppdHRlciJgLCB1c2UgYHBvc2l0aW9uID0gcG9zaXRpb25faml0dGVyKC4uLilgIGFuZCBjb25maXJtIHRoYXQgeW91IGNhbiBwYXNzIGluIHRoZXNlIHNhbWUgcGFyYW1ldGVycw0KDQo2LiBDb21wYXJlIGFuZCBjb250cmFzdCB0aGUgdXNlIG9mIGBnZW9tX2ppdHRlcigpYCBhbmQgYGdlb21fY291bnQoKWAgaW4gcGxvdHRpbmcgYGh3eWAgYWdhaW5zdCBgY3R5YCB3aXRoIHRoZSBgbXBnYCBkYXRhc2V0DQoNCiMjIyBDb29yZGluYXRlIFN5c3RlbXMNCg0KMS4gQ3JlYXRlIGEgaG9yaXpvbnRhbCBiYXIgcGxvdCBvZiB0aGUgbWVhbiBwcmljZSBmb3IgZWFjaCBjdXQgZm9yIHRoZSBgZGlhbW9uZHNgIGRhdGFzZXQuIFlvdSB3aWxsIG5lZWQgdG8gdXNlIGBncm91cF9ieSgpYCwgYHN1bW1hcmlzZSgpYCBhbmQgdGhlbiBlaXRoZXIgYGdlb21fY29sKClgIG9yIGBnZW9tX2JhcigpYCB3aXRoIGBzdGF0ID0gImlkZW50aXR5ImANCg0KMi4gQ3JlYXRlIGEgYmFyIHBsb3Qgb2YgdGhlIGN5bGluZGVyIG51bWJlcnMgaW4gdGhlIGBtcGdgIGRhdGFzZXQgYW5kIGNvbnZlcnQgaXQgaW50byBhIENveGNvbWIgcGxvdCB1c2luZyBgY29vcmRfcG9sYXIoKWANCg0KMy4gQ3JlYXRlIGEgcGllIGNoYXJ0IG9mIGFueSBhcHByb3ByaWF0ZSB2YXJpYWJsZSB5b3Ugd2lzaCBmcm9tIHRoZSBgZGlhbW9uZHNgIGRhdGFzZXQNCg0KNC4gV2hhdCBkb2VzIHRoZSBwbG90IGJlbG93IHRlbGwgeW91IGFib3V0IHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBjaXR5IGFuZCBoaWdod2F5IG1pbGVhZ2UgaW4gdGhlIGBtcGdgIGRhdGFzZXQ/IFdoeSBpcyBgY29vcmRfZml4ZWQoKWAgaW1wb3J0YW50IGFuZCB3aGF0IGRvZXMgYGdlb21fYWJsaW5lKClgIGRvPw0KDQpgYGB7cn0NCmdncGxvdChkYXRhID0gbXBnLCBtYXBwaW5nID0gYWVzKHggPSBjdHksIHkgPSBod3kpKSArDQogIGdlb21fcG9pbnQoKSArIA0KICBnZW9tX2FibGluZSgpICsNCiAgY29vcmRfZml4ZWQoKQ0KYGBgDQoNCiMjIyBUaGVtaW5nDQoNCjEuIENyZWF0ZSBoZXggcGxvdCBvZiBwZXRhbCBsZW5ndGggYW5kIHdpZHRoIHdpdGggdGhlIGBpcmlzYCBkYXRhc2V0LiBBcHBseSBkaWZmZXJlbnQgdGhlbWVzIHRvIHRoaXMgYW5kIGRlY2lkZSBvbiB5b3VyIGZhdm91cml0ZQ0KDQoyLiBDcmVhdGUgYSBzY2F0dGVyIHBsb3Qgb2Ygc2VwYWwgbGVuZ3RoIGJ5IHdpZHRoIGNvbG91cmVkIGJ5IHNwZWNpZXMuIFVzZSBgdGhlbWUobGVnZW5kLnBvc2l0aW9uID0gImJvdHRvbSIpYCB0byBtb3ZlIHRoZSBsZWdlbmQuIFdoYXQgYWJvdXQgYCJsZWZ0ImAgYW5kIGAidG9wImA/IFdoYXQgZG9lcyBgbGVnZW5kLnBvc2l0aW9uID0gIm5vbmUiYCBkbz8NCg0KMy4gQ3JlYXRlIGEgcGxvdCBvZiB5b3VyIGNob2ljZSBhbmQgdXNlIHRoZSBgdGhlbWUoKWAgZnVuY3Rpb24gdG8gbWFrZSB0aGUgZ3JpZGxpbmVzIGdyZWVuDQoNCjQuIEhvdyBjYW4gd2UgdXNlIHRoZW1pbmcgdG8gaW1wcm92ZSB0aGUgbG9vayBvZiB0aGUgcGllIGNoYXJ0IHdlIGdlbmVyYXRlZCBlYXJsaWVyPw0KDQojIyBHb2luZyBCZXlvbmQNCg0KVGhlIERhdGFWaXogYmF0dGxlIGlzIHlvdXIgb3Bwb3J0dW5pdHkgdG8gJ2dvIGJleW9uZCcgc28gdGhpcyBzZWN0aW9uIHdpbGwgYmUgc2hvcnQgdGhpcyB3ZWVrLiBBbGwgaXQgd2lsbCBjb250YWluIGFyZSBzb21lIHN1Z2dlc3RlZCBhcmVhcyB0byBsb29rIGF0IHRvIGhlbHAgaW1wcm92ZSB5b3VyIHBsb3QgYW5kIGltcHJvdmUgeW91ciBjaGFuY2VzIG9mIHdpbm5pbmcgdGhlIHByaXplIGZvciBhbiBpbm5vdmF0aXZlIHBsb3QuDQoNCiMjIyBGdXJ0aGVyIFJlc2VhcmNoDQoNCjEuIExvb2sgYXQgdGhlIGxhdGVzdCBnZ3Bsb3QgY2hlYXRzaGVldCBbaGVyZV0oaHR0cHM6Ly9yc3R1ZGlvLmNvbS93cC1jb250ZW50L3VwbG9hZHMvMjAxNi8xMS9nZ3Bsb3QyLWNoZWF0c2hlZXQtMi4xLnBkZikuIFdoYXQgZ2VvbWV0cmllcyBoYXZlIHdlIG5vdCBtZW50aW9uZWQ/IA0KDQoyLiBSZWFkIHRoZSBzZWN0aW9uIHRpdGxlZCAnU3RhdGlzdGljYWwgdHJhbnNmb3JtYXRpb25zJyBmcm9tIENoYXB0ZXIgMyBvZiBbUjREU10oaHR0cHM6Ly9yNGRzLmhhZC5jby5uei9kYXRhLXZpc3VhbGlzYXRpb24uaHRtbCkuIFRoaXMgZ29lcyBpbnRvIG1vcmUgZGV0YWlsIHJlZ2FyZGluZyB0aGUgaW50ZXJwbGF5IGJldHdlZW4gc3RhdHMgYW5kIGdlb21ldHJpZXMuIFRyeSBjcmVhdGluZyBhIGJhciBjaGFydCB1c2luZyBgc3RhdF9jb3VudCgpYA0KDQozLiBMb29rIGF0IHRoZSBoZWxwIHBhZ2UgZm9yIGBjb29yZF9jYXJ0ZXNpYW4oKWAuIEhvdyBjYW4gd2UgdXNlIHRoaXMgdG8gem9vbSBpbiBvbiBvdXIgcGxvdA0KDQo0LiBSZWFkIFt0aGlzXShodHRwOi8vd3d3LnN0aGRhLmNvbS9lbmdsaXNoL3dpa2kvZ2dwbG90Mi1heGlzLXNjYWxlcy1hbmQtdHJhbnNmb3JtYXRpb25zKSBndWlkZSBvbiBheGlzIHNjYWxlIGFuZCB0cmFuc2Zvcm1hdGlvbnMgaW4gZ2dwbG90LiBJbiBwYXJ0aWN1bGFyLCBoYXZlIGEgbG9vayBhdCB0aGUgYHNjYWxlXypfZGF0ZSgpYCBmYW1pbHkgb2YgZnVuY3Rpb25zDQoNCjUuIFJlYWQgW3RoaXNdKGh0dHBzOi8vZ2dwbG90Mi50aWR5dmVyc2Uub3JnL3JlZmVyZW5jZS9sYWJlbGxlci5odG1sKSBwYWdlIG9mIHRoZSBnZ3Bsb3QgZG9jdW1lbnRhdGlvbiwgc2hvd2luZyBob3cgdG8gc2V0dXAgY3VzdG9tIGZhY2V0IGxhYmVscw==