Note: This exercise sheet is much shorter than the rest to allow more time in preparation for the DataViz Battle. If you feel like you need more practice with these topics, please have a look at the last part of chapter 3 from R4DS.

Remember, before you can use the tidyverse, you need to load the package.

library(tidyverse)

Missing Pieces

R Scripts

  1. Create an R Script that will produce a histogram of the city mileages in the mpg dataset
  2. Run this script
  3. Why is it important to include library imports at the start of our R scripts?
  4. Create an R Script containing a pipeline that takes the diamonds dataset, groups by color, and summarises the mean of each group
  5. Leave a blank line and then add code to create a bar chart of the classes in mpg
  6. Place your cursor on the first line of the script. What happens when you use ctrl-enter? What code is executed? Where does your cursor move to?

Saving output

  1. Write the head of the iris dataset to a CSV file called ‘iris.csv’
  2. Now, append the tail of the same dataset to the same CSV file
  3. Instead, overwrite the existing CSV with just the head
  4. What happens if you try to write a CSV to a directory (folder) that doesn’t exist?
  5. Create a jittered scatter plot of highway mileage against engine size for the mpg dataset. Use ggsave() to save the output as a PNG image
  6. Create a boxplot of price for each cut in the diamonds dataset. Save this as an A5 PDF. You may find the units = "cm argument useful.

Advanced Data Visualisation

Statistical Transformations

  1. In our proportional bar chart we had to write group = -1. Why was that needed and what would happen without it?
ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, y = ..prop.., group = -1))
  1. What are the default stats for geom_line(), geom_histogram(), and geom_density()?

  2. What variables does stat_smooth() compute? What parameters control its behaviour?

Positional Adjustments

  1. Create a bar chart of the cut variable from the diamonds dataset filled by clarity. Use position = "identity" and set transparency to a low value so all bars are visible. Is this a good plot?

  2. Repeat the above plot but now use position = "dodge" and then position = "fill"

  3. Which of the above plots is best for answering questions regarding what distribution the clarities have for each cut?

  4. Create a jitter plot both by using geom_point() with position = "jitter" and geom_jitter()

  5. What parameters can be used to control the behaviour of geom_jitter()? Instead of using position = "jitter", use position = position_jitter(...) and confirm that you can pass in these same parameters

  6. Compare and contrast the use of geom_jitter() and geom_count() in plotting hwy against cty with the mpg dataset

Coordinate Systems

  1. Create a horizontal bar plot of the mean price for each cut for the diamonds dataset. You will need to use group_by(), summarise() and then either geom_col() or geom_bar() with stat = "identity"

  2. Create a bar plot of the cylinder numbers in the mpg dataset and convert it into a Coxcomb plot using coord_polar()

  3. Create a pie chart of any appropriate variable you wish from the diamonds dataset

  4. What does the plot below tell you about the relationship between city and highway mileage in the mpg dataset? Why is coord_fixed() important and what does geom_abline() do?

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point() + 
  geom_abline() +
  coord_fixed()

Theming

  1. Create hex plot of petal length and width with the iris dataset. Apply different themes to this and decide on your favourite

  2. Create a scatter plot of sepal length by width coloured by species. Use theme(legend.position = "bottom") to move the legend. What about "left" and "top"? What does legend.position = "none" do?

  3. Create a plot of your choice and use the theme() function to make the gridlines green

  4. How can we use theming to improve the look of the pie chart we generated earlier?

Going Beyond

The DataViz battle is your opportunity to ‘go beyond’ so this section will be short this week. All it will contain are some suggested areas to look at to help improve your plot and improve your chances of winning the prize for an innovative plot.

Further Research

  1. Look at the latest ggplot cheatsheet here. What geometries have we not mentioned?

  2. Read the section titled ‘Statistical transformations’ from Chapter 3 of R4DS. This goes into more detail regarding the interplay between stats and geometries. Try creating a bar chart using stat_count()

  3. Look at the help page for coord_cartesian(). How can we use this to zoom in on our plot

  4. Read this guide on axis scale and transformations in ggplot. In particular, have a look at the scale_*_date() family of functions

  5. Read this page of the ggplot documentation, showing how to setup custom facet labels

LS0tDQp0aXRsZTogIkludG8gdGhlIFRpZHl2ZXJzZSINCnN1YnRpdGxlOiAiU2Vzc2lvbiBGaXZlIEV4ZXJjaXNlcyINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCioqTm90ZToqKiBUaGlzIGV4ZXJjaXNlIHNoZWV0IGlzIG11Y2ggc2hvcnRlciB0aGFuIHRoZSByZXN0IHRvIGFsbG93IG1vcmUgdGltZSBpbiBwcmVwYXJhdGlvbiBmb3IgdGhlIERhdGFWaXogQmF0dGxlLiBJZiB5b3UgZmVlbCBsaWtlIHlvdSBuZWVkIG1vcmUgcHJhY3RpY2Ugd2l0aCB0aGVzZSB0b3BpY3MsIHBsZWFzZSBoYXZlIGEgbG9vayBhdCB0aGUgbGFzdCBwYXJ0IG9mIGNoYXB0ZXIgMyBmcm9tIFtSNERTXShodHRwczovL3I0ZHMuaGFkLmNvLm56L2RhdGEtdmlzdWFsaXNhdGlvbi5odG1sKS4NCg0KUmVtZW1iZXIsIGJlZm9yZSB5b3UgY2FuIHVzZSB0aGUgdGlkeXZlcnNlLCB5b3UgbmVlZCB0byBsb2FkIHRoZSBwYWNrYWdlLg0KDQpgYGB7ciBtZXNzYWdlPUZBTFNFfQ0KbGlicmFyeSh0aWR5dmVyc2UpDQpgYGANCg0KIyMgTWlzc2luZyBQaWVjZXMNCg0KIyMjIFIgU2NyaXB0cw0KDQoxLiBDcmVhdGUgYW4gUiBTY3JpcHQgdGhhdCB3aWxsIHByb2R1Y2UgYSBoaXN0b2dyYW0gb2YgdGhlIGNpdHkgbWlsZWFnZXMgaW4gdGhlIGBtcGdgIGRhdGFzZXQNCjIuIFJ1biB0aGlzIHNjcmlwdA0KMy4gV2h5IGlzIGl0IGltcG9ydGFudCB0byBpbmNsdWRlIGxpYnJhcnkgaW1wb3J0cyBhdCB0aGUgc3RhcnQgb2Ygb3VyIFIgc2NyaXB0cz8NCjQuIENyZWF0ZSBhbiBSIFNjcmlwdCBjb250YWluaW5nIGEgcGlwZWxpbmUgdGhhdCB0YWtlcyB0aGUgYGRpYW1vbmRzYCBkYXRhc2V0LCBncm91cHMgYnkgYGNvbG9yYCwgYW5kIHN1bW1hcmlzZXMgdGhlIG1lYW4gb2YgZWFjaCBncm91cA0KNS4gTGVhdmUgYSBibGFuayBsaW5lIGFuZCB0aGVuIGFkZCBjb2RlIHRvIGNyZWF0ZSBhIGJhciBjaGFydCBvZiB0aGUgY2xhc3NlcyBpbiBgbXBnYA0KNi4gUGxhY2UgeW91ciBjdXJzb3Igb24gdGhlIGZpcnN0IGxpbmUgb2YgdGhlIHNjcmlwdC4gV2hhdCBoYXBwZW5zIHdoZW4geW91IHVzZSBgY3RybC1lbnRlcmA/IFdoYXQgY29kZSBpcyBleGVjdXRlZD8gV2hlcmUgZG9lcyB5b3VyIGN1cnNvciBtb3ZlIHRvPw0KDQojIyMgU2F2aW5nIG91dHB1dA0KDQoxLiBXcml0ZSB0aGUgaGVhZCBvZiB0aGUgYGlyaXNgIGRhdGFzZXQgdG8gYSBDU1YgZmlsZSBjYWxsZWQgJ2lyaXMuY3N2Jw0KMi4gTm93LCBhcHBlbmQgdGhlIHRhaWwgb2YgdGhlIHNhbWUgZGF0YXNldCB0byB0aGUgc2FtZSBDU1YgZmlsZQ0KMy4gSW5zdGVhZCwgb3ZlcndyaXRlIHRoZSBleGlzdGluZyBDU1Ygd2l0aCBqdXN0IHRoZSBoZWFkDQo0LiBXaGF0IGhhcHBlbnMgaWYgeW91IHRyeSB0byB3cml0ZSBhIENTViB0byBhIGRpcmVjdG9yeSAoZm9sZGVyKSB0aGF0IGRvZXNuJ3QgZXhpc3Q/DQo1LiBDcmVhdGUgYSBqaXR0ZXJlZCBzY2F0dGVyIHBsb3Qgb2YgaGlnaHdheSBtaWxlYWdlIGFnYWluc3QgZW5naW5lIHNpemUgZm9yIHRoZSBgbXBnYCBkYXRhc2V0LiAgVXNlIGBnZ3NhdmUoKWAgdG8gc2F2ZSB0aGUgb3V0cHV0IGFzIGEgUE5HIGltYWdlDQo2LiBDcmVhdGUgYSBib3hwbG90IG9mIHByaWNlIGZvciBlYWNoIGN1dCBpbiB0aGUgYGRpYW1vbmRzYCBkYXRhc2V0LiBTYXZlIHRoaXMgYXMgYW4gQTUgUERGLiBZb3UgbWF5IGZpbmQgdGhlIGB1bml0cyA9ICJjbWAgYXJndW1lbnQgdXNlZnVsLg0KDQojIyBBZHZhbmNlZCBEYXRhIFZpc3VhbGlzYXRpb24NCg0KIyMjIFN0YXRpc3RpY2FsIFRyYW5zZm9ybWF0aW9ucw0KDQoxLiBJbiBvdXIgcHJvcG9ydGlvbmFsIGJhciBjaGFydCB3ZSBoYWQgdG8gd3JpdGUgYGdyb3VwID0gLTFgLiBXaHkgd2FzIHRoYXQgbmVlZGVkIGFuZCB3aGF0IHdvdWxkIGhhcHBlbiB3aXRob3V0IGl0Pw0KDQpgYGB7ciBldmFsPUZBTFNFfQ0KZ2dwbG90KGRhdGEgPSBkaWFtb25kcykgKw0KICBnZW9tX2JhcihtYXBwaW5nID0gYWVzKHggPSBjdXQsIHkgPSAuLnByb3AuLiwgZ3JvdXAgPSAtMSkpDQpgYGANCg0KMi4gV2hhdCBhcmUgdGhlIGRlZmF1bHQgc3RhdHMgZm9yIGBnZW9tX2xpbmUoKWAsIGBnZW9tX2hpc3RvZ3JhbSgpYCwgYW5kIGBnZW9tX2RlbnNpdHkoKWA/DQoNCjMuIFdoYXQgdmFyaWFibGVzIGRvZXMgYHN0YXRfc21vb3RoKClgIGNvbXB1dGU/IFdoYXQgcGFyYW1ldGVycyBjb250cm9sIGl0cyBiZWhhdmlvdXI/DQoNCiMjIyBQb3NpdGlvbmFsIEFkanVzdG1lbnRzDQoNCjEuIENyZWF0ZSBhIGJhciBjaGFydCBvZiB0aGUgYGN1dGAgdmFyaWFibGUgZnJvbSB0aGUgYGRpYW1vbmRzYCBkYXRhc2V0IGZpbGxlZCBieSBgY2xhcml0eWAuIFVzZSBgcG9zaXRpb24gPSAiaWRlbnRpdHkiYCBhbmQgc2V0IHRyYW5zcGFyZW5jeSB0byBhIGxvdyB2YWx1ZSBzbyBhbGwgYmFycyBhcmUgdmlzaWJsZS4gSXMgdGhpcyBhIGdvb2QgcGxvdD8NCg0KMi4gUmVwZWF0IHRoZSBhYm92ZSBwbG90IGJ1dCBub3cgdXNlIGBwb3NpdGlvbiA9ICJkb2RnZSJgIGFuZCB0aGVuIGBwb3NpdGlvbiA9ICJmaWxsImANCg0KMy4gV2hpY2ggb2YgdGhlIGFib3ZlIHBsb3RzIGlzIGJlc3QgZm9yIGFuc3dlcmluZyBxdWVzdGlvbnMgcmVnYXJkaW5nIHdoYXQgZGlzdHJpYnV0aW9uIHRoZSBjbGFyaXRpZXMgaGF2ZSBmb3IgZWFjaCBjdXQ/DQogDQo0LiBDcmVhdGUgYSBqaXR0ZXIgcGxvdCBib3RoIGJ5IHVzaW5nIGBnZW9tX3BvaW50KClgIHdpdGggYHBvc2l0aW9uID0gImppdHRlciJgIGFuZCBgZ2VvbV9qaXR0ZXIoKWANCg0KNS4gV2hhdCBwYXJhbWV0ZXJzIGNhbiBiZSB1c2VkIHRvIGNvbnRyb2wgdGhlIGJlaGF2aW91ciBvZiBgZ2VvbV9qaXR0ZXIoKWA/IEluc3RlYWQgb2YgdXNpbmcgYHBvc2l0aW9uID0gImppdHRlciJgLCB1c2UgYHBvc2l0aW9uID0gcG9zaXRpb25faml0dGVyKC4uLilgIGFuZCBjb25maXJtIHRoYXQgeW91IGNhbiBwYXNzIGluIHRoZXNlIHNhbWUgcGFyYW1ldGVycw0KDQo2LiBDb21wYXJlIGFuZCBjb250cmFzdCB0aGUgdXNlIG9mIGBnZW9tX2ppdHRlcigpYCBhbmQgYGdlb21fY291bnQoKWAgaW4gcGxvdHRpbmcgYGh3eWAgYWdhaW5zdCBgY3R5YCB3aXRoIHRoZSBgbXBnYCBkYXRhc2V0DQoNCiMjIyBDb29yZGluYXRlIFN5c3RlbXMNCg0KMS4gQ3JlYXRlIGEgaG9yaXpvbnRhbCBiYXIgcGxvdCBvZiB0aGUgbWVhbiBwcmljZSBmb3IgZWFjaCBjdXQgZm9yIHRoZSBgZGlhbW9uZHNgIGRhdGFzZXQuIFlvdSB3aWxsIG5lZWQgdG8gdXNlIGBncm91cF9ieSgpYCwgYHN1bW1hcmlzZSgpYCBhbmQgdGhlbiBlaXRoZXIgYGdlb21fY29sKClgIG9yIGBnZW9tX2JhcigpYCB3aXRoIGBzdGF0ID0gImlkZW50aXR5ImANCg0KMi4gQ3JlYXRlIGEgYmFyIHBsb3Qgb2YgdGhlIGN5bGluZGVyIG51bWJlcnMgaW4gdGhlIGBtcGdgIGRhdGFzZXQgYW5kIGNvbnZlcnQgaXQgaW50byBhIENveGNvbWIgcGxvdCB1c2luZyBgY29vcmRfcG9sYXIoKWANCg0KMy4gQ3JlYXRlIGEgcGllIGNoYXJ0IG9mIGFueSBhcHByb3ByaWF0ZSB2YXJpYWJsZSB5b3Ugd2lzaCBmcm9tIHRoZSBgZGlhbW9uZHNgIGRhdGFzZXQNCg0KNC4gV2hhdCBkb2VzIHRoZSBwbG90IGJlbG93IHRlbGwgeW91IGFib3V0IHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBjaXR5IGFuZCBoaWdod2F5IG1pbGVhZ2UgaW4gdGhlIGBtcGdgIGRhdGFzZXQ/IFdoeSBpcyBgY29vcmRfZml4ZWQoKWAgaW1wb3J0YW50IGFuZCB3aGF0IGRvZXMgYGdlb21fYWJsaW5lKClgIGRvPw0KDQpgYGB7cn0NCmdncGxvdChkYXRhID0gbXBnLCBtYXBwaW5nID0gYWVzKHggPSBjdHksIHkgPSBod3kpKSArDQogIGdlb21fcG9pbnQoKSArIA0KICBnZW9tX2FibGluZSgpICsNCiAgY29vcmRfZml4ZWQoKQ0KYGBgDQoNCiMjIyBUaGVtaW5nDQoNCjEuIENyZWF0ZSBoZXggcGxvdCBvZiBwZXRhbCBsZW5ndGggYW5kIHdpZHRoIHdpdGggdGhlIGBpcmlzYCBkYXRhc2V0LiBBcHBseSBkaWZmZXJlbnQgdGhlbWVzIHRvIHRoaXMgYW5kIGRlY2lkZSBvbiB5b3VyIGZhdm91cml0ZQ0KDQoyLiBDcmVhdGUgYSBzY2F0dGVyIHBsb3Qgb2Ygc2VwYWwgbGVuZ3RoIGJ5IHdpZHRoIGNvbG91cmVkIGJ5IHNwZWNpZXMuIFVzZSBgdGhlbWUobGVnZW5kLnBvc2l0aW9uID0gImJvdHRvbSIpYCB0byBtb3ZlIHRoZSBsZWdlbmQuIFdoYXQgYWJvdXQgYCJsZWZ0ImAgYW5kIGAidG9wImA/IFdoYXQgZG9lcyBgbGVnZW5kLnBvc2l0aW9uID0gIm5vbmUiYCBkbz8NCg0KMy4gQ3JlYXRlIGEgcGxvdCBvZiB5b3VyIGNob2ljZSBhbmQgdXNlIHRoZSBgdGhlbWUoKWAgZnVuY3Rpb24gdG8gbWFrZSB0aGUgZ3JpZGxpbmVzIGdyZWVuDQoNCjQuIEhvdyBjYW4gd2UgdXNlIHRoZW1pbmcgdG8gaW1wcm92ZSB0aGUgbG9vayBvZiB0aGUgcGllIGNoYXJ0IHdlIGdlbmVyYXRlZCBlYXJsaWVyPw0KDQojIyBHb2luZyBCZXlvbmQNCg0KVGhlIERhdGFWaXogYmF0dGxlIGlzIHlvdXIgb3Bwb3J0dW5pdHkgdG8gJ2dvIGJleW9uZCcgc28gdGhpcyBzZWN0aW9uIHdpbGwgYmUgc2hvcnQgdGhpcyB3ZWVrLiBBbGwgaXQgd2lsbCBjb250YWluIGFyZSBzb21lIHN1Z2dlc3RlZCBhcmVhcyB0byBsb29rIGF0IHRvIGhlbHAgaW1wcm92ZSB5b3VyIHBsb3QgYW5kIGltcHJvdmUgeW91ciBjaGFuY2VzIG9mIHdpbm5pbmcgdGhlIHByaXplIGZvciBhbiBpbm5vdmF0aXZlIHBsb3QuDQoNCiMjIyBGdXJ0aGVyIFJlc2VhcmNoDQoNCjEuIExvb2sgYXQgdGhlIGxhdGVzdCBnZ3Bsb3QgY2hlYXRzaGVldCBbaGVyZV0oaHR0cHM6Ly9yc3R1ZGlvLmNvbS93cC1jb250ZW50L3VwbG9hZHMvMjAxNi8xMS9nZ3Bsb3QyLWNoZWF0c2hlZXQtMi4xLnBkZikuIFdoYXQgZ2VvbWV0cmllcyBoYXZlIHdlIG5vdCBtZW50aW9uZWQ/IA0KDQoyLiBSZWFkIHRoZSBzZWN0aW9uIHRpdGxlZCAnU3RhdGlzdGljYWwgdHJhbnNmb3JtYXRpb25zJyBmcm9tIENoYXB0ZXIgMyBvZiBbUjREU10oaHR0cHM6Ly9yNGRzLmhhZC5jby5uei9kYXRhLXZpc3VhbGlzYXRpb24uaHRtbCkuIFRoaXMgZ29lcyBpbnRvIG1vcmUgZGV0YWlsIHJlZ2FyZGluZyB0aGUgaW50ZXJwbGF5IGJldHdlZW4gc3RhdHMgYW5kIGdlb21ldHJpZXMuIFRyeSBjcmVhdGluZyBhIGJhciBjaGFydCB1c2luZyBgc3RhdF9jb3VudCgpYA0KDQozLiBMb29rIGF0IHRoZSBoZWxwIHBhZ2UgZm9yIGBjb29yZF9jYXJ0ZXNpYW4oKWAuIEhvdyBjYW4gd2UgdXNlIHRoaXMgdG8gem9vbSBpbiBvbiBvdXIgcGxvdA0KDQo0LiBSZWFkIFt0aGlzXShodHRwOi8vd3d3LnN0aGRhLmNvbS9lbmdsaXNoL3dpa2kvZ2dwbG90Mi1heGlzLXNjYWxlcy1hbmQtdHJhbnNmb3JtYXRpb25zKSBndWlkZSBvbiBheGlzIHNjYWxlIGFuZCB0cmFuc2Zvcm1hdGlvbnMgaW4gZ2dwbG90LiBJbiBwYXJ0aWN1bGFyLCBoYXZlIGEgbG9vayBhdCB0aGUgYHNjYWxlXypfZGF0ZSgpYCBmYW1pbHkgb2YgZnVuY3Rpb25zDQoNCjUuIFJlYWQgW3RoaXNdKGh0dHBzOi8vZ2dwbG90Mi50aWR5dmVyc2Uub3JnL3JlZmVyZW5jZS9sYWJlbGxlci5odG1sKSBwYWdlIG9mIHRoZSBnZ3Bsb3QgZG9jdW1lbnRhdGlvbiwgc2hvd2luZyBob3cgdG8gc2V0dXAgY3VzdG9tIGZhY2V0IGxhYmVscw==