Beautiful and informative data visualisation

Аll the files you need to complete this tutorial can be downloaded from this repository. Clone and download the repo as a zip file, then unzip it.

1. Good data visualisation and ggplot2 syntax

We’ve learned how to import our data in RStudio, format and manipulate them, and now it’s time we talk about communicating the results of our analyses - data visualisation! When it comes to data visualisation, the package `ggplot2` by Hadley Wickham has won over many scientists’ hearts. In this tutorial, we will learn how to make beautiful and informative graphs and how to arrange them in a panel. Before we take on the `ggplot2` syntax, let’s briefly cover what good graphs have in common.

`ggplot2` is a great package to guide you through those steps. The `gg` in `ggplot2` stands for grammar of graphics. Writing the code for your graph is like constructing a sentence made up of different parts that logically follow from one another. In a data visualisation context, the different elements of the code represent layers - first you make an empty plot, then you add a layer with your data points, then your measure of uncertainty, the axis labels and so on.

When using `ggplot2`, you usually start your code with `ggplot(your_data, aes(x = independent_variable, y = dependent_variable))`, then you add the type of plot you want to make using `+ geom_boxplot()`, `+ geom_histogram()`, etc. `aes` stands for aesthetics, hinting to the fact that using `ggplot2` you can make aesthetically pleasing graphs - there are many `ggplot2` functions to help you clearly communicate your results, and we will now go through some of them.

2. Making different plots with ggplot2

Open RStudio, select `File/New File/R script` and start writing your script with the help of this tutorial.

``````# Purpose of the script
# Your name, date and email

# Libraries - if you haven't installed them before, run the code install.packages("package_name")
library(tidyr)
library(dplyr)
library(ggplot2)
library(gridExtra)
``````

We will use data from the Living Planet Index, which you have already downloaded from the repository (Click on `Clone or Download/Download ZIP` and then unzip the files).

``````# Import data from the Living Planet Index - population trends of vertebrate species from 1970 to 2014
``````

The data are in wide format - the different years are column names, when really they should be rows in the same column. We will reshape the data using the `gather()` function from the `tidyr` package.

``````# Reshape data into long form
# By adding 9:53, we select columns from 9 to 53, the ones for the different years of monitoring
LPI2 <- gather(LPI, "year", "abundance", 9:53)
View(LPI2)
``````

There is an ‘X’ in front of all the years because when we imported the data, all column names become characters. R puts an ‘X’ in front of the years to turn the numbers into characters. Now that the years are rows, not columns, we need them to be proper numbers, so we will transform them using `parse_number()` from the `readr` package.

``````LPI2\$year <- parse_number(LPI2\$year)

# When manipulating data it's always good check if the variables have stayed how we want them
# Use the str() function
str(LPI2)

# Abundance is also a character variable, when it should be numeric, let's fix that
LPI2\$abundance <- as.numeric(LPI2\$abundance)
``````

This is a very large dataset, so for the first few graphs we will focus on how the population of one species has changed. Pick a species of your choice, make sure you spell it the same way as it is entered in the dataframe, in this example we are using the “Griffon vulture”, but you can use whatever species you want. To see what species are available use the following to get a list:

``````unique(LPI2\$`Common Name`)
``````

Then filter out just the records for that species using the following code, substituting `Common.Name` for your chosen species.

``````vulture <- filter(LPI2, Common.Name == "Griffon vulture / Eurasian griffon")

# There are a lot of NAs in this dataframe, so we will get rid of the empty rows using na.omit()
vulture <- na.omit(vulture)
``````

2a. Histograms to visualise data distribution

We will do a quick comparison between base R graphics and `ggplot2` - of course both can make good graphs when used well, but here at Coding Club, we like working with `ggplot2`.

``````# With base R graphics
base_hist <- hist(vulture\$abundance)
# For another way to check whether your data is normally distributed, you can either create density plots using package ggpubr and command ggdensity() OR use functions qqnorm() and qqline()
``````

Note that putting your entire ggplot code in brackets () creates the graph and then shows it in the plot viewer. If you don’t have the brackets, you’ve only created the object, but haven’t visualized it. You would then have to call the object such that it will be displayed by just typing `vulture_hist` after you’ve created the “vulture_hist” object.

``````# With ggplot2: creating graph with no brackets
vulture_hist <- ggplot(vulture, aes(x = abundance))  +
geom_histogram()

# Calling the object to display it in the plot viewer
vulture_hist

# With brackets: you create and display the graph at the same time
(vulture_hist <- ggplot(vulture, aes(x = abundance))  +
geom_histogram())
``````

The ggplot one is a bit prettier, but the default ggplot settings are not ideal, there is lots of unnecessary grey space behind the histogram, the axes labels are quite small, and the bars blend with each other; so lets beautify the histogram a bit. This is where the true power of `ggplot2` shines! For more information and follow up on ggplot, check out this follow up ggplot tutorial.

``````(vulture_hist <- ggplot(vulture, aes(x = abundance)) +
geom_histogram(binwidth = 250, colour = "#8B5A00", fill = "#CD8500") +    # Changing the binwidth and colours
geom_vline(aes(xintercept = mean(abundance)),                       # Adding a line for mean abundance
colour = "red", linetype = "dashed", size=1) +           # Changing the look of the line
theme_bw() +                                                      # Changing the theme to get rid of the grey background
ylab("Count\n") +                                                   # Changing the text of the y axis label
xlab("\nGriffon vulture abundance")  +                              # \n adds a blank line between axis and text
theme(axis.text = element_text(size = 12),                          # Changing font size of axis labels and title
axis.title.x = element_text(size = 14, face = "plain"),       # face="plain" is the default, you can change it to italic, bold, etc.
panel.grid = element_blank(),                                 # Removing the grey grid lines
plot.margin = unit(c(1,1,1,1), units = , "cm")))              # Putting a 1 cm margin around the plot

# We can see from the histogram that the data are very skewed - a typical distribution of count abundance data
``````

Figure 1. Histogram of Griffon vulture abundance in populations included in the LPI dataset. Red line shows mean abundance.

Pressing enter after each “layer” of your plot (i.e. indenting it) prevents the code from being one gigantic line and makes it much easier to read.

Learning how to use colourpicker

In the code above, you can see a colour code `colour = "#8B5A00"` - each colour has a code, called a “hex code”, a combination of letters and numbers. You can get the codes for different colours online, from Paint, Photoshop or similar programs, or even from RStudio, which is very convenient! There is an RStudio Colourpicker addin - to install it, run the following code:

``````install.packages("colourpicker")
``````

To find out what is the code for a colour you like, click on `Addins/Colour picker`.

When you click on `All R colours` you will see lots of different colours you can choose from - a good colour scheme makes your graph stand out, but of course, don’t go crazy with the colours. When you click on `1`, and then on a certain colour, you fill up `1` with that colour, same goes for `2`, `3` - you can add more colours with the `+`, or delete them by clicking the bin. Once you’ve made your pick, click `Done`. You will see a line of code `c("#8B5A00", "#CD8500")` appear - in this case, we just need the colour code, so we can copy that, and delete the rest. Try changing the colour of the histogram you made just now.

2b. Scatter plot to examine how Griffon vulture populations have changed between 1970 and 2017 in Croatia and Italy

``````# Filtering the data to get records only from Croatia and Italy using the `filter()` function from the `dplyr` package
vultureITCR <- filter(vulture, Country.list %in% c("Croatia", "Italy"))

# Using default base graphics
plot(vultureITCR\$year, vultureITCR\$abundance, col = c("#1874CD", "#68228B"))

# Using default ggplot2 graphics
(vulture_scatter <- ggplot(vultureITCR, aes (x = year, y = abundance, colour = Country.list)) +
geom_point())
``````

Hopefully by now we’ve convinced you of the perks of ggplot2, but again like with the histogram, the graph above needs a bit more work. You might have noticed that sometimes we have the `colour =` argument surrounded by `aes()` and sometimes we don’t. If you are designating colours based on a certain variable in your data, like here `colour = Country.list`, then that goes in the `aes()` argument. If you just want to give the lines, dots or bars a certain colour, then you can use e.g. `colour = "blue"` and that does not need to be surrounded by `aes()`.

``````(vulture_scatter <- ggplot(vultureITCR, aes (x = year, y = abundance, colour = Country.list)) +
geom_point(size = 2) +                                               # Changing point size
geom_smooth(method = "lm", aes(fill = Country.list)) +               # Adding linear model fit, colour-code by country
theme_bw() +
scale_fill_manual(values = c("#EE7600", "#00868B")) +                # Adding custom colours
scale_colour_manual(values = c("#EE7600", "#00868B"),                # Adding custom colours
labels = c("Croatia", "Italy")) +                # Adding labels for the legend
ylab("Griffon vulture abundance\n") +
xlab("\nYear")  +
theme(axis.text.x = element_text(size = 12, angle = 45, vjust = 1, hjust = 1),     # making the years at a bit of an angle
axis.text.y = element_text(size = 12),
axis.title = element_text(size = 14, face = "plain"),
panel.grid = element_blank(),                                   # Removing the background grid lines
plot.margin = unit(c(1,1,1,1), units = , "cm"),                 # Adding a 1cm margin around the plot
legend.text = element_text(size = 12, face = "italic"),         # Setting the font for the legend text
legend.title = element_blank(),                                 # Removing the legend title
legend.position = c(0.9, 0.9)))                      # Setting legend position - 0 is left/bottom, 1 is top/right
``````

Figure 2. Population trends of Griffon vulture in Croatia and Italy. Data points represent raw data with a linear model fit and 95% confidence intervals. Abundance is measured in number of breeding individuals.

If your axis labels need to contain fancy characters or superscript, you can get `ggplot2` to plot that, too. It might require some googling regarding your specific case, but for example, this code `ylabs(expression(paste("Grain yield"," ","(ton.", ha^-1,")", sep="")))` will create a y axis with a Grain yield ton. ha^-1 label.

2c. Boxplot to examine whether vulture abundance differs between Croatia and Italy

``````(vulture_boxplot <- ggplot(vultureITCR, aes(`Country list`, abundance)) + geom_boxplot())

# Beautifying

(vulture_boxplot <- ggplot(vultureITCR, aes(Country.list, abundance)) + geom_boxplot(aes(fill = Country.list)) +
theme_bw() +
scale_fill_manual(values = c("#EE7600", "#00868B")) +               # Adding custom colours
scale_colour_manual(values = c("#EE7600", "#00868B")) +             # Adding custom colours
ylab("Griffon vulture abundance\n") +
xlab("\nCountry")  +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14, face = "plain"),
panel.grid = element_blank(),                                 # Removing the background grid lines
plot.margin = unit(c(1,1,1,1), units = , "cm"),               # Adding a margin
legend.position = "none"))                                    # Removing legend - not needed with only 2 factors
``````

Figure 3. Griffon vulture abundance in Croatia and Italy.

2d. Barplot to examine the species richness of a few European countries

``````# Calculating species richness using pipes %>% from the dplyr package
richness <- LPI2 %>% filter (Country.list %in% c("United Kingdom", "Germany", "France", "Netherlands", "Italy")) %>%
group_by(Country.list) %>%
mutate(richness = (length(unique(Common.Name)))) # create new column based on how many unique common names (or species) there are in each country

(richness_barplot <- ggplot(richness, aes(x = Country.list, y = richness)) +
geom_bar(position = position_dodge(), stat = "identity", colour = "black", fill = "#00868B") +
theme_bw() +
ylab("Species richness\n") +
xlab("Country")  +
theme(axis.text.x = element_text(size = 12, angle = 45, vjust = 1, hjust = 1),  # Angled labels, so text doesn't overlap
axis.text.y = element_text(size = 12),
axis.title = element_text(size = 14, face = "plain"),
panel.grid = element_blank(),
plot.margin = unit(c(1,1,1,1), units = , "cm")))
``````

Figure 4. Species richness in five European countries. Based on LPI data.

You might be picking up on the fact that we repeat a lot of the same code - same font size, same margins, etc. Less repetition makes for tidier code and it’s important to have consistent formatting across graphs for the same project, so please check out our tutorial on writing your own functions to learn how to use functions and loops. We also have some guidance on making your own `ggplot2` theme in this second follow up tutorial - you can now reuse this theme in all your ggplots!

Arranging plots in a panel using `grid.arrange()` from the package `gridExtra`

``````grid.arrange(vulture_hist, vulture_scatter, vulture_boxplot, ncol = 1)

# This doesn't look right - the graphs are too stretched, the legend and text are all messed up, the white margins are too big
# Fixing the problems - adding ylab() again overrides the previous settings

(panel <- grid.arrange(vulture_hist + ggtitle("(a)") + ylab("Count") + xlab("Abundance") +   # adding labels to the different plots
theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), units = , "cm")),
vulture_boxplot + ggtitle("(b)") + ylab("Abundance") + xlab("Country") +
theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), units = , "cm")),
vulture_scatter + ggtitle("(c)") + ylab("Abundance") + xlab("Year") +
theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), units = , "cm")) +
theme(legend.text = element_text(size = 12, face = "italic"),
legend.title = element_blank(),
legend.position = c(0.85, 0.85)), # changing the legend position so that it fits within the panel
ncol = 1)) # ncol determines how many columns you have
``````

If you want to change the width or height of any of your pictures, you can add either ` + width = c(1, 1, 1)` or ` + height = c(0.8, 0.8, 0.8)` for example, to the end of your grid arrange command. This is helpful when you have different sized figures or if you want to highlight the most important figure in your panel.

To get around the too stretched/too squished panel problems, we will save the file and give it exact dimensions using `ggsave` from the `ggplot2` package. The default `width` and `height` are measured in inches. If you want to swap to pixels or centimeters, you can add `units = "px"` or `units = "cm"` inside the `ggsave()` brackets, e.g. `ggsave(object, filename = "mymap.png", width = 1000, height = 1000, units = "px"`. The file will be saved to wherever your working directory is, which you can check by running `getwd()` in the console.

``````ggsave(panel, file = "vulture_panel2.png", width = 5, height = 12)
``````

Figure 5. Examining Griffon vulture populations from the LPI dataset. (a) shows histogram of abundance data distribution, (b) shows a boxplot comparison of abundance in Croatia and Italy, and (c) shows population trends between 1970 and 2014 in Croatia and Italy.

A team figure beautification challenge

To practice making graphs, open the `Graph_challenge.R` script file that you unzipped from the repository at the start of this tutorial and follow the instructions. Once you have made your figures, please upload them to this Google Drive folder.

Check out this page to learn how you can get involved! We are very happy to have people use our tutorials and adapt them to their needs. We are also very keen to expand the content on the website, so feel free to get in touch if you’d like to write a tutorial!