Beautiful and informative data visualisation


Tutorial Aims:

1. Get familiar with the ggplot2 syntax

2. Practice making different plots with ggplot2

3. Learn to arrange graphs in a panel and to save files

Аll the files you need to complete this tutorial can be downloaded from this repository. Clone and download the repo as a zip file, then unzip it.

1. Good data visualisation and ggplot2 syntax

We’ve learned how to import our data in RStudio, format and manipulate them, and now it’s time we talk about communicating the results of our analyses - data visualisation! When it comes to data visualisation, the package ggplot2 by Hadley Wickham has won over many scientists’ hearts. In this tutorial, we will learn how to make beautiful and informative graphs and how to arrange them in a panel. Before we take on the ggplot2 syntax, let’s briefly cover what good graphs have in common.


ggplot2 is a great package to guide you through those steps. The gg in ggplot2 stands for grammar of graphics. Writing the code for your graph is like constructing a sentence made up of different parts that logically follow from one another. In a data visualisation context, the different elements of the code represent layers - first you make an empty plot, then you add a layer with your data points, then your measure of uncertainty, the axis labels and so on.

When using ggplot2, you usually start your code with ggplot(your_data, aes(x = independent_variable, y = dependent_variable)), then you add the type of plot you want to make using + geom_boxplot(), + geom_histogram(), etc. aes stands for aesthetics, hinting to the fact that using ggplot2 you can make aesthetically pleasing graphs - there are many ggplot2 functions to help you clearly communicate your results, and we will now go through some of them.

2. Making different plots with ggplot2

Open RStudio, select File/New File/R script and start writing your script with the help of this tutorial.

# Purpose of the script
# Your name, date and email

# Your working directory, set to the folder you just downloaded from Github, e.g.:

# Libraries - if you haven't installed them before, run the code install.packages("package_name")

We will use data from the Living Planet Index, which you have already downloaded from the repository (Click on Clone or Download/Download ZIP and then unzip the files).

# Import data from the Living Planet Index - population trends of vertebrate species from 1970 to 2014
LPI <- read.csv("LPIdata_CC.csv")

The data are in wide format - the different years are column names, when really they should be rows in the same column. We will reshape the data using the gather() function from the tidyr package.

# Reshape data into long form
# By adding 9:53, we select columns from 9 to 53, the ones for the different years of monitoring
LPI2 <- gather(LPI, "year", "abundance", 9:53)

There is an ‘X’ in front of all the years because when we imported the data, all column names become characters. R puts an ‘X’ in front of the years to turn the numbers into characters. Now that the years are rows, not columns, we need them to be proper numbers, so we will transform them using parse_number() from the readr package.

LPI2$year <- parse_number(LPI2$year)

# When manipulating data it's always good check if the variables have stayed how we want them
# Use the str() function

# Abundance is also a character variable, when it should be numeric, let's fix that
LPI2$abundance <- as.numeric(LPI2$abundance)

This is a very large dataset, so for the first few graphs we will focus on how the population of one species has changed. Pick a species of your choice, make sure you spell it the same way as it is entered in the dataframe, in this example we are using the “Griffon vulture”, but you can use whatever species you want. To see what species are available use the following to get a list:

unique(LPI2$`Common Name`)

Then filter out just the records for that species using the following code, substituting Common.Name for your chosen species.

vulture <- filter(LPI2, Common.Name == "Griffon vulture / Eurasian griffon")

# There are a lot of NAs in this dataframe, so we will get rid of the empty rows using na.omit()
vulture <- na.omit(vulture)

2a. Histograms to visualise data distribution

We will do a quick comparison between base R graphics and ggplot2 - of course both can make good graphs when used well, but here at Coding Club, we like working with ggplot2.

# With base R graphics
base_hist <- hist(vulture$abundance)
# For another way to check whether your data is normally distributed, you can either create density plots using package ggpubr and command ggdensity() OR use functions qqnorm() and qqline()

Note that putting your entire ggplot code in brackets () creates the graph and then shows it in the plot viewer. If you don’t have the brackets, you’ve only created the object, but haven’t visualized it. You would then have to call the object such that it will be displayed by just typing vulture_hist after you’ve created the “vulture_hist” object.

# With ggplot2: creating graph with no brackets
vulture_hist <- ggplot(vulture, aes(x = abundance))  +
# Calling the object to display it in the plot viewer

# With brackets: you create and display the graph at the same time
(vulture_hist <- ggplot(vulture, aes(x = abundance))  +
Img Img

The ggplot one is a bit prettier, but the default ggplot settings are not ideal, there is lots of unnecessary grey space behind the histogram, the axes labels are quite small, and the bars blend with each other; so lets beautify the histogram a bit. This is where the true power of ggplot2 shines! For more information and follow up on ggplot, check out this follow up ggplot tutorial.

(vulture_hist <- ggplot(vulture, aes(x = abundance)) +
  geom_histogram(binwidth = 250, colour = "#8B5A00", fill = "#CD8500") +    # Changing the binwidth and colours
  geom_vline(aes(xintercept = mean(abundance)),                       # Adding a line for mean abundance
             colour = "red", linetype = "dashed", size=1) +           # Changing the look of the line
    theme_bw() +                                                      # Changing the theme to get rid of the grey background
  ylab("Count\n") +                                                   # Changing the text of the y axis label
  xlab("\nGriffon vulture abundance")  +                              # \n adds a blank line between axis and text
  theme(axis.text = element_text(size = 12),                          # Changing font size of axis labels and title
        axis.title.x = element_text(size = 14, face = "plain"),       # face="plain" is the default, you can change it to italic, bold, etc. 
        panel.grid = element_blank(),                                 # Removing the grey grid lines
        plot.margin = unit(c(1,1,1,1), units = , "cm")))              # Putting a 1 cm margin around the plot

# We can see from the histogram that the data are very skewed - a typical distribution of count abundance data


Figure 1. Histogram of Griffon vulture abundance in populations included in the LPI dataset. Red line shows mean abundance.

Pressing enter after each “layer” of your plot (i.e. indenting it) prevents the code from being one gigantic line and makes it much easier to read.

Learning how to use colourpicker

In the code above, you can see a colour code colour = "#8B5A00" - each colour has a code, called a “hex code”, a combination of letters and numbers. You can get the codes for different colours online, from Paint, Photoshop or similar programs, or even from RStudio, which is very convenient! There is an RStudio Colourpicker addin - to install it, run the following code:


To find out what is the code for a colour you like, click on Addins/Colour picker.


When you click on All R colours you will see lots of different colours you can choose from - a good colour scheme makes your graph stand out, but of course, don’t go crazy with the colours. When you click on 1, and then on a certain colour, you fill up 1 with that colour, same goes for 2, 3 - you can add more colours with the +, or delete them by clicking the bin. Once you’ve made your pick, click Done. You will see a line of code c("#8B5A00", "#CD8500") appear - in this case, we just need the colour code, so we can copy that, and delete the rest. Try changing the colour of the histogram you made just now.


2b. Scatter plot to examine how Griffon vulture populations have changed between 1970 and 2017 in Croatia and Italy

# Filtering the data to get records only from Croatia and Italy using the `filter()` function from the `dplyr` package
vultureITCR <- filter(vulture, Country.list %in% c("Croatia", "Italy"))

# Using default base graphics
plot(vultureITCR$year, vultureITCR$abundance, col = c("#1874CD", "#68228B"))

# Using default ggplot2 graphics
(vulture_scatter <- ggplot(vultureITCR, aes (x = year, y = abundance, colour = Country.list)) +
Img Img

Hopefully by now we’ve convinced you of the perks of ggplot2, but again like with the histogram, the graph above needs a bit more work. You might have noticed that sometimes we have the colour = argument surrounded by aes() and sometimes we don’t. If you are designating colours based on a certain variable in your data, like here colour = Country.list, then that goes in the aes() argument. If you just want to give the lines, dots or bars a certain colour, then you can use e.g. colour = "blue" and that does not need to be surrounded by aes().

(vulture_scatter <- ggplot(vultureITCR, aes (x = year, y = abundance, colour = Country.list)) +
    geom_point(size = 2) +                                               # Changing point size
    geom_smooth(method = "lm", aes(fill = Country.list)) +               # Adding linear model fit, colour-code by country
    theme_bw() +
    scale_fill_manual(values = c("#EE7600", "#00868B")) +                # Adding custom colours
    scale_colour_manual(values = c("#EE7600", "#00868B"),                # Adding custom colours
                        labels = c("Croatia", "Italy")) +                # Adding labels for the legend
    ylab("Griffon vulture abundance\n") +                             
    xlab("\nYear")  +
    theme(axis.text.x = element_text(size = 12, angle = 45, vjust = 1, hjust = 1),     # making the years at a bit of an angle
          axis.text.y = element_text(size = 12),
          axis.title = element_text(size = 14, face = "plain"),                        
          panel.grid = element_blank(),                                   # Removing the background grid lines               
          plot.margin = unit(c(1,1,1,1), units = , "cm"),                 # Adding a 1cm margin around the plot
          legend.text = element_text(size = 12, face = "italic"),         # Setting the font for the legend text
          legend.title = element_blank(),                                 # Removing the legend title
          legend.position = c(0.9, 0.9)))                      # Setting legend position - 0 is left/bottom, 1 is top/right


Figure 2. Population trends of Griffon vulture in Croatia and Italy. Data points represent raw data with a linear model fit and 95% confidence intervals. Abundance is measured in number of breeding individuals.

If your axis labels need to contain fancy characters or superscript, you can get ggplot2 to plot that, too. It might require some googling regarding your specific case, but for example, this code ylabs(expression(paste("Grain yield"," ","(ton.", ha^-1,")", sep=""))) will create a y axis with a Grain yield ton. ha^-1 label.

2c. Boxplot to examine whether vulture abundance differs between Croatia and Italy

(vulture_boxplot <- ggplot(vultureITCR, aes(`Country list`, abundance)) + geom_boxplot())

# Beautifying

(vulture_boxplot <- ggplot(vultureITCR, aes(Country.list, abundance)) + geom_boxplot(aes(fill = Country.list)) +
    theme_bw() +
    scale_fill_manual(values = c("#EE7600", "#00868B")) +               # Adding custom colours
    scale_colour_manual(values = c("#EE7600", "#00868B")) +             # Adding custom colours
    ylab("Griffon vulture abundance\n") +                             
    xlab("\nCountry")  +
    theme(axis.text = element_text(size = 12),
          axis.title = element_text(size = 14, face = "plain"),                     
          panel.grid = element_blank(),                                 # Removing the background grid lines               
          plot.margin = unit(c(1,1,1,1), units = , "cm"),               # Adding a margin
          legend.position = "none"))                                    # Removing legend - not needed with only 2 factors


Figure 3. Griffon vulture abundance in Croatia and Italy.

2d. Barplot to examine the species richness of a few European countries

# Calculating species richness using pipes %>% from the dplyr package
richness <- LPI2 %>% filter (Country.list %in% c("United Kingdom", "Germany", "France", "Netherlands", "Italy")) %>%
            group_by(Country.list) %>%
            mutate(richness = (length(unique(Common.Name)))) # create new column based on how many unique common names (or species) there are in each country 

(richness_barplot <- ggplot(richness, aes(x = Country.list, y = richness)) +
    geom_bar(position = position_dodge(), stat = "identity", colour = "black", fill = "#00868B") +
    theme_bw() +
    ylab("Species richness\n") +                             
    xlab("Country")  +
    theme(axis.text.x = element_text(size = 12, angle = 45, vjust = 1, hjust = 1),  # Angled labels, so text doesn't overlap
          axis.text.y = element_text(size = 12),
          axis.title = element_text(size = 14, face = "plain"),                      
          panel.grid = element_blank(),                                          
          plot.margin = unit(c(1,1,1,1), units = , "cm")))


Figure 4. Species richness in five European countries. Based on LPI data.

You might be picking up on the fact that we repeat a lot of the same code - same font size, same margins, etc. Less repetition makes for tidier code and it’s important to have consistent formatting across graphs for the same project, so please check out our tutorial on writing your own functions to learn how to use functions and loops. We also have some guidance on making your own ggplot2 theme in this second follow up tutorial - you can now reuse this theme in all your ggplots!

Arranging plots in a panel using grid.arrange() from the package gridExtra

grid.arrange(vulture_hist, vulture_scatter, vulture_boxplot, ncol = 1)

# This doesn't look right - the graphs are too stretched, the legend and text are all messed up, the white margins are too big
# Fixing the problems - adding ylab() again overrides the previous settings

(panel <- grid.arrange(vulture_hist + ggtitle("(a)") + ylab("Count") + xlab("Abundance") +   # adding labels to the different plots
                 theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), units = , "cm")),
               vulture_boxplot + ggtitle("(b)") + ylab("Abundance") + xlab("Country") +
                 theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), units = , "cm")),
               vulture_scatter + ggtitle("(c)") + ylab("Abundance") + xlab("Year") +
                 theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), units = , "cm")) +
                 theme(legend.text = element_text(size = 12, face = "italic"),               
                       legend.title = element_blank(),                                   
                       legend.position = c(0.85, 0.85)), # changing the legend position so that it fits within the panel
               ncol = 1)) # ncol determines how many columns you have

If you want to change the width or height of any of your pictures, you can add either ` + width = c(1, 1, 1) or + height = c(0.8, 0.8, 0.8)` for example, to the end of your grid arrange command. This is helpful when you have different sized figures or if you want to highlight the most important figure in your panel.

To get around the too stretched/too squished panel problems, we will save the file and give it exact dimensions using ggsave from the ggplot2 package. The default width and height are measured in inches. If you want to swap to pixels or centimeters, you can add units = "px" or units = "cm" inside the ggsave() brackets, e.g. ggsave(object, filename = "mymap.png", width = 1000, height = 1000, units = "px". The file will be saved to wherever your working directory is, which you can check by running getwd() in the console.

ggsave(panel, file = "vulture_panel2.png", width = 5, height = 12) 


Figure 5. Examining Griffon vulture populations from the LPI dataset. (a) shows histogram of abundance data distribution, (b) shows a boxplot comparison of abundance in Croatia and Italy, and (c) shows population trends between 1970 and 2014 in Croatia and Italy.

A team figure beautification challenge

To practice making graphs, open the Graph_challenge.R script file that you unzipped from the repository at the start of this tutorial and follow the instructions. Once you have made your figures, please upload them to this Google Drive folder.

Check out this page to learn how you can get involved! We are very happy to have people use our tutorials and adapt them to their needs. We are also very keen to expand the content on the website, so feel free to get in touch if you’d like to write a tutorial!

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Img

  We would love to hear your feedback, please fill out our survey!

  You can contact us with any questions on

  Related tutorials:

  Subscribe to our mailing list:

Back to blog