Beautiful and informative data visualisation
By Gergana on January 29, 2017

Tutorial Aims:
1. Get familiar with the ggplot2 syntax
2. Practice making different plots with ggplot2
3. Learn to arrange graphs in a panel and to save files
Note : all the files you need to complete this tutorial can be downloaded from this repository. Clone and download the repo as a zip file, then unzip it.
1. Good data visualisation and ggplot2 syntax
We’ve learned how to import our data in RStudio, format and manipulate them, and now it’s time we talk about communicating the results of our analyses - data visualisation! When it comes to data visualisation, the package ggplot2
by Hadley Wickham has won over many scientists’ hearts. In this tutorial, we will learn how to make beautiful and informative graphs and how to arrange them in a panel. Before we take on the ggplot2
syntax, let’s briefly cover what good graphs have in common.

ggplot2
is a great package to guide you through those steps. The gg
in ggplot2
stands for grammar of graphics. Writing the code for your graph is like constructing a sentence made up of different parts that logically follow from one another. In a data visualisation context, the different elements of the code represent layers - first you make an empty plot, then you add a layer with your data points, then your measure of uncertainty, the axis labels and so on.
When using ggplot2
, you usually start your code with ggplot(your_data, aes(x=independent_variable, y=dependent_variable))
, then you add the type of plot you want to make using + geom_boxplot()
, + geom_histogram()
, etc. aes
stands for aesthetics, hinting to the fact that using ggplot2
you can make aesthetically pleasing graphs - there are many ggplot2
functions to help you clearly communicate your results, and we will now go through some of them.
2. Making different plots with ggplot2
Open RStudio, select File/New File/R script
and start writing your script with the help of this tutorial.
# Purpose of the script
# Your name, date and email
# Your working directory, set to the folder you just downloaded from Github, e.g.:
setwd("~/Downloads/CC-4-Datavis-master")
# Libraries - if you haven't installed them before, run the code install.packages("package_name")
library(tidyr)
library(dplyr)
library(ggplot2)
library(readr)
library(gridExtra)
We will use data from the Living Planet Index, which you have already downloaded from the repository (Click on Clone or Download/Download ZIP
and then unzip the files).
# Import data from the Living Planet Index - population trends of vertebrate species from 1970 to 2014
LPI <- read.csv("LPIdata_CC.csv")
The data are in wide format - the different years are column names, when really they should be rows in the same column. We will reshape the data using the gather()
function from the tidyr
package.
# Reshape data into long form
# By adding 9:53, we select columns from 9 to 53, the ones for the different years of monitoring
LPI2 <- gather(LPI, "year", "abundance", 9:53)
View(LPI2)
There is an ‘X’ in front of all the years because when we imported the data, all column names become characters. R puts an ‘X’ in front of the years to turn the numbers into characters. Now that the years are rows, not columns, we need them to be proper numbers, so we will transform them using parse_number()
from the readr
package.
LPI2$year <- parse_number(LPI2$year)
# When manipulating data it's always good check if the variables have stayed how we want them
# Use the str() function
str(LPI2)
# Abundance is also a character variable, when it should be numeric, let's fix that
LPI2$abundance <- as.numeric(LPI2$abundance)
This is a very large dataset, so for the first few graphs we will focus on how the population of one species has changed. Pick a species of your choice, make sure you spell it the same way as it is entered in the dataframe, in this example we are using the “Griffon vulture”, but you can use whatever species you want. To see what species are available use the following to get a list:
unique(LPI2$Common.Name)
Then filter out just the records for that species using the following code, substituting Common.Name
for your chosen species.
vulture <- filter(LPI2, Common.Name == "Griffon vulture / Eurasian griffon")
head(vulture)
# There are a lot of NAs in this dataframe, so we will get rid of the empty rows using na.omit()
vulture <- na.omit(vulture)
Histogram to visualise data distribution
We will do a quick comparison between base R graphics and ggplot2
- of course both can make good graphs when used well, but here at Coding Club, we like working with ggplot2
.
# With base R graphics
base_hist <- hist(vulture$abundance)
# For another way to check whether your data is normally distributed, you can either create density plots using package ggpubr and command ggdensity() OR use functions qqnorm() and qqline()
# With ggplot2
vulture_hist <- ggplot(vulture, aes(x=abundance)) +
geom_histogram()
# putting your entire ggplot code in () creates the graph and shows it in the plot viewer. Without brackets, you have to call the object, so that it gets displayed, e.g.
vulture_hist
# Here is how it looks with the brackets
(vulture_hist <- ggplot(vulture, aes(x=abundance)) +
geom_histogram())


The ggplot one is a bit prettier, but the default ggplot settings are not ideal, there is lots of unnecessary grey space behind the histogram, the axes labels are quite small, and the bars blend with each other; so lets beautify the histogram a bit. This is where the true power of ggplot2
shines!
(vulture_hist <- ggplot(vulture, aes(x=abundance)) +
geom_histogram(binwidth=250, colour="#8B5A00", fill="#CD8500") + # Changing the binwidth and colours
geom_vline(aes(xintercept=mean(abundance)), # Adding a line for mean abundance
colour="red", linetype="dashed", size=1) + # Changing the look of the line
theme_bw() + # Changing the theme to get rid of the grey background
ylab("Count\n") + # Changing the text of the y axis label
xlab("\nGriffon vulture abundance") + # \n adds a blank line
theme(axis.text.x=element_text(size=12), # Changing font size of axis labels
axis.text.y=element_text(size=12),
axis.title.x=element_text(size=14, face="plain"), # Changing font size of axis titles
axis.title.y=element_text(size=14, face="plain"), # face="plain" changes font type, could also be italic, etc
panel.grid.major.x=element_blank(), # Removing the grey grid lines
panel.grid.minor.x=element_blank(),
panel.grid.minor.y=element_blank(),
panel.grid.major.y=element_blank(),
plot.margin = unit(c(1,1,1,1), units = , "cm"))) # Putting a 1 cm margin around the plot
# We can see from the histogram that the data are very skewed - a typical distribution of count abundance data
Figure 1. Histogram of Griffon vulture abundance in populations included in the LPI dataset. Red line shows mean abundance.
Pressing enter after each “layer” of your plot (i.e. indenting it) prevents the code from being one gigantic line and makes it much easier to read.
In the code above you can see a colour code colour="#8B5A00"
- each colour has a code, called a “hex code”, a combination of letters and numbers. You can get the codes for different colours online, from Paint, Photoshop or similar programs, or even from RStudio, which is very convenient! There is an RStudio Colourpicker addin - to install it, run the following code:
install.packages("colourpicker")
To find out what is the code for a colour you like, click on Addins/Colour picker
.

When you click on All R colours
you will see lots of different colours you can choose from - a good colour scheme makes your graph stand out, but of course, don’t go crazy with the colours. When you click on 1
, and then on a certain colour, you fill up 1
with that colour, same goes for 2
, 3
- you can add mode colours with the +
, or delete them by clicking the bin. Once you’ve made your pick, click Done
. You will see a line of code c("#8B5A00", "#CD8500")
appear - in this case, we just need the colour code, so we can copy that, and delete the rest. Try changing the colour of the histogram you made just now.

Scatter plot to examine how Griffon vulture populations have changed between 1970 and 2017 in Croatia and Italy
# Filtering the data to get records only from Croatia and Italy using the `filter()` function from the `dplyr` package
vultureITCR <- filter(vulture, Country.list == c("Croatia", "Italy"))
# Using default base graphics
plot(vultureITCR$year, vultureITCR$abundance, col = vultureITCR$Country.list)
# Using default ggplot2 graphics
(vulture_scatter <- ggplot(vultureITCR, aes (x=year, y=abundance, colour=Country.list)) +
geom_point())


Hopefully by now we’ve convinced you of the perks of ggplot2, but again like with the histogram, the graph above needs a bit more work.
You might have noticed that sometimes we have the colour =
argument surrounded by aes()
and sometimes we don’t. If you are designating colours based on a certain variable in your data, like here colour = Country.list
, then that goes in the aes()
argument. If you just want to give the lines, dots or bars a certain colour, then you can use e.g. colour = "blue"
and that does not need to be surrounded by aes()
.
(vulture_scatter <- ggplot(vultureITCR, aes (x=year, y=abundance, colour=Country.list)) +
geom_point(size=2) + # Changing point size
geom_smooth(method=lm, aes(fill=Country.list)) + # Adding a linear model fit and colour-coding by country
theme_bw() +
scale_fill_manual(values = c("#EE7600", "#00868B")) + # Adding custom colours
scale_colour_manual(values = c("#EE7600", "#00868B"), # Adding custom colours
labels=c("Croatia", "Italy")) + # Adding labels for the legend
ylab("Griffon vulture abundance\n") +
xlab("\nYear") +
theme(axis.text.x=element_text(size=12, angle=45, vjust=1, hjust=1), # making the years at a bit of an angle
axis.text.y=element_text(size=12),
axis.title.x=element_text(size=14, face="plain"),
axis.title.y=element_text(size=14, face="plain"),
panel.grid.major.x=element_blank(), # Removing the background grid lines
panel.grid.minor.x=element_blank(),
panel.grid.minor.y=element_blank(),
panel.grid.major.y=element_blank(),
plot.margin = unit(c(1,1,1,1), units = , "cm")) + # Adding a 1cm margin around the plot
theme(legend.text = element_text(size=12, face="italic"), # Setting the font for the legend text
legend.title = element_blank(), # Removing the legend title
legend.position=c(0.9, 0.9))) # Setting the position for the legend - 0 is left/bottom, 1 is top/right
Figure 2. Population trends of Griffon vulture in Croatia and Italy. Data points represent raw data with a linear model fit and 95% confidence intervals. Abundance is measured in number of breeding individuals.
If your axis labels need to contain fancy characters or superscript, you can get ggplot2
to plot that, too. It might require some googling regarding your specific case, but for example, this code ylabs(expression(paste("Grain yield"," ","(ton.", ha^-1,")", sep="")))
will create a y axis with a Grain yield ton. ha^-1 label.
Boxplot to examine whether vulture abundance differs between Croatia and Italy
(vulture_boxplot <- ggplot (vultureITCR, aes(Country.list, abundance)) + geom_boxplot())
# Beautifying
(vulture_boxplot <- ggplot (vultureITCR, aes(Country.list, abundance)) + geom_boxplot(aes(fill=Country.list)) +
theme_bw() +
scale_fill_manual(values = c("#EE7600", "#00868B")) + # Adding custom colours
scale_colour_manual(values = c("#EE7600", "#00868B")) + # Adding custom colours
ylab("Griffon vulture abundance\n") +
xlab("\nCountry") +
theme(axis.text.x=element_text(size=12),
axis.text.y=element_text(size=12),
axis.title.x=element_text(size=14, face="plain"),
axis.title.y=element_text(size=14, face="plain"),
panel.grid.major.x=element_blank(), # Removing the background grid lines
panel.grid.minor.x=element_blank(),
panel.grid.minor.y=element_blank(),
panel.grid.major.y=element_blank(),
plot.margin = unit(c(1,1,1,1), units = , "cm"), # Adding a margin
legend.position="none")) # Removing the legend - not needed with only two factors
Figure 3. Griffon vulture abundance in Croatia and Italy.
Barplot to examine the species richness of a few European countries
# Calculating species richness using pipes ``%>%` from the `dplyr` package
richness <- LPI2 %>% filter (Country.list == c("United Kingdom", "Germany", "France", "Netherlands", "Italy")) %>%
group_by(Country.list) %>%
mutate (., richness=(length(unique(Common.Name))))
(richness_barplot <- ggplot(richness, aes(x=Country.list, y=richness)) +
geom_bar(position=position_dodge(), stat="identity", colour="black", fill="#00868B") +
theme_bw() +
ylab("Species richness\n") +
xlab("Country") +
theme(axis.text.x=element_text(size=12, angle=45, vjust=1, hjust=1), # x axis labels angled, so that text doesn't overlap
axis.text.y=element_text(size=12),
axis.title.x=element_text(size=14, face="plain"),
axis.title.y=element_text(size=14, face="plain"),
panel.grid.major.x=element_blank(),
panel.grid.minor.x=element_blank(),
panel.grid.minor.y=element_blank(),
panel.grid.major.y=element_blank(),
plot.margin = unit(c(1,1,1,1), units = , "cm")))
Figure 4. Species richness in five European countries. Based on LPI data.
You might be picking up on the fact that we repeat a lot of the same code - same font size, same margins, etc. Less repetition makes for tidier code and it’s important to have consistent formatting across graphs for the same project, so please check out our tutorial on writing your own functions to learn how to make your own ggplot2
theme, that you can re use in all your ggplots!
Arranging plots in a panel using grid.arrange()
from the package gridExtra
grid.arrange(vulture_hist, vulture_scatter, vulture_boxplot, ncol=1)
# This doesn't look right - the graphs are too stretched, the legend and text are all messed up, the white margins are too big
# Fixing the problems - adding ylab() again overrides the previous settings
panel <- grid.arrange(vulture_hist + ggtitle("(a)") + ylab("Count") + xlab("Abundance") + # adding labels to the different plots
theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), units = , "cm")),
vulture_boxplot + ggtitle("(b)") + ylab("Abundance") + xlab("Country") +
theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), units = , "cm")),
vulture_scatter + ggtitle("(c)") + ylab("Abundance") + xlab("Year") +
theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), units = , "cm")) +
theme(legend.text = element_text(size=12, face="italic"),
legend.title = element_blank(),
legend.position=c(0.85, 0.85)), # changing the legend position so that it fits within the panel
ncol=1) # ncol determines how many columns you have
To get around the too stretched/too squished panel problems, we will save the file and give it exact dimensions using ``ggsave`.
ggsave(panel, file="vulture_panel2.png", width=5, height=12) # the file is saved in your working directory, find it with getwd()
Figure 5. Examining Griffon vulture populations from the LPI dataset. (a) shows histogram of abundance data distribution, (b) shows a boxplot comparison of abundance in Croatia and Italy, and (c) shows population trends between 1970 and 2014 in Croatia and Italy.
A team figure beautification challenge
To practice making graphs, open the Graph_challenge.R
script file that you unzipped from the repository at the start of this tutorial and follow the instructions. Once you have made your figures, please upload them to this Google Drive folder.
We would love to hear your feedback, please fill out our survey!
You can contact us with any questions on ourcodingclub@gmail.com
Related tutorials:
- - Quantifying and visualising population trends
- - Data visualisation 2
- - Intro to data clustering
- - Working efficiently with large datasets
- - Getting Started with Shiny
- - Beautiful and informative data visualisation
- - Spatial Data and Maps