This challenge will require the use of data manipulation, visualisation and analysis skills, and is the culmination of the MASTERING MODELLING course stream. You will find here all the instructions you need to complete the challenge.

Challenge outline and objectives

The Isle of May, located on the east coast of Scotland, is a nature reserve home to large colonies of seabirds. A long-term monitoring programme is in place, and every year, scientists and volunteers record information about the abundance, breeding success, and diet of seabirds such as puffins, fulmars, kittiwakes, shags, guillemots, and razorbills. There is concern that with changing climate, the abundance of sandeels (and the plankton upon which they depend), a favourite food resource for birds, will decrease or shift temporally so that the availability will be reduced at the critical time of breeding and chick rearing.

Your mission will be to analyse the breeding success and other behaviours of seabirds compiled and summarised by the Centre for Ecology and Hydrology to assess the health of these seabird populations. You will look for temporal trends, but also for environmental factors which may influence the breeding of the birds.

Data overview

You will use the following datasets, available to download from the Challenge repository on GitHub. To be able to answer the quiz questions properly, it is important that you use these datasets and not potentially updated versions available through the original providers.

CEH’s Isle of May Long-Term Study data

Climate data from the Met Office

The dataset also contains seasonal averages of these variables. Winter: Dec-Feb, Spring: Mar-May, Summer: June-Aug, Autumn: Sept-Nov. (For winter, year refers to Jan/Feb). Original data link here.

Specific tasks

Here is a detailed list of the tasks you should achieve within this challenge. Remember that a challenge is meant to be, well, challenging, and therefore we are setting you goals but the choice of workflow and functions to achieve them is up to you! We also list the questions that will be asked in the quiz at the end to confirm your successful completion - we suggest you take note of your answers as you go.

You will import the breeding success data, and plot the time series and a line of best fit for each species. Specifically, you should:

Be prepared to answer the questions:

2. Does climate affect breeding success?

There is growing evidence that climate change affects the dynamics of seabird populations, for instance by disrupting the timing and availability of food resources such as sandeels (and the plankton upon which eels depend).

You will design a hierarchical model to test for the influence of climate on breeding success. First, you may assume that species might show similar responses and therefore want to predict seabird breeding success as a function of climate only, with other factors perhaps introducing some non-independence in the data.

Specifically, you should:

Be prepared to answer the questions:

Remember that we are working with summarised data rather than the raw data, which limits our modelling options. If we had access to the raw dataset, it would contain counts (integer) of actual fledglings per nest, with a row for each of the hundreds of nests surveyed. With this in mind, have a think about:

3. Dive deeper! (Optional)

The Dive times and depths dataset contains information about the diving behaviour of monitored seabirds. It is a fairly large dataset with some interesting features, and is therefore ideal to test your data manipulation skills. So if you feel like going further in your data wrangling and modelling journey, try to answer the following questions:

The dataset contains the logged start and end time of each dive – to get you started, you’ll need to convert these to POSIXct format and calculate the duration of the interval. Don’t forget to remove the obvious outliers of impossible dive times (very long or negative), probably indicative of logger failure!

For an extra challenge, why don’t you try answering those same questions using a Bayesian framework?

Getting started

Download the challenge repository, which contains all the data you need, and create a new script for your challenge. Refer to this page to make sure you are answering all the questions.

There is no script or code provided for this challenge: how you go about solving the tasks is entirely up to you! You may want to refer to the tutorials listed below (and other online resources).

Finished? Take the quiz!

Once you have a fully working script and have completed the specific tasks, take the quiz.

Help & hints

Here is a list of tutorials that might help you:

Need a hint? Just click on a question to expand.

How do I avoid running copying my linear model code for the six different species?

There is a handy package in the tidyverse called broom. We suggest you take a look at the tidy and glance functions. Combined to some of our favourite dplyr functions for grouping, you’ll be unstoppable!

How do I bring the climate data into all this?

The first thing you probably want to do is to subset the climate data to the period and variables of interest: the filter function will be your friend here.

Then, find a variable that is shared by both datasets (there’s only one!) and merge or join them together.

We love getting your feedback, and will add more hints to this section if you get in touch and tell us where you struggled in this challenge!


We thank all the organisations that provided open access data for this challenge. The datasets licences are as follow:

*Newell, M.; Harris, M.P.; Wanless, S.; Burthe, S.; Bogdanova, M.; Gunn, C.M.; Daunt, F. (2016). The Isle of May long-term study (IMLOTS) seabird annual breeding success 1982-2016. NERC Environmental Information Data Centre. (Dataset). (available under an Open Government Licence)

*Dunn, R.E.; Wanless, S.; Green, J.A.; Harris, M.P.; Daunt, F. (2019). Dive times and depths of auks (Atlantic puffin, common guillemot and razorbill) from the Isle of May outside the seabird breeding season. NERC Environmental Information Data Centre. (Dataset). (available under an Open Government Licence)

*Met Office (2019). Regional time series of monthly, seasonal and annual values. Available from the Met Office Datasets page under an Open Government License. Crown Copyright.

bug icon

Get in touch

Bee in your bonnet? Technical issues? Don't hesitate to get in touch with any questions or suggestions concerning the course. Please keep in mind that this is a brand new course and we are still testing and implementing some features, so if you notice errors or some areas of the site are not working as they should, please tell us!