This challenge will require the use of data manipulation, visualisation and analysis skills, and is the culmination of the MASTERING MODELLING course stream. You will find here all the instructions you need to complete the challenge.
Data overview
You will use the following datasets, available to download from the Challenge repository on GitHub. To be able to answer the quiz questions properly, it is important that you use these datasets and not potentially updated versions available through the original providers.
CEH’s Isle of May Long-Term Study data
- Breeding success of sea birds (IMLOTSBSDataset1982-2016.csv): compiled as the number of chicks fledged per nest. Original data link here.
- Dive times and depths of auks (IoM_AukDiving.csv): from the Isle of May outside the breeding season, obtained by fitting birds with data loggers. Original data link here.
Climate data from the Met Office
- clim_east_Scotland.csv For the East of Scotland, extracted and compiled by us. It contains:
- minimum, mean and maximum monthly temperatures (°C)
- monthly sunshine (hours)
- monthly rainfall (mm)
The dataset also contains seasonal averages of these variables. Winter: Dec-Feb, Spring: Mar-May, Summer: June-Aug, Autumn: Sept-Nov. (For winter, year refers to Jan/Feb). Original data link here.
Specific tasks
Here is a detailed list of the tasks you should achieve within this challenge. Remember that a challenge is meant to be, well, challenging, and therefore we are setting you goals but the choice of workflow and functions to achieve them is up to you! We also list the questions that will be asked in the quiz at the end to confirm your successful completion - we suggest you take note of your answers as you go.
1. Temporal trends in breeding success
You will import the breeding success data, and plot the time series and a line of best fit for each species. Specifically, you should:
- Reshape the data for analysis, with a “species” column.
- Create a faceted plot showing the time series and a line of best fit for each species.
- Run a linear regression for each species and extract slopes, confidence intervals and goodness of fit information for these models. (Look at the Help & Hints section if you cannot find a way to automate this – you should not copy and paste your code six times!).
- Create a visualisation of your choice showing the slope estimate and confidence intervals for each species, so that is clear which slopes differ from zero.
Be prepared to answer the questions:
- From looking at the plot, which species (2) seem to have the greatest inter-annual variability in breeding success?
- From the model, which species has/have experienced a significant increase?
- From the model, Which species has/have experienced the strongest decrease?
- For which species did you get the best goodness of fit?
2. Does climate affect breeding success?
There is growing evidence that climate change affects the dynamics of seabird populations, for instance by disrupting the timing and availability of food resources such as sandeels (and the plankton upon which eels depend).
You will design a hierarchical model to test for the influence of climate on breeding success. First, you may assume that species might show similar responses and therefore want to predict seabird breeding success as a function of climate only, with other factors perhaps introducing some non-independence in the data.
Specifically, you should:
- Subset your breeding success dataset to exclude shags (if you’ve completed the first section, you probably saw that they’re not following the same trends as other species)
- Design a random-intercept mixed-model approach to answer the question. Use June max temperature (when chicks hatch and are reared) as the explanatory variable – but feel free to experiment with other possibly meaningful climate variables.
- Extract and plot the predicted values from the model using the ggeffects package, and overlay the raw data on the graph.
Be prepared to answer the questions:
- What are your random effects?
- Does June temperature affect the breeding success of seabirds?
Remember that we are working with summarised data rather than the raw data, which limits our modelling options. If we had access to the raw dataset, it would contain counts (integer) of actual fledglings per nest, with a row for each of the hundreds of nests surveyed. With this in mind, have a think about:
- What data distribution would you use to answer the same question as above?
- What random effect structure would you choose?
3. Dive deeper! (Optional)
The Dive times and depths dataset contains information about the diving behaviour of monitored seabirds. It is a fairly large dataset with some interesting features, and is therefore ideal to test your data manipulation skills. So if you feel like going further in your data wrangling and modelling journey, try to answer the following questions:
- Does dive depth vary among species, and between males and females of the same species?
- Does dive duration also vary?
The dataset contains the logged start and end time of each dive – to get you started, you’ll need to convert these to POSIXct format and calculate the duration of the interval. Don’t forget to remove the obvious outliers of impossible dive times (very long or negative), probably indicative of logger failure!
For an extra challenge, why don’t you try answering those same questions using a Bayesian framework?
Finished? Take the quiz!
Once you have a fully working script and have completed the specific tasks, take the quiz.
Help & hints
Here is a list of tutorials that might help you:
- Intro to model design
- Efficient data manipulation
- Intro to linear mixed models
- Working efficiently with large datasets: this one was not part of the stream but has some very useful snippets that might help you run multiple linear models and extract their outputs ( wink wink ).
Need a hint? Just click on a question to expand.
How do I avoid running copying my linear model code for the six different species?
There is a handy package in the tidyverse
called broom
. We suggest you take a look at the tidy
and glance
functions. Combined to some of our favourite dplyr
functions for grouping, you’ll be unstoppable!
How do I bring the climate data into all this?
The first thing you probably want to do is to subset the climate data to the period and variables of interest: the filter
function will be your friend here.
Then, find a variable that is shared by both datasets (there’s only one!) and merge
or join
them together.
We love getting your feedback, and will add more hints to this section if you get in touch and tell us where you struggled in this challenge!
Acknowledgements
We thank all the organisations that provided open access data for this challenge. The datasets licences are as follow:
*Newell, M.; Harris, M.P.; Wanless, S.; Burthe, S.; Bogdanova, M.; Gunn, C.M.; Daunt, F. (2016). The Isle of May long-term study (IMLOTS) seabird annual breeding success 1982-2016. NERC Environmental Information Data Centre. (Dataset). https://doi.org/10.5285/02c98a4f-8e20-4c48-8167-1cd5044c4afe (available under an Open Government Licence)
*Dunn, R.E.; Wanless, S.; Green, J.A.; Harris, M.P.; Daunt, F. (2019). Dive times and depths of auks (Atlantic puffin, common guillemot and razorbill) from the Isle of May outside the seabird breeding season. NERC Environmental Information Data Centre. (Dataset). https://doi.org/10.5285/6ab0ee70-96f8-41e6-a3e3-6f4c31fa5372 (available under an Open Government Licence)
*Met Office (2019). Regional time series of monthly, seasonal and annual values. Available from the Met Office Datasets page under an Open Government License. Crown Copyright.
Get in touch
Bee in your bonnet? Technical issues? Don't hesitate to get in touch with any questions or suggestions concerning the course. Please keep in mind that this is a brand new course and we are still testing and implementing some features, so if you notice errors or some areas of the site are not working as they should, please tell us!