This challenge will require the use of data manipulation, plotting and linear modelling skills, and is the culmination of the STATS FROM SCRATCH course stream. Scroll for more information on your tasks and how to complete the challenge.

Challenge outline and objectives

Red squirrels, once widespread throughout the UK, have declined sharply in the last century following the introduction of grey squirrels from North America. Most of the remaining populations are now restricted to parts of Scotland, and still threatened by the expansion of grey squirrels, which are more competitive and carry the deadly squirrel pox.

Red squirrels are a protected species and, with conservation efforts from dedicated organisations, are able to maintain strongholds in various parts of Scotland. These organisations also collect information on red and grey squirrel sightings, and we will use these data in the challenge to learn more about red squirrel population trends and habitat preferences.

Data overview

You will use the following datasets, available from the Challenge Github repository on Github. To be able to answer the quiz questions properly, it is important that you use these datasets and not potentially updated versions available through the original providers.

The Scottish Squirrel Database

squirrels.csv: A dataset of grey and red squirrel observations compiled by the Scottish Wildlife Trust and hosted on the NBN Atlas. The most relevant variables in the dataset for this challenge are:

Forest cover

forestcoverOS.csv: This dataset contains the forest cover (in % and total area) in each OS grid cell. This dataset was created by us*, using:

Fancy a more advanced challenge? Why don’t you try re-creating this dataset yourself? (Best suited to someone with notions of spatial analysis: all you have to do is intersect the files and extract the area.)

Specific tasks

Here is a detailed list of the tasks you should achieve within this challenge. Remember that a challenge is meant to be, well, challenging, and therefore we are setting you goals but the choice of workflow and functions to achieve them is up to you! We also list the questions that will be asked in the quiz at the end to confirm your successful completion - we suggest you take note of your answers as you go.

1. Data manipulation

Clean the squirrel dataset for the last decade, so it’s ready to analyse. Specifically, you should:

Be prepared to answer the question:

To the nearest thousand, how large is your cleaned dataset?

Determine if there is a temporal trend in the number of observations for red and grey squirrels (2008-2017). Specifically, you should:

Be prepared to answer the questions:

Think about the following: what could be the reasons for this trend? Is it ecologically meaningful? Are there any biases in the data to be aware of?

3. Do red and grey squirrels prefer different habitats?

We usually think of grey squirrels as city dwellers, while red squirrels require extensive forest cover. Determine whether recent squirrel counts in OS grid cells (10km) are linked to forest cover in that cell. Specifically, you should:

Be prepared to answer the questions:

4. Re-classify forest cover

Building on the previous point, try turning the forest cover data into a categorical variable, and use the visual representation of your choice to display the median abundance of grey and red squirrels in these classes, and the uncertainty around these measures. Specifically, you should:

Be prepared to answer the question:

How to get started

Download the challenge Github repository, which contains all the data you need, and create a new script for your challenge. Refer to this page to make sure you are answering all the questions.

There is no script or code provided for this challenge: how you go about solving the tasks is entirely up to you! You may want to refer to the tutorials listed below (and other online resources).

Finished? Take the quiz!

Once you have a fully working script and have completed the specific tasks, take the quiz.

Help & hints

Here is a list of tutorials that might help you complete this challenge:

Need a hint? Just click on a question to expand

How do I remove unwanted data points

You can specify a variety of logical statements in the the filter() function from {dplyr}.

I can't figure out how to replace NA values with something else.

NA values are something special in R, and there are special functions to handle them. Take a look at the is.na() logical function, and see if you can use it within a mutate call to create a new column based on existing values.

You’ll want mutate to replace the value in a cell IF the original value was one, and ELSE you’ll want to keep the original value. Oh, hey, do you know the ifelse() function?

We love getting your feedback, and will add more hints to this section if you get in touch and tell us where you struggled in this challenge!

Acknowledgements

We thank all the organisations that provided open access data for this challenge. The datasets licences are as follow:

bug icon

Get in touch


Bee in your bonnet? Technical issues? Don't hesitate to get in touch with any questions or suggestions concerning the course. Please keep in mind that this is a brand new course and we are still testing and implementing some features, so if you notice errors or some areas of the site are not working as they should, please tell us!