This challenge will require the use of data manipulation, plotting and mapping skills, and is the culmination of the WIZ OF DATA VIS course stream. Scroll for more information on your tasks and how to complete the challenge.

Challenge outline and objectives

While Scotland is best known for its endless rolling heather hills, it used to be covered in wide swathes of forest. Less than 20% of Scotland is now afforested, and only 4% of the territory consists of native woodlands (Woodland Trust, Scottish Natural Heritage).

The Scottish government has included woodland expansion goals in its Climate Change plan, and several governmental and non governmental organisations are working towards the creation of new woodlands that will support native species and provide a wider range of ecosystem services than just timber.

You have been asked to provide a report on the extent and structure of some high-priority conservation habitats in national nature reserves (NNR) of Scotland. For selected woodland types, you are required to prepare maps of their distribution in the Cairngorms, the Glen Affric, and the Trossachs nature reserve areas. You have also been tasked to calculate their respective extent within the reserve boundaries, and some basic biodiversity indices.

Data overview

You will use the following datasets, available from the Challenge repository on GitHub. To be able to answer the quiz questions properly, it is important that you use these datasets and not potentially updated versions available through the original providers.

NOTE: The data files have been saved as RDS objects because of their relatively large size. You can easily read a RDS file in R using the readRDS() function.

Native Woodland Survey of Scotland

Original data link here and more information about the survey here.

Original data link here.

National Nature Reserves

Original data link here.

About spatial data

Two of the three datasets are shapefiles, which means that they contain geometric information that allow the data to be represented as shapes (polygons), points or lines. But don’t panic! When you import the files into R, you will see that you can preview and manipulate the data much like any other dataframe.

The spatial objects have been saved using the sf package which allows for integration with the tidyverse: the sf functions are pipe-friendly and you can pretty much do everything to a sf object that you would do to a regular dataframe (e.g. merge with another dataset, subset to some values or conditions, etc). Remember, in the end, a spatial dataset is just like any other dataset, with extra geographic information tucked in one column!

You will not have to do any complex spatial analysis for this, but the instructions will point you in the right direction when functions specific to the sf package might be needed. More hints can be found at the bottom of the page.

Specific tasks

Here is a detailed list of the tasks you should achieve within this challenge. Remember that a challenge is meant to be, well, challenging, and therefore we are setting you goals but the choice of workflow and functions to achieve them is up to you! We also list the questions that will be asked in the quiz at the end to confirm your successful completion - we suggest you take note of your answers as you go.

1. Clean the data

You will need to clean and filter the data to the sites and woodland types of interest. Specifically, you should:

NB: There are 6 more NNRs within the Cairngorms National Park, but these three are large ones within the core of the park, and the only ones we’ll be considering for this analysis.

HINT: Once you have filtered both datasets to only keep the regions and habitats of interest, the best way forward is to create one object that combines the two: i.e. only keep the habitats of interest that are found within the regions of interest. You may need some indepent research to figure it out, but only one function from the sf package is required to achieve this. To get you started, know that all sf functions begin with “st, and this type of spatial operation is called an _intersection

2. Map the areas of interest

Create a map for each of the three areas (Cairngorms, Trossachs, and Glen Affric) showing the geographical distribution of the priority habitats. Specifically, you should:

HINT: Producing a map is not very different than producing any other plot. The sf package integrates almost seamlessly with ggplot2, so you can use all your favourite ways of selecting colours based on factor levels, adding text and legends, etc. The only difference is that the sf objects are called in your plot through geom_sf.

3. Calculate the proportion of land (in %) covered by each habitat in the three areas.

The total NNR area is found in the cell SITE_HA, and the habitat polygon size is contained in the cell HECTARES. (Note that there are more than one polygon per habitat type! Think about grouping observations first.)

Specifically, you should:

Be prepared to answer the questions:

4. Calculate the species richness and evenness of the three areas.

Species richness simply corresponds to the number of different species in each area. (Tip: all the species information can be found in species_structure.RDS.)

Species evenness is a value between 0 (not even at all) and 1 (perfectly even) indicating how equitably species are represented, abundance-wise (i.e., is there one very dominant species, or are all species found in similar proportions?). A way of calculating this is to divide H’, the Shannon diversity index, by the natural logarithm (ln) of species richness that you have previously calculated. The Shannon diversity index is calculated as such:

_H’ = -1 * sum of all ( _p_i * ln(_p_i))__, where _p_i in our case is the proportion of species i cover (ESTIMT_HA) relative to the cover of all species.

Specifically, you should:

Be ready to answer the questions:

How to get started

Download the challenge repository, which contains all the data you need, and create a new script for your challenge. Refer to this page to make sure you are answering all the questions.

There is no script or code provided for this challenge: how you go about solving the tasks is entirely up to you! You may want to refer to the tutorials listed below (and other online resources).

Finished? Take the quiz!

Once you have a fully working script and have completed the specific tasks, take the quiz.

Help & hints

Here is a list of tutorials that might help you complete this challenge:

Need a hint? Just click on a question to expand.

How do I crop the NWSS to just the NNRs I want?

First, make sure that you have filtered both datasets to only keep the 6 habitats and 3 NNRs required. You can do this with the filter function from dplyr. Then, you need to do a spatial operation called an intersection with your two data objects, which will keep only the observations of A found within the boundaries of B. You can achieve this with st_intersection(A, B).

How do I plot spatial data?

You can plot sf objects from the comfort of ggplot2.

You can try something like: ggplot() + geom_sf(data = nwss, aes(fill = DOM_HABITA)) + theme_minimal()

Can I calculate the biodiversity metrics for the 3 sites at once?

Of course you can! Think of our favourite dplyr functions group_by() and summarise().

How do I make my colour scheme consistent across plots?

We have a tutorial that shows exactly how to create a custom colour palette.

We love getting your feedback, and will add more hints to this section if you get in touch and tell us where you struggled in this challenge!

Acknowledgements

We thank all the organisations that provided open access data for this challenge. The datasets licences are as follows:

bug icon

Get in touch


Bee in your bonnet? Technical issues? Don't hesitate to get in touch with any questions or suggestions concerning the course. Please keep in mind that this is a brand new course and we are still testing and implementing some features, so if you notice errors or some areas of the site are not working as they should, please tell us!