This challenge will require the use of data manipulation, plotting and mapping skills, and is the culmination of the WIZ OF DATA VIS course stream. Scroll for more information on your tasks and how to complete the challenge.
Data overview
You will use the following datasets, available from the Challenge repository on GitHub. To be able to answer the quiz questions properly, it is important that you use these datasets and not potentially updated versions available through the original providers.
NOTE: The data files have been saved as RDS objects because of their relatively large size. You can easily read a RDS file in R using the readRDS()
function.
Native Woodland Survey of Scotland
- NWSS.RDS: a shapefile of all woodland patches in Scotland. The most important variables in the dataset are:
- DOM_HABITA: the main habitat type for the polygon. We will only retain some habitats of interest.
- HECTARES: the area of a given patch
Original data link here and more information about the survey here.
-
species_structure.RDS: a spreadsheet containing tree species information from the woodlands. The most important variables in the dataset are:
- SCPTDATA_I: a unique identifier code that will allow to match the observations to the spatial data in NWSS.RDS
- SPECIES: the name of the species recorded
- ESTIMT_HA: the estimated area, in hectares, covered by a given species at this location
Original data link here.
National Nature Reserves
- SNH_national_reserves.RDS: a shapefile containing the outlines of Scotland’s NNRs. The most important variables in the dataset are:
- NAME: The name of the reserve
- SITE_HA: The area of the site in hectares
Original data link here.
About spatial data
Two of the three datasets are shapefiles, which means that they contain geometric information that allow the data to be represented as shapes (polygons), points or lines. But don’t panic! When you import the files into R, you will see that you can preview and manipulate the data much like any other dataframe.
The spatial objects have been saved using the sf package which allows for integration with the tidyverse: the sf functions are pipe-friendly and you can pretty much do everything to a sf object that you would do to a regular dataframe (e.g. merge with another dataset, subset to some values or conditions, etc). Remember, in the end, a spatial dataset is just like any other dataset, with extra geographic information tucked in one column!
You will not have to do any complex spatial analysis for this, but the instructions will point you in the right direction when functions specific to the sf package might be needed. More hints can be found at the bottom of the page.
Specific tasks
Here is a detailed list of the tasks you should achieve within this challenge. Remember that a challenge is meant to be, well, challenging, and therefore we are setting you goals but the choice of workflow and functions to achieve them is up to you! We also list the questions that will be asked in the quiz at the end to confirm your successful completion - we suggest you take note of your answers as you go.
1. Clean the data
You will need to clean and filter the data to the sites and woodland types of interest. Specifically, you should:
- Restrict the NWSS observations to the following dominant habitat types:
- Native pinewood
- Upland birchwood
- Upland mixed ashwood
- Upland oakwood
- Wet woodland
- Lowland mixed deciduous woodland
- Restrict the NNR shapefile to the following areas, lump the last three under the same name, and rename as indicated:
- The Great Trossachs Forest (rename to “Trossachs”)
- Glen Affric (leave as such)
- Cairngorms (part of the “Cairngorms” group)
- Mar Lodge Estate (part of the “Cairngorms” group)
- Abernethy (part of the “Cairngorms” group)
NB: There are 6 more NNRs within the Cairngorms National Park, but these three are large ones within the core of the park, and the only ones we’ll be considering for this analysis.
HINT: Once you have filtered both datasets to only keep the regions and habitats of interest, the best way forward is to create one object that combines the two: i.e. only keep the habitats of interest that are found within the regions of interest. You may need some indepent research to figure it out, but only one function from the sf package is required to achieve this. To get you started, know that all sf functions begin with “st“, and this type of spatial operation is called an _intersection…
2. Map the areas of interest
Create a map for each of the three areas (Cairngorms, Trossachs, and Glen Affric) showing the geographical distribution of the priority habitats. Specifically, you should:
- Create a colour palette that you will use consistently to refer to the habitat types
- Produce a map for each region, complete with a legend. Be prepared to answer the question:
- What type(s) of priority habitat is (are) found in the Trossachs but not in the other two areas?
HINT: Producing a map is not very different than producing any other plot. The sf package integrates almost seamlessly with ggplot2, so you can use all your favourite ways of selecting colours based on factor levels, adding text and legends, etc. The only difference is that the sf objects are called in your plot through geom_sf.
3. Calculate the proportion of land (in %) covered by each habitat in the three areas.
The total NNR area is found in the cell SITE_HA, and the habitat polygon size is contained in the cell HECTARES. (Note that there are more than one polygon per habitat type! Think about grouping observations first.)
Specifically, you should:
- Create a graph of your choice to represent the proportion of each habitat within the three reserves.
Be prepared to answer the questions:
- What type of graph did you create?
- What proportion of Glen Affric is covered in pinewoods?
4. Calculate the species richness and evenness of the three areas.
Species richness simply corresponds to the number of different species in each area. (Tip: all the species information can be found in species_structure.RDS.)
Species evenness is a value between 0 (not even at all) and 1 (perfectly even) indicating how equitably species are represented, abundance-wise (i.e., is there one very dominant species, or are all species found in similar proportions?). A way of calculating this is to divide H’, the Shannon diversity index, by the natural logarithm (ln) of species richness that you have previously calculated. The Shannon diversity index is calculated as such:
_H’ = -1 * sum of all ( _p_i * ln(_p_i))__, where _p_i in our case is the proportion of species i cover (ESTIMT_HA) relative to the cover of all species.
Specifically, you should:
- Calculate the richness, the Shannon index, and the evenness for all three sites. (Hint: some pipe chains involving our favourite dplyr functions may be useful here!)
- Create a map that visually represents the difference in evenness among the three sites. (Think colour gradient.)
Be ready to answer the questions:
- Which area has the most species?
- Which area has the lowest evenness?
Finished? Take the quiz!
Once you have a fully working script and have completed the specific tasks, take the quiz.
Help & hints
Here is a list of tutorials that might help you complete this challenge:
Need a hint? Just click on a question to expand.
How do I crop the NWSS to just the NNRs I want?
First, make sure that you have filtered both datasets to only keep the 6 habitats and 3 NNRs required. You can do this with the filter
function from dplyr
.
Then, you need to do a spatial operation called an intersection with your two data objects, which will keep only the observations of A found within the boundaries of B. You can achieve this with st_intersection(A, B)
.
How do I plot spatial data?
You can plot sf
objects from the comfort of ggplot2
.
You can try something like: ggplot() + geom_sf(data = nwss, aes(fill = DOM_HABITA)) + theme_minimal()
Can I calculate the biodiversity metrics for the 3 sites at once?
Of course you can! Think of our favourite dplyr
functions group_by()
and summarise()
.
How do I make my colour scheme consistent across plots?
We have a tutorial that shows exactly how to create a custom colour palette.
We love getting your feedback, and will add more hints to this section if you get in touch and tell us where you struggled in this challenge!
Acknowledgements
We thank all the organisations that provided open access data for this challenge. The datasets licences are as follows:
- Scottish Natural Heritage (2018). National Nature Reserves. Shapefile available here under Open Government Licence (Crown copyright).
- Forestry Commission (2018). Native Woodland Survey of Scotland (NWSS). Available on the Forestry Commission Open Data portal under Open Governement licence (Crown copyright).
- Forestry Commission (2018). Native Woodland Survey of Scotland (NWSS) - Species structure. Available on Forestry Commission Open Data portal under Open Governement licence (Crown copyright).
Get in touch
Bee in your bonnet? Technical issues? Don't hesitate to get in touch with any questions or suggestions concerning the course. Please keep in mind that this is a brand new course and we are still testing and implementing some features, so if you notice errors or some areas of the site are not working as they should, please tell us!