Intro to Github for version control


Img

Tutorial Aims:

1. Get familiar with version control

2. Learn to use RStudio and Github together

What is version control?

Version control allows you to keep track of your work and helps you to easily explore what changes you have made, be it data, coding scripts, or papers. You are probably already doing some type of version control, if you save multiple files, such as Dissertation_script_25thFeb.R, Dissertation_script_26thFeb.R, etc. This approach will leave you with tens, or hundreds, of similar files, it makes it rather cumbersome to directly compare different versions, and is not easy to share among collaborators. With version control software such as Git, version control is much smoother and easier to implement. Using an online platform like Github to store your files also means that you have an online back up of your work, so you won’t need to panic when your laptop dies or files mysteriously disappear.

Git uses the command line to perform more advanced actions and we encourage you to look through the extra resources we have added at the end of the tutorial later, to get more comfortable with version control, using the command line and Github. But until then, here we offer a gentle introduction to syncing RStudio and Github, so you can start using version control in minutes.

Please register on the Github website and download and install Git for your operating system.


What is a repository?

You can think of a repository (aka a repo) as a “master folder” - a repository can have folders within it, or be just separate files. In any case, a repository is what holds all of the files associated with a certain project.

Using RStudio and Github together

Log into your Github account and navigate to the repository for this workshop . Click Fork - this means you are making a copy of this repository for yourself - think of it as a working copy.

Img

This is a tiny repo, so forking it will only take a few seconds, note that when working with lots of data, it can take a while. Once the forking is done, you will be automatically redirected to your forked copy of the CC-7-Github repo. Notice that now the repo is located within your own Github account - e.g. where you see “gndaskalova”, you should see your name.

Img

Click Clone or download and if you are on a Windows computer, copy the HTTPS link (that’s the one that automatically appears in the box). If you have a Mac, click Use SSH and copy that link.

Img

Now open RStudio, click File/ New Project/ Version control/ Git and paste the link you copied from Github. Select a directory on your computer - that is where the “local” copy of your repository will be (the online one being on Github).

On some Macs, RStudio will fail to find Git. To fix this open the terminal and type sudo mv /usr/bin/git /usr/bin/git-system. Then open RStudio/ Preferences/ Git/SVN and change Git executable: to /usr/local/bin/git. Then restart RStudio and try again. More information can be found in the README.txt for the Mac git installer.

Img

Once the files have finished copying across, you will notice that a few things about your RStudio session have changed:

Img

You are now ready to start making changes and documenting them through Github! Open the Sample_script.R file - there is a simple task outlined, just so that you can make changes to the script and see how they are reflected on Github. Load in the data file temp_elevation.csv and make a plot that shows how soil temperature changes with elevation. Try to make the plot on your own with ggplot2 and come back to this code if you get stuck.

(temp.el <- ggplot (temp_elevation, aes(x = Elevation.m, y = Soil.temp.mean)) +
  geom_point(colour = "#8B4513") +
  geom_smooth(method = lm, colour = "#8B4513", fill = "#8B4513", alpha = 0.6) +
  labs(x = "Elevation (m)", y = "Mean soil temperature (°C)") +
  theme.clean())

Save your plot in the project directory (that is the directory that it will automatically get saved into) and click File / Save to save the changes you have made to the script. If you click on the Git tab you will see that now there are four files there - a .gitignore file, an .Rproj file, your plot and your script. Tick them all. The .gitignore file tells Git which files to ignore and not put up on Github - you can leave that as it is and not worry about it, since we want all of our files uploaded. The plot file will have an A - you have added that script, whereas the script file has an M - this file you have modified. If you select the script file and click on Diff, you will see the changes you have made. Make sure that all of your files are selected - they are now staged, ready to be commited to Github.

Img

Click on Commit and add in your Commit message - aim to be concise and informative - what did you do? Once you have clicked on Commit, you will get a message about what changes you have made.

Img

You will also see a message saying that your branch is now one commit ahead of the origin/master branch - that is the branch that is on Github - we now need to let Github know about the changes we have made.

Img

It is good practice to always Pull before you Push - Pull means that you are pulling the most recent of the Github repository onto your local branch - this command is especially useful if several people are working within the same repository - imagine there was a second script examining soil pH along this elevation gradient, and your collaborator was working on it the same time as you - you wouldn’t want to “overwrite” their work and cause trouble. In this case, you are the only one working on these files, but it’s still good to develop the practice of pulling before you push. Once you’ve pulled, you’ll see a message that you are already up to date, you can now push! Click on Push, wait for the loading to be over and then click on Close - that was it, you have successfully pushed your work to Github!

Go back to your repository on Github, where you can now see all of your files (your new plot included) online.

Img

Click on your script file and then on History - this is where you can see the different versions of your script - obviously in real life situations you will make many changes as your work progresses - here we just have two. Thanks to Github and version control, you don’t need to save hundreds of almost identical files (e.g. Dissertation_script_25thFeb.R, Dissertation_script_26thFeb.R) - you have one file and by clicking on the different commits, you can see what it looked like at different points in time.

Img

So far we have used version control for a repository that already existed, the Coding Club practice repo, but what about if you want to create a project about your own work and use version control?

Go back to RStudio, click File/ New Project/ New directory/ Empty project. Add a project name, select a filepath for the project and make sure that Create a git repository is clicked. Afterwards you can save your scripts, plots, data files, etc. to your new project directory and follow the same workflow as outlined above - stage your files, commit, pull, push.



Git in the command line

Traditionally, Git uses the command line to perform actions on local Git repositories. In this tutorial we ignored the command line but it is necessary if you want more control over Git. There are several excellent introductory guides on version control using Git, e.g. Prof Simon Mudd’s Numeracy, Modelling and Data management guide, The Software Carpentry guide, and this guide from the British Ecological Society Version Control workshop . We have also created a neat cheatsheet with some basic Git commands and how they fit into the git/github ecosystem. A couple of the commands require hub a wrapper for Git that increases its functionality, but not having this won’t prevent you using the other commands:

Img
Command Is hub required? Origin Destination Description
git fork Y Other Github Personal Github Creates github repo in your personal account from a previously cloned github repo.
git clone git@github.com:user/repo.git N Personal Github Local Creates a local copy of a github repo called "repo" owned by the user "user". This can be copied from github.com.
git add README.md N Working Dir Staging Area Add "README.md" to staging area.
git commit -m "Message" N Staging Area Local Commits changes to files to the local repo with the commit message "Message".
git commit -a -m "Message" N Working Dir Local adds and commits all file changes to the local repo with the commit message "Message".
git pull N Personal Github Local Retrieve any changes from a github repo.
git push N Local Personal Github Sends commited file changes to github repo.
git create Y Local Personal Github Create a github repo with the same name as the local repo.
git merge N NA NA Merge any changes in the named branch with the current branch.
git checkout -b patch1 N NA NA Create a branch called "patch1" from the current branch and switch to it.


If you have any questions about completing this tutorial, please contact us on ourcodingclub@gmail.com

We would love to hear your feedback on the tutorial, whether you did it in the classroom or online:

https://www.surveymonkey.co.uk/r/NXNHYYX

  Subscribe to our mailing list:

Back to blog