Intro to Github for version control


Img

Tutorial Aims:

1. Get familiar with version control

2. Learn to use RStudio and Github together

3. Create your own repository and sync through RStudio

What is version control?

Version control allows you to keep track of your work and helps you to easily explore the changes you have made, be it data, coding scripts, notes, etc. You are probably already doing some type of version control, if you save multiple files, such as Dissertation_script_25thFeb.R, Dissertation_script_26thFeb.R, etc. This approach will leave you with tens, or hundreds, of similar files, it makes it rather cumbersome to directly compare different versions, and is not easy to share among collaborators. With version control software such as Git, version control is much smoother and easier to implement. Using an online platform like Github to store your files means that you have an online back up of your work, which is beneficial for both you and your collaborators.

Git uses the command line to perform more advanced actions and we encourage you to look through the extra resources we have added at the end of the tutorial later, to get more comfortable with Git. But until then, here we offer a gentle introduction to syncing RStudio and Github, so you can start using version control in minutes.

Please register on the Github website and download and install Git for your operating system.


How does GitHub work?

What is a repository?

You can think of a repository (aka a repo) as a “master folder”, everything associated with a specific project should be kept in a repo for that project. Repos can have folders within them, or be just separate files.

You will have a local copy (i.e. on your computer) and an online copy (on GitHub) of all the files in the repository.

The workflow

The GitHub workflow can be summarised by the “commit-pull-push” mantra.

Commit

Once you’ve saved your files, you need to commit them - this means the changes you have made to files in your repo will be saved as a version of the repo, and your changes are now ready to go up on GitHub (the online copy of the repository).

Pull

Now, before you send your changes to Github, you need to pull, i.e. make sure you are completely up to date with the latest version of the online version of the files - other people could have been working on them even if you haven’t.

Push

Once you are up to date, you can push your changes - at this point in time your local copy and the online copy of the files will be the same.

Each file on GitHub has a history, so instead of having many files like Dissertation_1st_May.R, Dissertation_2nd_May.R, you can have only one and by exploring its history, you can see what it looked at different points in time.

For example, here is the history for a repo with an R script inside it, as viewed on Github. Obviously it took me a while to calculate those model predictions!

Img

Using RStudio and Github together

The “commit-pull-push” workflow can be embedded within RStudio using “Projects” and enabling version control for them - we will be doing that shortly in the tutorial.

Log into your Github account and navigate to the repository for this workshop . Click Fork - this means you are making a copy of this repository to your own Github account - think of it as a working copy.

Img

This is a tiny repo, so forking it will only take a few seconds, note that when working with lots of data, it can take a while. Once the forking is done, you will be automatically redirected to your forked copy of the CC-7-Github repo. Notice that now the repo is located within your own Github account - e.g. where you see “gndaskalova”, you should see your name.

Img

Click Clone or download and copy the HTTPS link.

Img

Now open RStudio, click File/ New Project/ Version control/ Git and paste the HTTPS link into the Repository URL:. Select a folder on your computer - that is where the “local” copy of your repository will be (the online one being on Github).

On some Macs, RStudio will fail to find Git. To fix this, first make sure all your work is saved then close R Studio, open up a terminal window by going to Applications/ Utilities/ Terminal then install Homebrew by typing the following, then pressing Enter:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Now enter:

brew install git

and follow any instructions in the terminal window, you may need to enter your Mac’s password or agree to questions by typing yes. Now you should be able to go back to RStudio and repeat the steps above to clone the git repository. We can’t guarantee the above fix for Mac will work forever, some googling may be required.

Once the files have finished copying across, you will notice that a few things about your RStudio session have changed, there is a Git tab in the top right corner of RStudio, and all the files that are in the repo are now on your computer as well.

Img

You are now ready to start making changes and documenting them through Github! Open the Sample_script.R file - there is a simple task outlined, just so that you can make changes to the script and see how they are reflected on Github. Load in the data file temp_elevation.csv using read.csv() and make a plot that shows how soil temperature changes with elevation. Try to make the plot on your own with ggplot2 and come back to this code if you get stuck.

(temp.el <- ggplot (temp_elevation, aes(x = Elevation.m, y = Soil.temp.mean)) +
  geom_point(colour = "#8B4513") +
  geom_smooth(method = lm, colour = "#8B4513", fill = "#8B4513", alpha = 0.6) +
  labs(x = "Elevation (m)", y = "Mean soil temperature (°C)") +
  theme.clean())

Save your plot in the project directory by clicking Export/ Image in the graphics window and click File / Save to save the changes you have made to the script. If you click on the Git tab you will see that now there are three files listed, your plot your script, and a .Rproj file. Tick all three. The plot file and .Rproj file will have an A - indicating that you have “Added” those files, whereas the script file has an M - this file you have “Modified”. If you select the script file and click on Diff, you will see the changes you have made. Make sure that all of the files are selected - they are now staged, ready to be commited to Github.

Img

Click on Commit and add in your commit message - aim to be concise and informative - what did you do? Once you have clicked on Commit, you will get a message about what changes you have made.

Img

If you are making your first ever commit, clicking on Commit may result in an error message - git will tell you that you need to configure your username and email. This is easily done, and you only need to do it once, afterwards you can commit-pull-push at your convenience!

In the top right corner of the RStudio screen, click on More/Shell.

Img

Copy the following code:

git config --global user.email your_email@example.com
# Add the email with which you registered on GitHub and click Enter

git config --global user.name "Your GitHub Username"
# Add your username and click Enter

If it worked fine, there will be no messages, you can close the shell window and do your commit again, this time it will work!

You will see a message saying that your branch is now one commit ahead of the origin/master branch - that is the branch that is on Github - we now need to let Github know about the changes we have made.

Img

It is good practice to always Pull before you Push. Pull means that you are retrieving the most recent version of the Github repository onto your local branch - this command is especially useful if several people are working within the same repository - imagine there was a second script examining soil pH along this elevation gradient, and your collaborator was working on it the same time as you - you wouldn’t want to “overwrite” their work and cause trouble. In this case, you are the only one working on these files, but it’s still good to develop the practice of pulling before you push. Once you’ve pulled, you’ll see a message that you are already up to date, you can now push! Click on Push, wait for the loading to be over and then click on Close - that was it, you have successfully pushed your work to Github!

Go back to your repository on Github, where you can now see all of your files (your new plot included) online.

Img

Click on your script file and then on History - this is where you can see the different versions of your script - obviously in real life situations you will make many changes as your work progresses - here we just have two. Thanks to Github and version control, you don’t need to save hundreds of almost identical files (e.g. Dissertation_script_25thFeb.R, Dissertation_script_26thFeb.R) - you have one file and by clicking on the different commits, you can see what it looked like at different points in time.

Img

So far we have used version control for a repository that already exists, the Coding Club practice repo, but what about if you want to create a project about your own work and use version control?

Creating your own repository and syncing with your computer and laptop through RStudio

Go to your GitHub profile online and click on the Repositories tab, then select New. If you haven’t made or forked any repositories so far, your list of repositories will be empty, otherwise you might see the Coding Club repo you forked earlier. Here is a screenshot of my GitHub profile:

Img

Choose a sensible name for your repo (shorter is better), add a brief description and click on New repository. You will notice you have a choice between public and private repositories. Unless you choose otherwise, all GitHub repositories are public - anyone can see them, but only you and/or people you have given access can make changes. Public repositories are free, private repositories are paid for. If your repository has an educational purpose and you are based at a university or another educational institution, you can apply for a free private repository via this link.

Img

Your repository is now ready to be imported in RStudio! Just like we did with the Coding Club repository earlier, copy the HTTPS link (that’s the one that automatically appears in the Clone or Download box on Github.

Img

Now open RStudio, click File/ New Project/ Version control/ Git and paste the link you copied from Github. Select a directory on your computer - that is where the “local” copy of your repository will be (the online one being on Github).

Making a README.md file

A README.md file outlines what the purpose of the repository is, who is the owner, are there any rules that should be followed - in general everything someone might need to read and know before checking out the rest of the repository’s content. You can open your favourite plain text editor (Atom and Brackets are some free options), create a new file, write some text, and save it as README.md or README.txt in the folder that contains the local copy of your repository. Note that when you are saving files in Atom, you have to write the file extension when saving the file, not choose it from a drop down list, so you should write README.md. When you go back to RStudio, under the Git tab in the top right corner, you will see your newly created file with an A next to it, standing for “added” - you have just added this file, and you need to select it, commit-pull-push, and then the file will be up on GitHub, too.

Making a .gitignore file

A .gitignore file is a list of all the files, or types of files, that you do not want to upload on GitHub. This is especially useful if you are using multiple computers (e.g. your work computer and laptop at home) and if you are collaborating with someone through Github. To set up a .gitignore file, go back to Atom or Brackets and make a new file. Here is an example content for a .gitignore file that you can copy and paste:

# Prevent users commiting RProj files. .Rproj files cause problems if using two computers or different people are working together - you don't want your RProject to overwrite someone else's (or your laptop's RProject file to overwrite your computer's RProject file).
.Rproj.user
.Rproj
# Prevent users to commit their own .Rhistory files
.Rhistory
.Rapp.history
# Temporary files
*~
~$*.doc*
~$*.xls*
*.xlk
~$*.ppt*
# Prevent mac users to commit .DS_Store files
*.DS_Store
# Prevent users to commit the README files created by RStudio
*README.html
*README_cache/
#*README_files/

Afterwards you can save the file just as .gitignore in the folder that holds the local copy of your repository and commit it to GitHub through RStudio like you did with the README.md file.

You are now ready to add your scripts, plots, data files, etc. to your new project directory and follow the same workflow as outlined above - stage your files, commit, pull, push.

Potential problems

Sometimes you will see error messages as you try to commit-pull-push. Usually the error message identifies the problem and which file it’s associated with, if the message is more obscure, googling it is a good step towards solving the problem. Here are some potential problems that might arise:

Code conflicts

While you were working on a certain part of a script, someone else was working on it, too. When you go through commit-pull-push, GitHub will make you decide which version you want to keep. This is called a code conflict, and you can’t proceed until you’ve resolved it. You will see arrows looking like >>>>>>>>> around the two versions of the code - delete the version of the code you don’t want to keep, as well as the arrows, and your conflict should disappear.

Pushing the wrong files

If you accidentally push what you didn’t intend to, deleted many things (or everything!) and then pushed empty folders, you can revert your commit. You can keep reverting until you reach the point in time when everything was okay.



Git in the command line

Traditionally, Git uses the command line to perform actions on local Git repositories. In this tutorial we ignored the command line but it is necessary if you want more control over Git. There are several excellent introductory guides on version control using Git, e.g. Prof Simon Mudd’s Numeracy, Modelling and Data management guide, The Software Carpentry guide, and this guide from the British Ecological Society Version Control workshop . For more generic command line tools, look at this General cheat-sheet and this Cheat-sheet for mac users. We have also created a neat cheatsheet with some basic Git commands and how they fit into the Git/Github ecosystem. A couple of the commands require hub a wrapper for Git that increases its functionality, but not having this won’t prevent you using the other commands. The orange lines refer to the core workflow, while the blue lines are extra:

Img
Command Is hub required? Origin Destination Description
git fork Y Other Github Personal Github Creates github repo in your personal account from a previously cloned github repo.
git clone REPO_URL N Personal Github Local Creates a local copy of a github repo. The URL can be copied from Github.com by clicking the `Clone or Download` button.
git add README.md N Working Dir Staging Area Add "README.md" to staging area.
git commit -m "Message" N Staging Area Local Commits changes to files to the local repo with the commit message "Message".
git commit -a -m "Message" N Working Dir Local adds and commits all file changes to the local repo with the commit message "Message".
git pull N Personal Github Local Retrieve any changes from a github repo.
git push N Local Personal Github Sends commited file changes to github repo.
git create Y Local Personal Github Create a github repo with the same name as the local repo.
git merge N NA NA Merge any changes in the named branch with the current branch.
git checkout -b patch1 N NA NA Create a branch called "patch1" from the current branch and switch to it.


  We would love to hear your feedback, please fill out our survey!


  You can contact us with any questions on ourcodingclub@gmail.com


  Related tutorials:

  - Transferring quantitative skills among scientists

  - Setting up a GitHub repository for your lab


  Subscribe to our mailing list:

Back to blog