Intro to Github for version control


Img

Tutorial Aims:

1. Get familiar with version control, git and GitHub

2. Create your own repository and project folder structure

3. Sync and interact with your repository through RStudio

4. Sync and interact with your repository through the command line

1. Get familiar with version control, Git and GitHub

What is version control?

Version control allows you to keep track of your work and helps you to easily explore the changes you have made, be it data, coding scripts, notes, etc. You are probably already doing some type of version control, if you save multiple files, such as Dissertation_script_25thFeb.R, Dissertation_script_26thFeb.R, etc. This approach will leave you with tens or hundreds of similar files, making it rather cumbersome to directly compare different versions, and is not easy to share among collaborators. With version control software such as Git, version control is much smoother and easier to implement. Using an online platform like Github to store your files means that you have an online back up of your work, which is beneficial for both you and your collaborators.

Git uses the command line to perform more advanced actions and we encourage you to look through the extra resources we have added at the end of the tutorial later, to get more comfortable with Git. But until then, here we offer a gentle introduction to syncing RStudio and Github, so you can start using version control in minutes.

What are the benefits of using version control?

Having a GitHub repo makes it easy for you to keep track of collaborative and personal projects - all files necessary for certain analyses can be held together and people can add in their code, graphs, etc. as the projects develop. Each file on GitHub has a history, making it easy to explore the changes that occurred to it at different time points. You can review other people’s code, add comments to certain lines or the overall document, and suggest changes. For collaborative projects, GitHub allows you to assign tasks to different users, making it clear who is responsible for which part of the analysis. You can also ask certain users to review your code. For personal projects, version control allows you to keep track of your work and easily navigate among the many versions of the files you create, whilst also maintaining an online backup.

How to get started

Please register on the Github website.

If you are on a personal Windows machine, download and install git for your operating system. Below are some recommended installation instructions, to keep things simple. However, if you know what these options do, and want to change them to suit you, go ahead:

  1. For “Select Components”, check:
    • “Git Bash Here”
    • “Git GUI Here”
    • “Git LFS (Large File Support)”
    • “Associate .git* …”
    • “Associate .sh …”
  2. When prompted to choose the default editor, pick Nano (a simple terminal editor) or Notepad++ (a simple graphical editor):
  3. For “Adjust your PATH environment”, select: “Use Git from Git Bash only”
  4. For “Choose HTTPS transport backend”, select: “Use the OpenSSL library”
  5. For “Configure the line ending conversions”, select: “Checkout Windows-style,…”
  6. For “Configure the terminal emulator …”, select: “Use MinTTY …”
  7. For “Configure extra options”, select: “Enable file system caching”
  8. “Enable Git Credential Manager”

If you are on a personal Mac machine, install Git via Homebrew, which is a package manager for command line programs on Mac. First, open a terminal, which can be found at ~/Application/Utilities/Terminal.app. Then, copy and paste this line into the terminal and hit “Enter”:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Now enter the following to install Git:

brew install git

Follow any instructions in the terminal window, you may need to enter your Mac’s password or agree to questions by typing yes.

The files you put on GitHub will be public (i.e. everyone can see them & suggest changes, but only the people with access to the repository can directly edit and add/remove files). You can also have private repositories on GitHub, which means that only you can see the files. GitHub now offers free private repositories as standard with up to three collaborators per repository. They also offer a free education package, with access to software and other perks, you can apply for one using this link.

How does version control work?

What is a repository?

You can think of a repository (aka a repo) as a “master folder”, everything associated with a specific project should be kept in a repo for that project. Repos can have folders within them, or just be separate files.

You will have a local copy (on your computer) and an online copy (on GitHub) of all the files in the repository.

The workflow

The GitHub workflow can be summarised by the “commit-pull-push” mantra.

Commit

Once you’ve saved your files, you need to commit them - this means the changes you have made to files in your repo will be saved as a version of the repo, and your changes are now ready to go up on GitHub (the online copy of the repository).

Pull

Now, before you send your changes to Github, you need to pull, i.e. make sure you are completely up to date with the latest version of the online version of the files - other people could have been working on them even if you haven’t.

Push

Once you are up to date, you can push your changes - at this point in time your local copy and the online copy of the files will be the same.

Each file on GitHub has a history, so instead of having many files like Dissertation_1st_May.R, Dissertation_2nd_May.R, you can have only one and by exploring its history, you can see what it looked at different points in time.

For example, here is the history for a repo with an R script inside it, as viewed on Github. Obviously it took me a while to calculate those model predictions!

Img

2. Create your own repository and project folder structure

To make a repository, go to Repositories/New repository - choose a concise and informative name that has no spaces or funky characters in it. This can be your master repo that holds together past and ongoing research, data, scripts, manuscripts. Later on you might want to have more repositories - e.g. a repository associated with a particular project that you want to make public or a project where you are actively seeking feedback from a wider audience. For now, we will focus on organising and using your main repository that holds the files for all your work. With a free GitHub account, you can use public or private respositories.

Img

Click on Initialise repo with a README.md file. It’s common practice for each repository to have a README.md file, which contains information about the project, the purpose of the repository, as well as any comments on licensing and data sources. Github understands several text formats, including .txt and .md. .md stands for a file written in Markdown - you might have used Markdown before from within RStudio to create neatly organised reports of your code and its outputs (you can also check out our Markdown tutorial). You can also use Markdown to write plain text files, for example the file you are reading now was written in Markdown.

Img

You can directly edit your README.md file on Github by clicking on the file and then selecting Edit this file.

Img

Exercise 1: Write an informative README.md file

You can now write the README.md file for your repository. To make headings and subheadings, put hashtags before a line of text - the more hashtags, the smaller the heading will appear. You can make lists using - and numbers 1, 2, 3, etc.. You can discuss the information you want to include among your lab members - here are some things you might want to consider:

- Your name 

- Project title

- Links to website & social media

- Contact details

Exercise 2: Writing a .gitignore file

Repositories often have a file called .gitignore and we are about to make one shortly. In this file you specify which files you want Git to ignore when users make changes and add files. Examples include temporary Word, Excel and Powerpoint files, .Rproj files, .Rhist files, etc. Some files you might want to only have on your local repository (i.e. on your computer), but not online as they might be too big to store online.

Go to Create new file and write a .gitignore file within your main repository (not within any folders). You need to call the file .gitignore and then add the types of files that Git should ignore on separate lines. You can make this specific to your needs, but as a start, you can copy over this code:

# Prevent users to commit their own RProject
.Rproj.user
.Rproj
# Prevent users to commit their own .RData and .Rhistory in mutual area
.RData
.Rhistory
.Rapp.history
# Temporary files
*~
~$*.doc*
~$*.xls*
*.xlk
~$*.ppt*
# Prevent mac users to commit .DS_Store files
*.DS_Store
# Prevent users to commit the README files created by RStudio
*README.html
*README_cache/
#*README_files/

Exercise 3: Create folders

Discuss among your lab what folders your repository will contain - some examples include: manuscripts, data, figures, scripts, scripts/users/personal_folder_your_name. To make a new folder, click on Create new file and add in the name of your new folder, e.g. manuscripts/ before the file name, in this case a quick README.md file. When creating folders within your repo through GitHub’s website, you always need to make at least one file associated with them, you can’t just create an empty folder. Add a brief explanation of what the folder is for in the README.md file, scroll down and click on Commit new file. Add a quick message where it says Create README.md file in light grey text - we will cover GitHub etiquette later, but for now, when creating/editing files, it’s always a good idea to change the default message to a more precise description of what was done and who did it. Stick with the default option of Commit directly to master branch - we will explain branches and pull requests at a later stage of the tutorial.

Img

GitHub etiquette

If you’ll be sharing the repository with collaborators and even for your own benefit, it’s a good idea to define some rules on how to use the repository before we start working within it - for example what GitHub and coding etiquette should people be following? Is there a prefered folder structure, file naming system?

We can make a new github-etiquette.md file that outlines the rules that people with access to your repository should follow.

Exercise 4: Write a github-etiquette.md file

Go to your lab’s main repository, click on Create new file and add github-etiquette.md as a file name. Remember to include the file extension .md - otherwise GitHub won’t know what’s the file format.

Here is a set of sample GitHub rules:

Keep file paths short and sensible.
Don't use funky characters and spaces in your file names, these cause trouble because of differences in Mac/Windows systems.
Always pull before you push in case someone has done any work since the last time you pulled - you wouldn't want anyone's work to get lost or to have to resolve many coding conflicts.

3. Sync and interact with your repository through RStudio

The “commit-pull-push” workflow can be embedded within RStudio using “Projects” and enabling version control for them - we will be doing that shortly in the tutorial.

Log into your Github account and navigate to the repository you created earlier

Click Clone or download and copy the HTTPS link.

Img

Now open RStudio, click File/ New Project/ Version control/ Git and paste the HTTPS link from the Github repository into the Repository URL: field. Select a folder on your computer - that is where the “local” copy of your repository will be (the online one being on Github).

We also know that there might be problems with the newest updates of the Mac software and installing git and linking it with RStudio. The solutions appear to be very specific to the Mac version you have, so if the above code didn’t work, a good starting point is googling “rstudio can’t find git mac **your version**” and trying out the suggested solutions.

Once the files have finished copying across, you will notice that a few things about your RStudio session have changed, there is a Git tab in the top right corner of RStudio, and all the files that are in the repo are now on your computer as well.

Img

You are now ready to start making changes and documenting them through Github! Note that you can’t push empty folders.

You can open some of the files you made online earlier - for example if you click on your README.md file, it will open in RStudio and you can make changes. Add some more text just for the sake of exemplifying how version control works. Save the file in the same location (i.e., your repository).

Img

If you click on the Git tab you will see that now your README.md file is listed there. Add a tick next to it. Now it has an M - this means you have modified the file. If there’s an A, that’s an added file, and a D is a deleted file.

If you select the README.md file and click on Diff, you will see the changes you have made. Once the file is selected, it is staged, ready to be commited to Github.

Click on Commit and add in your commit message - aim to be concise and informative - what did you do? Once you have clicked on Commit, you will get a message about what changes you have made.

Img

If you are making your first ever commit, clicking on Commit may result in an error message - git will tell you that you need to configure your username and email. This is easily done, and you only need to do it once, afterwards you can commit-pull-push at your convenience!

In the top right corner of the RStudio screen, click on More/Shell.

Img

Copy the following code:

git config --global user.email your_email@example.com
# Add the email with which you registered on GitHub and click Enter

git config --global user.name "Your GitHub Username"
# Add your username and click Enter

If it worked fine, there will be no messages, you can close the shell window and do your commit again, this time it will work!

You will see a message saying that your branch is now one commit ahead of the origin/master branch - that is the branch that is on Github - we now need to let Github know about the changes we have made.

Img

It is good practice to always Pull before you Push. Pull means that you are retrieving the most recent version of the Github repository onto your local branch - this command is especially useful if several people are working within the same repository - imagine there was a second script examining soil pH along this elevation gradient, and your collaborator was working on it the same time as you - you wouldn’t want to “overwrite” their work and cause trouble. In this case, you are the only one working on these files, but it’s still good to develop the practice of pulling before you push. Once you’ve pulled, you’ll see a message that you are already up to date, you can now push! Click on Push, wait for the loading to be over and then click on Close - that was it, you have successfully pushed your work to Github!

Go back to your repository on Github, where you can now see all of your updated files online.

Img

Click on your script file and then on History - this is where you can see the different versions of your script - obviously in real life situations you will make many changes as your work progresses - here we just have two. Thanks to Github and version control, you don’t need to save hundreds of almost identical files (e.g. Dissertation_script_25thFeb.R, Dissertation_script_26thFeb.R) - you have one file and by clicking on the different commits, you can see what it looked like at different points in time.

Img

You are now ready to add your scripts, plots, data files, etc. to your new project directory and follow the same workflow as outlined above - stage your files, commit, pull, push.

Potential problems

Sometimes you will see error messages as you try to commit-pull-push. Usually the error message identifies the problem and which file it’s associated with, if the message is more obscure, googling it is a good step towards solving the problem. Here are some potential problems that might arise:

Code conflicts

While you were working on a certain part of a script, someone else was working on it, too. When you go through commit-pull-push, GitHub will make you decide which version you want to keep. This is called a code conflict, and you can’t proceed until you’ve resolved it. You will see arrows looking like >>>>>>>>> around the two versions of the code - delete the version of the code you don’t want to keep, as well as the arrows, and your conflict should disappear.

Pushing the wrong files

If you accidentally push what you didn’t intend to, deleted many things (or everything!) and then pushed empty folders, you can revert your commit. You can keep reverting until you reach the point in time when everything was okay. This is an easy way out if you’re the only person working in the repository - be aware that if there are other people that have committed to the repository, reverting will also undo all of their work, as reverting refers to the repository as a whole, not just your own work in it.



4. Sync and interact with your repository through the command line

Traditionally, Git uses the command line to perform actions on local Git repositories. In this tutorial we ignored the command line but it is necessary if you want more control over Git. There are several excellent introductory guides on version control using Git, e.g. Prof Simon Mudd’s Numeracy, Modelling and Data management guide, The Software Carpentry guide, and this guide from the British Ecological Society Version Control workshop . For more generic command line tools, look at this general cheat-sheet and this cheat-sheet for mac users. We have also created a table and flow diagram with some basic Git commands and how they fit into the Git/Github workflow. Orange lines refer to the core workflow, the blue lines describe extra functions and the green lines deal with branches:

Img
Command Origin Destination Description
git clone REPO_URL Personal Github Local Creates a local copy of a Github repo. The URL can be copied from Github.com by clicking the `Clone or Download` button.
git add README.md Working Dir Staging Area Add "README.md" to staging area.
git commit Staging Area Local Commits changes to files to the local repo.
git commit -a Working Dir Local adds and commits all file changes to the local repo.
git pull Personal Github Local Retrieve any changes from a Github repo.
git push Local Personal Github Sends commited file changes to Github repo.
git merge Other branch Current branch Merge any changes in the named branch with the current branch.
git checkout -b patch1 NA NA Create a branch called "patch1" from the current branch and switch to it.
git init NA NA Initialise a directory as a Git repo.
git log NA NA Display the commit history for the current repo
git status NA NA See which files are staged/unstaged/changed
git diff NA NA See the difference between staged uncomitted changes and the most recent commit
git stash NA NA Save uncommitted changes in a temporary version and revert to the most recent commit

Below is a quick exercise so you can familiarise yourself with these command line tools. There are a few ways to use interact with Git using the terminal:

  1. If you are already in RStudio on a Mac or Linux machine, you can open a terminal within RStudio by going to Tools -> Terminal -> New Terminal in the menu.
Img
  1. If you are on a Mac or Linux machine you could just open a terminal program and run Git from there. Most Mac and Linux machines will have Git installed by default. On Mac you can go open a terminal by going to: Applications/Utilities/Terminal.app.
  2. If you are on a personal Windows machine, you can run Git using Git Bash, which can be installed when you installed Git.

Once you have opened a terminal using one of the above methods, start by creating a folder somewhere on your local system called git_test, using the mkdir (make directory) command by typing the following into the terminal and hitting “Enter”:

mkdir git_test

Then enter that folder using cd (change directory):

cd git_test

Then, make the folder into a Git repository:

git init

Now the folder has been made into a Git repository, allowing you to track changes to files. Now, lets create a README.md file inside the repository and put some text in it, using whatever text editor you are comfortable with. Make sure to place this README.md file into the repository folder on your device so it can be found!

Now, to add the file to be tracked by the Git repository:

git add README.md

The file has now been added to the staging area, but has not yet been committed to a version of the repository. To commit a version:

git commit

You then have to enter a commit message using the text editor which appears, If you have selected Vim as the default text editor, you will need to press i before you can type, then Esc when you are finished typing. To save and exit, type :wq.

Currently, the Git repository is still only on our local computer. Versions are being committed, but they are not being backed up to a remote version of the repository on Github. Go to Github and create a repository called git_test, like you did earlier on in the workshop, but this time don’t create a README.md because we have just made one on the local computer. Now, copy the HTTPS link for that repository. In the terminal, link the local Git repository with the remote repository using the following code, replacing <HTTPS_LINK> with the link you copied:

git remote add origin <HTTPS_LINK>

Then make the first push to that newly linked remote repository:

git push -u origin master

Now you can continue editing files, adding changes (git add <FILE>), committing changes (git commit), pulling (git pull) and pushing (git push) changes, similar to the process you did with clicking buttons in RStudio. Feel free to explore some of the more advanced commands laid out in the table and flow diagram above. You can also check out a more advanced command line tutorial written by Prof Simon Mudd for Numeracy, Modelling and Data management guide.

This tutorial was developed as part of the collaboration between Coding Club and the NERC E3 Doctoral Training Programme. To learn more about the E3 DTP, check out the programme’s website.

Img



Check out this page to learn how you can get involved! We are very happy to have people use our tutorials and adapt them to their needs. We are also very keen to expand the content on the website, so feel free to get in touch if you’d like to write a tutorial!

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Img

  We would love to hear your feedback, please fill out our survey!


  You can contact us with any questions on ourcodingclub@gmail.com


  Related tutorials:


  Subscribe to our mailing list:

Back to blog