Why use Git and GitHub?
If you write code for your research or analysis, maybe R or Python, there’s a good chance you will have heard of Git and GitHub. But many haven’t and, in my opinion, using Git and GitHub (or a competitor like Gitlab) are essential for coding.
In this post I’ll explain what exactly Git and GitHub are, and why they are so important for high quality analysis.
23 January 2023
5 minute read
What is Git?
Git is a piece of software than runs on your laptop. It is used for version control. Ever got yourself into a mess with file names? Something like:
“file v3 final”
“file actual final — v4”
Well technically you’ve been practicing version control already. Congratulations! Clearly this is chaos — and this is just one file. Add a few more files into the mix and you have a recipe for confusion and mistakes. Git takes care of this.
Git is the world’s most popular version control system and has been a mainstay for software developers and programmers for many years. But it’s also popular among data scientists, data analysts, statisticians and researchers too. I see Git and GitHub pop up as a requirement on lots of data and analytical job adverts, so it’s an in-demand skill.
How do you use it?
There’s a bit of a problem with Git. It’s quite unfriendly to use. It runs in the terminal so doesn’t have a nice point-and-click interface. It’s also full of jargon terms. All this makes it quite intimidating for new users.
There are tools you can download which are point and click but I haven’t found one that makes things less confusing for beginners.
Let’s say you’ve got a folder you are working in. In Git-speak this is called a repository or repo for short. You can tell Git to start tracking that folder and take snapshots of everything inside the folder. Taking a snapshot is known as making a commit.
Imagine you have a piece of working code and you make a change. Now things aren’t working. Or maybe you notice a mistake and need to revert to an old version. Without some version control system in place it would be time to panic. But Git allows you to travel back in time through your commit history to any point where you took a snapshot.
What are branches?
In Git you work inside branches. The most simple form is to work inside a single branch, called the main or master branch by default, making commits along the way. This creates a detailed version history that you can travel in time through.
But imagine we want to try something out. We’re not sure if it will work so don’t want to risk mucking up our main branch. We could create a new branch and do the work in there.
If things don’t work out no harm done. We can easily delete the second branch. If the work goes well though you will likely want to combine the new code into your main branch. This is known as merging.
Often you use different branches when collaborating. Each person can work on some aspect of the code inside their own branch. To enable collaboration we need more than just Git though. Enter GitHub.
What is GitHub?
GitHub is a platform for hosting Git repositories online and is extremely popular (more than 94 million users as of 23 January 2023). Using GitHub with Git has a few advantages:
GitHub enables easy collaboration and code sharing. Code is hosted in a single place that collaborators can access easily.
GitHub acts as a backup for your code. If something happens to your laptop, your code will be preserved on GitHub.
GitHub is fairly user friendly and offers much better tools than Git alone for viewing your code and commit history.
There are a few more key terms associated with remote repositories.
Downloading a repository from a remote source, like GitHub, for the first time is known as cloning.
Downloading ongoing changes on is known as pulling.
Putting your changes on to GitHub is known as pushing.
GitHub has free and paid accounts. The free accounts are extremely powerful.
Should I learn Git and GitHub?