13. Version control (Git)#

Note: Once again, this is a very high-level introduction. The Software Carpentry Project have another excellent tutorial that it might be worth exploring.

13.1. Why use git for your science#

As your Python projects grow more complex, you may reach the point where it becomes useful for you to learn version control software: nearly universally, this means git and integration with the online website GitHub.

There are a number of reasons to do this:

  • To keep track of changes in your code in a more rigourous manner than e.g. OneDrive. You will be able to track versions of code, see what changed, and when you did it. You will be able to introduce automated testing to ensure that changes you make to your code hasn’t broken anything or inadvertently changes your results.

  • This becomes key when you begin collaborating on code with others. Through the use of features such as branching and pull requests, git ensures that work being done by seperate people on seperate machines can come together in a consistent and functional manner.

  • Aid reproducability. Sharing your code online allows others to use, benefit from, and develop your work. A great example is sharing code necessary to reproduce figures rather than just sharing the raw data.

  • You can go further and develop entire packages to help people use your data and methods easier. See e.g. pypromice to help people download Promice data. Uploading these packages to github could be the last step, or it could be the first step in also getting them installable on e.g. pip and conda.

13.1.1. Difference between git and GitHub#

git and GitHub can often be used interchangeably, but it is important to note they are not the same. git is a local, open-source version control system that can used to keep track of changes locally. GitHub is an online web platform (for-profit, owned by Microsoft) that allows you to host your repositories online. Online hosting is a real strength that massively enhances your ability to perform open and reproducable science.

GitHub is, ultimately, the primary game in town, but it’s important to note that it isn’t necessarily part of the same feel-good open-source community as the rest of the tools I’ve been recommending. There are open-source alternatives, the main one being GitLab, but this is less beginner-friendly and costs money.

13.2. Installing git#

13.2.1. Linux / WSL#

If Git is not already available on your machine you can try to install it via your distro’s package manager. e.g. sudo apt-get install git.

13.2.2. MacOS#

From the Terminal app, type git --version. If it’s not installed already, follow the instructions to install the “command line developer tools”. Don’t bother with “Get Xcode”: that will install the full set of tools which will take ages and is not necessary for our purposes.

13.3. Setting up a repository#

Repositories (or repos) are folders whose history git tracks.

13.3.1. On GitHub#

  1. Log into GitHub.

  2. Click New repository.

  3. Choose a name, description, and set visibility (private for personal work, public to share).

  4. Leave “Initialize with README” unchecked if you plan to connect an existing local folder.

13.3.2. Locally#

  1. Navigate to your project folder:

    cd my-project
    
  2. Initialize git:

    git init
    
  3. Link it to GitHub (replace with your repo URL):

    git remote add origin https://github.com/username/my-project.git
    

13.4. Using a personal repository#

13.4.1. Committing and pushing#

Typical workflow:

  1. Stage changes:

    # add all new files within a directory, if neccessary
    git add .
    
  2. Commit with a message:

    git commit -m "Add NDSI function"
    
  3. Push to GitHub:

    git push origin main
    

Think: add → commit → push.

13.4.2. Branching#

Branches let you experiment without breaking your main work.

  • Create a new branch:

    git checkout -b new-analysis
    
  • Commit to new branch on GitHub

    git push origin new-analysis
    

Now, you have the option to keep two seperate strands of files going - one your original, and one with your new idea/function/analysis. At the end, you can merge the branches on GitHub (or locally, but it’s worth doing on GitHub and then git pull to update your main branch for deleting the local non-main branch).

13.5. Working with others#

Tip: Start small — just version-control your code, README, and environment files. Collaboration and branching come naturally once you’re comfortable.

13.5.1. Forking#

If you want to contribute to someone else’s repository but don’t have write access:

  • Fork it on GitHub (creates your copy).

  • Clone your fork locally:

    git clone https://github.com/yourname/their-project.git
    

13.5.2. Pull requests#

When you’ve made changes in your fork and want the original repo to adopt them:

  1. Push your branch to your fork.

  2. On GitHub, open a Pull Request (PR) from your branch into the main project.

  3. Project maintainers will review, discuss, and merge if appropriate.