13. Version control (Git)#
Note: Once again, this is a very high-level introduction. The Software Carpentry Project have another excellent tutorial that it might be worth exploring.
13.1. Why use git
for your science#
As your Python projects grow more complex, you may reach the point where it becomes useful for you to learn version control software: nearly universally, this means git
and integration with the online website GitHub.
There are a number of reasons to do this:
To keep track of changes in your code in a more rigourous manner than e.g. OneDrive. You will be able to track versions of code, see what changed, and when you did it. You will be able to introduce automated testing to ensure that changes you make to your code hasn’t broken anything or inadvertently changes your results.
This becomes key when you begin collaborating on code with others. Through the use of features such as branching and pull requests,
git
ensures that work being done by seperate people on seperate machines can come together in a consistent and functional manner.Aid reproducability. Sharing your code online allows others to use, benefit from, and develop your work. A great example is sharing code necessary to reproduce figures rather than just sharing the raw data.
You can go further and develop entire packages to help people use your data and methods easier. See e.g. pypromice to help people download Promice data. Uploading these packages to github could be the last step, or it could be the first step in also getting them installable on e.g.
pip
andconda
.
13.1.1. Difference between git
and GitHub#
git
and GitHub can often be used interchangeably, but it is important to note they are not the same. git
is a local, open-source version control system that can used to keep track of changes locally. GitHub is an online web platform (for-profit, owned by Microsoft) that allows you to host your repositories online. Online hosting is a real strength that massively enhances your ability to perform open and reproducable science.
GitHub is, ultimately, the primary game in town, but it’s important to note that it isn’t necessarily part of the same feel-good open-source community as the rest of the tools I’ve been recommending. There are open-source alternatives, the main one being GitLab, but this is less beginner-friendly and costs money.
13.2. Installing git
#
13.2.1. Linux / WSL#
If Git is not already available on your machine you can try to install it via your distro’s package manager. e.g. sudo apt-get install git
.
13.2.2. MacOS#
From the Terminal app, type git --version
. If it’s not installed already, follow the instructions to install the “command line developer tools”. Don’t bother with “Get Xcode”: that will install the full set of tools which will take ages and is not necessary for our purposes.
13.3. Setting up a repository#
Repositories (or repos) are folders whose history git
tracks.
13.3.1. On GitHub#
Log into GitHub.
Click New repository.
Choose a name, description, and set visibility (private for personal work, public to share).
Leave “Initialize with README” unchecked if you plan to connect an existing local folder.
13.3.2. Locally#
Navigate to your project folder:
cd my-project
Initialize
git
:git init
Link it to GitHub (replace with your repo URL):
git remote add origin https://github.com/username/my-project.git
13.3.3. Recommended basic files#
These files help organize and communicate your project.
13.3.3.1. .gitignore
#
Lists files/folders git
should not track (e.g., large datasets, temporary files).
Example:
*.pyc
__pycache__/
data/raw/
.env
13.3.3.2. README.md
#
Explains your project: purpose, how to install dependencies, and how to run analyses. This is the first thing others see.
13.3.3.3. LICENSE
#
Defines how others can use your code. Common choices for science:
MIT License (permissive, simple)
GPL (ensures sharing and openness)
13.3.3.4. environment.yml
#
For reproducibility. Defines the Python environment (packages, versions) used in your project.
Example:
name: climate-analysis
channels:
- conda-forge
dependencies:
- python=3.11
- numpy
- xarray
- pandas
- matplotlib
- jupyter
13.4. Using a personal repository#
13.4.1. Committing and pushing#
Typical workflow:
Stage changes:
# add all new files within a directory, if neccessary git add .
Commit with a message:
git commit -m "Add NDSI function"
Push to GitHub:
git push origin main
Think: add → commit → push.
13.4.2. Branching#
Branches let you experiment without breaking your main work.
Create a new branch:
git checkout -b new-analysis
Commit to new branch on GitHub
git push origin new-analysis
Now, you have the option to keep two seperate strands of files going - one your original, and one with your new idea/function/analysis. At the end, you can merge the branches on GitHub (or locally, but it’s worth doing on GitHub and then git pull
to update your main branch for deleting the local non-main branch).
13.5. Working with others#
Tip: Start small — just version-control your code, README, and environment files. Collaboration and branching come naturally once you’re comfortable.
13.5.1. Forking#
If you want to contribute to someone else’s repository but don’t have write access:
Fork it on GitHub (creates your copy).
Clone your fork locally:
git clone https://github.com/yourname/their-project.git
13.5.2. Pull requests#
When you’ve made changes in your fork and want the original repo to adopt them:
Push your branch to your fork.
On GitHub, open a Pull Request (PR) from your branch into the main project.
Project maintainers will review, discuss, and merge if appropriate.