14.1. Durham University#

14.1.1. Accessing Hamilton, the Durham HPC#

The main centrally run supercomputer at Durham is Hamilton, for which the main page can be found here. This is free to access for Durham researchers and just requires registration, which can be found on the Hamilton website. The Advance Research Computing (ARC) team also runs regular training sessions on using Hamilton and other topics related to scientific computing.

NB: if you are interested in performing deep learning or other GPU-heavy tasks, you may instead by interested in the Bede Supercomputer, the regional GPU supercomputer. It requires a slightly more significant application process but the process is still not arduous.

14.1.2. Accessing Hamilton via the Terminal#

14.1.2.1. Using ssh#

Within the terminal, you can log into Hamilton via the command:

ssh <username>@hamilton8.dur.ac.uk

Note that Hamilton access requires you to within the University network. If you are accessing away from campus, access.durham.ac.uk provides a convenient VPN - just turn it on before ssh-ing in.

Where <username> is your University user ID. You can transfer data using the scp command:

scp <filename> <username>@hamilton8.dur.ac.uk

Although there are other, possibly slightly more convenient, methods available to you.

14.1.2.2. Making ssh access convenient#

14.1.2.3. SSH config#

To avoid having to type in the full ssh command every single time, it is work setting up an ssh config file. You can create a file at $HOME/.ssh/config (where $HOME is your home directory), and add a configuration block:

Host hamilton
  HostName hamilton.dur.ac.uk
  User YOURUSERNAME

Now you can write ssh hamilton instead of ssh <username>@hamilton.dur.ac.uk.

(Of course, you can also set up an alias in your bashrc or zshrc file - e.g. alias hamilton="ssh <username>@hamilton.dur.ac.uk", so now you can just type hamilton to get on board!)

14.1.2.4. SSH keys#

With a bit more effort, you can even skip having to type in your password - instead authenticating your login via public key.

First, check whether you already have a keypair by checking whether in the ssh directory there are files such as id_rsa, id_rsa.pub, id_ed25519, or id_ed25519.pub (ed25519 is newer format than rsa but both should work).

If not, create a keypair as follows:

ssh-keygen -t rsa

The command line will ask a lot of questions which you can mostly press Enter through. It will ask for a passphrase - you can give one if you want, or leave it empty for fully passwordless security. The latter leaves you at risk if your laptop is broken into, of course.

Once you have a key, you can copy your public key to Hamilton:

ssh-copy-id hamilton

(Assuming you have set up hamilton as a config host name). If ssh-copy-id isn’t available, you can also do it the old fashioned way:

cat ~/.ssh/id_rsa.pub | ssh <username>@hamilton.dur.ac.uk "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys"

Either way, your public key should now be on your Hamilton home directory in ~/.ssh/authorized_keys, and you shouldn’t need your password to log in.

If you set a passphrase, you will still need that - you can add your key to ssh-agent if so, and you will only need to enter your passphrase once per reboot.

eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_rsa

14.1.2.5. An additional note - VS Code#

If you have been using VS Code, it is possible to set up the “SSH - Remote” extension. From here, you can open a new VS Code window that is ssh-ed in to your Hamilton home directory, allowing you to edit your code and run the terminal from within VS Code. This is my preferred way of interacting with the HPC - this is a good guide.

14.1.3. Using the HPC#

Once in, you will see you have a home directory /home/username with 10GB of storage space and a scratch space of 600GB at /nobackup/username. More here.

14.1.3.1. Login nodes#

By default, when you log in, you will be in the login node. This is a low-resource shared node, so you shouldn’t be using it for anything serious or you will be making life more difficult for everyone else. It’s totally fine to use this to manage your files, edit code, manage your environments, etc., but doing any serious processing will make life hard for everyone else and will likely get you told off.

Regardless, take the opportunity to (re) set up your environment: installing Python using Miniforge; setting up you conda environment; getting your bashrc file (Hamilton uses bash) exactly how you like it; moving data across; etc etc.

14.1.3.2. Running code on compute nodes#

The official Hamilton guide on this is located here.

To run serious programs, you will need to create a .sh script that you.

Let’s say you have a serious_python_script.py. First, make sure the script is executable (chmod u+x serious_python_script.py).

Then, you could run the script by creating a file called my_batch_job.sh which reads as follows:

#!/bin/bash
 
# Request resources:
#SBATCH -c 1           # 1 CPU core
#SBATCH --mem=1G       # memory required, up to 250G on standard nodes.
#SBATCH --time=1:0:0   # time limit for job (format:  days-hours:minutes:seconds)
#SBATCH --gres=tmp:1G  # temporary disk space required on the compute node ($TMPDIR),
                       # up to 400G
#SBATCH -p shared      # Run in the 'shared' queue (job may share node with other jobs)
 
# Commands to be run: 
conda activate geospatial
cd ~/path/to/my/script/directory
python serious_python_script.py

The first part of the file is your SLURM instructions. Note that I have requested a fairly tame amount of resources (this is not worth using an HPC for). However, in the queue, this will get served the quickest when space is available, so it is not worth asking for more than you need. Additionally, your processing might not be about sheer memory - it could be about needing to run a moderate task for a long time and not wanting your own machine overheating for days in the meantime. The exception to resource choice is the number of cores, for which is almost always worth choosing ‘1’ when using Python, unless you know what you’re doing with libraries that go beyond serial processing, such multiprocessing, joblib, mpi4pu, etc., or are using libraries that do (e.g. dask or tensorflow/pytorch).

Note that -p selects your “queue”, the names of which are specific to hamilton. shared is the default queue, with 119 nodes and a job limit of 3 days. test has 1 node and a time limit of 15 minutes - for short tests, you might want to submit to this queue as the wait time will be far shorter. long goes up to 7 days and bigmem allows >250GB memory, but both of these are more competitive and will require longer wait times. Don’t ask for more than you need!

The second part of the file is the .sh script of commands you want to run. When the workload manager wakes a node and begins running your script, you have to imagine that you have just logged in for the first time. Hence, I have activated my conda environment and cd to my desired directory.

Once you’re happy, you can submit the job to the queue with the command sbatch my_python_job.sh.

14.1.3.3. Further SLURM stuff#

Once you have submitted a job to the queueing system with the sbatch command, you will be provided with a job number and/or job id (you can also set this to be emailed to you, with updates if wanted, in your SLURM file). You can check how your submitted runs are doing in the queue using squeue:

squeue -u myusername
             JOBID PARTITION     NAME       USER ST       TIME  NODES NODELIST(REASON)
           3141717    shared myscript   foobar22 PD       0:00      1 (Resources)

This, for instance, shows that your script is waiting for resources to become free. scancel <jobid> can be used to cancel your job if you change your mind.

When the job has started running, a file called slurm_<JOBID>.out will be created that contained any printed output from your script - useful so you can keep track of things. In the SLURM file, you can alter where this will be located.

You can also run interactive jobs, if you would like to run intensive jobs interactively but beyond the capability of a login node. You can do this via e.g.

srun --pty --mem=2G -c 1 -p test bash

This will be put in the same queue, however. You could also replace bash with another command.

14.1.4. A convenient GUI, OpenOnDemand#

There now exists a convenient alternative to CLI ssh login login known as the Hamilton Portal, an instance of OpenOnDemand. This interface allows you to log in through an online GUI, monitoring your filespace, jobs, and even access interactive apps including Jupyter Notebooks. This is by far the recommended way to interact with Jupyter Notebook on Hamilton, but is also useful for other tasks.

To get started with the portal, navigate to portal.hamilton8.dur.ac.uk and enter your username and password when requested.