14.1. Durham University#
14.1.1. Accessing Hamilton, the Durham HPC#
The main centrally run supercomputer at Durham is Hamilton, for which the main page can be found here. This is free to access for Durham researchers and just requires registration, which can be found on the Hamilton website. The Advance Research Computing (ARC) team also runs regular training sessions on using Hamilton and other topics related to scientific computing.
NB: if you are interested in performing deep learning or other GPU-heavy tasks, you may instead by interested in the Bede Supercomputer, the regional GPU supercomputer. It requires a slightly more significant application process but the process is still not arduous.
14.1.2. Accessing Hamilton via the Terminal#
14.1.2.1. Using ssh
#
Within the terminal, you can log into Hamilton via the command:
ssh <username>@hamilton8.dur.ac.uk
Note that Hamilton access requires you to within the University network. If you are accessing away from campus, access.durham.ac.uk provides a convenient VPN - just turn it on before ssh
-ing in.
Where <username>
is your University user ID. You can transfer data using the scp
command:
scp <filename> <username>@hamilton8.dur.ac.uk
Although there are other, possibly slightly more convenient, methods available to you.
14.1.2.2. Making ssh
access convenient#
14.1.2.3. SSH config#
To avoid having to type in the full ssh command every single time, it is work setting up an ssh config file. You can create a file at $HOME/.ssh/config
(where $HOME
is your home directory), and add a configuration block:
Host hamilton
HostName hamilton.dur.ac.uk
User YOURUSERNAME
Now you can write ssh hamilton
instead of ssh <username>@hamilton.dur.ac.uk
.
(Of course, you can also set up an alias
in your bashrc
or zshrc
file - e.g. alias hamilton="ssh <username>@hamilton.dur.ac.uk"
, so now you can just type hamilton
to get on board!)
14.1.2.4. SSH keys#
With a bit more effort, you can even skip having to type in your password - instead authenticating your login via public key.
First, check whether you already have a keypair by checking whether in the ssh
directory there are files such as id_rsa
, id_rsa.pub
, id_ed25519
, or id_ed25519.pub
(ed25519
is newer format than rsa
but both should work).
If not, create a keypair as follows:
ssh-keygen -t rsa
The command line will ask a lot of questions which you can mostly press Enter through. It will ask for a passphrase - you can give one if you want, or leave it empty for fully passwordless security. The latter leaves you at risk if your laptop is broken into, of course.
Once you have a key, you can copy your public key to Hamilton:
ssh-copy-id hamilton
(Assuming you have set up hamilton
as a config host name). If ssh-copy-id
isn’t available, you can also do it the old fashioned way:
cat ~/.ssh/id_rsa.pub | ssh <username>@hamilton.dur.ac.uk "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys"
Either way, your public key should now be on your Hamilton home directory in ~/.ssh/authorized_keys
, and you shouldn’t need your password to log in.
If you set a passphrase, you will still need that - you can add your key to ssh-agent
if so, and you will only need to enter your passphrase once per reboot.
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_rsa
14.1.2.5. An additional note - VS Code#
If you have been using VS Code, it is possible to set up the “SSH - Remote” extension. From here, you can open a new VS Code window that is ssh
-ed in to your Hamilton home directory, allowing you to edit your code and run the terminal from within VS Code. This is my preferred way of interacting with the HPC - this is a good guide.
14.1.3. Using the HPC#
Once in, you will see you have a home directory /home/username
with 10GB of storage space and a scratch space of 600GB at /nobackup/username
. More here.
14.1.3.1. Login nodes#
By default, when you log in, you will be in the login
node. This is a low-resource shared node, so you shouldn’t be using it for anything serious or you will be making life more difficult for everyone else. It’s totally fine to use this to manage your files, edit code, manage your environments, etc., but doing any serious processing will make life hard for everyone else and will likely get you told off.
Regardless, take the opportunity to (re) set up your environment: installing Python using Miniforge
; setting up you conda environment; getting your bashrc
file (Hamilton uses bash
) exactly how you like it; moving data across; etc etc.
14.1.3.2. Running code on compute nodes#
The official Hamilton guide on this is located here.
To run serious programs, you will need to create a .sh
script that you.
Let’s say you have a serious_python_script.py
. First, make sure the script is executable (chmod u+x serious_python_script.py
).
Then, you could run the script by creating a file called my_batch_job.sh
which reads as follows:
#!/bin/bash
# Request resources:
#SBATCH -c 1 # 1 CPU core
#SBATCH --mem=1G # memory required, up to 250G on standard nodes.
#SBATCH --time=1:0:0 # time limit for job (format: days-hours:minutes:seconds)
#SBATCH --gres=tmp:1G # temporary disk space required on the compute node ($TMPDIR),
# up to 400G
#SBATCH -p shared # Run in the 'shared' queue (job may share node with other jobs)
# Commands to be run:
conda activate geospatial
cd ~/path/to/my/script/directory
python serious_python_script.py
The first part of the file is your SLURM instructions. Note that I have requested a fairly tame amount of resources (this is not worth using an HPC for). However, in the queue, this will get served the quickest when space is available, so it is not worth asking for more than you need. Additionally, your processing might not be about sheer memory - it could be about needing to run a moderate task for a long time and not wanting your own machine overheating for days in the meantime. The exception to resource choice is the number of cores, for which is almost always worth choosing ‘1’ when using Python, unless you know what you’re doing with libraries that go beyond serial processing, such multiprocessing
, joblib
, mpi4pu
, etc., or are using libraries that do (e.g. dask
or tensorflow
/pytorch
).
Note that -p
selects your “queue”, the names of which are specific to hamilton. shared
is the default queue, with 119 nodes and a job limit of 3 days. test
has 1 node and a time limit of 15 minutes - for short tests, you might want to submit to this queue as the wait time will be far shorter. long
goes up to 7 days and bigmem
allows >250GB memory, but both of these are more competitive and will require longer wait times. Don’t ask for more than you need!
The second part of the file is the .sh
script of commands you want to run. When the workload manager wakes a node and begins running your script, you have to imagine that you have just logged in for the first time. Hence, I have activated my conda environment and cd
to my desired directory.
Once you’re happy, you can submit the job to the queue with the command sbatch my_python_job.sh
.
14.1.3.3. Further SLURM stuff#
Once you have submitted a job to the queueing system with the sbatch
command, you will be provided with a job number and/or job id (you can also set this to be emailed to you, with updates if wanted, in your SLURM file). You can check how your submitted runs are doing in the queue using squeue
:
squeue -u myusername
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
3141717 shared myscript foobar22 PD 0:00 1 (Resources)
This, for instance, shows that your script is waiting for resources to become free. scancel <jobid>
can be used to cancel your job if you change your mind.
When the job has started running, a file called slurm_<JOBID>.out
will be created that contained any printed output from your script - useful so you can keep track of things. In the SLURM file, you can alter where this will be located.
You can also run interactive jobs, if you would like to run intensive jobs interactively but beyond the capability of a login node. You can do this via e.g.
srun --pty --mem=2G -c 1 -p test bash
This will be put in the same queue, however. You could also replace bash
with another command.
14.1.4. A convenient GUI, OpenOnDemand#
There now exists a convenient alternative to CLI ssh
login login known as the Hamilton Portal, an instance of OpenOnDemand. This interface allows you to log in through an online GUI, monitoring your filespace, jobs, and even access interactive apps including Jupyter Notebooks. This is by far the recommended way to interact with Jupyter Notebook on Hamilton, but is also useful for other tasks.
To get started with the portal, navigate to portal.hamilton8.dur.ac.uk and enter your username and password when requested.