1. Introducing the command line#
Interecting with programmatic data requires a basic understanding of how to use a command line interface (CLI) - i.e. interacting with our computer using text-based commands.
We do this through a program known as a ‘terminal’, which runs a software known as a ‘shell’ - common shells include bash
and zsh
. You can probably ignore such details here: for our purposes, phrases such as ‘terminal’, ‘shell’, and ‘console’ are basically synonymous.
It may be the case that you don’t end up interacting with the terminal much at all, so don’t be put off by the fact that we have to start this journey in possibly the most low-level technical part. However, if you progress to a more advanced level, this will become essential knowledge!
1.1. Opening the command line#
1.1.1. Linux and MacOS#
Linux and MacOS are closely related operating systems, part of a family known as ‘Unix’ (or Unix-like) operating systems. Even though Windows is the most popular consumer OS, Unix-like operating systems form the basis of most computing infrastructure (web, high-performance computing, etc), as well as nearly all scientific computing.
On MacOS and most Linux distributions, you can access the command line through an application called ‘Terminal’. On some Linux distributions, it may also be called ‘Konsole’ or ‘gnome-terminal’ (although if you are using those distributions you probably don’t need this guide). Some power users choose to install their own alternative terminal applications (iTerm2 is quite popular on MacOS), but this isn’t necessary for our purposes at this stage.
1.1.2. Windows#
Windows is not a Unix-like OS, and as such has its own distinct command line interface for interacting with the Windows OS - most commonly ‘Command Prompt’ (cmd
) or ‘PowerShell’. However, this is a distinct software with its own language and tools. Nearly all scientific computing documentation and tutorials - including this one - assume a Unix-like CLI. The differences can range from how filepaths are formatted (e.g. /users/documents/file.txt
in a UNIX system vs. C:\\Users\Documents\file.txt
in a Windows OS) through to the actual commands used. At some point (usually quickly), attempting to perform scientific computing on a Windows machine will cause you some problems.
Luckily, Windows now has official capability to install a Unix-like subsystem called Windows Subsystem for Linux (WSL). This allows you to run a Linux environment on your Windows machine, enable to you interact with a command-line interface in the same way as described in this repo and most scientific computing guides.
WSL can come preinstalled on some setups, but if not, information on installing WSL can be found on the official Microsoft website. I am not a Windows user, so cannot offer any specific advice, but if you end up installing it let me know if there’s anything worth adding to this guide (especially if it’s related to Durham computing specifically!)
1.2. Basic Interactions#
1.2.2. Some introductory commands#
Now that we’re comfortable moving around, here are some other common beginner commands:
mkdir
: Makes a directory. e.g.mkdir new_directory
.echo
: Prints text to the terminal window. e.g.echo hello world
.touch
: Creates a new empty file. e.g.touch new_file.txt
.mv
: Move or rename a directroy. e.g.mv new_directory ~/newlocation/new_name_of_directory
.rm
: Delete a file or directory. e.g.rm ~/newlocation/new_name_of_directory
.
1.2.3. Slightly more advanced commands#
Most commands have a series of options you can activate via ‘flags’. rm
is a useful example. For instance, -i
activates ‘interactive mode’, where you are promped for confirmation, which is very useful for such a powerful command that can cause damage by deleting files. -r
deletes ‘recursively’, deleting all sub-directories and files. This is a very dangerous command. The conmmand rm -r my_directory
will blast away absolutely everything within my_directory
. A slip of a finger could delete everything within your home directory or even your hard drive!
Another useful thing to know is the use of *
as a wildcard. It tells the shell that *
can be replaced with any and all characters. For instance, ls *.txt
would list all .txt
files in the current directory, rm report_*
would delete all files beginning with report_
, and cp *.jpg ~/pictures
would copy all jpeg
files in the current working directory to the directory ~/pictures
.
Some more advance commands that I use a fair amount include the following. I will not explain how to use them, but leave you to google if they are useful (or use the man
command, which gives you a printed ‘manual’ of any command!):
curl
orwget
(depending on OS): download files found at given URLs.grep
: Search for fragments of text within larger bodies.wc
: Counts the number or words, lines, characters, etc. in a file or output.|
(pipe): links commands together. For instance,ls | wc -l
‘pipes’ the output ofls
intowc -l
, counting the number of lines in thels
output. Asls
prints out one directory per line, this gives us a count of the number of files and directories in your working directory.>
(right angle bracket): Output the results into whatever is specified (usually a file) - e.g.ls > my_directories.txt
It is also possible to view and edit documents and files within the command line using commands such as vim
- I will highlight that it is possible here, but it tends to be preferable for advanced powerusers. I prefer to edit my code using GUI software (specifically, VS Code), and will recommend as such in a later section.
There’s plenty more, but I will leave you to explore online material for help.
1.3. Shell scripts (.sh
files)#
A shell script is a file (ending in .sh
) that contains a series of commands for Unix-style command-line interpreter to execute. We can use these to automate repetitive tasks. For instance, let’s pretend we have a file called counthome.sh
that we can use to print an estimate of the number of files in our home directory:
#!/bin/zsh
# `ls -1` lists each file on a new line, and `wc -l` counts the lines.
count=$(ls -1 ~ | wc -l)
# Print the result to the terminal.
echo "You have $count files in your home directory."
Note a couple new techniques here: #
, which indicites a ‘comment’ (the computer will ignore these lines when running, allowing us to make notes for ourself), and setting a variable (count=
), which allows us to call is later by specifying $count
. The top line (#!/bin/zsh
) is just good convention, and indicates that the script was written to be run using the zsh
shell (although at our level, in 99.99% of cases it will run on any shell - bash
, etc).
Now, we run the file by entering ./counthome.sh
- note that ./
is necessary, indicating ‘the file within the current directory. If we do so, we will get just the echo
statement printed to the terminal.
This is a simple and relatively useless example, but later (when learning about running code on the HPC) we will make use of more practical .sh
scripts.
1.4. Run commands (rc
file)#
Every time you start a new shell, it will ‘forget’ all the instructions, variables, and commands you gave it in the previous session. As you develop in using the command line, there may be an number of commands and instructions you want to have at your fingertips.
An rc
file (e.g. ~/.bashrc
or /.zshrc
, depending on your shell) is effectively a shell script that is run every time the terminal is started. This allows you to set up your environment and preferences every time you start. You can create an empty file and edit it in a text editor (e.g. cd ~
, touch .bashrc
). The .
preprended to the file name indicates that this is a ‘hidden file’ that is not shown by default in e.g. file managers, or using the ls
command (this does mean that to find an edit it you will need to go through an extra step, which you can google according to your OS).
A useful example of a use of an rc
file is managing your PATH
. The PATH
is a list of directories where the shell looks for commands (this includes commands such as cd
and ls
that you are already using!). As you create your own scripts and commands (such as we did with .counthome.sh
above), it would be useful to them from anywhere without indicating the full filepath (e.g. ~/foo/bar/usefultool
). As a result, people often put these scripts into a directory called bin
in an accessible location (e.g. ~/bin/usefultool
). This file can in practice be called anything you like - it is just convention that it is called bin
, as this communicates the function clearly when other people begin exploring your directories. We can add the ~/bin
directory to the path using the command export PATH=$PATH:~/bin
. However, without an rc
file, we have to do this every time we open a new session. Instead, by creating an rc
file and including the line export PATH=$PATH:~/bin
, this adds the ~/bin
directory to your PATH
automatically every time you start a new session. Now, your terminal will search the ~/bin
directory when seeking a command to be executed, and your usefultool
will work anywhere!
Another example of a useful way of using the rc
file is using aliases. Say you have a long command you use quite often - for instance, changing directories to one with a very long filepath (e.g. ~/my/very/long/directory/path
). We can use the alias
tool to save our fingers by creating a shortcut command. Running the command alias gotodir="cd ~/my/very/long/directory/path"
means that every time you run gotodir
, it will instead run the longer command you indicated using the alias
function. Once again, the shell will ‘forget’ theses aliases when it is closed, so you will want to add alias commands to your rc
file.
Later in this material, various parts of the Python installation will add their own lines to your rc
file. Don’t be surprised if things start appearing that you haven’t added! It is always best to understand where things have come from and what they do, however, because often lines can be added that are mutually exclusive or break if executed in the wrong order.