Slurm Small Documentation (Faith Cluster)

Overview

Simple Linux Utility for Resource Management, or Slurm, is an open-source job scheduler used by high-performance computing (HPC) clusters to manage and allocate resources for jobs submitted by users. It has different features, such as job queuing, resource allocation, and job execution based on a defined scheduling policy. The official documentation helps the users for a quick hands-in with Slurm. An official cheat sheet can also be used to check the available commands.

Every following commands should be run from the master node, that is the diufrd200, which is the entry point for the faith cluster.

Basic Commands

Command	Description
`sbatch <script_name>`	Submit a job script in the queue, possibly for later execution
`squeue`	Show the job queue
`scancel <job_id>`	Cancel a queued or running job
`sinfo`	Show information about nodes status in the cluster

It is better to immediately sinfo to check the available nodes, and their partitions.

Common `#SBATCH` Options in Job Script

Inside a Slurm job script (bash script), #SBATCH directives are used to define how the job should be handled by the scheduler. These lines must begin with #SBATCH and are placed at the top of the script. The table below shows the main options that can be used, and the complete list can be found in the official documentation. Be careful of the default values if options are not defined in the script.

Option	Description
`--job-name=NAME`	Set the name of the job
`--nodelist=NODES`	Specify the host/node where to launch the script
`--partition=PARTITION`	Select which partition to submit (queue) to
`--ntasks=N`	Number of tasks/processes to run
`--cpus-per-task=N`	Number of CPU cores per task
`--output=FILE`	File to write standard output
`--error=FILE`	File to write standard error (optional)
`--gres=gpu:N`	Request `N` GPUs (if available)
`--mem=4G`	Request a specific amount of memory
`--mail-user=EMAIL`	Email address for notifications
`--mail-type=BEGIN,END,FAIL`	When to send email notifications

Example Job Script for Python

More examples in here

#!/bin/bash
#SBATCH --job-name=my_nice_job
#SBATCH --nodelist=diufrd202
#SBATCH --partition=GPU
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --output=logs/%x_%j.log
#SBATCH --error=logs/%x_%j.log
#SBATCH --gres=1
#SBATCH --mem=4G
#SBATCH --mail-user=john.doe@unifr.ch
#SBATCH --mail-type=END,FAIL

python <python_script_name>

Then launch the bash script with sbatch <bash_script_name>.

Tricks

Slurm does not show any output on the shell, which sometimes can be annoying when we want to see some progression or various outputs. We could constantly outputting the modification of the log file to the terminal with the tail -f <log_file> command.

UniFR documentation

More examples and details can be found in the UniFR intranet page (need to access it through the UniFR VPN)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Slurm Small Documentation (Faith Cluster)

Overview

Basic Commands

Common `#SBATCH` Options in Job Script

Example Job Script for Python

Tricks

UniFR documentation

About

Uh oh!

Releases

Packages

eXascaleInfolab/faith-cluster

Folders and files

Latest commit

History

Repository files navigation

Slurm Small Documentation (Faith Cluster)

Overview

Basic Commands

Common #SBATCH Options in Job Script

Example Job Script for Python

Tricks

UniFR documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Common `#SBATCH` Options in Job Script

Packages