Skip to content

eXascaleInfolab/faith-cluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Slurm Small Documentation (Faith Cluster)

Overview

Simple Linux Utility for Resource Management, or Slurm, is an open-source job scheduler used by high-performance computing (HPC) clusters to manage and allocate resources for jobs submitted by users. It has different features, such as job queuing, resource allocation, and job execution based on a defined scheduling policy. The official documentation helps the users for a quick hands-in with Slurm. An official cheat sheet can also be used to check the available commands.

Every following commands should be run from the master node, that is the diufrd200, which is the entry point for the faith cluster.

Basic Commands

Command Description
sbatch <script_name> Submit a job script in the queue, possibly for later execution
squeue Show the job queue
scancel <job_id> Cancel a queued or running job
sinfo Show information about nodes status in the cluster

It is better to immediately sinfo to check the available nodes, and their partitions.

Common #SBATCH Options in Job Script

Inside a Slurm job script (bash script), #SBATCH directives are used to define how the job should be handled by the scheduler. These lines must begin with #SBATCH and are placed at the top of the script. The table below shows the main options that can be used, and the complete list can be found in the official documentation. Be careful of the default values if options are not defined in the script.

Option Description
--job-name=NAME Set the name of the job
--nodelist=NODES Specify the host/node where to launch the script
--partition=PARTITION Select which partition to submit (queue) to
--ntasks=N Number of tasks/processes to run
--cpus-per-task=N Number of CPU cores per task
--output=FILE File to write standard output
--error=FILE File to write standard error (optional)
--gres=gpu:N Request N GPUs (if available)
--mem=4G Request a specific amount of memory
--mail-user=EMAIL Email address for notifications
--mail-type=BEGIN,END,FAIL When to send email notifications

Example Job Script for Python

More examples in here

#!/bin/bash
#SBATCH --job-name=my_nice_job
#SBATCH --nodelist=diufrd202
#SBATCH --partition=GPU
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --output=logs/%x_%j.log
#SBATCH --error=logs/%x_%j.log
#SBATCH --gres=1
#SBATCH --mem=4G
#SBATCH --mail-user=john.doe@unifr.ch
#SBATCH --mail-type=END,FAIL

python <python_script_name>

Then launch the bash script with sbatch <bash_script_name>.

Tricks

  • Slurm does not show any output on the shell, which sometimes can be annoying when we want to see some progression or various outputs. We could constantly outputting the modification of the log file to the terminal with the tail -f <log_file> command.

UniFR documentation

More examples and details can be found in the UniFR intranet page (need to access it through the UniFR VPN)

About

Indications and hints on how to use the Faith Cluster

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published