Migration from SGE to Slurm

Overview

HPC clusters at MPCDF use Slurm job scheduler for batch job management and execution.

This reference guide provides information on migrating from SGE to Slurm.

Common job commands

Command	SGE	Slurm
Cluster status	–	sinfo
Job submission	qsub <job_script>	sbatch <job_script>
Start an interactive job	qlogin or qrsh	srun <args> --pty bash
Job deletion	qdel <job_ID>	scancel <job_ID>
Job status (all)	qstat or show	squeue
Job status by job	qstat -j <job_ID>	squeue -j <job_ID>
Job status by user	qstat -u <user>	squeue -u <user>
Job status detailed	qstat -j <job_ID>	scontrol show job <job_ID>
Show expected start time	qstat -j <job_ID>	squeue -j <job_ID> --start
Hold a job	qhold <job_ID>	scontrol hold <job_ID>
Release a job	qrls <job_ID>	scontrol release <job_ID>
Queue list / information	qconf -sql	scontrol show partition
Queue details	qconf -sq <queue>	scontrol show partition <queue>
Node list	qhost	scontrol show nodes
Node details	qhost -F <node>	scontrol show node <node>
X forwarding	qsh <args>	salloc <args> or srun <args> --pty
Monitor or review job resource usage	qacct -j <job_ID>	sacct -j <job_ID>
GUI	qmon	sview

Job submission options in scripts

Option	SGE (qsub)	Slurm (sbatch)
Script directive	#$	#SBATCH
Job name	-N <name>	--job-name=<name>
Standard output file	-o <file_path>	--output=<file_path>
Standard error file	-e <file_path>	--error=<file_path>
Combine stdout/stderr to stdout	-j yes	--output=<file_path>
Working directory	-wd <directory_path>	--workdir=<directory_path>
Request notification	-m <events>	--mail-type=<events>
Email address	-M <email_address>	--mail-user=<email_address>
Job dependency	-hold_jid [job_ID \| job_name]	--dependency=after:job_JD[:job_JD…] --dependency=afterok:job_JD[:job_JD…] --dependency=afternotok:job_JD[:job_JD…] --dependency=afterany:job_JD[:job_JD…]
Copy environment	-V	--export=ALL (default)
Copy environment variable	-v <variable[=value][,variable2=value2[,…]]>	--export=<variable[=value][,variable2=value2[,…]]>
Node count	–	--nodes=<count>
Request specific nodes	-l hostname=<node>	--nodelist=<node[,node2[,…]]> --nodefile=<node_file>
Processor count per node	-pe <count>	--ntasks-per-node=<count>
Processor count per task	–	--cpus-per-task=<count>
Memory limit	-l mem_free=<limit>	--mem=<limit> (in mega bytes -MB)
Minimum memory per processor	–	--mem-per-cpu=<memory>
Wall time limit	-l h_rt=<seconds>	--time=<hh:mm:ss>
Queue	-q <queue>	--partition=<queue>
Request specific resource	-l resource=<velue>	--gres=gpu:<count> or --gres=mic:<count>
Job array	-t <array_indices>	--array=<array_indices>
Licences	-l licence=<licence_spec>	--licences=<licence_spec>
Assign job to the project	-P <project_name>	--account=<project_name>

Job environments

Information	SGE	Slurm	Comments
Version	–	–	Can be extracted by `sbatch --version`
Job name	$JOBNAME	$SLURM_JOB_NAME
Job ID	$JOBID	$SLURM_JOB_ID
Batch or interactive	$ENVIRONMENT	–
Submit host	$SGE_O_HOST	$SLURM_SUBMIT_HOST
Submit directory	$SGE_O_WORKDIR	$SLURM_SUBMIT_DIR	Slurm jobs start from the submit directory by default
Node file	$PE_HOSTLIST	–	File and path that lists the nodes where a job has been allocated
Node list	cat $PE_HOSTLIST	$SLURM_JOB_NODELIST	To get a list of nodes: `scontrol show hostnames $SLURM_JOB_NODELIST`
Hostname	$HOSTNAME	$SLURM_SUBMIT_HOST
Job user	$USER	$SLURM_JOB_USER
Job array index	$SGE_TASK_ID	$SLURM_ARRAY_TASK_ID
Queue name	$QUEUE	$SLURM_JOB_PARTITION
Number of allocated nodes	$NHOSTS	$SLURM_JOB_NUM_NODES
Number of procecces	$NSLOTS	$SLURM_NTASKS
Number of procecces per node	–	$SLURM_TASKS_PER_NODE
Requested tasks per node	–	$SLURM_NTASKS_PER_NODE
Requested cpus per task	–	$SLURM_CPUS_PER_TASK
Scheduling priority	–	$SLURM_PRIO_PROCESS

The OpenMP can require a variable OMP_NUM_THREADS to be set what can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK that is set when –cpus-per-task is specified in a sbatch script

Set OMP_NUM_THREADS
# Set the number of cores available per process if the $SLURM_CPUS_PER_TASK is set if [ ! -z $SLURM_CPUS_PER_TASK ] ; then export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK else export OMP_NUM_THREADS=1 fi

Sample job scripts

SGE script

Slurm script 1

#!/bin/bash
#
#
#$ -N sge_test
#$ -j y
#$ -o test.output
# Current working directory
#$ -cwd
#$ -M YourID@some.domain
#$ -m bea
# Request for 8 hours run time
#$ -l h_rt=8:0:0
# Specify the project for job
#$ -P your_project_name_here
# Set Memory for job
#$ -l mem=4G
echo "start job"
sleep 120
echo "bye"

#!/bin/bash -l
# NOTE the -l flag!
#
#SBATCH -J slurm_test
#SBATCH -o test.output
#SBATCH -e test.output
# Default in slurm
#SBATCH -D ./
#SBATCH --mail-user YourID@some.domain
#SBATCH --mail-type=ALL
# Request 8 hours run time
#SBATCH -t 8:0:0
# Specify the project for job
#SBATCH -A your_project_name_here
# Set Memory for job
#SBATCH --mem=4000
echo "start job"
sleep 120
echo "bye"

Remarks

1: #SBATCH -A can be simply ignored as not used in the same way as in SGE at MPCDF.

More examples can be found at home page of institute general-purpose compute cluster Cobra and on the page with sample scripts