Migration from SGE to Slurm


Overview

HPC clusters at MPCDF use Slurm job scheduler for batch job management and execution.

This reference guide provides information on migrating from SGE to Slurm.

Common job commands

Command

SGE

Slurm

Cluster status

sinfo

Job submission

qsub <job_script>

sbatch <job_script>

Start an interactive job

qlogin or qrsh

srun <args> --pty bash

Job deletion

qdel <job_ID>

scancel <job_ID>

Job status (all)

qstat or show

squeue

Job status by job

qstat -j <job_ID>

squeue -j <job_ID>

Job status by user

qstat -u <user>

squeue -u <user>

Job status detailed

qstat -j <job_ID>

scontrol show job <job_ID>

Show expected start time

qstat -j <job_ID>

squeue -j <job_ID> --start

Hold a job

qhold <job_ID>

scontrol hold <job_ID>

Release a job

qrls <job_ID>

scontrol release <job_ID>

Queue list / information

qconf -sql

scontrol show partition

Queue details

qconf -sq <queue>

scontrol show partition <queue>

Node list

qhost

scontrol show nodes

Node details

qhost -F <node>

scontrol show node <node>

X forwarding

qsh <args>

salloc <args> or srun <args> --pty

Monitor or review job resource usage

qacct -j <job_ID>

sacct -j <job_ID>

GUI

qmon

sview

Job submission options in scripts

Option

SGE (qsub)

Slurm (sbatch)

Script directive

#$

#SBATCH

Job name

-N <name>

--job-name=<name>

Standard output file

-o <file_path>

--output=<file_path>

Standard error file

-e <file_path>

--error=<file_path>

Combine stdout/stderr to stdout

-j yes

--output=<file_path>

Working directory

-wd <directory_path>

--workdir=<directory_path>

Request notification

-m <events>

--mail-type=<events>

Email address

-M <email_address>

--mail-user=<email_address>

Job dependency

-hold_jid [job_ID | job_name]

--dependency=after:job_JD[:job_JD…]

--dependency=afterok:job_JD[:job_JD…]

--dependency=afternotok:job_JD[:job_JD…]

--dependency=afterany:job_JD[:job_JD…]

Copy environment

-V

--export=ALL (default)

Copy environment variable

-v <variable[=value][,variable2=value2[,…]]>

--export=<variable[=value][,variable2=value2[,…]]>

Node count

--nodes=<count>

Request specific nodes

-l hostname=<node>

--nodelist=<node[,node2[,…]]>

--nodefile=<node_file>

Processor count per node

-pe <count>

--ntasks-per-node=<count>

Processor count per task

--cpus-per-task=<count>

Memory limit

-l mem_free=<limit>

--mem=<limit> (in mega bytes -MB)

Minimum memory per processor

--mem-per-cpu=<memory>

Wall time limit

-l h_rt=<seconds>

--time=<hh:mm:ss>

Queue

-q <queue>

--partition=<queue>

Request specific resource

-l resource=<velue>

--gres=gpu:<count> or --gres=mic:<count>

Job array

-t <array_indices>

--array=<array_indices>

Licences

-l licence=<licence_spec>

--licences=<licence_spec>

Assign job to the project

-P <project_name>

--account=<project_name>

Job environments

Information

SGE

Slurm

Comments

Version

Can be extracted by

sbatch --version

Job name

$JOBNAME

$SLURM_JOB_NAME

Job ID

$JOBID

$SLURM_JOB_ID

Batch or interactive

$ENVIRONMENT

Submit host

$SGE_O_HOST

$SLURM_SUBMIT_HOST

Submit directory

$SGE_O_WORKDIR

$SLURM_SUBMIT_DIR

Slurm jobs start from the submit

directory by default

Node file

$PE_HOSTLIST

File and path that lists the nodes

where a job has been allocated

Node list

cat $PE_HOSTLIST

$SLURM_JOB_NODELIST

To get a list of nodes:

scontrol show hostnames $SLURM_JOB_NODELIST

Hostname

$HOSTNAME

$SLURM_SUBMIT_HOST

Job user

$USER

$SLURM_JOB_USER

Job array index

$SGE_TASK_ID

$SLURM_ARRAY_TASK_ID

Queue name

$QUEUE

$SLURM_JOB_PARTITION

Number of allocated nodes

$NHOSTS

$SLURM_JOB_NUM_NODES

Number of procecces

$NSLOTS

$SLURM_NTASKS

Number of procecces per node

$SLURM_TASKS_PER_NODE

Requested tasks per node

$SLURM_NTASKS_PER_NODE

Requested cpus per task

$SLURM_CPUS_PER_TASK

Scheduling priority

$SLURM_PRIO_PROCESS

The OpenMP can require a variable OMP_NUM_THREADS to be set what can be obtained from the Slurm environment variable $SLURM_CPUS_PER_TASK that is set when –cpus-per-task is specified in a sbatch script

Set OMP_NUM_THREADS

# Set the number of cores available per process if the $SLURM_CPUS_PER_TASK is set
if [ ! -z $SLURM_CPUS_PER_TASK ] ; then
    export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
else
    export OMP_NUM_THREADS=1
fi

Sample job scripts

SGE script

Slurm script 1

#!/bin/bash
#
#
#$ -N sge_test
#$ -j y
#$ -o test.output
# Current working directory
#$ -cwd
#$ -M YourID@some.domain
#$ -m bea
# Request for 8 hours run time
#$ -l h_rt=8:0:0
# Specify the project for job
#$ -P your_project_name_here
# Set Memory for job
#$ -l mem=4G
echo "start job"
sleep 120
echo "bye"
#!/bin/bash -l
# NOTE the -l flag!
#
#SBATCH -J slurm_test
#SBATCH -o test.output
#SBATCH -e test.output
# Default in slurm
#SBATCH -D ./
#SBATCH --mail-user YourID@some.domain
#SBATCH --mail-type=ALL
# Request 8 hours run time
#SBATCH -t 8:0:0
# Specify the project for job
#SBATCH -A your_project_name_here
# Set Memory for job
#SBATCH --mem=4000
echo "start job"
sleep 120
echo "bye"

Remarks

1

#SBATCH -A can be simply ignored as not used in the same way as in SGE at MPCDF.

More examples can be found at home page of institute general-purpose compute cluster Cobra and on the page with sample scripts