Batch Jobs

A Slurm cluster is primarily designed to execute batch jobs i.e. workloads that are time-limited and non-interactive.

To submit a batch job, define it in a batch script (a standard shell script) and submit it with the sbatch command. After submission, Slurm allocates the requested resources, creating a job allocation (referred to as a job), and runs the script on one of the allocated compute nodes.

You specify all job requirements (time, CPUs, memory, GPUs, etc.) as #SBATCH directives at the top of your script.

Slurm will:

  1. Place your job in the queue.
  2. Wait for resources to become available.
  3. Run the job automatically on the assigned nodes.

Example:

#!/usr/bin/env bash
#SBATCH --job-name=test_job
#SBATCH --output=output.%j.out
#SBATCH --time=00:10:00
#SBATCH --partition=cpu
#SBATCH --ntasks=1

echo "Running on $HOSTNAME"
To run this, execute the following:

sbatch test_job.slurm

Use sbatch when:

  1. You want non-interactive job execution.
  2. You don’t need to monitor output in real time.
  3. You’re running longer or repeatable workloads (training, simulations, etc.).