Skip to content

Containerized Jobs

The SLURM cluster is configured with Apptainer and Enroot/Pyxis in order to run containers.


Apptainer

With SLURM clusters, you can run workloads inside containers using Apptainer, a lightweight and secure container engine built for HPC and scientific computing environments. Formerly known as Singularity, Apptainer integrates seamlessly with Slurm to provide portable, reproducible job execution and continues to use the familiar .sif image format.

Follow these steps to run containers with Apptainer.

  • Access the Login node via SSH
  • Run the following command to pull the container image and convert it to .sif format used by Apptainer
srun apptainer pull cuda_image.sif docker://nvidia/cuda:12.4.1-cudnn-devel-rockylinux8

Create a file named apptainer.sbatch on the shared volume, /mnt/data with the following content

#!/bin/bash
#SBATCH --job-name=apptainer
#SBATCH --output=output.log
#SBATCH --error=error.log
apptainer exec --nv cuda_image.sif date

For more information about other parameters available for srun, see Apptainer documentation. Once the image is converted, run the job by executing the following command

sbatch apptainer.sbatch

Check the output.log file in the directory where the script was run to see the date output by the container


Enroot/Pyxis

SLURM clusters integrate Enroot and Pyxis to enable containerized job execution:

  • Enroot is a lightweight container runtime developed by NVIDIA for machine learning and high-performance computing (HPC) workloads. It supports Docker-compatible images and runs them efficiently within SLURM environments.
  • Pyxis is a SLURM plugin that extends Enroot’s functionality, allowing users to launch containers directly with the srun command by adding --container-* options.

With this setup, you can easily execute a SLURM job inside a container, whether the container image resides in a registry or is stored locally. Follow these steps to run containers with Enroot/Pyxis.

  • Access the Login node via SSH
  • Create a file named pyxis.sbatch on the shared volume, /mnt/data with the following content
#!/bin/bash

#SBATCH --job-name=pyxis
#SBATCH --output=output.log
#SBATCH --error=error.log
#SBATCH --partition=pyxis

srun --container-image="nvcr.io#nvidia/tensorflow:23.02-tf1-py3" \
   python -c "import tensorflow as tf; print (tf.__version__)"

For more information about other parameters available for srun, see Pyxis documentation. Run the job by executing the following command

sbatch pyxis.sbatch