Project Slinky: Bringing Slurm Scheduling to Kubernetes¶
As high-performance computing (HPC) environments evolve, there’s an increasing demand to bridge the gap between traditional HPC job schedulers and modern cloud-native infrastructure. Project Slinky is an open-source project that integrates Slurm, the industry-standard workload manager for HPC, with Kubernetes, the de facto orchestration platform for containers.
This enables organizations to deploy and operate Slurm-based workloads on Kubernetes clusters allowing them to leverage the best of both worlds: Slurm’s mature, job-centric HPC scheduling model and Kubernetes’s scalable, cloud-native runtime environment.
Why Integrate Slurm with Kubernetes?¶
Slurm is trusted by many of the world’s largest supercomputing centers. It offers fine-grained control for batch job scheduling, queue management, prioritization, and accounting. However, it’s not designed to work natively in containerized environments, nor does it inherently support the cloud-native deployment models Kubernetes excels at.
Kubernetes, on the other hand, is optimized for microservices and long-running containerized applications. It lacks the depth and domain expertise needed to schedule large MPI jobs or batch workflows typical in HPC.
Project Slinky attempts to bridge this gap by allowing Slurm to schedule jobs that run in containers inside Kubernetes. This lets HPC teams containerize their workloads, adopt DevOps practices, and extend compute capacity into the cloud—without giving up the Slurm interface or sacrificing control.
How Does it Work?¶
At its core, Project Slinky introduces a Slurm Kubernetes plugin and supporting services that allow Slurm to delegate job execution to Kubernetes pods. Here’s how it works:
1. Job Submission¶
Nothing changes for users that are already familiar with Slurm. They submit jobs via standard Slurm interfaces (sbatch, srun, etc.)
2. Pod Translation¶
The plugin translates Slurm job specifications into Kubernetes pod definitions.
3. Kubernetes Execution¶
Slurm launches and manages the pod lifecycle through Kubernetes APIs.
4. Resource Synchronization¶
Slinky maps Slurm partitions and nodes to Kubernetes namespaces and node pools, enabling Slurm to “see” Kubernetes resources as schedulable targets.
Note that Slinky supports features like GPU scheduling, MPI job orchestration, and accounting integration. It also ensures Slurm remains the single point of interaction for job lifecycle management, maintaining compatibility with existing Slurm workflows.
Benefits¶
Listed below are some of the benefits of this approach of unifying Slurm with Kubernetes.
✅ Auto Containerization of HPC workloads without disrupting user experience for existing Slurm users
✅ Capacity Bursting: Users can offload Slurm jobs to managed Kubernetes clusters where additional capacity may be available
✅ Unified & Converged Infrastructure: Run HPC and cloud-native apps on the same cluster
✅ Scalability: Let Kubernetes handle pod placement, autoscaling, and fault tolerance
Conclusion¶
As HPC workloads increasingly move toward the cloud and container-based execution, Project Slinky provides a bridge by respecting the legacy and power of Slurm while embracing the operational efficiency and scale of Kubernetes. Slinky doesn’t replace Slurm or Kubernetes—it makes them better together.
Project Slinky can be an excellent solution for
- Research institutions wanting to modernize their HPC stack
- Enterprises running AI/ML workflows that require batch GPU scheduling
- Cloud-native teams looking to unify infrastructure and reduce operational complexity
In the next blog, we will look at how customers use Rafay's GPU PaaS to provide users with self service access to Slinky based Slurm clusters running on multi tenant Kubernetes clusters.
-
Free Org
Sign up for a free Org if you want to try this yourself with our Get Started guides.
-
Live Demo
Schedule time with us to watch a demo in action.