Skip to content

Kubernetes

Enable Dynamic Resource Allocation (DRA) in Kubernetes

In the previous blog, we introduced the concept of Dynamic Resource Allocation (DRA) that just went GA in Kubernetes v1.34 which was released in August 2025.

In this post, we’ll will configure DRA on a Kubernetes 1.34 cluster.

Info

We have optimized the steps for users to experience this on their macOS or Windows laptops in less than 15 minutes. The steps in this blog are optimized for macOS users.

Introduction to Dynamic Resource Allocation (DRA) in Kubernetes

In the previous blog, we reviewed the limitations of Kubernetes GPU scheduling. These often result in:

  1. Resource fragmentation – large portions of GPU memory remain idle and unusable.
  2. Topology blindness – multi-GPU workloads may be scheduled suboptimally.
  3. Cost explosion – teams overprovision GPUs to work around scheduling inefficiencies.

In this post, we’ll look at how a new GA feature in Kubernetes v1.34Dynamic Resource Allocation (DRA) — aims to solve these problems and transform GPU scheduling in Kubernetes.

Rethinking GPU Allocation in Kubernetes

Kubernetes has cemented its position as the de-facto standard for orchestrating containerized workloads in the enterprise. In recent years, its role has expanded beyond web services and batch processing into one of the most demanding domains of all: AI/ML workloads.

Organizations now run everything from lightweight inference services to massive, distributed training pipelines on Kubernetes clusters, relying heavily on GPU-accelerated infrastructure to fuel innovation.

But there’s a problem. In this blog, we will explore why the current model falls short, what a more advanced GPU allocation approach looks like, and how it can unlock efficiency, performance, and cost savings at scale.

Demystifying Fractional GPUs in Kubernetes: MIG, Time Slicing, and Custom Schedulers

As GPU acceleration becomes central to modern AI/ML workloads, Kubernetes has emerged as the orchestration platform of choice. However, allocating full GPUs for many real-world workloads is an overkill resulting in under utilization and soaring costs.

Enter the need for fractional GPUs: ways to share a physical GPU among multiple containers without compromising performance or isolation.

In this post, we'll walk through three approaches to achieve fractional GPU access in Kubernetes:

  1. MIG (Multi-Instance GPU)
  2. Time Slicing
  3. Custom Schedulers (e.g., KAI)

For each, we’ll break down how it works, its pros and cons, and when to use it.

Self-Service Slurm Clusters on Kubernetes with Rafay GPU PaaS

In the previous blog, we discussed how Project Slinky bridges the gap between Slurm, the de facto job scheduler in HPC, and Kubernetes, the standard for modern container orchestration.

Project Slinky and Rafay’s GPU Platform-as-a-Service (PaaS) combined provide enterprises and cloud providers with a transformative combination that enables secure, multi-tenant, self-service access to Slurm-based HPC environments on shared Kubernetes clusters. Together, they allow cloud providers and enterprise platform teams to offer Slurm-as-a-Service on Kubernetes—without compromising on performance, usability, or control.

Design

Project Slinky: Bringing Slurm Scheduling to Kubernetes

As high-performance computing (HPC) environments evolve, there’s an increasing demand to bridge the gap between traditional HPC job schedulers and modern cloud-native infrastructure. Project Slinky is an open-source project that integrates Slurm, the industry-standard workload manager for HPC, with Kubernetes, the de facto orchestration platform for containers.

This enables organizations to deploy and operate Slurm-based workloads on Kubernetes clusters allowing them to leverage the best of both worlds: Slurm’s mature, job-centric HPC scheduling model and Kubernetes’s scalable, cloud-native runtime environment.

Project Slinky

Get Started with Cilium as a Load Balancer for On-Premises Kubernetes Clusters

Organizations deploying Kubernetes in on-premises data centers or hybrid cloud environments often face challenges with exposing services externally. Unlike public cloud providers that offer managed load balancers out of the box, bare metal environments require custom solutions. This is where Cilium steps in as a powerful alternative, offering native load balancing capabilities using BGP (Border Gateway Protocol).

Cilium is more than just a CNI plugin. It enables advanced networking features, such as observability, security, and load balancing—all integrated deeply with the Kubernetes networking model. Specifically, Cilium can advertise Kubernetes LoadBalancer service IPs to external routers using BGP, making these services reachable directly from external networks without needing to rely on cloud-native load balancers or manual proxy setups. This is ideal for enterprises running bare metal Kubernetes clusters, air-gapped environments, or hybrid cloud setups.

Want to dive deeper? Check out our introductory blog on Cilium’s Kubernetes load balancing capabilities. Navigate to the detailed step-by-step instructions for additional information.

Using Cilium as a Kubernetes Load Balancer: A Powerful Alternative to MetalLB

In Kubernetes, exposing services of type LoadBalancer in on-prem or bare-metal environments typically requires a dedicated "Layer 2" or "BGP-based" software load balancer—such as MetalLB. While MetalLB has been the go-to solution for this use case, recent advances in Cilium, a powerful eBPF-based Kubernetes networking stack, offer a modern and more integrated alternative.

Cilium isn’t just a fast, scalable Container Network Interface (CNI). It also includes cilium-lb, a built-in eBPF-powered load balancer that can replace MetalLB with a more performant, secure, and cloud-native approach.

Cilium based k8s Load Balancer

Introduction to Slurm-The Backbone of HPC

This is part-1 in a blog series on Slurm. In the first part, we will provide some introductory concepts about Slurm. We are not talking about the fictional soft drink in the world of Futurama. Instead, this blog is about Slurm (Simple Linux Utility for Resource Management), an open-source, fault-tolerant, and highly scalable cluster management job scheduler and resource manager used in high-performance computing (HPC) environments.

Slurm was originally conceptualized in 2002 at Lawrence Livermore National Laboratory (LLNL) and has been actively developed and maintained especially by SchedMD. In this time, Slurm has become the defacto workload manager for HPC with >50% of the Top-500 super computers using it.

Slurm Logo

Get Started with Auto Mode for Amazon EKS with Rafay

This is Part 3 in our series on Amazon EKS Auto Mode. In the previous posts, we explored:

  1. Part 1: An Introduction: Learn the core concepts and benefits of EKS Auto Mode.
  2. Part 2: Considerations: Understand the key considerations before Configuring EKS Auto Mode.

In this post, we will dive into the steps required to build and manage an Amazon EKS cluster with Auto Mode template using the Rafay Platform. This exercise is specifically well suited for platform teams interested in providing their end users with a controlled self-service experience with centralized governance.

EKS Auto Mode Cluster in Rafay