GPU Resource Management in Kubernetes: From Extended Resource to DRA
This blog is part of our DRA series, continuing from our earlier posts: Introduction to DRA, Enabling DRA with Kind, and MIG with DRA . This post focuses on pre-DRA vs post-DRA GPU management on Rafay upstream Kubernetes clusters.
Overview
With the rise of AI, ML, and HPC workloads, GPU resource management has become a cornerstone of Kubernetes scheduling. Over time, Kubernetes has evolved from static, count-based GPU allocation using extended resources (nvidia.com/gpu
) to the more flexible DRA framework now a stable feature in Kubernetes v1.34.
This guide walks through the evolution from pre DRA GPU management to DRA-based allocation and sharing, complete with examples.
Pre-DRA: GPU Management Using Extended Resources
Before DRA, Kubernetes workloads used the NVIDIA Device Plugin to expose GPUs as extended resources. These resources could then be requested by pods just like CPU or memory.
GPU Operator Components
To enable GPU scheduling, the NVIDIA GPU Operator packaged all required components:
-
Host components:
- NVIDIA GPU driver
-
Kubernetes components:
- NVIDIA device plugin
- MIG Manager
- DCGM Exporter
- GPU Feature Discovery (GFD)
Each of these components was deployed as a DaemonSet on GPU nodes, ensuring the scheduler could detect and allocate GPU resources properly.
Requesting GPUs (Pre-DRA)
Here's an example of how users would request GPU access before DRA:
apiVersion: v1
kind: Pod
metadata:
name: pod-gpu-classic
spec:
containers:
- name: app-container
image: nvidia/cuda
resources:
limits:
nvidia.com/gpu: 2
This tells Kubernetes to assign two GPUs to the container. The scheduler and device plugin work together to: - Locate a node with at least two available GPUs - Schedule the pod there - Inject the GPU devices into the container
If a specific GPU type was needed (e.g., A100-40GB), node labels and selectors could be used to ensure the pod landed on the right hardware.
Post-DRA: Dynamic Resource Allocation
Kubernetes v1.34 introduces DRA, which graduated to GA - a new, flexible, and vendor extendable approach for requesting resources such as GPUs.
Why DRA?
DRA addresses key limitations of the old model: - Enables fine grained GPU sharing providing complext constraints when requesting a GPU. - Allows custom APIs and parameters from vendors - Supports better isolation and resource reusability - Makes cross-pod sharing possible through claims
Requesting GPUs with DRA
Instead of requesting GPUs via a simple count (nvidia.com/gpu: 2
), DRA introduces three main objects:
1. DeviceClass βDefines a category of devices that can be claimed and how to select specific device attributes in claims.
2. ResourceClaimTemplate β defines how a claim should be created
3. ResourceClaim β represents an actual allocated resource
Here's the DRA equivalent for requesting two GPUs:
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
namespace: gpu
name: multiple-gpus
spec:
spec:
devices:
requests:
- name: gpu-1
exactly:
deviceClassName: gpu.example.com
- name: gpu-2
exactly:
deviceClassName: gpu.example.com
---
apiVersion: v1
kind: Pod
metadata:
namespace: gpu
name: pod0
labels:
app: pod
spec:
containers:
- name: ctr0
image: ubuntu:22.04
command: ["bash", "-c"]
args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"]
resources:
claims:
- name: gpus
resourceClaims:
- name: gpus
resourceClaimTemplateName: multiple-gpus
This example shows 1 pod with 1 container requesting 2 GPUs using the DRA mechanism.
kubectl get pods -n gpu
NAME READY STATUS RESTARTS AGE
pod0 1/1 Running 0 81s
logs of this pod to verify that GPUs were allocated to them
kubectl logs -f -n gpu pod0
declare -x DRA_RESOURCE_DRIVER_NAME="gpu.example.com"
declare -x GPU_DEVICE_3="gpu-3"
declare -x GPU_DEVICE_3_RESOURCE_CLAIM="c2d2a1d2-b52b-4b4c-b5ae-c1cace625493"
declare -x GPU_DEVICE_3_SHARING_STRATEGY="TimeSlicing"
declare -x GPU_DEVICE_3_TIMESLICE_INTERVAL="Default"
declare -x GPU_DEVICE_4="gpu-4"
declare -x GPU_DEVICE_4_RESOURCE_CLAIM="c2d2a1d2-b52b-4b4c-b5ae-c1cace625493"
declare -x GPU_DEVICE_4_SHARING_STRATEGY="TimeSlicing"
declare -x GPU_DEVICE_4_TIMESLICE_INTERVAL="Default"
declare -x HOME="/root"
declare -x HOSTNAME="pod0"
declare -x KUBERNETES_NODE_NAME="mks-demo"
declare -x KUBERNETES_PORT="tcp://10.96.0.1:443"
declare -x KUBERNETES_PORT_443_TCP="tcp://10.96.0.1:443"
declare -x KUBERNETES_PORT_443_TCP_ADDR="10.96.0.1"
declare -x KUBERNETES_PORT_443_TCP_PORT="443"
declare -x KUBERNETES_PORT_443_TCP_PROTO="tcp"
declare -x KUBERNETES_SERVICE_HOST="10.96.0.1"
declare -x KUBERNETES_SERVICE_PORT="443"
declare -x KUBERNETES_SERVICE_PORT_HTTPS="443"
declare -x OLDPWD
declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
declare -x PWD="/"
declare -x SHLVL="1"
- You have two device envs: GPU_DEVICE_3="gpu-3" and GPU_DEVICE_4="gpu-4".
Controlled GPU Sharing with DRA
One of DRA's most powerful capabilities is controlled GPU sharing, which allows multiple containers or pods to access the same GPU safely.
1. Intra-Pod GPU Sharing (Multiple Containers in One Pod)
Multiple containers within the same pod can reference a single ResourceClaim, giving them shared access to the same GPU.
---
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
namespace: gpu0
name: single-gpu
spec:
spec:
devices:
requests:
- name: gpu
exactly:
deviceClassName: gpu.example.com
---
apiVersion: v1
kind: Pod
metadata:
namespace: gpu0
name: pod0
spec:
containers:
- name: ctr0
image: ubuntu:22.04
command: ["bash", "-c"]
args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"]
resources:
claims:
- name: shared-gpu
- name: ctr1
image: ubuntu:22.04
command: ["bash", "-c"]
args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"]
resources:
claims:
- name: shared-gpu
resourceClaims:
- name: shared-gpu
resourceClaimTemplateName: single-gpu
This example shows 1 pod with 2 containers sharing 1 GPU in time using the Dynamic Resource Allocation (DRA) mechanism.
kubectl get pods -n gpu0
NAME READY STATUS RESTARTS AGE
pod0 2/2 Running 0 9s
kubectl logs -f -n gpu0 pod0 -c ctr0
declare -x DRA_RESOURCE_DRIVER_NAME="gpu.example.com"
declare -x GPU_DEVICE_3="gpu-3"
declare -x GPU_DEVICE_3_RESOURCE_CLAIM="99944ed0-806f-496f-bdb5-e457b6a66a2d"
declare -x GPU_DEVICE_3_SHARING_STRATEGY="TimeSlicing"
declare -x GPU_DEVICE_3_TIMESLICE_INTERVAL="Default"
declare -x HOME="/root"
declare -x HOSTNAME="pod0"
declare -x KUBERNETES_NODE_NAME="mks-demo"
declare -x KUBERNETES_PORT="tcp://10.96.0.1:443"
declare -x KUBERNETES_PORT_443_TCP="tcp://10.96.0.1:443"
declare -x KUBERNETES_PORT_443_TCP_ADDR="10.96.0.1"
declare -x KUBERNETES_PORT_443_TCP_PORT="443"
declare -x KUBERNETES_PORT_443_TCP_PROTO="tcp"
declare -x KUBERNETES_SERVICE_HOST="10.96.0.1"
declare -x KUBERNETES_SERVICE_PORT="443"
declare -x KUBERNETES_SERVICE_PORT_HTTPS="443"
declare -x OLDPWD
declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
declare -x PWD="/"
declare -x SHLVL="1"
root@mks-demo:/home/demo/dra-example-driver# kubectl logs -f -n gpu0 pod0 -c ctr1
declare -x DRA_RESOURCE_DRIVER_NAME="gpu.example.com"
declare -x GPU_DEVICE_3="gpu-3" # β HIGHLIGHTED
declare -x GPU_DEVICE_3_RESOURCE_CLAIM="99944ed0-806f-496f-bdb5-e457b6a66a2d"
declare -x GPU_DEVICE_3_SHARING_STRATEGY="TimeSlicing"
declare -x GPU_DEVICE_3_TIMESLICE_INTERVAL="Default"
declare -x HOME="/root"
declare -x HOSTNAME="pod0"
declare -x KUBERNETES_NODE_NAME="mks-demo"
declare -x KUBERNETES_PORT="tcp://10.96.0.1:443"
declare -x KUBERNETES_PORT_443_TCP="tcp://10.96.0.1:443"
declare -x KUBERNETES_PORT_443_TCP_ADDR="10.96.0.1"
declare -x KUBERNETES_PORT_443_TCP_PORT="443"
declare -x KUBERNETES_PORT_443_TCP_PROTO="tcp"
declare -x KUBERNETES_SERVICE_HOST="10.96.0.1"
declare -x KUBERNETES_SERVICE_PORT="443"
declare -x KUBERNETES_SERVICE_PORT_HTTPS="443"
declare -x OLDPWD
declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
declare -x PWD="/"
declare -x SHLVL="1"
- GPU_DEVICE_3="gpu-3" is present in both ctr0 and ctr1
2. Inter-Pod GPU Sharing (Global Claim Across Pods)
You can create a global ResourceClaim and reference it across multiple pods β ideal for workloads that need coordinated access (like shared inference).
---
apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
namespace: gpu1
name: single-gpu
spec:
devices:
requests:
- name: gpu
exactly:
deviceClassName: gpu.example.com
---
apiVersion: v1
kind: Pod
metadata:
namespace: gpu1
name: pod0
labels:
app: pod
spec:
containers:
- name: ctr0
image: ubuntu:22.04
command: ["bash", "-c"]
args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"]
resources:
claims:
- name: shared-gpu
resourceClaims:
- name: shared-gpu
resourceClaimName: single-gpu
---
apiVersion: v1
kind: Pod
metadata:
namespace: gpu1
name: pod1
labels:
app: pod
spec:
containers:
- name: ctr0
image: ubuntu:22.04
command: ["bash", "-c"]
args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"]
resources:
claims:
- name: shared-gpu
resourceClaims:
- name: shared-gpu
resourceClaimName: single-gpu
This example shows 2 pods with 1 container each sharing 1 GPU in time using the Dynamic Resource Allocation mechanism.
Conclusion
The move from extended resources to Dynamic Resource Allocation represents a major leap in how Kubernetes manages GPUs and other accelerators.
DRA brings flexibility, fine-grained control, and vendor extensibility β making it the future of GPU scheduling in Kubernetes. Whether you're enabling fractional GPU usage, managing shared inference workloads, or defining custom device policies, DRA unlocks new capabilities that were never possible before.