GPU Resource Management in Kubernetes: From Extended Resource to DRA¶

This blog is part of our DRA series, continuing from our earlier posts: Introduction to DRA, Enabling DRA with Kind, and MIG with DRA . This post focuses on pre-DRA vs post-DRA GPU management on Rafay upstream Kubernetes clusters.

Overview¶

With the rise of AI, ML, and HPC workloads, GPU resource management has become a cornerstone of Kubernetes scheduling. Over time, Kubernetes has evolved from static, count-based GPU allocation using extended resources (nvidia.com/gpu) to the more flexible DRA framework now a stable feature in Kubernetes v1.34.

This guide walks through the evolution from pre DRA GPU management to DRA-based allocation and sharing, complete with examples.

Pre-DRA: GPU Management Using Extended Resources¶

Before DRA, Kubernetes workloads used the NVIDIA Device Plugin to expose GPUs as extended resources. These resources could then be requested by pods just like CPU or memory.

GPU Operator Components¶

To enable GPU scheduling, the NVIDIA GPU Operator packaged all required components:

Host components:
- NVIDIA GPU driver
Kubernetes components:
- NVIDIA device plugin
- MIG Manager
- DCGM Exporter
- GPU Feature Discovery (GFD)

Each of these components was deployed as a DaemonSet on GPU nodes, ensuring the scheduler could detect and allocate GPU resources properly.

Requesting GPUs (Pre-DRA)¶

Here's an example of how users would request GPU access before DRA:

apiVersion: v1
kind: Pod
metadata:
  name: pod-gpu-classic
spec:
  containers:
    - name: app-container
      image: nvidia/cuda
      resources:
        limits:
          nvidia.com/gpu: 2

This tells Kubernetes to assign two GPUs to the container. The scheduler and device plugin work together to: - Locate a node with at least two available GPUs - Schedule the pod there - Inject the GPU devices into the container

If a specific GPU type was needed (e.g., A100-40GB), node labels and selectors could be used to ensure the pod landed on the right hardware.

Post-DRA: Dynamic Resource Allocation¶

Kubernetes v1.34 introduces DRA, which graduated to GA - a new, flexible, and vendor extendable approach for requesting resources such as GPUs.

Why DRA?¶

DRA addresses key limitations of the old model: - Enables fine grained GPU sharing providing complext constraints when requesting a GPU. - Allows custom APIs and parameters from vendors - Supports better isolation and resource reusability - Makes cross-pod sharing possible through claims

Requesting GPUs with DRA¶

Instead of requesting GPUs via a simple count (nvidia.com/gpu: 2), DRA introduces three main objects: 1. DeviceClass –Defines a category of devices that can be claimed and how to select specific device attributes in claims. 2. ResourceClaimTemplate – defines how a claim should be created 3. ResourceClaim – represents an actual allocated resource

Here's the DRA equivalent for requesting two GPUs:

apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  namespace: gpu
  name: multiple-gpus
spec:
  spec:
    devices:
      requests:
      - name: gpu-1
        exactly:
          deviceClassName: gpu.example.com
      - name: gpu-2
        exactly:
          deviceClassName: gpu.example.com
---
apiVersion: v1
kind: Pod
metadata:
  namespace: gpu
  name: pod0
  labels:
    app: pod
spec:
  containers:
  - name: ctr0
    image: ubuntu:22.04
    command: ["bash", "-c"]
    args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"]
    resources:
      claims:
      - name: gpus
  resourceClaims:
  - name: gpus
    resourceClaimTemplateName: multiple-gpus

This example shows 1 pod with 1 container requesting 2 GPUs using the DRA mechanism.

kubectl  get pods -n gpu

NAME   READY   STATUS    RESTARTS   AGE
pod0   1/1     Running   0          81s

logs of this pod to verify that GPUs were allocated to them

kubectl  logs -f -n gpu          pod0
declare -x DRA_RESOURCE_DRIVER_NAME="gpu.example.com"
declare -x GPU_DEVICE_3="gpu-3"
declare -x GPU_DEVICE_3_RESOURCE_CLAIM="c2d2a1d2-b52b-4b4c-b5ae-c1cace625493"
declare -x GPU_DEVICE_3_SHARING_STRATEGY="TimeSlicing"
declare -x GPU_DEVICE_3_TIMESLICE_INTERVAL="Default"
declare -x GPU_DEVICE_4="gpu-4"
declare -x GPU_DEVICE_4_RESOURCE_CLAIM="c2d2a1d2-b52b-4b4c-b5ae-c1cace625493"
declare -x GPU_DEVICE_4_SHARING_STRATEGY="TimeSlicing"
declare -x GPU_DEVICE_4_TIMESLICE_INTERVAL="Default"
declare -x HOME="/root"
declare -x HOSTNAME="pod0"
declare -x KUBERNETES_NODE_NAME="mks-demo"
declare -x KUBERNETES_PORT="tcp://10.96.0.1:443"
declare -x KUBERNETES_PORT_443_TCP="tcp://10.96.0.1:443"
declare -x KUBERNETES_PORT_443_TCP_ADDR="10.96.0.1"
declare -x KUBERNETES_PORT_443_TCP_PORT="443"
declare -x KUBERNETES_PORT_443_TCP_PROTO="tcp"
declare -x KUBERNETES_SERVICE_HOST="10.96.0.1"
declare -x KUBERNETES_SERVICE_PORT="443"
declare -x KUBERNETES_SERVICE_PORT_HTTPS="443"
declare -x OLDPWD
declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
declare -x PWD="/"
declare -x SHLVL="1"

You have two device envs: GPU_DEVICE_3="gpu-3" and GPU_DEVICE_4="gpu-4".

One of DRA's most powerful capabilities is controlled GPU sharing, which allows multiple containers or pods to access the same GPU safely.

Multiple containers within the same pod can reference a single ResourceClaim, giving them shared access to the same GPU.

---
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  namespace: gpu0
  name: single-gpu
spec:
  spec:
    devices:
      requests:
      - name: gpu
        exactly:
          deviceClassName: gpu.example.com

---
apiVersion: v1
kind: Pod
metadata:
  namespace: gpu0
  name: pod0
spec:
  containers:
  - name: ctr0
    image: ubuntu:22.04
    command: ["bash", "-c"]
    args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"]
    resources:
      claims:
      - name: shared-gpu
  - name: ctr1
    image: ubuntu:22.04
    command: ["bash", "-c"]
    args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"]
    resources:
      claims:
      - name: shared-gpu
  resourceClaims:
  - name: shared-gpu
    resourceClaimTemplateName: single-gpu

This example shows 1 pod with 2 containers sharing 1 GPU in time using the Dynamic Resource Allocation (DRA) mechanism.

kubectl  get pods -n gpu0
NAME   READY   STATUS    RESTARTS   AGE
pod0   2/2     Running   0          9s

kubectl  logs -f -n  gpu0 pod0 -c ctr0
declare -x DRA_RESOURCE_DRIVER_NAME="gpu.example.com"
declare -x GPU_DEVICE_3="gpu-3"
declare -x GPU_DEVICE_3_RESOURCE_CLAIM="99944ed0-806f-496f-bdb5-e457b6a66a2d"
declare -x GPU_DEVICE_3_SHARING_STRATEGY="TimeSlicing"
declare -x GPU_DEVICE_3_TIMESLICE_INTERVAL="Default"
declare -x HOME="/root"
declare -x HOSTNAME="pod0"
declare -x KUBERNETES_NODE_NAME="mks-demo"
declare -x KUBERNETES_PORT="tcp://10.96.0.1:443"
declare -x KUBERNETES_PORT_443_TCP="tcp://10.96.0.1:443"
declare -x KUBERNETES_PORT_443_TCP_ADDR="10.96.0.1"
declare -x KUBERNETES_PORT_443_TCP_PORT="443"
declare -x KUBERNETES_PORT_443_TCP_PROTO="tcp"
declare -x KUBERNETES_SERVICE_HOST="10.96.0.1"
declare -x KUBERNETES_SERVICE_PORT="443"
declare -x KUBERNETES_SERVICE_PORT_HTTPS="443"
declare -x OLDPWD
declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
declare -x PWD="/"
declare -x SHLVL="1"
root@mks-demo:/home/demo/dra-example-driver# kubectl  logs -f -n  gpu0 pod0 -c ctr1
declare -x DRA_RESOURCE_DRIVER_NAME="gpu.example.com"
declare -x GPU_DEVICE_3="gpu-3" # ← HIGHLIGHTED
declare -x GPU_DEVICE_3_RESOURCE_CLAIM="99944ed0-806f-496f-bdb5-e457b6a66a2d"
declare -x GPU_DEVICE_3_SHARING_STRATEGY="TimeSlicing"
declare -x GPU_DEVICE_3_TIMESLICE_INTERVAL="Default"
declare -x HOME="/root"
declare -x HOSTNAME="pod0"
declare -x KUBERNETES_NODE_NAME="mks-demo"
declare -x KUBERNETES_PORT="tcp://10.96.0.1:443"
declare -x KUBERNETES_PORT_443_TCP="tcp://10.96.0.1:443"
declare -x KUBERNETES_PORT_443_TCP_ADDR="10.96.0.1"
declare -x KUBERNETES_PORT_443_TCP_PORT="443"
declare -x KUBERNETES_PORT_443_TCP_PROTO="tcp"
declare -x KUBERNETES_SERVICE_HOST="10.96.0.1"
declare -x KUBERNETES_SERVICE_PORT="443"
declare -x KUBERNETES_SERVICE_PORT_HTTPS="443"
declare -x OLDPWD
declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
declare -x PWD="/"
declare -x SHLVL="1"

GPU_DEVICE_3="gpu-3" is present in both ctr0 and ctr1

You can create a global ResourceClaim and reference it across multiple pods — ideal for workloads that need coordinated access (like shared inference).

---
apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
  namespace: gpu1
  name: single-gpu
spec:
    devices:
      requests:
      - name: gpu
        exactly:
          deviceClassName: gpu.example.com

---
apiVersion: v1
kind: Pod
metadata:
  namespace: gpu1
  name: pod0
  labels:
    app: pod
spec:
  containers:
  - name: ctr0
    image: ubuntu:22.04
    command: ["bash", "-c"]
    args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"]
    resources:
      claims:
      - name: shared-gpu
  resourceClaims:
  - name: shared-gpu
    resourceClaimName: single-gpu

---
apiVersion: v1
kind: Pod
metadata:
  namespace: gpu1
  name: pod1
  labels:
    app: pod
spec:
  containers:
  - name: ctr0
    image: ubuntu:22.04
    command: ["bash", "-c"]
    args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"]
    resources:
      claims:
      - name: shared-gpu
  resourceClaims:
  - name: shared-gpu
    resourceClaimName: single-gpu

This example shows 2 pods with 1 container each sharing 1 GPU in time using the Dynamic Resource Allocation mechanism.

Conclusion¶

The move from extended resources to Dynamic Resource Allocation represents a major leap in how Kubernetes manages GPUs and other accelerators.

DRA brings flexibility, fine-grained control, and vendor extensibility — making it the future of GPU scheduling in Kubernetes. Whether you're enabling fractional GPU usage, managing shared inference workloads, or defining custom device policies, DRA unlocks new capabilities that were never possible before.