Custom GPU Resource Classes in Kubernetes¶
In the modern era of containerized machine learning and AI infrastructure, GPUs are a critical and expensive asset. Kubernetes makes scheduling and isolation easier—but managing GPU utilization efficiently requires more than just assigning something like
nvidia.com/gpu: 1
In this blog post, we will explore what custom GPU resource classes are, why they matter, and when to use them for maximum impact. Custom GPU resource classes are a powerful technique for fine-grained GPU management in multi-tenant, cost-sensitive, and performance-critical environments.
Info
If you are new to GPU sharing approaches, we recommend reading the following introductory blogs: Demystifying Fractional GPUs in Kubernetes and Choosing the Right Fractional GPU Strategy.
What Are Custom GPU Resource Classes?¶
By default, Kubernetes exposes GPUs through a single resource name: nvidia.com/gpu. As an end user, you have no idea how the underlying GPU is setup and configured. For example, the GPU type you will use may fall into one of the following:
- Full exclusive GPUs
- Time-sliced shared GPUs
- MIG (Multi-Instance GPU) slices
- Fractional (e.g., ¼) allocations
Custom resource classes allow administrators to define new GPU resource names that are more obvious and apparent for users. These names are configured by the GPU device plugin (typically via the NVIDIA GPU Operator) and allow you to expose multiple logical GPU types from the same physical hardware.
Some examples are shown below.
1. nvidia.com/gpu-time-slice
As the custom resource class name suggests, this is a time sliced GPU
2. nvidia.com/gpu-mig-1g.5gb
As the custom resource class name suggests, this is a MIG GPU instance with 1g.5gb of memory.
3. nvidia.com/gpu-fraction-0.25
As the custom resource class name suggests, this is a fractional (0.25) GPU
Why Custom Resource Classes Matter?¶
We sometimes get asked by customers as to why does custom resource classes matter? Here are some common reasons we can think of:
Better Scheduling and Workload Matching¶
Different workloads can have vastly different GPU requirements. For example,
- Dev notebooks or small inference tasks only need a fraction of a GPU.
- Real-time inference needs isolated and predictable performance.
- Training jobs require full, exclusive access.
Custom classes can help align GPU access mode with application intent, improving performance and minimizing waste.
Enabling Multi-Tenancy¶
In shared environments—such as internal ML platforms, GPU clouds, or research clusters—custom classes allow administrators to achieve the following:
- Partition GPU usage across teams
- Enforce resource quotas per class
- Prevent one user from monopolizing all full GPUs
This ensures fair access, cost visibility, and clear accountability.
Cost Optimization¶
As we all know, GPU costs add up quickly. Using full GPUs for lightweight jobs is inefficient. Custom classes enable the following:
- Time-sliced sharing for low-duty jobs
- MIG slices for sandbox or model testing
- Fine-grained billing per resource type
By aligning consumption with actual needs, you reduce idle capacity and lower cloud or on-prem GPU costs.
Transparency and Observability¶
Custom resource names make GPU usage explicit in YAMLs and dashboards. For example, when you use the following YAML
resources:
limits:
nvidia.com/gpu-time-slice: 1
It tells the user (and the platform) exactly what type of resource is requested. This clarity supports better monitoring, debugging, and user education.
How to Set It Up?¶
Custom resource classes need to be defined in the NVIDIA GPU Operator’s Helm values.yaml file. You can use an override such as the following:
devicePlugin:
config:
custom-time-slicing:
strategy: shared
resources:
- name: nvidia.com/gpu-time-slice
replicas: 4
In this example, the configuration exposes each physical GPU as 4 logical time-sliced units. Users can then request for a time sliced unit with the following YAML.
resources:
limits:
nvidia.com/gpu-time-slice: 1
Conclusion¶
Custom GPU resource classes offer the flexibility, cost-efficiency, and isolation required for scalable and sustainable GPU operations in Kubernetes. Whether you’re a platform engineer, ML researcher, or infrastructure architect, adopting this pattern can dramatically improve your cluster’s GPU utilization and user experience.
-
Free Org
Sign up for a free Org if you want to try this yourself with our Get Started guides.
-
Live Demo
Schedule time with us to watch a demo in action.