2025¶

November 29, 2025
in Product Blog, Run:AI, Self Service, GPU Cloud
4 min read

How GPU Clouds can deliver Run:AI via Self Service using Rafay GPU PaaS

As the demand for AI training and inference surges, GPU Clouds are increasingly looking to offer higher-level, turnkey AI services—not just raw GPU instances. Some customers may be familiar with Run:AI from Nvidia as a AI workload orchestration and optimization platform. Delivering Run:AI as a scalable, repeatable SKU—something customers can select and provision with a few clicks—requires deep automation, lifecycle management, and tenant isolation capabilities. This is exactly what Rafay provides.

With Rafay, GPU Clouds can deliver Run:AI as a self-service SKU, ensuring customers receive a fully configured Run:AI environment—complete with GPU infrastructure, a Kubernetes cluster, necessary operators, and a ready-to-use Run:AI tenant—all deployed automatically. This blog explains how Rafay enables cloud providers to industrialize Run:AI provisioning into a consistent, production-ready SKU.

Run:AI via Self Service

November 14, 2025
in Product Blog, EKS, Auto Mode, Node Management, Kubernetes
7 min read

Granular Control of Your EKS Auto Mode Managed Nodes with Custom Node Classes and Node Pools

With a couple of releases back, we added EKS Auto Mode support in our platform for doing either quick configuration or custom configuration. In this blog, we will explore how you can create an EKS cluster using quick configuration and then dive deep into creating custom node classes and node pools using addons to deploy them on EKS Auto Mode enabled clusters.

November 2, 2025
in Product Blog, Fractional GPU, Self Service, Nvidia, KAI Secheduler
4 min read

Self-Service Fractional GPU Memory with Rafay GPU PaaS

In Part-1, we explored how Rafay GPU PaaS empowers developers to use fractional GPUs, allowing multiple workloads to share GPU compute efficiently. This enabled better utilization and cost control — without compromising isolation or performance.

In Part-2, we will show how you can enhance this by provide users the means to select fractional GPU memory. While fractional GPUs provide a share of the GPU’s compute cores, different workloads have dramatically different GPU memory needs. With this update, developers can now choose exactly how much GPU memory they want for their pods — bringing fine-grained control, better scheduling, and cost efficiency.

Fractional GPU Memory

November 1, 2025
in Product Blog, Fractional GPU, Self Service, Nvidia, KAI Secheduler
6 min read

Self-Service Fractional GPUs with Rafay GPU PaaS

Enterprises and GPU Cloud providers are rapidly evolving toward a self-service model for developers and data scientists. They want to provide instant access to high-performance compute — especially GPUs — while keeping utilization high and costs under control.

Rafay GPU PaaS enables enterprises and GPU Clouds to achieve exactly that: developers and data scientists can spin up resources such as Developer Pods or Jupyter Notebooks backed by fractional GPUs, directly from an intuitive self-service interface.

This is Part-1 in a multi-part series on end user, self service access to Fractional GPU based AI/ML resources.

Fractional GPU

October 23, 2025
in Product Blog, Kubernetes v1.34, GPU Resource Management, DRA, AI/ML
6 min read

GPU Resource Management in Kubernetes: From Extended Resource to DRA

This blog is part of our DRA series, continuing from our earlier posts: Introduction to DRA, Enabling DRA with Kind, and MIG with DRA . This post focuses on pre-DRA vs post-DRA GPU management on Rafay upstream Kubernetes clusters.

October 21, 2025
in Product Blog, Inference, LLM, Nvidia, NIM Operator
4 min read

NVIDIA NIM Operator: Bringing AI Model Deployment to the Kubernetes Era

In the previous blog, we learnt the basics about NIM (NVIDIA Inference Microservices). In this follow-on blog, we will do a deep dive into the NIM Kubernetes Operator, a Kubernetes-native extension that automates the deployment and management of NVIDIA’s NIM containers. By combining the strengths of Kubernetes orchestration with NVIDIA’s optimized inference stack, the NIM Operator makes it dramatically easier to deliver production-grade generative AI at scale.

NIM Operator

October 20, 2025
in Product Blog, Inference, LLM, Nvidia, NIM
5 min read

NVIDIA NIM: Why It Matters—and How It Stacks Up

Generative AI is moving from experiments to production, and the bottleneck is no longer training—it’s serving: getting high-quality model inference running reliably, efficiently, and securely across clouds, data centers, and the edge.

NVIDIA’s answer is NIM (NVIDIA Inference Microservices). NIM a set of prebuilt, performance-tuned containers that expose industry-standard APIs for popular model families (LLMs, vision, speech) and run anywhere there’s an NVIDIA GPU. Think of NIM as a “batteries-included” model-serving layer that blends TensorRT-LLM optimizations, Triton runtimes, security hardening, and OpenAI-compatible APIs into one deployable unit.

NIM Logo

October 6, 2025
in DRA, Kubernetes, Get Started, MKS
8 min read

Dynamic Resource Allocation for GPU Allocation on Rafay's MKS (Kubernetes 1.34)

This blog demonstrates how to leverage Dynamic Resource Allocation (DRA) for efficient GPU allocation using Multi-Instance GPU (MIG) strategy on Rafay's Managed Kubernetes Service (MKS) running Kubernetes 1.34.

In our previous blog series, we covered various aspects of Dynamic Resource Allocation (DRA) in Kubernetes:

Introduction to Dynamic Resource Allocation (DRA) in Kubernetes — what it is and why it matters
Enable Dynamic Resource Allocation (DRA) in Kubernetes — configuring DRA on a Kubernetes 1.34 cluster using kind
Deploy Workload using DRA ResourceClaim/ResourceClaimTemplate in Kubernetes — deploying and managing DRA workloads natively on Kubernetes

DRA is GA in Kubernetes 1.34

With Kubernetes 1.34, Dynamic Resource Allocation (DRA) is Generally Available (GA) and enabled by default on MKS clusters. This means you can immediately start using DRA features without additional configuration.

Prerequisites

Before we begin, ensure you have:

A Rafay MKS cluster running Kubernetes 1.34 (see MKS v1.34 Blog)
GPU nodes with compatible NVIDIA GPUs (A100, H100, or similar MIG-capable GPUs)
Container Device Interface (CDI) enabled (automatically enabled in MKS for Kubernetes 1.34)
Basic understanding of Dynamic Resource Allocation concepts (covered in our previous blog series)
Active Rafay account with appropriate permissions to manage MKS clusters and addons

October 6, 2025
in Product Blog, Kubernetes v1.34, Bare Metal and VM based Environments, Oct 2025 Release
3 min read

Kubernetes v1.34 for Rafay MKS

As part of our continuous effort to bring the latest Kubernetes versions to our users, support for Kubernetes v1.34 will be added soon to the Rafay Operations Platform for MKS cluster types.

Both new cluster provisioning and in-place upgrades of existing clusters are supported. As with most Kubernetes releases, this version also deprecates and removes a number of features. To ensure there is zero impact to our customers, we have made sure that every feature in the Rafay Kubernetes Operations Platform has been validated on this Kubernetes version. This will be promoted from Preview to Production in a few days and will be made available to all customers.

Kubernetes v1.34 Release

September 16, 2025
in DRA, Kubernetes, Get Started, Deply Workloads
5 min read

Deploy Workload using DRA ResourceClaim in Kubernetes

In the first blog in the DRA series, we introduced the concept of Dynamic Resource Allocation (DRA) that recently went GA in Kubernetes v1.34 which was released end of August 2025.

In the second blog, we installed a Kuberneres v1.34 cluster and deployed an example DRA driver on it with "simulated GPUs". In this blog, we’ll will deploy a few workloads on the DRA enabled Kubernetes cluster to understand how "Resource Claim" and "ResourceClaimTemplates" work.

Info

We have optimized the steps for users to experience this on their laptops in less than 5 minutes. The steps in this blog are optimized for macOS users.