Index¶

September 13, 2025
in Product Blog, Usage Metering, Billing, GPU Cloud, Neo Cloud
6 min read

GPU/Neo Cloud Billing using Rafay’s Usage Metering APIs

Cloud providers offering GPU or Neo Cloud services need accurate and automated mechanisms to track resource consumption. Usage data becomes the foundation for billing, showback, or chargeback models that customers expect. The Rafay Platform provides usage metering APIs that can be easily integrated into a provider’s billing system. '

In this blog, we’ll walk through how to use these APIs with a sample Python script to generate detailed usage reports.

Usage Metering

September 10, 2025
in Product Blog, Upstream Kubernetes, RedHat Enterprise Linux 10, Sept 2025 Release
2 min read

Upstream Kubernetes on RHEL 10 using Rafay

Our upcoming release update will add support for a number of new features and enhancements. This blog is focused on the upcoming support for Upstream Kubernetes on nodes based on Red Hat Enterprise Linux (RHEL) v10.0. Both new cluster provisioning and in-place upgrades of Kubernetes clusters will be supported for lifecycle management.

RHEL 9.2

September 5, 2025
in Product Blog
2 min read

Support for Parallel Execution with Rafay's Integrated GitOps Pipeline

At Rafay, we are continuously evolving our platform to deliver powerful capabilities that streamline and accelerate the software delivery lifecycle. One such enhancement is the recent update to our GitOps pipeline engine, designed to optimize execution time and flexibility — enabling a better experience for platform teams and developers alike.

Integrated Pipeline for Diverse Use Cases

Rafay provides a tightly integrated pipeline framework that supports a range of common operational use cases, including:

System Synchronization: Use Git as the single source of truth to orchestrate controller configurations
Application Deployment: Define and automate your app deployment process directly from version-controlled pipelines
Approval Workflows: Insert optional approval gates to control when and how specific pipeline stages are triggered, offering an added layer of governance and compliance

This comprehensive design empowers platform teams to standardize delivery patterns while still accommodating organization-specific controls and policies.

From Sequential to Parallel Execution with DAG Support

Historically, Rafay’s GitOps pipeline executed all stages sequentially, regardless of interdependencies. While effective for simpler workflows, this model imposed time constraints for more complex operations.

With our latest update, the pipeline engine now supports Directed Acyclic Graphs (DAGs) — allowing stages to execute in parallel, wherever dependencies allow.

September 4, 2025
in Product Blog
2 min read

Important Update: Changes to Bitnami Public Catalog

Recently, Bitnami announced significant changes to its container image distribution here. As part of this update, the Bitnami public catalog (docker.io/bitnami) will be permanently deleted on September 29^th.

What’s Changing

All existing container images (including older or versioned tags such as 2.50.0, 10.6, etc.) will be moved from the public catalog (docker.io/bitnami) to a Bitnami Legacy repository (docker.io/bitnamilegacy).
The legacy catalog will no longer receive updates or support. It is intended only as a temporary migration solution to give users time to transition.

September 2, 2025
in Product Blog, Agents
2 min read

Simplifying Day-2 Operations with Agent Pools

Implementing Day-2 Operations such as agent replacement is cumbersome today because every configuration tied to a previous agent must be reconfigured manually. This makes tasks like scaling, retiring agents, or handling failures both error-prone and time-consuming.

To address this pain point, we are introducing the concept of an Agent Pool.

Why Agent Pools?

Instead of binding configurations directly to individual agents, customers can now attach multiple agents to a shared Agent Pool. Configurations such as Environment Templates and Resource Templates reference the pool, rather than a single agent.

This simple shift brings significant operational benefits:

Seamless Failover and Replacement
Add or remove agents from a pool without reconfiguring existing associations.
Simplified Day-2 Operations
Manage scaling, upgrades, and retirements without disruption.
Load Balancing
Distribute load across multiple agents within a pool for higher availability and performance.

August 28, 2025
in DRA, Kubernetes, Get Started
6 min read

Enable Dynamic Resource Allocation (DRA) in Kubernetes

In the previous blog, we introduced the concept of Dynamic Resource Allocation (DRA) that just went GA in Kubernetes v1.34 which was released in August 2025.

In this blog post, we’ll will configure DRA on a Kubernetes 1.34 cluster.

Info

We have optimized the steps for users to experience this on their macOS or Windows laptops in less than 15 minutes. The steps in this blog are optimized for macOS users.

August 26, 2025
in Product Blog, Performance Reference Architecture, Nvidia, GPU VM, PRA
4 min read

NVIDIA Performance Reference Architecture: An Introduction

Artificial intelligence (AI) and high-performance computing (HPC) workloads are evolving at unprecedented speed. Enterprises today require infrastructure that can scale elastically, provide consistent performance, and ensure secure multi-tenant operation. NVIDIA’s Performance Reference Architecture (PRA), built on HGX platforms with Shared NVSwitch GPU Passthrough Virtualization, delivers precisely this capability.

This is the introductory blog in a multi part series. In this blog, we explain why PRA is critical for modern enterprises and service providers, highlight the benefits of adoption, and outline the key steps required to successfully deploy and support the PRA design/architecture.

August 24, 2025
in Product Blog, Nvidia, GPU, nvidia-smi utility
3 min read

Deep Dive into `nvidia-smi`: Monitoring Your NVIDIA GPU with Real Examples

Whether you're training deep learning models, running simulations, or just curious about your GPU's performance, nvidia-smi is your go-to command-line tool. Short for NVIDIA System Management Interface, this utility provides essential real-time information about your NVIDIA GPU’s health, workload, and performance.

In this blog, we’ll explore what nvidia-smi is, how to use it, and walk through a real output from a system using an NVIDIA T1000 8GB GPU.

What is `nvidia-smi`?

nvidia-smi is a CLI utility bundled with the NVIDIA driver. It enables:

Real-time GPU monitoring
Driver and CUDA version discovery
Process visibility and control
GPU configuration and performance tuning

You can execute it using:

nvidia-smi

August 23, 2025
in DRA, Kubernetes
4 min read

Introduction to Dynamic Resource Allocation (DRA) in Kubernetes

In the previous blog, we reviewed the limitations of Kubernetes GPU scheduling. These often result in:

Resource fragmentation – large portions of GPU memory remain idle and unusable.
Topology blindness – multi-GPU workloads may be scheduled suboptimally.
Cost explosion – teams overprovision GPUs to work around scheduling inefficiencies.

In this post, we’ll look at how a new GA feature in Kubernetes v1.34 — Dynamic Resource Allocation (DRA) — aims to solve these problems and transform GPU scheduling in Kubernetes.

August 20, 2025
in DRA, Kubernetes
4 min read

Rethinking GPU Allocation in Kubernetes

Kubernetes has cemented its position as the de-facto standard for orchestrating containerized workloads in the enterprise. In recent years, its role has expanded beyond web services and batch processing into one of the most demanding domains of all: AI/ML workloads.

Organizations now run everything from lightweight inference services to massive, distributed training pipelines on Kubernetes clusters, relying heavily on GPU-accelerated infrastructure to fuel innovation.

But there’s a problem. In this blog, we will explore why the current model falls short, what a more advanced GPU allocation approach looks like, and how it can unlock efficiency, performance, and cost savings at scale.