Mohan Atreya¶

August 26, 2025
in Product Blog, Performance Reference Architecture, Nvidia, GPU VM, PRA
4 min read

NVIDIA Performance Reference Architecture: An Introduction

Artificial intelligence (AI) and high-performance computing (HPC) workloads are evolving at unprecedented speed. Enterprises today require infrastructure that can scale elastically, provide consistent performance, and ensure secure multi-tenant operation. NVIDIA’s Performance Reference Architecture (PRA), built on HGX platforms with Shared NVSwitch GPU Passthrough Virtualization, delivers precisely this capability.

This is the introductory blog in a multi part series. In this blog, we explain why PRA is critical for modern enterprises and service providers, highlight the benefits of adoption, and outline the key steps required to successfully deploy and support the PRA design/architecture.

August 24, 2025
in Product Blog, Nvidia, GPU, nvidia-smi utility
3 min read

Deep Dive into `nvidia-smi`: Monitoring Your NVIDIA GPU with Real Examples

Whether you're training deep learning models, running simulations, or just curious about your GPU's performance, nvidia-smi is your go-to command-line tool. Short for NVIDIA System Management Interface, this utility provides essential real-time information about your NVIDIA GPU’s health, workload, and performance.

In this blog, we’ll explore what nvidia-smi is, how to use it, and walk through a real output from a system using an NVIDIA T1000 8GB GPU.

What is `nvidia-smi`?

nvidia-smi is a CLI utility bundled with the NVIDIA driver. It enables:

Real-time GPU monitoring
Driver and CUDA version discovery
Process visibility and control
GPU configuration and performance tuning

You can execute it using:

nvidia-smi

August 23, 2025
in DRA, Kubernetes
4 min read

Introduction to Dynamic Resource Allocation (DRA) in Kubernetes

In the previous blog, we reviewed the limitations of Kubernetes GPU scheduling. These often result in:

Resource fragmentation – large portions of GPU memory remain idle and unusable.
Topology blindness – multi-GPU workloads may be scheduled suboptimally.
Cost explosion – teams overprovision GPUs to work around scheduling inefficiencies.

In this post, we’ll look at how a new GA feature in Kubernetes v1.34 — Dynamic Resource Allocation (DRA) — aims to solve these problems and transform GPU scheduling in Kubernetes.

August 20, 2025
in DRA, Kubernetes
4 min read

Rethinking GPU Allocation in Kubernetes

Kubernetes has cemented its position as the de-facto standard for orchestrating containerized workloads in the enterprise. In recent years, its role has expanded beyond web services and batch processing into one of the most demanding domains of all: AI/ML workloads.

Organizations now run everything from lightweight inference services to massive, distributed training pipelines on Kubernetes clusters, relying heavily on GPU-accelerated infrastructure to fuel innovation.

But there’s a problem. In this blog, we will explore why the current model falls short, what a more advanced GPU allocation approach looks like, and how it can unlock efficiency, performance, and cost savings at scale.

August 6, 2025
in Product Blog, ArgoCD, Zero Trust Kubectl, Kubectl Proxy
4 min read

GitOps Without Borders: Running Argo CD Across Isolated Security Domains with Rafay’s Zero-Trust Kubectl

Modern enterprises rarely run applications in a single cluster. A production fleet might include on-prem clusters in Singapore and London, a regulated environment in AWS us-east-1, and a developer sandbox in someone’s laptop. GitOps with Argo CD is the natural way to keep all those clusters in the desired state—but the moment clusters live in different security domains (fire-walled data centers, private VPCs, or even air-gapped networks) the simple argocd cluster add story breaks down:

Bespoke bastion hosts or VPN tunnels for every hop
Long-lived bearer-token Secrets stashed in Argo’s namespace
High latency between the GitOps engine and far-flung clusters, turning reconciliations into a slog

Rafay’s Zero-Trust Kubectl Access (ZTKA) solves all three problems in one stroke. By front-loading the connection with a hardened Kube API Access Proxy—and issuing just-in-time (JIT), short-lived ServiceAccounts inside every cluster.

In this blog, we will describe how Rafay Zero Trust Kubectl Access Proxy gives Argo CD a secure path to every cluster in the fleet, even when those clusters sit deep behind corporate firewalls.

ArgCD integration Rafay

August 6, 2025
in Product Blog, Regional Proxy, Dedicated Proxy
3 min read

Turbo-charging kubectl: How Rafay’s Zero-Trust Access + Regional Proxies Deliver Lightning-Fast CLI Performance

When developers are halfway around the world from their clusters, every kubectl get pods can feel like it’s moving through molasses. Rafay’s Zero-Trust Kubectl (ZTKA) service fixes the security risks and the lag by adding a network of regional proxies between the user and the cluster.

Zero-Trust Kubectl in a Nutshell

Rafay ZTKA routes all CLI and web-terminal traffic through its Kube API Access Proxy. The key design goals are:

Friction-free for users (“vanilla kubectl”),
Zero infrastructure to manage for platform teams,
Centralized RBAC + audit, and “great performance” even for clusters behind firewalls.

Under the hood, users authenticate to Rafay; Rafay spins up just-in-time service accounts inside the target cluster and tears them down after idle timeouts, eliminating credential sprawl.

August 5, 2025
in Product Blog, Drift Detection, Drift Prevention
5 min read

Drift Prevention vs Detection: Does a Polling Approach make sense At Scale?

Many organizations typically rely on pull-based GitOps tools (e.g. Argo CD) to detect and remediate drift on their Kubernetes clustes. This approach allows clusters to diverge before reconciling them on the next polling interval. For the last 4 years, Rafay customers have benefited from an architecturally different approach that focuses on true drift prevention, backed by robust detection capabilities across both cluster blueprints and application workloads.

Info

In a previous blog, we discussed how ArgoCD's reconcilation works and its best practices.

Drift Block

August 4, 2025
in Product Blog, ArgoCD, Reconciliation, Best Practices
5 min read

Understanding ArgoCD Reconciliation: How It Works, Why It Matters, and Best Practices

ArgoCD is a powerful GitOps controller for Kubernetes, enabling declarative configuration and automated synchronization of workloads. One of its core functions is reconciliation, a continuous process by which ArgoCD ensures that the live state of a Kubernetes cluster matches the desired state defined in a Git repository.

While this might sound straightforward, reconciliation plays a critical role in the GitOps lifecycle, and its default behavior can be surprisingly aggressive. In this blog post, we’ll explore:

What reconciliation in ArgoCD actually does
Why it exists and how it ensures cluster integrity
The pitfalls of the default timer
Best practices for tuning reconciliation to balance responsiveness and resource efficiency

Info

In a related blog, we describe how customers using Rafay are able to Block Drift in the first place.

ArgoCD Reconciliation

July 11, 2025
in Product Blog, GPU, Custom Resources
3 min read

Custom GPU Resource Classes in Kubernetes

In the modern era of containerized machine learning and AI infrastructure, GPUs are a critical and expensive asset. Kubernetes makes scheduling and isolation easier—but managing GPU utilization efficiently requires more than just assigning something like

nvidia.com/gpu: 1

In this blog post, we will explore what custom GPU resource classes are, why they matter, and when to use them for maximum impact. Custom GPU resource classes are a powerful technique for fine-grained GPU management in multi-tenant, cost-sensitive, and performance-critical environments.

Info

If you are new to GPU sharing approaches, we recommend reading the following introductory blogs: Demystifying Fractional GPUs in Kubernetes and Choosing the Right Fractional GPU Strategy.

July 10, 2025
in Product Blog, GPU Sharing, Fractional GPUs, Cloud Providers
3 min read

Choosing the Right Fractional GPU Strategy for Cloud Providers

As demand for GPU-accelerated workloads soars across industries, cloud providers are under increasing pressure to offer flexible, cost-efficient, and isolated access to GPUs. While full GPU allocation remains the norm, it often leads to resource waste—especially for lightweight or intermittent workloads.

In the previous blog, we described the three primary technical approaches for fractional GPUs. In this blog, we'll explore the most viable approaches to offering fractional GPUs in a GPU-as-a-Service (GPUaaS) model, and evaluate their suitability for cloud providers serving end customers.