Product Blog¶

August 6, 2025
in Product Blog, Regional Proxy, Dedicated Proxy
3 min read

Turbo-charging kubectl: How Rafay’s Zero-Trust Access + Regional Proxies Deliver Lightning-Fast CLI Performance

When developers are halfway around the world from their clusters, every kubectl get pods can feel like it’s moving through molasses. Rafay’s Zero-Trust Kubectl (ZTKA) service fixes the security risks and the lag by adding a network of regional proxies between the user and the cluster.

Zero-Trust Kubectl in a Nutshell

Rafay ZTKA routes all CLI and web-terminal traffic through its Kube API Access Proxy. The key design goals are:

Friction-free for users (“vanilla kubectl”),
Zero infrastructure to manage for platform teams,
Centralized RBAC + audit, and “great performance” even for clusters behind firewalls.

Under the hood, users authenticate to Rafay; Rafay spins up just-in-time service accounts inside the target cluster and tears them down after idle timeouts, eliminating credential sprawl.

August 5, 2025
in Product Blog, Drift Detection, Drift Prevention
5 min read

Drift Prevention vs Detection: Does a Polling Approach make sense At Scale?

Many organizations typically rely on pull-based GitOps tools (e.g. Argo CD) to detect and remediate drift on their Kubernetes clustes. This approach allows clusters to diverge before reconciling them on the next polling interval. For the last 4 years, Rafay customers have benefited from an architecturally different approach that focuses on true drift prevention, backed by robust detection capabilities across both cluster blueprints and application workloads.

Info

In a previous blog, we discussed how ArgoCD's reconcilation works and its best practices.

Drift Block

August 4, 2025
in Product Blog, ArgoCD, Reconciliation, Best Practices
5 min read

Understanding ArgoCD Reconciliation: How It Works, Why It Matters, and Best Practices

ArgoCD is a powerful GitOps controller for Kubernetes, enabling declarative configuration and automated synchronization of workloads. One of its core functions is reconciliation, a continuous process by which ArgoCD ensures that the live state of a Kubernetes cluster matches the desired state defined in a Git repository.

While this might sound straightforward, reconciliation plays a critical role in the GitOps lifecycle, and its default behavior can be surprisingly aggressive. In this blog post, we’ll explore:

What reconciliation in ArgoCD actually does
Why it exists and how it ensures cluster integrity
The pitfalls of the default timer
Best practices for tuning reconciliation to balance responsiveness and resource efficiency

Info

In a related blog, we describe how customers using Rafay are able to Block Drift in the first place.

ArgoCD Reconciliation

July 11, 2025
in Product Blog, GPU, Custom Resources
3 min read

Custom GPU Resource Classes in Kubernetes

In the modern era of containerized machine learning and AI infrastructure, GPUs are a critical and expensive asset. Kubernetes makes scheduling and isolation easier—but managing GPU utilization efficiently requires more than just assigning something like

nvidia.com/gpu: 1

In this blog post, we will explore what custom GPU resource classes are, why they matter, and when to use them for maximum impact. Custom GPU resource classes are a powerful technique for fine-grained GPU management in multi-tenant, cost-sensitive, and performance-critical environments.

Info

If you are new to GPU sharing approaches, we recommend reading the following introductory blogs: Demystifying Fractional GPUs in Kubernetes and Choosing the Right Fractional GPU Strategy.

July 10, 2025
in Product Blog, GPU Sharing, Fractional GPUs, Cloud Providers
3 min read

Choosing the Right Fractional GPU Strategy for Cloud Providers

As demand for GPU-accelerated workloads soars across industries, cloud providers are under increasing pressure to offer flexible, cost-efficient, and isolated access to GPUs. While full GPU allocation remains the norm, it often leads to resource waste—especially for lightweight or intermittent workloads.

In the previous blog, we described the three primary technical approaches for fractional GPUs. In this blog, we'll explore the most viable approaches to offering fractional GPUs in a GPU-as-a-Service (GPUaaS) model, and evaluate their suitability for cloud providers serving end customers.

July 8, 2025
in Product Blog, GPU Sharing, Fractional GPUs, Kubernetes
4 min read

Demystifying Fractional GPUs in Kubernetes: MIG, Time Slicing, and Custom Schedulers

As GPU acceleration becomes central to modern AI/ML workloads, Kubernetes has emerged as the orchestration platform of choice. However, allocating full GPUs for many real-world workloads is an overkill resulting in under utilization and soaring costs.

Enter the need for fractional GPUs: ways to share a physical GPU among multiple containers without compromising performance or isolation.

In this post, we'll walk through three approaches to achieve fractional GPU access in Kubernetes:

MIG (Multi-Instance GPU)
Time Slicing
Custom Schedulers (e.g., KAI)

For each, we’ll break down how it works, its pros and cons, and when to use it.

July 3, 2025
in Product Blog, Agents, AI Agents
5 min read

The Rise of AI Agents: From Zero to Production

Artificial Intelligence (AI) has moved far beyond simple chat bots and rigid automation. At the frontier of this evolution lies a powerful new paradigm—AI Agents. These autonomous, intelligent programs can understand their environment, reason through complex problems, and take meaningful actions.

Whether you’re a developer, product leader, or startup founder, understanding AI agents isn't just a competitive advantage—it’s a necessity. In this blog, we will attempt to decipher agents, how they are different from regular applications and how you can build them.

AI Agents

June 27, 2025
in Product Blog, Quotas, Per User, Per Project, Per Team
3 min read

Configure and Manage GPU Resource Quotas in Multi-Tenant Clouds

In multi-tenant GPU cloud environments, effective resource management is critical to ensure fair usage and prevent contention. GPU resource quotas allow organizations to allocate computing capacity at multiple levels—across the entire organization, at individual project scopes, and even down to the per-user level. In this blog, we will describe how GPU Clouds can provide fine grained control of limited resources to their tenants and their admins.

Per Project and User Quotas

June 23, 2025
in Product Blog, Approvals, Compliance
3 min read

Enforcing ServiceNow-Based Approvals with Rafay

Enterprises often require explicit approvals before critical actions can proceed especially when provisioning infrastructure or making configuration changes. With Rafay’s out-of-the-box (OOB) workflow handlers, customers can easily integrate with popular ITSM systems such as ServiceNow (SNOW).

Catalog

This post explains how to configure and use Rafay’s ServiceNow Workflow Handler to enforce approval gates.

Workflow Handlers in Rafay

Rafay enables platform teams to attach Workflow Handlers to key actions as pre-hooks or post-hooks:

Pre-hook Handlers: Triggered before an action (e.g., pause provisioning until approval is received)
Post-hook Handlers: Triggered after an action (e.g., notify stakeholders after infrastructure (environment) creation)

Typical Scenarios

Here are a few use cases where ServiceNow-based approvals come into play:

Developers request a vCluster to test their app before raising a PR
Platform admins initiate a Kubernetes upgrade for a fleet of clusters that requires approval

June 21, 2025
in Product Blog, Amazon EKS, Graviton, Cost Savings
5 min read

Slash EKS Cluster Costs by 20-30% Instantly with AWS Graviton

If you’re running Kubernetes workloads on Amazon EKS backed by Intel-based instances, you’re leaving significant savings on the table. In this blog, we will look at how many Rafay customers have been able to immediately cut compute costs by ~20-30% with minimal effort and quickly comply with internal cost saving mandates.

Graviton Ready