Skip to content

GPU

OpenClaw and NemoClaw: A Better Way to Consume AI Services Through Token Factory

As AI adoption accelerates, most businesses do not actually want to manage GPU clusters, model serving stacks, or low-level infrastructure. What they want is simple, reliable access to powerful models through tools their teams can use immediately. That is exactly the value of combining OpenClaw and NVIDIA NemoClaw with a service provider’s deployment of Rafay Token Factory.

OpenClaw is the user-facing interface where people interact with models and AI assistants. NemoClaw extends that experience with additional security and control for long-running or always-on agents. In both cases, the user experience can remain simple: connect to the provider, use tokens, and start working.

The complexity of GPUs, inference infrastructure, scaling, and capacity planning stays behind the scenes. OpenClaw is the open-source AI agent platform, while NVIDIA describes NemoClaw as an open-source reference stack for running OpenClaw more safely with policy-based privacy and security guardrails.

OpenClaw with Token Factory

Running GPU Infrastructure on Kubernetes: What Enterprise Platform Teams Must Get Right

KubeCon + CloudNativeCon Europe 2026, Amsterdam


If you are at KubeCon this week in Amsterdam, you are likely hearing the same question repeatedly: how do we actually operate GPU infrastructure on Kubernetes at enterprise scale? The announcements from NVIDIA — the DRA Driver donation, the KAI Scheduler entering CNCF Sandbox, GPU support for Kata Containers expand what is technically possible. But for enterprise platform teams, the harder problem is not capability. It is operating GPU infrastructure efficiently and responsibly once demand arrives.

This post is written for platform teams building internal GPU platforms — on-premises, in sovereign environments, or in hybrid models. You are not just provisioning infrastructure. You are governing access to some of the most expensive and constrained resources in the organization.

At scale, GPU inefficiency is not accidental. It is structural:

  • Idle GPUs that remain allocated but unused
  • Over-provisioned workloads consuming more than needed
  • Fragmented capacity that cannot satisfy real workloads
  • Lack of cost visibility and accountability

Solving this requires more than infrastructure. It requires a governed platform model.

Advancing GPU Scheduling and Isolation in Kubernetes

KubeCon + CloudNativeCon Europe 2026, Amsterdam


At KubeCon Europe 2026, NVIDIA made a set of significant open-source contributions that advance how GPUs are managed in Kubernetes. These developments span across: resource allocation (DRA), scheduling (KAI), and isolation (Kata Containers). Specifically, NVIDIA donated its DRA Driver for GPUs to the Cloud Native Computing Foundation, transferring governance from a single vendor to full community ownership under the Kubernetes project. The KAI Scheduler was formally accepted as a CNCF Sandbox project, marking its transition from an NVIDIA-governed tool to a community-developed standard. And NVIDIA collaborated with the CNCF Confidential Containers community to introduce GPU support for Kata Containers, extending hardware-level workload isolation to GPU-accelerated workloads. Together, these contributions move GPU infrastructure closer to a first-class, community-owned, scheduler-integrated model.

OpenClaw on Kubernetes: A Platform Engineering Pattern for Always-On AI

AI is moving beyond chat windows. The next useful form factor is an Always-On AI service that can live behind messaging channels, expose a control surface, invoke tools, and be operated like any other platform workload. OpenClaw is interesting because it is built around that model.

OpenClaw is a Gateway-centric runtime with onboarding, workspace/config, channels, and skills, plus a documented Kubernetes install path for hosting.

For platform teams, that makes OpenClaw more than an AI app. It looks like an AI gateway layer that can be deployed, secured, and managed on Kubernetes using the same operational patterns you would use for internal developer platforms, control planes, or multi-service middleware.

OpenClaw

Flexible GPU Billing Models for Modern Cloud Providers — Powering the AI Factory with Rafay

The GPU cloud market is evolving fast. At NVIDIA GTC 2026, one theme rang loud and clear: enterprises are no longer experimenting with AI, they are committing to it at scale. Training frontier models, fine-tuning domain-specific LLMs, and running large-scale inference workloads on NVIDIA gear require sustained, predictable access to high-end GPU infrastructure. That kind of commitment demands a billing model to match.

If you are running a GPU cloud business, you already know that a simple pay-as-you-go model doesn't cut it anymore. Your enterprise customers want options and your ability to offer those options is a direct competitive advantage. That's where Rafay comes in.

Accelerating the AI Factory: Rafay & NVIDIA NCX Infra Controller (NICo)

Acquiring GPU hardware is the easy part. Turning it into a productive, multi-tenant AI service with proper isolation, self-service provisioning, and the governance to operate it at scale is where most get stuck. Custom integration work piles up, timelines slip, and the gap between racked hardware and revenue widens.

Rafay is closing that gap through a new integration with the NVIDIA NCX Infrastructure Controller (NICo), NVIDIA's open-source component for automated bare-metal lifecycle management. Together, Rafay and NICo give operators a unified platform to manage their GPU fleet to deliver cloud-like, self-service experiences to end users.

How Rafay and NVIDIA Help Neoclouds Monetize Accelerated Computing with Token Factories

The AI boom has created an unprecedented demand for GPUs. In response, a new generation of GPU-first cloud providers purpose-built for AI workloads—known as neoclouds—has emerged to deliver the AI infrastructure needed to power AI applications.

However, a critical shift is happening in the market. Selling raw GPU infrastructure is no longer enough. The real opportunity lies in turning GPU capacity into AI services. Developers and enterprises don't want GPUs. They want models, APIs, and intelligence on demand.

With Rafay's Token Factory offering, Neoclouds can transform GPU clusters into a self-service AI platform that exposes models through token-metered APIs. The result is a marketplace where neoclouds monetize infrastructure, model developers reach users, and developers build applications, all on the same platform.

This is where Rafay and NVIDIA have come together to unlock a powerful new business model for AI infrastructure providers.

End User Portal Token Factory

NVIDIA AICR Generates It. Rafay Runs It. Your GPU Clusters, Finally Under Control

Deploying GPU-accelerated Kubernetes infrastructure for AI workloads has never been simple. Administrators face a relentless compatibility matrix i.e. matching GPU driver versions to CUDA releases, pinning Kubernetes versions to container runtimes, tuning configurations differently for NVIDIA H100s versus A100s, and doing all of it differently again for training versus inference.

One wrong version combination and workloads fail silently, or worse, perform far below hardware capability. For years, the answer was static documentation, tribal knowledge, and hoping that whoever wrote the runbook last week remembered to update it.

NVIDIA's AI Cluster Runtime (AICR) and the Rafay Platform represent a new approach — one where GPU infrastructure configuration is treated as code, generated deterministically, validated against real hardware, and enforced continuously across fleets of clusters.

Together, they cover the full lifecycle from first aicr snapshot to production-grade day-2 operations, with cluster blueprints as the critical bridge between the two.

Baton Pass

From Slurm to Kubernetes: A Guide for HPC Users

If you've spent years submitting batch jobs with Slurm, moving to a Kubernetes-based cluster can feel like learning a new language. The concepts are familiar — resource requests, job queues, priorities — but the vocabulary and tooling are different. This guide bridges that gap, helping HPC veterans understand how Kubernetes handles workloads and what that means day-to-day.

SLurm to k8s

Run nvidia-smi on Remote GPU Kubernetes Clusters Using Rafay Zero Trust Access

Infra operators managing GPU-enabled Kubernetes clusters often need a fast and secure way to validate GPU visibility, driver health, and runtime readiness without exposing the cluster directly or relying on bastion hosts, VPNs, or manually managed kubeconfigs.

With Rafay's zero trust kubectl, operators can securely access remote Kubernetes resources and execute commands inside running pods from the Rafay platform. A simple but powerful example is running nvidia-smi inside a GPU Operator pod to confirm that the NVIDIA driver stack, CUDA runtime, and GPU devices are functioning correctly on a remote cluster.

In this post, we walk through how infra operators can use Rafay's zero trust access workflow to run nvidia-smi on a remote GPU-based Kubernetes cluster.

Nvidia SMI over ZTKA