How Rafay and NVIDIA Help Neoclouds Monetize Accelerated Computing with Token Factories¶

The AI boom has created an unprecedented demand for GPUs. In response, a new generation of GPU-first cloud providers purpose-built for AI workloads—known as neoclouds—has emerged to deliver the AI infrastructure needed to power AI applications.

However, a critical shift is happening in the market. Selling raw GPU infrastructure is no longer enough. The real opportunity lies in turning GPU capacity into AI services. Developers and enterprises don't want GPUs. They want models, APIs, and intelligence on demand.

With Rafay's Token Factory offering, Neoclouds can transform GPU clusters into a self-service AI platform that exposes models through token-metered APIs. The result is a marketplace where neoclouds monetize infrastructure, model developers reach users, and developers build applications, all on the same platform.

This is where Rafay and NVIDIA have come together to unlock a powerful new business model for AI infrastructure providers.

The Evolution of Neoclouds¶

The first generation of neoclouds focused primarily on GPU access. They offered bare metal GPU servers, GPU-enabled VMs, and Kubernetes clusters—an approach that helped address the global AI Infrastructure shortage. But the market has evolved. Today's AI developers expect something very different. They want:

OpenAI-compatible APIs
Instant access to models
Elastic inference scaling
Pay-as-you-go pricing

Most importantly, developers want zero infrastructure management. To meet these expectations, neoclouds must move beyond infrastructure and deliver models as services.

Introducing the Token Factory¶

Rafay's Token Factory offering, available as a self-standing offering or as part of the broader Rafay Platform, is designed to turn GPU clusters into a self-service AI monetization engine. Instead of exposing raw GPUs, operators publish models as scalable APIs backed by GPU inference infrastructure.

The Token Factory automates the following:

Model deployment
Inference infrastructure orchestration
API endpoint generation
Token usage counting and tracking
Tenant access control
Billing integration

With Rafay's Token Factory, every model becomes an AI product. Every API call consumes tokens. Every token becomes billable consumption. For neoclouds, this transforms infrastructure into a highly efficient, scalable marketplace for AI services and use cases.

Built on NVIDIA NIM and Dynamo¶

A key advantage of the Rafay Token Factory is its deep, turnkey integration with NVIDIA's AI inference technology stack. Models from sources such as NVIDIA NGC, Hugging Face, or partner ecosystems can be deployed as production-grade AI APIs in minutes.

For neocloud operators, this dramatically reduces the complexity of running large-scale AI inference while ensuring that models run with NVIDIA-tuned performance out of the box.

Rafay’s Token Factory has also been validated on Nvidia’s Hopper, Blackwell, Grace Blackwell platforms with validation on Vera Rubin coming soon.

NVIDIA Blackwell dominates on SemiAnalysis' InfereceMax benchmarks.

NVIDIA NIM¶

NVIDIA NIM (NVIDIA Inference Microservices) provides pre-optimized, containerized inference engines for a wide range of AI models. NIM handles model packaging, runtime optimization, and hardware compatibility, so operators can stand up production-ready inference endpoints without manual tuning. For neoclouds, this means faster time-to-market and consistent performance across GPU clusters.

NVIDIA Dynamo¶

Dynamo provides advanced capabilities for distributed inference optimization. It dramatically improves performance by enabling:

Intelligent request routing
Multi-node inference scaling
Efficient GPU utilization
Lower latency and higher throughput

When combined with Rafay's platform orchestration, Dynamo helps ensure that inference workloads run efficiently across GPU clusters at scale. For neoclouds, this translates into higher requests per GPU, better cost efficiency, improved customer experience, and the ability to handle bursty inference traffic.

Together, NIM and Dynamo create the optimized inference layer, while Rafay provides the multi-tenant platform, automation, and monetization framework on top.

Monetization for Every Customer Type¶

One of the most powerful capabilities of the Token Factory is its ability to serve multiple customer segments simultaneously—allowing neoclouds to maximize both GPU utilization and revenue.

Enterprises¶

Enterprises require secure, governed environments where they can build AI-powered applications. With the Token Factory, enterprise tenants receive:

Dedicated tenants with user and team management
Centralized API key management
Usage tracking and reporting
Secure access to curated models
Invoice-based billing

This allows enterprises to build production applications—AI copilots, agentic workflows, knowledge assistants—without managing GPU infrastructure.

Developers¶

Neoclouds can open the platform to solo developers and startups as well. These users can instantly access models, generate API keys, pay per token with credit card billing, and experiment freely. This developer ecosystem drives continuous demand for AI services, increasing GPU utilization across the platform.

Watch a video showcasing the experience for a solo developer.

A Marketplace of Models¶

AI innovation is moving quickly, and developers need access to a wide range of models. With the Token Factory, neoclouds can onboard models from multiple sources, including:

NVIDIA NGC Catalog
Hugging Face
Internally developed models
Partner-developed models

This allows operators to offer a catalog that includes frontier LLMs, open-source models, domain-specific models, NVIDIA-optimized models such as Nemotron, open source models such as Qwen, Gemma, Mistral, Llama and Kimi.

Each model can be deployed as a scalable inference service. For developers, this feels like an AI app marketplace. For providers, it creates a continuous stream of monetizable AI services.

The Future of AI Clouds¶

AI demand is growing faster than any previous wave of cloud computing. But the neoclouds that succeed will not simply offer the most GPUs—they will offer the best AI services. Traditional cloud business models revolve around infrastructure consumption. AI workloads behave differently.

Customers don't want to buy GPUs; they want AI outcomes: faster answers, lower cost per inference, and reliable uptime at scale. Rafay's Token Factory aligns monetization with how AI is actually consumed.

Instead of selling GPUs, neoclouds sell intelligence as a service. Instead of billing for infrastructure, they monetize AI tokens. Instead of serving isolated workloads, they build ecosystems where infrastructure providers, model creators, and application developers all participate and all benefit.

Powered by world-class NVIDIA inference technology and Rafay's multi-tenant platform, the Token Factory gives neoclouds everything they need to make that transition—and build a durable AI services business in the process.

Free Org

Sign up for a free Org if you want to try this yourself with our Get Started guides.

Free Org
Live Demo

Schedule time with us to watch a demo in action.

Schedule Demo