Skip to content

Capabilities of the Bare Metal GPU Service

The Bare Metal GPU Service provides a way to consume powerful, pre-configured physical machines that are optimized for advanced AI/ML workloads. These nodes offer full access to GPUs, CPUs, memory, storage, and networking resources with no virtualization overhead.


Key Capabilities

The following capabilities are supported as part of the Bare Metal GPU Service:

Capability Description
Multi-GPU Support Enables usage of nodes with 1, 4, or 8 high-performance GPUs for scale-out training and inference workloads.
Kubernetes Integration Supports Kubernetes-native workflows; users can deploy workloads using standard manifests and Helm charts.
Custom OS Images Ability to boot bare metal nodes with pre-approved base operating systems such as Ubuntu 22.04 LTS.
GPU Sharing (Optional) Offers full node access, but can also support GPU sharing configurations when enabled at cluster level.
High-Speed Interconnects Nodes are equipped with NVLink, NVSwitch, and NDR Infiniband for high-bandwidth GPU-to-GPU communication.
Dedicated CPU Nodes Allows provisioning of CPU-only nodes for non-GPU workloads such as orchestration, preprocessing, or storage.
User-Controlled Lifecycle End users can start, stop, and terminate nodes through self-service controls with quota enforcement.
Custom Initialization Hooks Supports bootstrap scripts and environment-specific initialization logic.
Telemetry & Monitoring Integration with monitoring dashboards and system metrics for observability (requires setup).
Networking & Security Supports workload isolation through Kubernetes namespaces, CNI-based policies, and secure ingress/egress.
No Virtualization Overhead Direct access to hardware ensures maximum performance for demanding AI/ML pipelines.

Supported Workloads

The service is optimized for:

  • Large Language Model (LLM) training and fine-tuning
  • Multi-GPU distributed training jobs
  • High-throughput inference pipelines
  • Data preprocessing and feature engineering
  • Serving orchestration or control plane components

Access Patterns

Users can consume bare metal resources through:

  • Compute Profiles with the baremetal type
  • Environment Templates mapped to supported node types
  • Custom Providers to inject hooks, data, and logic into provisioning workflows

Platform Setup Overview

The platform team is responsible for the initial configuration and enablement of the Bare Metal GPU Service. This setup includes onboarding physical nodes into the Rafay platform, defining system-level resource pools (such as public IP pools and VLANs), configuring networking interfaces (including DPUs), and enabling self-service compute profiles for specific projects.

The architecture typically involves physically provisioned servers with GPU and CPU roles, high-speed interconnects (e.g., NVLink, NDR InfiniBand), and secure tenant-facing network configurations. The platform ensures these resources are exposed to end users in a controlled and quota-enforced manner.

The following sequence diagram outlines the high-level process for preparing the platform for bare metal consumption:

sequenceDiagram
    participant Admin as NCP-Admin
    participant Infra as Bare Metal Infrastructure
    participant Rafay as Rafay Platform

    Admin->>Infra: Rack & Provision Bare Metal Servers
    Admin->>Infra: Install Base OS (e.g., Ubuntu 22.04 LTS)

    Admin->>Infra: Setup Networking (VLANs, IP Pools, DPU Config)
    Admin->>Infra: Attach High-Speed Storage (e.g., NVMe, Ceph)

    Admin->>Rafay: Register Bare Metal Node Resources
    Rafay-->>Infra: Perform Hardware Discovery and Validation

    Admin->>Rafay: Configure Compute Profiles (baremetal type)
    Admin->>Rafay: Setup Environment Templates and Custom Init Hooks

    Admin->>Rafay: Provision Workload Environments using Bare Metal Nodes
    Rafay->>Infra: Bootstrap Kubernetes, System Components, GPU Drivers

    Rafay-->>Admin: Nodes Ready for AI/ML Workloads

Supported Integrations

Integration Availability
GitOps Workflows ✅ Supported
Service Account Injection ✅ Supported
Container Runtime Options (e.g., Kata) ⚙️ Configurable (on request)
GPU Monitoring Dashboards ⚙️ Requires setup
Storage Plugins (e.g., CSI) ✅ Supported

Summary

The Bare Metal GPU Service is designed for users who need full hardware access and control to maximize AI/ML performance. It supports highly parallelized workloads with multiple GPUs, dedicated networking, and deep customization for training pipelines and inference systems.