Rafay Product Documentation

Home
Home
- 👋 The Three Pillars of the Rafay Platform
- Overview
  Overview
- Automation
  Automation
  - Overview
  - CLI
    CLI
    
    Overview
    
    Setup
    
    Commands
    Commands
    
    Access Reports
    
    AddOns
    
    Agents
    
    Agent Pools
    
    Backup
    
    Blueprints
    
    Catalog
    
    Clusters
    
    Cloud Credentials
    
    Custom ZTKA Access
    
    Environment Manager
    
    Groups
    
    IdP/SSO
    
    Namespaces
    
    Network Policy
    
    Overrides
    
    Overrides Schema
    
    Pipelines
    
    Policy
    
    Projects
    
    Registry
    
    Repository
    
    RBAC
    
    Secret Groups
    
    Secret Stores
    
    Templating
    
    Trigger
    
    Groups
    
    Workloads
    
    Legacy
    Legacy
    
    Overview
    
    Blueprints
    
    Addons
    
    Agents
    
    Clusters
    
    Credentials
    
    Namespaces
    
    Overrides
    
    Pipeline
    
    Projects
    
    Repository
    
    Trigger
    
    Workloads
  - Terraform Provider
    Terraform Provider
    
    Overview
    
    Best Practices for Cluster Sharing
  - APIs
    APIs
    
    Overview
    
    Security
    
    Self-Service Portals
- Clusters
  Clusters
  - Home
  - Metadata
    Metadata
    
    Location
    
    Cluster Labels
    
    Node Labels
    
    Node Taints
    
    Health
  - Amazon EKS
    Amazon EKS
    
    Overview
    
    Supported Environments
    
    Pre-requisites
    Pre-requisites
    
    Credentials
    Credentials
    
    Cloud Credentials
    
    IAM Policy & Role Creation in AWS
    
    CNI Providers
    
    VPC Networking
    VPC Networking
    
    Overview
    
    Secondary CIDR with VPC
    
    Custom AWS CNI
    
    IAM Policy
    IAM Policy
    
    Overview
    
    Full
    
    Customer-Managed VPC
    
    Customer-Managed VPC & IAM
    
    Restricted IAM Policies on Tags
    
    Restricted IAM Policy on ARN
    
    Restricted IAM Policies on VPC & Tags
    
    Service Linked IAM Role
    
    EKS Add-Ons
    EKS Add-Ons
    
    Managed Add-Ons
    
    EKS Pod Identity Associations
    
    Cluster Configuration
    Cluster Configuration
    
    V3 Config Schema
    
    V1 Config Schema
    
    Cluster Config
    
    AWS Tags
    
    IAM
    IAM
    
    IAM Service Accounts
    IAM Service Accounts
    
    Overview
    
    CLI for IRSA
    
    Identity Mapping
    
    Cross Account ARN
    
    Clusters
    Clusters
    
    Control Plane
    
    Provision
    Provision
    
    Cluster Provisioning
    
    AWS EKS System Template
    
    Cluster with IPv6 Configuration
    
    Cluster Access Settings
    
    Convert to Managed
    
    Secret Encryption
    
    Auto Mode
    
    Day-2 Operations
    
    Nodegroups
    Nodegroups
    
    Overview
    
    Custom AMI
    
    Wavelength Zone
    
    Spot Instances
    
    Node Labels
    
    AWS Tags
    
    FAQs
    
    Automation
    Automation
    
    CLI
    
    API
    API
    
    Cluster API
    
    GitOps
    GitOps
    
    Overview
    
    Examples
    
    Overview
    
    RBAC based KubeCTL
    
    Upgrades
    Upgrades
    
    Upgrade Strategies
    
    k8s Upgrades
    
    Upgrade Insights
    
    AMI Upgrades
    
    Observability
    Observability
    
    Visibility and Monitoring
    
    Audit
    
    Deprovision
    
    Fleet Operations
    
    Diagnose
    Diagnose
    
    Best Practices
    
    FAQs
    
    Troubleshooting
  - Azure AKS
    Azure AKS
    
    Overview
    
    Supported Environments
    
    Pre-requisites
    Pre-requisites
    
    Credentials
    
    Azure Setup
    
    AKS Addons
    
    V1 Config Schema
    
    V3 Config Schema
    
    Restricted Roles & Identities
    
    Clusters
    Clusters
    
    Provision
    
    Azure CNI Overlay
    
    Convert to Managed
    
    Workload Identity
    Workload Identity
    
    Workload Identity Overview & Configuration
    
    Known Issues
    
    Auto Upgrade Clusters and Node OS
    
    Day-2 Operations
    
    Import and Takeover
    Import and Takeover
    
    Overview
    
    Workflow
    
    Azure AKS Template
    
    Start/Stop Clusters
    
    Nodepools
    Nodepools
    
    Node Labels
    
    Spot Price
    
    Automation
    Automation
    
    Overview
    
    GitOps
    GitOps
    
    Overview
    
    Examples
    
    K8s Upgrades
    
    Observability
    Observability
    
    Visibility and Monitoring
    
    Audit
    
    Deprovision
    
    Fleet Operations
    
    Troubleshooting
  - Bare Metal/VM
    Bare Metal/VM
    
    Approaches
    
    Overview
    
    Supported Environments
    
    Bare Metal Configuration Overview
    Bare Metal Configuration Overview
    
    Bare Metal/VM Deployment Models
    
    CNI Providers
    CNI Providers
    
    CNI Customization
    
    Kube-OVN and Cilium Integration
    
    Preflight Checks
    
    Provisioning
    
    Platform Version
    
    Installer Certificate TTL (Conjurer)
    
    Config Schema
    
    Extended Config Schema (Recommended)
    
    Master Nodes
    
    Worker Nodes
    
    Non-UI Interfaces
    Non-UI Interfaces
    
    GitOps
    
    CLI
    
    API
    
    Day-2 Operations
    
    Kubernetes Access
    
    Kubernetes Upgrades
    
    Node OS Upgrades
    
    Certificate Rotation
    
    Deprovision
    
    Troubleshooting
    
    Retry and Backoff
    
    Reset Node
    
    Storage
    Storage
    
    Overview
    
    Add Storage
    
    Zero Trust Host Access
    Zero Trust Host Access
    
    Overview
    
    Examples
    Examples
    
    Single Command-Node
    
    Multiple Command-Node
    
    Command-Cluster
    
    GPU Cluster Commands
    
    Command History
    
    Knowledge Base Articles
  - Edge
    Edge
    
    Overview
    
    Simulator
  - Equinix Metal
    Equinix Metal
    
    Overview
    
    Provision Servers
    
    Provision Kubernetes
  - Google GKE
    Google GKE
    
    Overview
    
    Supported Environments
    
    GCP Configuration
    
    Credentials
    
    Clusters
    Clusters
    
    Provisioning
    
    Custom Certificate Manager
    
    Shared VPC Network
    
    GPU Config
    
    Reservation Affinity
    
    Auto Upgrade Clusters
    
    Provisioning Explained
    
    Day-2 Operations
    
    preBootstrapCommands
    
    Automation
    Automation
    
    API
    
    CLI
    
    V3 API Config Schema
    
    V2 API Config Schema
    
    Scale Nodes
    
    Upgrade K8s
    
    Deprovision
    
    GKE Autopilot Template
    
    Troubleshooting
  - Imported
    Imported
    
    Overview
    
    Cluster Import Wizard
    
    Declarative
    
    Analysis
    
    Customization
    
    Import Failures
    
    Remove Operator
    
    EKS Add-on
    
    Fleet Operations
    
    Troubleshooting
  - Nutanix
    Nutanix
    
    Overview
  - Open Stack
    Open Stack
    
    Overview
    
    Provision
    
    Deprovision
    
    Lifecycle
    
    FAQ
  - RedHat OpenShift
    RedHat OpenShift
    
    Overview
    
    Provision
    
    Import
    
    Blueprints
    
    Dashboards
  - Virtual Appliance
    Virtual Appliance
    
    Overview
    
    Provision
    
    Deprovision
    
    Lifecycle
    
    vSphere Example
    
    SSH Example
- Fleet Operations
  Fleet Operations
  - Overview
  - Create Plan
  - Automation
  - Config Samples
  - Reference Implementation
    Reference Implementation
    
    Amazon EKS
    Amazon EKS
    
    EKS-1.23
    
    EKS-1.24
  - Troubleshooting
- Multi Tenancy
  Multi Tenancy
  - Overview
  - Hard Tenancy
  - Projects
    Projects
    
    Hard Tenancy
    
    Description
    
    Project Tags
    
    Resource Quotas
    
    Cluster Sharing
    
    CLI
  - Soft Tenancy
    Soft Tenancy
    
    Workspace Role
    
    Namespace
    Namespace
    
    Overview
    
    Management
    
    Reconciliation
    
    CLI
    
    Namespace Schema
- Services
  Services
  - Overview
  - Backup and Restore
    Backup and Restore
    
    Overview
    
    API
    
    CLI
    
    Backup Location
    Backup Location
    
    Overview
    
    AWS S3 Bucket
    
    Azure Blob Storage
    
    S3 Compatible Storage
    
    Credentials
    Credentials
    
    Overview
    
    AWS
    AWS
    
    Credentials
    
    Backup & Restore using IRSA
    
    Azure
    
    S3 Compatible
    
    Data Agent
    
    Backup Policy
    
    Backup Job
    
    Restore Policy
    
    Restore Job
    
    Considerations
  - Blueprints
    Blueprints
    
    Overview
    
    Custom Add-Ons
    
    Managed Add-Ons
    Managed Add-Ons
    
    Overview
    
    Ingress Controller
    Ingress Controller
    
    Background
    
    Managed Ingress
    
    Critical Add-ons
    
    Override Customization
    Override Customization
    
    Overview
    
    Resource Allocation
    
    v3 Specifications (Recommended)
    
    v2 Specifications (Legacy)
    
    Blueprint Types
    Blueprint Types
    
    Default System Blueprints
    Default System Blueprints
    
    Overview
    
    Minimal Blueprint
    
    Standard Default Blueprint
    
    Default AKS
    
    Default GKE
    
    Default Openshift
    
    Default Upstream
    
    Default CNI Blueprints
    
    Custom and Golden Blueprints
    Custom and Golden Blueprints
    
    Custom Blueprint
    
    Golden Blueprint
    
    Organization-Level Settings
    
    Draft Versions
    
    Fleet Management (Deprecated)
    
    Sharing
    
    Cluster Overrides
    Cluster Overrides
    
    Overview
    
    Workflow
    
    Customization
    
    Built-in Variables
    
    Sharing Overrides
    
    Update Blueprint
    
    Pod Security Policy (EOL)
    
    Blueprint Schema
    
    CLI
    CLI
    
    Blueprint CLI
    
    AddOns
    
    API
    
    Troubleshooting
  - Catalog
    Catalog
    
    Overview
    
    Manage Catalogs
    
    Catalog
  - Cost Management
    Cost Management
    
    Overview
    
    Considerations
    
    Cost Profiles
    
    Cloud Credentials
    
    AWS Integration
    
    Azure Integration
    
    GCP Integration
    
    Visibility
    
    Chargeback/Showback
    
    Explorer
    
    CLI
    CLI
    
    Profiles
    
    Chargeback Groups
  - GitOps (Apps & Infra)
    GitOps (Apps & Infra)
    
    Overview
    
    Benefits
    
    Pipelines
    
    Stages
    Stages
    
    Overview
    
    Approval
    
    Deploy Workload
    
    Infra Provisioner
    Infra Provisioner
    
    Overview
    
    CLI
    
    System Sync
    System Sync
    
    Bidirectional Synchronization
    
    Termination Protection
    
    System Sync (Best Practices)
    
    Workload Template
    
    Triggers
    Triggers
    
    Overview
    
    Troubleshooting
    
    Secret Groups
    Secret Groups
    
    Pipeline Secret Groups
    
    Secret Groups
    
    Agents
    
    Agent Pools
    
    Best Practices
    Best Practices
    
    System Sync
    
    Env Manager
  - Network Policy
    Network Policy
    
    Background
    
    Overview
    
    Installation Profiles
    
    Network Policy Rules
    Network Policy Rules
    
    Overview
    
    Cluster-Wide Network Policy Rules
    
    Namespace Network Policy Rules
    
    Cluster-Wide Network Policies
    
    Namespace Network Policies
    
    Network Policy
  - Policy Mgmt
    Policy Mgmt
    
    Overview
    
    Installation Profiles
    
    Constraint Templates
    
    Constraints
    
    Policies
    
    Policy Violations
    
    Visibility
    
    Policy
  - Secrets Management
    Secrets Management
    
    AWS Secrets Manager
    AWS Secrets Manager
    
    Secrets Store Add-on
    
    Secret Provider Classes
    
    Configure IRSA
    
    Annotations
    
    CLI
    
    HashiCorp Vault
    HashiCorp Vault
    
    Overview
    
    Configure Vault
    
    Use Vault-Helm/YAML
    Use Vault-Helm/YAML
    
    ENV Variables
    
    Files
    
    Use Vault-Wizard
    
    Sealers
    Sealers
    
    Secret Sealer
    
    Use Secret Sealer
  - Visibility & Monitoring
    Visibility & Monitoring
    
    Visibility
    Visibility
    
    Overview
    
    Organization
    
    Projects
    
    Cluster
    
    My Clusters
    
    Nodes
    
    Kubernetes Resources
    Kubernetes Resources
    
    View/Edit/Delete
    
    Create
    
    Kubernetes Events
    
    Pod Dashboard
    
    Container Dashboard
    
    Configuration
    
    GPU Dashboard
    
    Monitoring
    Monitoring
    
    Overview
    
    Alerts
    
    Notifications
    
    Custom Metrics HPA
  - Zero Trust Kubectl
    Zero Trust Kubectl
    
    Overview
    
    Architecture
    
    Background
    
    KubeCTL
    KubeCTL
    
    Browser
    
    KubeCTL CLI
    
    Configuration
    
    RBAC
    
    Audit Trail
    
    FAQ
  - Rafay CoPilot
    Rafay CoPilot
    
    Overview
    
    FAQ
- App Deployments
  App Deployments
  - Overview
  - Kubectl
  - Helm
  - MySQL
  - Workloads
    Workloads
    
    Overview
    
    Helm Charts
    
    k8s YAML
    
    Registry
    Registry
    
    Overview
    
    System Registry
    
    Repositories
    Repositories
    
    Overview
    
    Public Repos
    
    Private Repos
    
    Lifecycle
    
    Agents
    
    Wizard
    Wizard
    
    Overview
    
    Ingress
    
    DNS based GSLB
    
    Containers
    
    Upgrade Strategy
    
    Storage
    
    Policy
    
    Publish
    
    Certificate
    Certificate
    
    Overview
    
    New Certificate
    
    Cluster Overrides
    Cluster Overrides
    
    Workflow
    
    Share Override
    
    CLI
    
    Zero Trust Debug
    Zero Trust Debug
    
    Overview
    
    Developer Tools
    
    Continuous Integration
    Continuous Integration
    
    Overview
    
    Common Patterns
    
    Jenkins
    Jenkins
    
    Overview
    
    Workload Basics
    
    Workload Wizard
    
    Helm Workloads
    
    YAML Workloads
    
    Provision Upstream k8s
    
    Provision Amazon EKS
    
    CircleCI
    
    GitLab
    
    Azure DevOps
  - Integrated GitOps
  - 3rd Party GitOps
    3rd Party GitOps
    
    ArgoCD
- Backstage
  Backstage
  - Overview
  - Workflow
  - Setup
  - Templates
    Templates
    
    Environments
    Environments
    
    Create
    
    Clusters
    Clusters
    
    Create
    
    Register Existing
    
    Edit Template
    
    Namespaces
    Namespaces
    
    Create
    
    Register Existing
    
    Edit Template
    
    Workloads
    Workloads
    
    Create
  - Entity Cards
  - Delete Plugins
- Environment Manager
  Environment Manager
  - Overview
  - Workflow
  - Visibility
  - Non-UI Interfaces
    Non-UI Interfaces
    
    Overview
    
    CLI
    
    GitOps
  - Templates
    Templates
    
    Contexts
    
    Resource Template
    Resource Template
    
    Providers
    
    Create
    
    Environment Template
    Environment Template
    
    Create
    
    Schedules
    
    Workflow handlers
    Workflow handlers
    
    Create
    
    Functions
    
    Configuration Parameters
    Configuration Parameters
    
    Expressions
    Expressions
    
    Overview
    
    Starlark Expressions
    
    CUE Expressions
    
    Volume
    
    Selectors
    
    Static Resource
    
    Example Templates
    
    Skip Condition
  - Environment
    Environment
    
    Create
    
    Environment Schedules
    
    Manage Template-Based Clusters
    
    Fleet Plans
  - Actions
    Actions
    
    Configure Actions
    
    Data Schema and UI Schema Examples
  - RBAC
  - Cost Estimation
  - Security Scanning
  - HCP Terraform integration
  - Loader Utility
  - Template Catalog
    Template Catalog
    
    Overview
    
    Fleet Environment Template
    
    Troubleshooting
  - Developer Guide
    Developer Guide
    
    Introduction
    
    Key Components
    
    Env Template
    
    Resource Template
    
    Config Context
    
    Drivers/Workflow Handlers
    
    Static Resources
    
    Agents
    
    Repository
    
    Schedules
    
    Expressions
    
    Environments
    
    Design Guidelines
    
    Building Env Templates
- User Management
  User Management
  - Overview
  - Users
  - MFA
  - Groups
  - CLI
  - Roles
    Roles
    
    Base Roles
    
    Custom ZTKA access
    Custom ZTKA access
    
    Overview
    
    Rules
    
    Policies
    
    Attribute based access
    Attribute based access
    
    Overview
    
    Rules
    
    Policies
    
    Examples
    
    Common Scenarios
    
    Custom Roles
  - Single Sign On
    Single Sign On
    
    Overview
    
    ADFS
    
    Authentik
    
    AWS SSO
    
    Entra ID
    
    Duo SSO
    
    Google Workspace
    
    KeyCloak
    
    Okta
    
    Ping One
    
    IdP/SSO
    
    Webhooks
  - Multiple Orgs
  - IP Whitelisting
  - Break Glass Access
    Break Glass Access
    
    Overview
    
    UI
    
    CLI
- Security
  Security
  - Overview
  - Network White Listing"
  - Access Reports
    Access Reports
    
    Overview
    
    UI
    
    CLI
  - Audit Logging
  - Audit Log Aggregation
    Audit Log Aggregation
    
    Overview
    
    CloudWatch
    
    DataDog
    
    Splunk
    
    SumoLogic
    
    Syslog
  - Compliance
  - Vulnerabilities
  - CIS Benchmark
  - Contact
- Self Hosted Controller
  Self Hosted Controller
  - Overview
  - Architecture
  - Installation
    Installation
    
    Self-hosted Controller on EKS
    
    Self-hosted Controller on GKE
    
    Air-Gapped Controller on Baremetal/VM
    Air-Gapped Controller on Baremetal/VM
    
    Overview
    
    Prerequisites
    
    Preflight Check
    
    Install
    
    Upgrade
    
    Monitor and Dashboard
    
    SMTP Email Configuration
    
    Controller Package Contents
    
    Generate Support Logs with Recollect
    
    Backup and Restore
    Backup and Restore
    
    Backup & Restore with AWS S3
    
    Backup & Restore with S3-Compatible Storage
    
    Installation using Helm Chart
    Installation using Helm Chart
    
    Installation
    
    Configuration
    
    FIPS Compliant Controller
  - ConfigBuilder CLI Tool
    ConfigBuilder CLI Tool
    
    Configuration
    
    Input Parameters
  - FAQs
  - Troubleshooting
- Support Matrix
  Support Matrix
- Providers
  Providers
  - Overview
  - Tenants
  - Tenant Onboarding
  - Dashboards
  - Usage by GPU Type
  - Operations
  - SMTP Configuration
  - Whitelabeling
  - Customization
  - Global Settings
  - Agents
  - Inventory
    Inventory
    
    Overview
    
    Servers
    
    Virtual Machines
    
    Storages
    
    Network Switches
    
    InfiniBand
  - Metering & Billing
    Metering & Billing
    
    Overview
    
    APIs
  - User Management
  - Remote Support
AI/ML
AI/ML
- AI/ML and GenAI
- AWS SageMaker
  AWS SageMaker
  - Get Started
- GPU PaaS
  GPU PaaS
  - Overview
  - Deployment Options
  - Critical Capabilities
  - Cloud Providers
    Cloud Providers
    
    Overview
    
    Customization
    
    Tenants
    
    Tenant Onboarding
    
    Global Settings
    
    Agents
    
    Inventory
    Inventory
    
    Overview
    
    Servers
    
    Virtual Machines
    
    Storages
    
    Network Switches
    
    InfiniBand
    
    Dashboards
    
    Usage Metrics
    
    Billing
    Billing
    
    Overview
    
    APIs
    
    SMTP Configuration
    
    Provisioning Control
    
    User Management
    
    Usage by GPU Type
  - Administration
    Administration
    
    Overview
    
    Architectures
    
    Profile Quotas
    
    Profile Catalog Management
    
    System Profiles
    
    Compute Profile
    Compute Profile
    
    Create Compute Profile
    
    Profile Types
    Profile Types
    
    Baremetal
    
    Virtual Machines
    
    K8s
    
    vCluster
    
    Slurm
    
    Inventory-Based Scheduling
    
    Service Profile
    
    Compute Instances
    
    Services Instances
    
    Multi Tenancy
    Multi Tenancy
    
    Overview
    
    Controls
    Controls
    
    Virtual Clusters
    
    Isolated Containers
    
    Network Policy
    
    Cluster Policy
    
    Role based Access
    
    Secure Remote Access
    
    Resource Quotas
    
    Audit Logging
    
    Cost Allocation
    
    Network Segmentation
    
    Audit Logs
    
    Policies
    
    Schedules
    Schedules
    
    Overview
    
    Configurations
  - End Users
    End Users
    
    Overview
    
    Access
    
    Landing Page
    
    Workspaces
    
    Compute Instances
    Compute Instances
    
    Overview
    
    Baremetal
    
    Virtual Machines
    
    Kubernetes
    
    vCluster
    
    Slurm
    
    Services
    Services
    
    Overview
    
    Jupyter Notebooks
    
    Inference Endpoints
    
    AI/ML Jobs
    
    Custom Services
  - Get Started
    Get Started
    
    Overview
    
    Billing & Usage Metering
    Billing & Usage Metering
    
    Overview
    
    Per SKU Pricing
    
    Dimensional SKUs
    
    Tenant Overrides
    
    Enterprise
    Enterprise
    
    Overview
    
    Infrastructure
    Infrastructure
    
    Overview
    
    Prerequisites
    
    Linux Server
    
    Linux on Windows Laptop
    
    Cluster Add-Ons
    
    Template Catalog
    Template Catalog
    
    Clone Templates
    
    SKU Design
    SKU Design
    
    Compute SKU
    
    Notebook SKU
    
    End User
    End User
    
    Onboard End User
    
    Access Developer Hub
    
    Workspaces
    
    Compute Instance
    
    Jupyter Notebook
    
    Collaboration
    
    GPU Cloud Provider
    GPU Cloud Provider
    
    Overview
    
    Sample Icons
    Sample Icons
    
    Icons
  - Troubleshooting
- Bare Metal Servers
  Bare Metal Servers
  - Overview
  - Provisioning Models
  - With BCM
    With BCM
    
    Capabilities
    
    Requirements
    
    Setup
    
    Monitoring
    Monitoring
    
    Architecture
    
    Setup
    
    Dashboards
    
    Common Configs
    
    Videos
  - With Metal3/Ironic
    With Metal3/Ironic
    
    Capabilities
    
    Requirements
    
    Architecture
    
    Setup
- Virtual Machines
  Virtual Machines
  - Overview
  - Capabilities
  - Architecture
  - Requirements
  - Monitoring
    Monitoring
    
    Architecture
    
    Setup
    
    Dashboards
  - Get Started
  - Common Configs
  - Videos
- Managed Kubernetes Clusters
  Managed Kubernetes Clusters
- MLOps-Kubeflow
  MLOps-Kubeflow
  - Overview
  - MLOps
  - Unique Capabilities
  - Features
  - Benefits
  - Support Matrix
  - Videos
  - Installation
    Installation
    
    Overview
    
    Authentication
    
    Kubernetes
    Kubernetes
    
    Design
    
    Requirements
    
    Setup
    
    Configure
    
    Deployment
    
    Day 2
    
    Destroy
    
    Troubleshoot
    
    GCP
    GCP
    
    Design
    
    Requirements
    
    Costs
    
    Setup
    
    Deployment
    
    Resource Quotas
    
    Day 2
    
    Destroy
    
    Troubleshoot
    
    Integrations
    Integrations
    
    Google BigQuery
    
    Google Cloud Storage
  - User Guide
    User Guide
    
    Overview
    
    Workspace
    
    Auto ML
    Auto ML
    
    Overview
    
    Katib
    
    Feature Store
    Feature Store
    
    Overview
    
    Feast
    
    Operations
    
    Model Monitoring
    Model Monitoring
    
    Overview
    
    Evidently
    
    Phoenix Arize
    
    Model Visualization
    Model Visualization
    
    Overview
    
    Concepts
    
    Considerations
    
    PyTorch
    
    Notebooks
    Notebooks
    
    Overview
    
    Lifecycle
    
    Images
    
    Build Images in Notebook
    
    Troubleshooting
    
    Pipelines
    Pipelines
    
    Overview
    
    Distributed Training
    Distributed Training
    
    Overview
    
    Administrators
    
    User
    
    Model Registry
    Model Registry
    
    Overview
    
    Operations
  - Get Started
    Get Started
    
    Overview
    
    Basic Pipeline
    Basic Pipeline
    
    Overview
    
    Iris Dataset
    
    MLOps Pipeline
    
    Deep Learning Pipeline
    Deep Learning Pipeline
    
    Overview
    
    Titanic Dataset
    
    DL MLOps Pipeline
    
    Feature Store
    Feature Store
    
    Overview
    
    Feature View
    
    Katib (AutoML)
    Katib (AutoML)
    
    Overview
    
    Explainer
    
    AutoML
    
    Training in Notebook
    Training in Notebook
    
    PyTorch Basic
    
    PyTorch Distributed
    
    TensorFlow
- Jupyter Notebook
  Jupyter Notebook
- LLM Inference
  LLM Inference
- MLOps-Ray
  MLOps-Ray
  - Overview
  - Benefits
  - Support Matrix
  - Design
    Design
    
    Design
    
    Ray
    
    kubeRay
    
    Custom Scheduler
  - Administration
    Administration
    
    Setup
    
    Configure
    
    Service Profile
    
    Troubleshoot
  - Users
    Users
    
    Launch
    
    Use
    
    Ray Dashboard
    
    Ray Serve
  - Get Started
    Get Started
    
    Overview
    
    Simple
    
    Request GPU
    
    Batch
    
    Distributed Training
    Distributed Training
    
    PyTorch
    
    TensorFlow
- Serverless Pods
  Serverless Pods
- Serverless Inference
  Serverless Inference
  - Overview
  - Requirements Requirements
    Table of contents
    
    Rafay Control Plane
    
    Data Plane for Serverless Inferencing
    
    Kubernetes Master
    
    Worker Nodes
    
    Operating System
    
    Networking
    
    Internet Connectivity
    
    Local Object Storage
    
    Load Balancer
    
    Public IP Pool
    
    TLS Certificates
  - Service Provider
    Service Provider
    
    Models
    
    Model Deployments
  - Users
    Users
    
    Models
    
    API Keys
    
    API Usage
    
    Usage Dashboard
    
    Rate Limits
- SLURM-Kubernetes
  SLURM-Kubernetes
  - Overview
  - Deployment Options
  - Administration
    Administration
    
    Cluster Lifecycle
    
    User Management
    
    Monitoring
    
    GPU Benchmarks
  - Access Cluster
  - Jobs
    Jobs
    
    Batch Jobs
    
    Containerized Jobs
  - Upload Data
  - Cloud Provider
    Cloud Provider
    
    Requirements
    
    Custom Images
  - Slinky
- NIM Microservices
  NIM Microservices
  - Overview
  - Learn
    Learn
    
    Configure
    
    Use
- GPU Sharing
  GPU Sharing
  - Overview
  - Configure
  - Use
Get Started
Get Started
- Get Started
- Basics of Kubernetes
- By Kubernetes Distribution
- By Capability of Rafay Kubernetes Managenent
- Application Lifecycle using Rafay Kubernetes Management
- Access Control
  Access Control
  - IDP RBAC
    IDP RBAC
    
    Overview
    
    Alerts
    
    Notifications
- Alerts & Notifications
  Alerts & Notifications
  - Alerts
  - Notifications
- Amazon EKS
  Amazon EKS
  - Home
  - Backup/Restore
    Backup/Restore
    
    Overview
    
    Part 1: Setup Environment
    
    Part 2: Create Resources
    
    Part 3: Backup/Restore
  - Blue/Green Upgrade
    Blue/Green Upgrade
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Workload
    
    Part 4: Deprovision
  - Cluster Lifecycle
    Cluster Lifecycle
    
    Overview
    
    Prerequisites
    
    Part 1: Provision
    
    Part 2: Scale
    
    Part 3: Node Group
    
    Part 4: Upgrade
    
    Part 5: Deprovision
  - CloudWatch
    CloudWatch
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Blueprint
    
    Part 4: Deprovision
  - Cluster Autoscaler
    Cluster Autoscaler
    
    Overview
    
    Part 1: Setup
    
    Part 2: Blueprint
    
    Part 3: Provision
    
    Part 4: Workload
    
    Part 5: Deprovision
  - Custom Networking
    Custom Networking
    
    Overview
    
    Provision
    
    Deploy Workload
    
    Deprovision
  - EFS
    EFS
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Blueprint
    
    Part 4: Workload
    
    Part 5: Deprovision
  - EKS System Sync
    EKS System Sync
    
    Overview
    
    Part 1: Setup
    
    Part 2: Sync from Git
    
    Part 3: Sync from System
  - Fleet for EKS
    Fleet for EKS
    
    Overview
    
    Part 1: Create & Execute
    
    Part 2: Stop & Delete
  - External DNS
    External DNS
    
    Overview
    
    Part 1: Provision
    
    Part 2: Blueprint
    
    Part 3: Workload
    
    Part 4: Deprovision
  - Fargate
    Fargate
    
    Overview
    
    Provision
    
    Deploy Workload
    
    Deprovision
  - GitOps
    GitOps
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Pipeline
    
    Part 4: Utilize
    
    Part 5: Deprovision
  - GPU
    GPU
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Blueprint
    
    Part 4: Workload
    
    Part 5: Deprovision
  - Graviton
    Graviton
    
    Overview
    
    Provision
    
    Deploy Workload
    
    Deprovision
  - Karpenter
    Karpenter
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Blueprint
    
    Part 4: Workload
    
    Part 5: Upgrade
    
    Part 6: Deprovision
  - Secrets Manager
    Secrets Manager
    
    Overview
    
    Part 1: Provision
    
    Part 2: Blueprint
    
    Part 3: Workload
    
    Part 4: Deprovision
  - Spot Instances
    Spot Instances
    
    Overview
    
    Part 1: Provision
    
    Part 2: Deprovision
  - Takeover
    Takeover
    
    Overview
    
    Import & Takeover
    
    Lifecycle Operations
    
    Deprovision
  - Standard Operating Model
    Standard Operating Model
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Deprovision
  - Triton
    Triton
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Blueprint
    
    Part 4: Workload
    
    Part 5: Deprovision
  - Windows
    Windows
    
    Overview
    
    Part 1: Provision
    
    Part 2: Workload
    
    Part 3: Deprovision
- App Lifecycle
  App Lifecycle
  - Workload Lifecycle
    Workload Lifecycle
    
    Overview
    
    Part 1: YAML
    
    Part 2: Helm
    
    Part 3: Update
  - Multi Stage GitOps Pipeline
    Multi Stage GitOps Pipeline
    
    Overview
    
    Part 1: Setup
    
    Part 2: Deploy
    
    Part 3: Pipeline
    
    Part 4: Update
  - Troubleshooting
    Troubleshooting
    
    Overview
    
    Scenario 1: Misconfigured Requests
    
    Scenario 2: Incorrect Container Image
  - Progressive Rollouts
    Progressive Rollouts
    
    Overview
    
    Blue/Green
    Blue/Green
    
    Overview
    
    Blue/Green
    
    Canary
    Canary
    
    Overview
    
    Canary
- Azure AKS
  Azure AKS
  - Home
  - Backup/Restore
    Backup/Restore
    
    Overview
    
    Part 1: Setup Environment
    
    Part 2: Create Resources
    
    Part 3: Backup/Restore
  - Cluster Lifecycle
    Cluster Lifecycle
    
    Overview
    
    Prerequisites
    
    Part 1: Provision
    
    Part 2: Scale
    
    Part 3: Node Pool
    
    Part 4: Upgrade
    
    Part 5: Deprovision
  - Cluster Takeover
    Cluster Takeover
    
    Overview
    
    Part 1: Provision
    
    Part 2: Deprovision
  - GPU
    GPU
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Blueprint
    
    Part 4: Workload
    
    Part 5: Deprovision
  - Standard Operating Model
    Standard Operating Model
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Deprovision
- Basics
  Basics
- Blueprints
  Blueprints
  - Blueprint Lifecycle
    Blueprint Lifecycle
    
    Overview
    
    Part 1: Create
    
    Part 2: Update
    
    Part 3: Monitor
  - Add-Ons and Overrides
    Add-Ons and Overrides
    
    Overview
    
    Part 1: Create
    
    Part 2: Utilize
  - Drift Detection
    Drift Detection
    
    Overview
    
    Part 1: Detect
    
    Part 2: Block
  - Namespace Synchronization
    Namespace Synchronization
    
    Overview
    
    Part 1: Create
    
    Part 2: Manage
- Cost Management
  Cost Management
- Environment Manager
  Environment Manager
  - Get Started with Environment Manager
  - Introductory
    Introductory
    
    Overview
    
    Prerequisites
    
    Part 1
    
    Part 2
    
    Part 3
  - Intermediate
    Intermediate
    
    Schedules
    
    Hooks
  - Custom App
    Custom App
    
    Overview
    
    Environment Template
    
    Service Profile
  - AWS
    AWS
    
    Basics
    Basics
    
    RDS
    RDS
    
    Overview
    
    Setup
    
    Developer Self-Service
    
    ECS
    ECS
    
    Overview
    
    Setup
    
    Provision
  - Azure
    Azure
    
    Basics
    Basics
    
    Overview
    
    Setup
    
    Provision
  - GCP
    GCP
    
    Basics
    Basics
    
    Overview
    
    Setup
    
    Provision
- GitOps
  GitOps
  - AKS System Sync
    AKS System Sync
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Deprovision
  - Deployment Strategies
    Deployment Strategies
    
    Overview
    
    Setup
    
    Recreate
    
    Rolling Update
    
    Blue-Green
    
    Canary
  - System Sync
    System Sync
    
    Overview
    
    Part 1: Setup
    
    Part 2: Sync Blueprint
    
    Part 3: Sync Workload
  - EKS System Sync
    EKS System Sync
    
    Overview
    
    Part 1: Setup
    
    Part 2: Sync from Git
    
    Part 3: Sync from System
- Google GKE
  Google GKE
  - Home
  - Cluster Lifecycle
    Cluster Lifecycle
    
    Overview
    
    Part 1: Provision
    
    Part 2: Scale
    
    Part 3: Upgrade
    
    Part 4: Deprovision
  - GKE System Sync
    GKE System Sync
    
    Overview
    
    Part 1: Setup
    
    Part 2: Sync from Git
    
    Part 3: Sync from System
- Kubernetes
  Kubernetes
  - Overview
  - Install MicroK8s
  - Kubernetes 101
    Kubernetes 101
    
    Part 1: Using Namespaces
    
    Part 2: Using Pods
    
    Part 3: Using Deployments
    
    Part 4: Using Services
    
    Part 5: Using Ingress
  - Kubernetes 201
    Kubernetes 201
    
    Part 1: Using ConfigMaps
    
    Part 2: Using Secrets
    
    Part 3: Using PV
    
    Part 4: Using PVC
  - Kubernetes 301
    Kubernetes 301
    
    Deployments, StatefulSets, DaemonSets
    
    Part 1: Using StatefulSets
    
    Part 2: Using DaemonSets
  - Kubernetes 401
    Kubernetes 401
    
    Part 1: Using Port-Forward
- Multi-tenancy
  Multi-tenancy
- OpenShift
  OpenShift
- Policy Management
  Policy Management
  - OPA Gatekeeper
    OPA Gatekeeper
    
    Overview
    
    Part 1: Setup
    
    Part 2: Policy
    
    Part 3: Blueprint
    
    Part 4: Workload
  - Turnkey OPA Policies
    Turnkey OPA Policies
    
    Overview
    
    Part 1: Setup
    
    Part 2: Apply
    
    Part 3: Test
- Troubleshooting
  Troubleshooting
  - Workloads
    Workloads
    
    Overview
    
    Scenario 1: Misconfigured Requests
    
    Scenario 2: Incorrect Container Image
- Upstream MKS
  Upstream MKS
  - Home
  - Backup/Restore
    Backup/Restore
    
    Overview
    
    Part 1: Setup Environment
    
    Part 2: Create Resources
    
    Part 3: Backup/Restore
  - Cluster Lifecycle
    Cluster Lifecycle
    
    Overview
    
    Part 1: Provision
    
    Part 2: Scale
    
    Part 3: Upgrade
    
    Part 4: Deprovision
  - Cluster with Cilium and Hubble Config
    Cluster with Cilium and Hubble Config
    
    Overview
    
    Setup
    
    Provision
  - GPU
    GPU
    
    Overview
    
    Part 1: Setup
    
    Part 2: Blueprint
    
    Part 3: Workload
  - Managed Storage
    Managed Storage
    
    Overview
    
    Part 1: Setup
    
    Part 2: Blueprint
    
    Part 3: Utilize
    
    Part 4: Expand
  - Standard Operating Model
    Standard Operating Model
    
    Overview
    
    Part 1: Setup
    
    Part 2: Provision
    
    Part 3: Deprovision
  - Windows
    Windows
    
    Overview
    
    Part 1: Provision
    
    Part 2: Workload
    
    Part 3: Deprovision
- Virtual Machines
  Virtual Machines
  - KubeVirt
    KubeVirt
    
    Overview
    
    Part 1: Setup
    
    Part 2: Blueprint
    
    Part 3: Deploy VM
- Zero Trust Kubectl
  Zero Trust Kubectl
Integrations
Solutions
Solutions
- Solutions
- Contributors
- AI/ML
  AI/ML
  - Overview
  - Nvidia DPU
    Nvidia DPU
    
    Overview
  - K8sGPT
    K8sGPT
    
    Overview
    
    Configure
    
    Test
  - Kuberay
    Kuberay
    
    Overview
    
    Configure
    
    Test
- AlertManager
  AlertManager
- Autoscaling
  Autoscaling
  - Intro to KEDA
  - Setup
  - Airflow
    Airflow
    
    Setup
  - Kafka
    Kafka
    
    Setup
    
    Best Practices
- Backup
  Backup
  - CloudCasa
  - Velero
    Velero
    
    Overview
    
    Credentials - IAM Role
    
    Credentials - IAM User
    
    Credentials - MinIO
    
    Use Velero
- Cost Management
  Cost Management
  - Overview
  - Kubecost
  - StormForge
    StormForge
    
    Overview
    
    Configure
- Cert-Manager
  Cert-Manager
- Databases
  Databases
  - Redis
  - InfluxDB
- Developer Self-Service
  Developer Self-Service
  - Backstage
  - Vclusters
- Edge
  Edge
  - Zededa
    Zededa
    
    Overview
    
    Import Cluster
    
    Provision Cluster
- Functions
  Functions
  - Overview
  - Knative
- Governance
  Governance
  - OPA Gatekeeper
    OPA Gatekeeper
    
    Overview
    
    Policies
    
    Examples
    Examples
    
    Container without limits configured
    
    Container without probes configured
    
    Pull container images from only ECR registry
    
    Unique Service Selector
    
    Unique Ingress Host
    
    Run Containers only with selective users
  - Kyverno
    Kyverno
    
    Overview
    
    Policies
- GPU
  GPU
  - Nvidia GPU Operator
    Nvidia GPU Operator
    
    Overview
    
    Install
    
    Test GPU
  - GPU Simulator
    GPU Simulator
    
    Install
  - Sharing
    Sharing
    
    MIG Mixed
    
    MIG Single
    
    Time Slicing
- Ingress
  Ingress
  - ALB
    ALB
    
    Overview
    
    Create
    
    Configure
    
    Access
  - Ambassador
  - Citrix
  - Kong
    Kong
    
    Install Kong
    
    Enable Monitoring
    
    Enable Logging
    
    Sample Application
  - NGINX
    NGINX
    
    Overview
    
    Create Blueprint
    
    Test Workload
  - ngrok
  - Traefik
- Load Balancer
  Load Balancer
  - MetalLB
    MetalLB
    
    Overview
    
    Create
    
    Configure
    
    Access
  - Cilium
    Cilium
    
    Setup & Use
- Logging
  Logging
  - CloudWatch
  - OpenSearch
    OpenSearch
    
    Overview
    
    Create
    
    Configure
    
    Access
  - Splunk
  - Splunk Otel Collector
  - Sumologic
  - New Relic
- Monitoring
  Monitoring
  - Amazon Prometheus
    Amazon Prometheus
    
    Overview
    
    Create
    
    Configure
    
    Access
  - CloudWatch
  - Datadog Agent
  - Dynatrace
  - Grafana
  - New Relic
  - OpsVerse Agent
  - Kube Prometheus Stack
    Kube Prometheus Stack
    
    Overview
    
    Configure
    
    Access
  - Splunk Connect
  - Splunk Otel Collector
- Network Policy
  Network Policy
  - Overview
  - Calico
    Calico
    
    Install
    
    Test
  - Cilium
    Cilium
    
    AWS CNI
    
    Azure Overlay CNI
- Secrets
  Secrets
  - AWS Secrets Manager
    AWS Secrets Manager
    
    Overview
    
    Create
    
    Configure
    
    Access
  - External Secrets
    External Secrets
    
    Overview
    
    Create
    
    Configure
    
    Access
  - Hashicorp Vault
    Hashicorp Vault
    
    Overview
    
    Create
    
    Configure
    
    Access
  - Sealed Secrets
- Security
  Security
  - Araali
  - Kube-bench
  - Trivy
- Service Mesh
  Service Mesh
  - Istio
    Istio
    
    Overview
    
    Use Istio
  - Linkerd
    Linkerd
    
    Overview
    
    Use Linkerd
- Storage
  Storage
  - MinIO
  - Ondat
  - Portworx
  - Rook Ceph
    Rook Ceph
    
    Overview
    
    Install
    
    Utilize
    
    Monitor
    
    Expand
- Tracing
  Tracing
  - OpenTelemetry
    OpenTelemetry
    
    Overview
    
    Configure
    
    Test
- Troubleshooting
  Troubleshooting
  - Sosivio
Open Source
Open Source
- Open Source Projects
Use Cases
Use Cases
- Common Use Cases
- Cost Optimization
  Cost Optimization
  - Granular Cost Visibility & Chargebacks
- Environment and Resource Provisioning
  Environment and Resource Provisioning
  - Standardized Resource Creation for Developers
  - Cloud Landing Zone Management
- Kubernetes Lifecycle Management
  Kubernetes Lifecycle Management
- Migration from Other Platforms to Rafay
  Migration from Other Platforms to Rafay
  - Mirantis to Rafay Migration
  - Rancher to Rafay Migration
- Platform-as-a-Service Offerings
  Platform-as-a-Service Offerings
  - Managed Kubernetes Service for Customer Sites
- Multi-Tenant Infrastructure & Tooling
  Multi-Tenant Infrastructure & Tooling
  - Multi-Tenant Self-Service Clusters
- Standardization and Governance
  Standardization and Governance
Catalog
Catalog
- Overview
- Introduction
- Kubernetes Clusters
  Kubernetes Clusters
  - Overview
  - GKE
    GKE
    
    Overview
    
    Get Started
  - Bare Metal & VM
    Bare Metal & VM
    
    Overview
    
    MKS Cluster on Nutanix
    MKS Cluster on Nutanix
    
    Overview
    
    Get Started
  - VMware vSphere
    VMware vSphere
    
    Overview
    
    Introduction
    
    Get Started
  - Amazon EKS
    Amazon EKS
    
    Overview
    
    Get Started
    
    EKS IAM Permissions
    
    Config Examples
    
    Add-on IAM Permissions
- Virtual Clusters
  Virtual Clusters
  - Overview
  - Get Started
- Namespace as a Service
  Namespace as a Service
  - Overview
  - Get Started
- Virtual Machines
  Virtual Machines
  - Overview
  - Benefits
  - SSH KeyGen
  - VMware vSphere
    VMware vSphere
    
    Overview
    
    Administration
    
    End User
- ServiceNow Approval
  ServiceNow Approval
  - Overview
  - Get Started
- JIRA Approval
  JIRA Approval
  - Overview
  - Get Started
Releases
Releases
- Releases and Public Roadmap
- Release Info-SaaS
- Release Info-GPU PaaS
- Production-SaaS
  Production-SaaS
  - 2025
    2025
    
    Nov
    
    Oct
    
    Sept
    
    Aug
    
    July
    
    June
    
    May
    
    Apr
    
    Mar
    
    Feb
    
    Jan
  - 2024
    2024
    
    Dec
    
    Nov
    
    Oct
    
    Sept
    
    Aug
    
    July
    
    June
    
    May
    
    Apr
    
    Mar
    
    Feb
    
    Jan
  - 2023
    2023
    
    Dec
    
    Nov
    
    Oct
    
    Sept
    
    Aug
    
    July
    
    June
    
    May
    
    Apr
    
    Mar
    
    Feb
    
    Jan
  - 2022
  - 2021
  - 2020
  - 2019
- GPU PaaS
  GPU PaaS
  - 2025
    2025
    
    May
    
    June
    
    Jul
    
    Jul
    
    Aug
    
    Sep
    
    Oct
    
    Nov
    
    Dec
- Self Hosted Controller
  Self Hosted Controller
  - 2025
    2025
    
    July
    
    Apr
    
    Feb
    
    Jan
  - 2024
    2024
    
    Nov
    
    Aug
    
    July
    
    June
    
    Apr
- Preview-SaaS
  Preview-SaaS
  - Overview
  - Upcoming
Blog
Blog
Contact
Contact
Sign In ↗
Website ↗

Requirements

This section captures the prerequisites that are required to be in place before you can deploy and operate the Serverless Inference offering.

Rafay Control Plane¶

The Operations Console in the Rafay Controller is where the administrator will configure/deploy models, manage their lifecycle and share the models with tenant orgs. This is also where usage (token counts) is aggregated and persisted for billing etc. The serverless inference components can be installed as an add-on to the customer’s existing Rafay Controller deployment.

Info

To ensure compatibility, the Rafay controller version needs to be v3.1-36 or higher.

Data Plane for Serverless Inferencing¶

The data plane is where the actual inference requests from users are handled and processed. This will be a number of GPU servers running in a datacenter. We will deploy Rafay MKS (Upstream Kubernetes) on the servers and will act as the substrate for the data plane software components of the Serverless Inference solution.

Kubernetes Master¶

For a HA deployment, the k8s control plane needs to comprise at least 3-CPU nodes (8 CPU, 16G Memory, 100G RAW storage each)

Worker Nodes¶

These will be GPU servers that will be converted into Kubernetes worker nodes. The number of worker nodes depends on desired LLMs and expected scale at which the operator wishes to deploy and operate their service.

Info

Another consideration is whether some models need to be deployed as dedicated endpoints for specific customers/tenants.

Operating System¶

Ensure that the bare metal servers (nodes) are installed with standard 64-bit Ubuntu 24.04 LTS.

Important

Please do not install any GPU drivers on the Linux server. These will be automatically installed and configured via the GPU Operator.

Networking¶

The cluster's nodes (control plane and worker nodes) need to interact with each other over a local network. Please ensure that all the servers can communicate with each other over all ports via a local high speed network.

Important

It is not recommended to deploy firewall or proxies between these nodes because they will significantly impact performance and latency of the end user facing service.

Internet Connectivity¶

For providers planning to provide serverless inferencing service over the Internet to customers, please ensure that all worker nodes have access to the Internet on port 443.

Local Object Storage¶

Ensure the GPU servers are configured to have network access to low latency, S3 compatible object storage. The size/capacity will depend on size and number of LLMs to be deployed. As a guideline, >2 TB storage with the ability to expand later is a good starting point.

Note

The solution can also be optionally configured to "dynamically download and cache" the model's weights from repos such as HuggingFace and Nvidia's NGC. This option is not recommended because it can be error prone for large models. Admins are strongly recommended to download and host the model in their local storage namespace backed by high speed storage.

Load Balancer¶

All user requests will be serviced via a load balancer (MetalLB or alternative) on port 443.

Public IP Pool¶

All inference endpoints will need at least one public IP (3 preferred) so that they can serve multiple models from the same, unified endpoint.

TLS Certificates¶

The endpoints serving the serverless inferencing offering will terminate on https. They will require trusted TLS certficates for the domain the endpoint is served on.