Skip to content

Models

New Model

To create a new model, click on "New Model" and follow the workflow described below. At a high level, there are two distinct steps

  1. Create Model
  2. Create Model Deployment

General

Provide a unique name and an optional description for the model.

Create New Model


Provider

Select from the dropdown list of existing providers. Admins can also click on "Create New" to navigate to the workflow to create a new provider.

Create New Model


Configuration

Select type of repository where the model and its weights will be accessed. The following repository types are currently supported:

  1. NGC
  2. Huggingface
  3. Storage Namespace (weights downloaded and stored locally)

Repository Type-NGC

When this option is selected, the model weights and related information is downloaded during deployment from Nvidia's NGC Catalog. You need to provide the following details in order for the Rafay Platform to access NGC

  • NGC API Key (authentication)
  • Source
  • Revision

Optionally, you can also enable caching of the downloaded model to avoid having to download it repeatedly from NGC for autoscaling deployments and replicas.

NGC Configuration

Info

Ensure the data plane is configured with access to the Internet so that the worker nodes can download the model from NGC when required.


Repository Type-Hugging Face

When this option is selected, the model weights and related information is downloaded during deployment from Hugging Face. You need to provide the following details in order for the Rafay Platform to access Huggingface.

  • Hugging Face API Key (authentication)
  • Source
  • Revision

HF Configuration

Info

Ensure the data plane is configured with access to the Internet so that the worker nodes can download the model from Hugging Face when required.


Repository Type-Storage Namespace

When this option is selected, the model weights and related information is downloaded during deployment from the locally hosted Storage Namespace. Just select the required storage namespace from the dropdown list. Alternatively, admins can also initiate the workflow to create a new storage namespace right from the model configuration page.

Storage NS Configuration

Info

For this selection, the data plane does not require connectivity to the Internet. Since the worker nodes will retrieve the model and its weights from the locally hosted storage namespace, model deployments can be extremely quick compared to the other two options.


List All Models

In the Ops Console, click on GenAI and then Models. This will display the list of configured and deployed models.

List of Models


View Model Details

In the Ops Console, click on a selected model to view details. Shown below is an example of a "Llama 3.1-8b Instruct" model powered by the "NIM" Inference engine.

Model Details


Delete Model

In the Ops Console, click on the "ellipses" (3 dots on the far right) under Action for an existing model.

  • Click on "Delete" to delete the model.
  • You will be prompted to confirm deletion

Confirm Delete

Info

Deletion is not reversible. All associated infrastructure and resources will be torn down during this process.


Share Model

In the Ops Console, click on the "ellipses" (3 dots on the far right) under Action for an existing model. Now, click on "Manage Sharing" to initiate a workflow to share the model with All or Select tenant orgs.

  • By default, a newly created model is not shared with any tenant org.
  • Select "All Orgs" to make the model available to all tenant orgs under management
  • Select "Select Orgs" to make the model available to selected tenant orgs.

Sharing Model