Configure

In this section, you will create a standardized cluster blueprint with the kuberay-operator add-on. You can then reuse this blueprint with all your clusters.

Step 1: Create Namespace¶

In this step, you will create a namespace for the Kuberay Operator.

Navigate to a project in your Org
Select Infrastructure -> Namespaces
Click New Namespace
Enter the name kuberay
Select wizard for the type
Click Save
Click Discard Changes & Exit

Step 2: Create Repository¶

In this step, you will create a repository in your project so that the controller can retrieve the kuberay Operator Helm chart automatically.

Select Integrations -> Repositories
Click New Repository
Enter the name kuberay
Select Helm for the type
Click Create
Enter https://ray-project.github.io/kuberay-helm/ for the endpoint
Click Save

Optionally, you can click on the validate button on the repo to confirm connectivity.

Step 3: Create kuberay-operator Addon¶

In this step, you will create a custom add-on for the kuberay-operator that will pull the Helm chart from the previously created repository. This add-on will be added to a custom cluster blueprint in a later step.

Select Infrastructure -> Add-Ons
Click New Add-On -> Create New Add-On
Enter the name kuberay-operator
Select Helm 3 for the type
Selct Pull files from repository
Select Helm for the repository type
Select kuberay for the namespace
Click Create
Click New Version
Enter a version name
Select the previously created repository
Enter kuberay-operator for the chart name
Click Save Changes

Step 4: Create Blueprint¶

In this step, you will create a custom cluster blueprint that contains the kuberay-operator add-on that is previously created. The cluster blueprint can be applied to one or multiple clusters.

Select Infrastructure -> Blueprints
Click New Blueprint
Enter the name kuberay
Click Save
Enter a version name
Select Minimal for the base blueprint
In the add-ons section, click Configure Add-Ons
Click the + symbol next to the previously created add-ons to add them to the blueprint
Click Save Changes

Step 5: Apply Blueprint¶

In this step, you will apply the previously created cluster blueprint to an existing cluster. The blueprint will deploy the kuberay-operator add-ons to the cluster.

Select Infrastructure -> Clusters
Click the gear icon on the cluster card -> Update Blueprint
Select the previously created kuberay blueprint and version
Click Save and Publish

The controller will publish and reconcile the blueprint on the target cluster. This can take a few seconds to complete.

Step 6: Create RayLLM Workload¶

In this step, you will create a workload for the RayLLM.

Save the below YAML to a file named ray-service-llm.yaml

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: rayllm
spec:
  serviceUnhealthySecondThreshold: 1200
  deploymentUnhealthySecondThreshold: 1200
  serveConfigV2: |
      applications:
      - name: router
        import_path: rayllm.backend:router_application
        route_prefix: /
        args:
          models:
            - ./models/continuous_batching/amazon--LightGPT.yaml
            - ./models/continuous_batching/OpenAssistant--falcon-7b-sft-top1-696.yaml
  rayClusterConfig:
    headGroupSpec:
      rayStartParams:
        resources: '"{\"accelerator_type_cpu\": 2}"'
        dashboard-host: '0.0.0.0'
      template:
        spec:
          containers:
          - name: ray-head
            image: anyscale/ray-llm:latest
            resources:
              limits:
                cpu: 2
                memory: 8Gi
              requests:
                cpu: 2
                memory: 8Gi
            ports:
            - containerPort: 6379
              name: gcs-server
            - containerPort: 8265
              name: dashboard
            - containerPort: 10001
              name: client
            - containerPort: 8000
              name: serve
    workerGroupSpecs:
    - replicas: 1
      minReplicas: 0
      maxReplicas: 1
      groupName: gpu-group
      rayStartParams:
        resources: '"{\"accelerator_type_cpu\": 1, \"accelerator_type_a10\": 1, \"accelerator_type_a100_80g\": 1}"'
      template:
        spec:
          containers:
          - name: llm
            image: anyscale/ray-llm:latest
            lifecycle:
              preStop:
                exec:
                  command: ["/bin/sh","-c","ray stop"]
            resources:
              limits:
                cpu: "48"
                memory: "192G"
                nvidia.com/gpu: 1
              requests:
                cpu: "1"
                memory: "1G"
                nvidia.com/gpu: 1
            ports:
            - containerPort: 8000
              name: serve
          tolerations:
            - key: "ray.io/node-type"
              operator: "Equal"
              value: "gpu"
              effect: "NoSchedule"

Note

Make sure to have the taint ray.io/node-type:gpu on the gpu nodes in the cluster.

Select Applications -> Workloads
Click New Workload -> Create New Workload
Enter the name kuberay-llm
Select K8s YAML for the type
Select Upload files manually
Select kuberay for the namespace
Click Create
Click Upload and select the previously saved ray-service-llm.yaml file
Click Save And Goto Placement
Select the cluster in the placement page
Click Save And Goto Publish
Click Publish
Wait for the pods to come up and rayllm-serve-svc to be created

Next Step¶

At this point, you have done everything required to get kuberay-operator installed and operational on your cluster along with RayLLM. In the next step, we will query the models deployed using RayLLM.