Configure
In this section, you will create a standardized cluster blueprint with the kuberay-operator add-on. You can then reuse this blueprint with all your clusters.
Step 1: Create Namespace¶
In this step, you will create a namespace for the Kuberay Operator.
- Navigate to a project in your Org
 - Select Infrastructure -> Namespaces
 - Click New Namespace
 - Enter the name kuberay
 - Select wizard for the type
 - Click Save
 - Click Discard Changes & Exit
 
Step 2: Create Repository¶
In this step, you will create a repository in your project so that the controller can retrieve the kuberay Operator Helm chart automatically.
- Select Integrations -> Repositories
 - Click New Repository
 - Enter the name kuberay
 - Select Helm for the type
 - Click Create
 - Enter https://ray-project.github.io/kuberay-helm/ for the endpoint
 - Click Save
 
Optionally, you can click on the validate button on the repo to confirm connectivity.
Step 3: Create kuberay-operator Addon¶
In this step, you will create a custom add-on for the kuberay-operator that will pull the Helm chart from the previously created repository. This add-on will be added to a custom cluster blueprint in a later step.
- Select Infrastructure -> Add-Ons
 - Click New Add-On -> Create New Add-On
 - Enter the name kuberay-operator
 - Select Helm 3 for the type
 - Selct Pull files from repository
 - Select Helm for the repository type
 - Select kuberay for the namespace
 - 
Click Create
 - 
Click New Version
 - Enter a version name
 - Select the previously created repository
 - Enter kuberay-operator for the chart name
 - Click Save Changes
 
Step 4: Create Blueprint¶
In this step, you will create a custom cluster blueprint that contains the kuberay-operator add-on that is previously created. The cluster blueprint can be applied to one or multiple clusters.
- Select Infrastructure -> Blueprints
 - Click New Blueprint
 - Enter the name kuberay
 - Click Save
 - Enter a version name
 - Select Minimal for the base blueprint
 - In the add-ons section, click Configure Add-Ons
 - Click the + symbol next to the previously created add-ons to add them to the blueprint
 - Click Save Changes
 
Step 5: Apply Blueprint¶
In this step, you will apply the previously created cluster blueprint to an existing cluster. The blueprint will deploy the kuberay-operator add-ons to the cluster.
- Select Infrastructure -> Clusters
 - Click the gear icon on the cluster card -> Update Blueprint
 - Select the previously created kuberay blueprint and version
 - Click Save and Publish
 
The controller will publish and reconcile the blueprint on the target cluster. This can take a few seconds to complete.
Step 6: Create RayLLM Workload¶
In this step, you will create a workload for the RayLLM.
- Save the below YAML to a file named ray-service-llm.yaml
 
apiVersion: ray.io/v1
kind: RayService
metadata:
  name: rayllm
spec:
  serviceUnhealthySecondThreshold: 1200
  deploymentUnhealthySecondThreshold: 1200
  serveConfigV2: |
      applications:
      - name: router
        import_path: rayllm.backend:router_application
        route_prefix: /
        args:
          models:
            - ./models/continuous_batching/amazon--LightGPT.yaml
            - ./models/continuous_batching/OpenAssistant--falcon-7b-sft-top1-696.yaml
  rayClusterConfig:
    headGroupSpec:
      rayStartParams:
        resources: '"{\"accelerator_type_cpu\": 2}"'
        dashboard-host: '0.0.0.0'
      template:
        spec:
          containers:
          - name: ray-head
            image: anyscale/ray-llm:latest
            resources:
              limits:
                cpu: 2
                memory: 8Gi
              requests:
                cpu: 2
                memory: 8Gi
            ports:
            - containerPort: 6379
              name: gcs-server
            - containerPort: 8265
              name: dashboard
            - containerPort: 10001
              name: client
            - containerPort: 8000
              name: serve
    workerGroupSpecs:
    - replicas: 1
      minReplicas: 0
      maxReplicas: 1
      groupName: gpu-group
      rayStartParams:
        resources: '"{\"accelerator_type_cpu\": 1, \"accelerator_type_a10\": 1, \"accelerator_type_a100_80g\": 1}"'
      template:
        spec:
          containers:
          - name: llm
            image: anyscale/ray-llm:latest
            lifecycle:
              preStop:
                exec:
                  command: ["/bin/sh","-c","ray stop"]
            resources:
              limits:
                cpu: "48"
                memory: "192G"
                nvidia.com/gpu: 1
              requests:
                cpu: "1"
                memory: "1G"
                nvidia.com/gpu: 1
            ports:
            - containerPort: 8000
              name: serve
          tolerations:
            - key: "ray.io/node-type"
              operator: "Equal"
              value: "gpu"
              effect: "NoSchedule"
Note
Make sure to have the taint ray.io/node-type:gpu on the gpu nodes in the cluster.
- Select Applications -> Workloads
 - Click New Workload -> Create New Workload
 - Enter the name kuberay-llm
 - Select K8s YAML for the type
 - Select Upload files manually
 - Select kuberay for the namespace
 - Click Create
 - Click Upload and select the previously saved ray-service-llm.yaml file
 - Click Save And Goto Placement
 - Select the cluster in the placement page
 - Click Save And Goto Publish
 - Click Publish
 - Wait for the pods to come up and rayllm-serve-svc to be created
 
Next Step¶
At this point, you have done everything required to get kuberay-operator installed and operational on your cluster along with RayLLM. In the next step, we will query the models deployed using RayLLM.
