Intermediate

In this guide you will review usage metrics at the operator level by deploying a load generator to populate usage metrics within your existing Token Factory deployment.

Assumptions¶

This exercise assumes you have completed the Token Factory Basics Get Started Guide

1. Create Load Generator¶

In this section, you will create a GenAI load generator on your existing Kubernetes cluster node.

SSH into the Kubernetes Cluster node
Run the following commands to install the load testing tool hey

sudo apt update
sudo apt install hey -y

Verify the installation be running the following command to see the version of "hey"

hey -v

Create the load test script by running the following command

vi run_load_test.sh

Add the following content to the script and save it. Be sure to update the API Key and the Endpoint URL. These values can be found within the model used in the previous Basics Get Started guide.

#!/bin/bash
PATH=/usr/bin

API_KEY="<YOUR_API_KEY>"
ENDPOINT_URL="<YOUR_ENDPOINT_URL>"
MODEL_NAME="gs-deployment"
PROMPT="What is the best opensource inference library in general?"

hey -n 300 -c 15 -t 90 -m POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d "{\"model\": \"$MODEL_NAME\", \"temperature\": 0.1, \"max_tokens\": 128, \"messages\": [{\"role\": \"user\", \"content\": \"$PROMPT\"}]}" \
  "ENDPOINT_URL"

Run the following command to make the file executable

chmod +x run_load_test.sh

Run the following command to setup a cronJob to run the script

crontab -e

Add the following line to the crontab to run the script every 2 minutes. Be sure to update the Script Path in the command

*/2 * * * * /<SCRIPT_PATH>/run_load_test.sh >> /var/log/loadtest.log 2>&1

After 2 minutes, run the following command to verify the script is working

tail -f /var/log/loadtest.log

You will see output similar to the following:

  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0004 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0006 secs
  resp wait:    1.6844 secs, 0.7029 secs, 2.4664 secs
  resp read:    0.0000 secs, 0.0000 secs, 0.0004 secs

Status code distribution:
  [200] 300 responses

2. View Metrics¶

Next, you will use the Ops console to view the Token Factory usage metrics across models and tenants.

In the Ops console, navigate to GenAI -> Token Usage

You will see the Overview Dashboard

Click on the Token Usage tab to see the token usage metrics

Click on the Model Analytics tab to see the model analytics