Update the following values in the spec file to match the correct values in your environment.
project: defaultproject
cloudprovider: azure-cc
location: centralindia
resourceGroupName: Resource-Group
apiVersion:infra.k8smgmt.io/v3kind:Clustermetadata:# The name of the clustername:demo-gpu-aks# The name of the project the cluster will be created inproject:defaultprojectspec:blueprintConfig:# The name of the blueprint the cluster will usename:default-aks# The name of the cloud credential that will be used to create the cluster cloudCredentials:azure-ccconfig:kind:aksClusterConfigmetadata:# The name of the clustername:demo-gpu-aksspec:managedCluster:apiVersion:"2022-07-01"identity:# The identity type the AKS cluster will use to access Azure resourcestype:SystemAssigned# The Azure geo-location where the resources will residelocation:centralindiaproperties:apiServerAccessProfile:# Make network traffic between the API server and node pools on a private networkenablePrivateCluster:true# DNS name prefix of the Kubernetes API server FQDNdnsPrefix:demo-gpu-aks-dns# The Kubernetes version that will be installed on the clusterkubernetesVersion:1.29.4networkProfile:loadBalancerSku:standard# Network plugin used for building the Kubernetes network. Valid values are azure, kubenet, nonenetworkPlugin:kubenetsku:# The name of a managed cluster SKUname:Basic# If not specified, the default is Free. See uptime SLA for more details. Valid values are Paid, Freetier:Freetype:Microsoft.ContainerService/managedClustersnodePools:-apiVersion:"2022-07-01"# The Azure geo-location where the node pools will residelocation:centralindia# The name of the node poolname:primaryproperties:# The desired number of nodes that can run in the node pool count:1# Whether to enable auto-scalerenableAutoScaling:true# The maximum number of nodes that can run in the node poolmaxCount:1# The maximum number of pods that can run on a nodemaxPods:110# The minimum number of nodes that can run in the node poolminCount:1mode:System# The kubernetes version that will run on the node poolorchestratorVersion:1.29.4# The operating system type that the nodes in the node pool will runosType:Linux# Valid values are VirtualMachineScaleSets, AvailabilitySettype:VirtualMachineScaleSets# The size of the VMs that the nodes will run onvmSize:Standard_NC4as_T4_v3type:Microsoft.ContainerService/managedClusters/agentPools# The resource group where the cluster will be createdresourceGroupName:Resource-GroupproxyConfig:{}type:aks
On your command line, navigate to the cluster sub folder
Type the command
rctl apply -f aks-gpu.yaml
If there are no errors, you will be presented with a "Task ID" that you can use to check progress/status. Note that this step requires creation of infrastructure in your Azure account and can take ~20-30 minutes to complete.
{
"taskset_id": "x28y6ek",
"operations": [
{
"operation": "ClusterCreation",
"resource_name": "demo-gpu-aks",
"status": "PROVISION_TASK_STATUS_PENDING"
},
{
"operation": "NodegroupCreation",
"resource_name": "primary",
"status": "PROVISION_TASK_STATUS_PENDING"
},
{
"operation": "BlueprintSync",
"resource_name": "demo-gpu-aks",
"status": "PROVISION_TASK_STATUS_PENDING"
}
],
"comments": "The status of the operations can be fetched using taskset_id",
"status": "PROVISION_TASKSET_STATUS_PENDING"
}
Navigate to the project in your Org
Click on Infrastructure -> Clusters. You should see something like the following
Congratulations! At this point, you have successfully configured and provisioned an Azure AKS cluster with a GPU node pool in your account using the RCTL CLI. You are now ready to move on to the next step where you will create a deploy a custom cluster blueprint that contains the GPU Operator as an addon.