System GKE Template
Overview¶
This system template allows you to configure, templatize, and provision a GKE cluster using GCP’s native IaC OpenTofu provider. The templates are designed to support both Day 0 (initial setup) and Day 2 (ongoing management and maintenance) operations.
The template will enable users to provision & manage the lifecycle of GKE, and the add-ons defined in cluster blueprints. As part of the template output, the end user is provided with a kubeconfig file that includes cluster-wide privileges and enables secure access to the cluster.
Intial Setup¶
The platform team is responsible for performing the initial configuration and setup of the GKE template. The sequence diagram below outlines the high-level steps. In this process, the platform team will configure and share the template from the system catalog to the project they manage and then share the template downstream with the end user.
sequenceDiagram
participant Admin as Platform Admin
participant Catalog as System Catalog
participant Project as End User Project
Admin->>Catalog: Selects GKE Cluster Template from System Catalog
Admin->>Project: Shares Template with Predefined Controls
Project-->>Admin: Template Available in End User's Project
End User Flow¶
The end user launches a shared template, provides required input values, and deploys the cluster.
sequenceDiagram
participant User as End User
participant Project as Rafay Project
participant Cluster as GCP Infra
User->>Project: Launches Shared Template for GKE
User->>Project: Provides Required Input Values (API Key, GCP Service Account Details)
User->>Project: Clicks "Deploy"
Project->>Cluster: Provisions a GKE Cluster on GCP Infra
Cluster-->>User: Cluster Deployed Successfully
The templates are designed to support both:
- Day 0 operations: Initial setup
- Day 2 operations: Ongoing management
Resources:¶
This system template will deploy the following resources:
- GKE Cluster on the GCP Infrastructure.
Pre-Requisites¶
-
GCP Credentials:
- Ensure necessary permissions to create and manage GCP resources.
- Refer to the required IAM roles listed here.
- Service Account JSON credentials (
gcp-credentials.json
) must be provided at the time of launching the template.
-
Rafay Configuration:
At template launch, supply the following configuration values:
- API_KEY: The API key for the Rafay controller.
- gcp-credentials.json: The GCP service account JSON file.
-
Agent Configuration:
An agent must be configured in the project where the template will be used.Follow these instructions to deploy an agent. Existing agents can also be reused.
Input Variables for GKE System Template¶
Name | Default Value | Value Type | Description |
---|---|---|---|
Logging components | [] |
HCL |
List of services to monitor: SYSTEM_COMPONENTS, APISERVER, CONTROLLER_MANAGER, SCHEDULER, and WORKLOADS. Empty list disables logging. |
Enable Cloud TPU | false |
HCL |
Enable Cloud TPU resources in the cluster. WARNING: changing this after cluster creation is destructive! |
GCP Project ID of shared VPC's host | Text |
The project ID of the shared VPC's host (for shared VPC support). | |
Default max pods per node | 110 |
HCL |
The maximum number of pods to schedule per node. |
Datapath provider | DATAPATH_PROVIDER_UNSPECIFIED |
Text |
Allowed: [DATAPATH_PROVIDER_UNSPECIFIED, LEGACY_DATAPATH, ADVANCED_DATAPATH]. Sets the desired datapath provider for this cluster. |
Node pools | [ { name="default-node-pool", ... } ] |
HCL |
List of maps containing node pool configurations (e.g., machine type, disk size, autoscaling settings). |
Use control plane's external IP | true |
HCL |
When false , the cluster's private endpoint is used, and access through the public endpoint is disabled. |
Disable default SNAT | false |
Text |
Whether to disable the default SNAT for private use of public IP addresses. |
Enable legacy authorization | false |
HCL |
Enable the ABAC authorizer. Provides static permissions beyond those in RBAC or IAM. Defaults to false . |
Services secondary range name | null |
HCL |
The name of the secondary subnet IP range for services. |
Enable network policy addon | false |
HCL |
Enable the network policy addon. |
Enable Dataplane V2 metrics | false |
HCL |
Whether advanced datapath metrics are enabled. |
Issue a client certificate | false |
HCL |
Issues a client certificate for authentication. Changing this after cluster creation is destructive. |
Network name | default |
String |
The VPC network to host the cluster. |
Services IP address range | null |
String |
IP address range for services IPs. Can be blank, netmask (e.g., /14 ), or CIDR (e.g., 10.96.0.0/14 ). |
Enable private cluster | false |
Boolean |
Whether to enable private cluster network access. |
Enable binary authorization | false |
Boolean |
Enable Binary Authorization Admission controller. |
Network policy provider | CALICO |
String |
Allowed: [PROVIDER_UNSPECIFIED, CALICO]. Sets the network policy provider for the cluster. |
Authorized networks | [] |
HCL |
List of master authorized networks in CIDR format. |
Enable Backup for GKE | false |
HCL |
Enable Backup for GKE agent in the cluster. |
Gateway API Channel | CHANNEL_DISABLED |
Text |
Allowed: [CHANNEL_DISABLED, CHANNEL_STANDARD]. Configures the gateway API channel for the cluster. |
Private Endpoint Subnetwork | null |
HCL |
Subnetwork for master's private endpoint. |
Pods IP address range (route-based) | null |
String |
IP address range for Kubernetes pods. Defaults to an automatically assigned CIDR. |
Enable Compute Engine Persistent Disk CSI Driver | true |
Boolean |
Whether to enable the Google Compute Engine Persistent Disk Container Storage Interface (CSI) Driver. |
Blueprint Version | latest |
String |
Version of the blueprint assigned to the cluster. Use latest for system blueprints. |
Enable Maintenance window | false |
Boolean |
Whether to enable maintenance windows. |
Kubernetes version | 1.30 |
String |
Allowed: [1.28, 1.29, 1.30, 1.31]. Sets the Kubernetes version for the cluster. |
Pods secondary range name | null |
String |
Name of the secondary subnet IP range for pods. |
Maintenance exclusions | [] |
List |
List of maintenance exclusions with start/end times and exclusion scopes. |
Enable Intranode visibility | false |
Boolean |
Enable intra-node visibility for VPC network traffic. |
Ray Operator Config | { enabled=false, ... } |
HCL |
Configuration for the Ray Operator Addon. |
Regional cluster | true |
HCL |
Whether the cluster is regional. Setting to false creates a zonal cluster. |
Region | us-central1 |
String |
The region to host the cluster. |
Rafay project name | $(environment.project.name)$ |
Expressions |
Name of the Rafay project. Defaults to the environment name. |
Cluster description | Rafay managed cluster |
String |
Description of the cluster. |
Cloud monitoring components | [] |
HCL |
List of services to monitor in GCP. |
Google Groups for RBAC | null |
String |
Name of the RBAC security group for use with Google security groups in Kubernetes RBAC. |
Cluster Name | $(environment.name)$ |
Expressions |
Name of the cluster. Defaults to the environment name. |
Subnetwork name | default |
Text |
The subnetwork to host the cluster. |
GCP Project ID | dev-382813 |
Text |
The project ID to host the cluster (required). |
Enable Secret manager | false |
HCL |
Enable the Secret Manager add-on for this cluster. |
Maintenance Recurrence | FREQ=WEEKLY;BYDAY=MO,... |
Text |
Recurrence frequency for maintenance windows in RFC5545 format. |
Firewall Rules | [ { name="my-custom-rule", ... } ] |
HCL |
Firewall rules to be created for clusters. |
Maintenance start time | 05:00 |
Text |
Start time for daily or recurring maintenance operations (RFC3339 format). |
Rest Endpoint | console.rafay.dev |
String |
Select the endpoint of the controller. |
API Key | String |
Enter the API key of the controller. | |
gcp-credentials.json | String |
Provide the cloud credentials for creating the GKE cluster. |
Launch Time¶
The estimated time to launch an GKE cluster using this template is approximately 15 to 20 minutes.