Skip to content

System GKE Template

Overview

This system template allows you to configure, templatize, and provision a GKE cluster using GCP’s native IaC OpenTofu provider. The templates are designed to support both Day 0 (initial setup) and Day 2 (ongoing management and maintenance) operations.

The template will enable users to provision & manage the lifecycle of GKE, and the add-ons defined in cluster blueprints. As part of the template output, the end user is provided with a kubeconfig file that includes cluster-wide privileges and enables secure access to the cluster.

Intial Setup

The platform team is responsible for performing the initial configuration and setup of the GKE template. The sequence diagram below outlines the high-level steps. In this process, the platform team will configure and share the template from the system catalog to the project they manage and then share the template downstream with the end user.

sequenceDiagram
    participant Admin as Platform Admin
    participant Catalog as System Catalog
    participant Project as End User Project

    Admin->>Catalog: Selects GKE Cluster Template from System Catalog
    Admin->>Project: Shares Template with Predefined Controls
    Project-->>Admin: Template Available in End User's Project

End User Flow

The end user launches a shared template, provides required input values, and deploys the cluster.

sequenceDiagram
    participant User as End User
    participant Project as Rafay Project
    participant Cluster as GCP Infra

    User->>Project: Launches Shared Template for GKE
    User->>Project: Provides Required Input Values (API Key, GCP Service Account Details)
    User->>Project: Clicks "Deploy"
    Project->>Cluster: Provisions a GKE Cluster on GCP Infra
    Cluster-->>User: Cluster Deployed Successfully

The templates are designed to support both:

  • Day 0 operations: Initial setup
  • Day 2 operations: Ongoing management

Resources:

This system template will deploy the following resources:

  • GKE Cluster on the GCP Infrastructure.

Pre-Requisites

  1. GCP Credentials:

    • Ensure necessary permissions to create and manage GCP resources.
    • Refer to the required IAM roles listed here.
    • Service Account JSON credentials (gcp-credentials.json) must be provided at the time of launching the template.
  2. Rafay Configuration:

    At template launch, supply the following configuration values:

    • API_KEY: The API key for the Rafay controller.
    • gcp-credentials.json: The GCP service account JSON file.

    Configuration

  3. Agent Configuration:

    An agent must be configured in the project where the template will be used.Follow these instructions to deploy an agent. Existing agents can also be reused.


Input Variables for GKE System Template

Name Default Value Value Type Description
Logging components [] HCL List of services to monitor: SYSTEM_COMPONENTS, APISERVER, CONTROLLER_MANAGER, SCHEDULER, and WORKLOADS. Empty list disables logging.
Enable Cloud TPU false HCL Enable Cloud TPU resources in the cluster. WARNING: changing this after cluster creation is destructive!
GCP Project ID of shared VPC's host Text The project ID of the shared VPC's host (for shared VPC support).
Default max pods per node 110 HCL The maximum number of pods to schedule per node.
Datapath provider DATAPATH_PROVIDER_UNSPECIFIED Text Allowed: [DATAPATH_PROVIDER_UNSPECIFIED, LEGACY_DATAPATH, ADVANCED_DATAPATH]. Sets the desired datapath provider for this cluster.
Node pools [ { name="default-node-pool", ... } ] HCL List of maps containing node pool configurations (e.g., machine type, disk size, autoscaling settings).
Use control plane's external IP true HCL When false, the cluster's private endpoint is used, and access through the public endpoint is disabled.
Disable default SNAT false Text Whether to disable the default SNAT for private use of public IP addresses.
Enable legacy authorization false HCL Enable the ABAC authorizer. Provides static permissions beyond those in RBAC or IAM. Defaults to false.
Services secondary range name null HCL The name of the secondary subnet IP range for services.
Enable network policy addon false HCL Enable the network policy addon.
Enable Dataplane V2 metrics false HCL Whether advanced datapath metrics are enabled.
Issue a client certificate false HCL Issues a client certificate for authentication. Changing this after cluster creation is destructive.
Network name default String The VPC network to host the cluster.
Services IP address range null String IP address range for services IPs. Can be blank, netmask (e.g., /14), or CIDR (e.g., 10.96.0.0/14).
Enable private cluster false Boolean Whether to enable private cluster network access.
Enable binary authorization false Boolean Enable Binary Authorization Admission controller.
Network policy provider CALICO String Allowed: [PROVIDER_UNSPECIFIED, CALICO]. Sets the network policy provider for the cluster.
Authorized networks [] HCL List of master authorized networks in CIDR format.
Enable Backup for GKE false HCL Enable Backup for GKE agent in the cluster.
Gateway API Channel CHANNEL_DISABLED Text Allowed: [CHANNEL_DISABLED, CHANNEL_STANDARD]. Configures the gateway API channel for the cluster.
Private Endpoint Subnetwork null HCL Subnetwork for master's private endpoint.
Pods IP address range (route-based) null String IP address range for Kubernetes pods. Defaults to an automatically assigned CIDR.
Enable Compute Engine Persistent Disk CSI Driver true Boolean Whether to enable the Google Compute Engine Persistent Disk Container Storage Interface (CSI) Driver.
Blueprint Version latest String Version of the blueprint assigned to the cluster. Use latest for system blueprints.
Enable Maintenance window false Boolean Whether to enable maintenance windows.
Kubernetes version 1.30 String Allowed: [1.28, 1.29, 1.30, 1.31]. Sets the Kubernetes version for the cluster.
Pods secondary range name null String Name of the secondary subnet IP range for pods.
Maintenance exclusions [] List List of maintenance exclusions with start/end times and exclusion scopes.
Enable Intranode visibility false Boolean Enable intra-node visibility for VPC network traffic.
Ray Operator Config { enabled=false, ... } HCL Configuration for the Ray Operator Addon.
Regional cluster true HCL Whether the cluster is regional. Setting to false creates a zonal cluster.
Region us-central1 String The region to host the cluster.
Rafay project name $(environment.project.name)$ Expressions Name of the Rafay project. Defaults to the environment name.
Cluster description Rafay managed cluster String Description of the cluster.
Cloud monitoring components [] HCL List of services to monitor in GCP.
Google Groups for RBAC null String Name of the RBAC security group for use with Google security groups in Kubernetes RBAC.
Cluster Name $(environment.name)$ Expressions Name of the cluster. Defaults to the environment name.
Subnetwork name default Text The subnetwork to host the cluster.
GCP Project ID dev-382813 Text The project ID to host the cluster (required).
Enable Secret manager false HCL Enable the Secret Manager add-on for this cluster.
Maintenance Recurrence FREQ=WEEKLY;BYDAY=MO,... Text Recurrence frequency for maintenance windows in RFC5545 format.
Firewall Rules [ { name="my-custom-rule", ... } ] HCL Firewall rules to be created for clusters.
Maintenance start time 05:00 Text Start time for daily or recurring maintenance operations (RFC3339 format).
Rest Endpoint console.rafay.dev String Select the endpoint of the controller.
API Key String Enter the API key of the controller.
gcp-credentials.json String Provide the cloud credentials for creating the GKE cluster.

Launch Time

The estimated time to launch an GKE cluster using this template is approximately 15 to 20 minutes.