In this release, we have added an improved cluster deletion UI experience! This enhancement aims to streamline the deletion process and provide greater clarity.
By default, the "Delete Cluster Completely" option will now be pre-selected for managed clusters. This simplifies the process for complete cluster removal.
Users still have the flexibility to choose an alternative option based on their specific cluster state. The UI will continue to display available deletion options.
This update ensures a more user-friendly and efficient cluster deletion workflow. Below is a screenshot showcasing the enhanced cluster deletion user experience:
New EKS clusters can now be provisioned based on Kubernetes v1.30. Existing clusters managed by the controller can be upgraded "in-place" to Kubernetes v1.30.
New Cluster
In-Place Upgrade
Important Note: Please refer to the important information mentioned for EKS 1.30 here before creating new clusters based on the EKS 1.30 version.
Debugging & Troubleshooting: Enhanced Cloud Messaging for EKS Provisioning¶
Further enhancements have been implemented to the provisioning logs to help users pinpoint issues causing provisioning failures.
Prior to this release, provisioning failure logs were limited to control plane and default node groups. Starting this release, cloud-init logs for Bootstrap NodeGroup will also be displayed offering deeper insights into the node initialization process. Real-time visibility into CloudFormation events for your EKS cluster during provisioning can be obtained by using:
RCTL commands: The rctl cloudevents command will allow retrieval of CloudEvents for specific infrastructure resources like clusters, node groups, and EKS managed add-ons
Swagger API: CloudEvents can be accessed through the /apis/infra.k8smgmt.io/v3/projects/defaultproject/clusters/<cluster-name>/provision/cloudevents endpoint
This enhanced monitoring capability will aid with effectively troubleshooting and diagnosing provisioning issues.
A previous release added support for Azure CNI overlay for AKS clusters via supported automation interfaces i.e RCTL CLI, Terraform, GitOps (System Sync) and Swagger API interfaces. The facility to do the same is being added via the UI.
New upstream clusters based on Rafay's MKS distribution can be provisioned based on Kubernetes v1.30.x. Existing upstream Kubernetes clusters managed by the controller can be upgraded in-place to Kubernetes v1.30.x. Read more about this in this blog post
Upstream Kubernetes clusters based on Kubernetes v1.30 (and prior Kubernetes versions) will be fully CNCF conformant.
Known Issue: Upgrading Kubernetes Cluster to 1.30.x with Windows Nodes
Upgrading a Kubernetes cluster to version 1.30.x with Windows nodes will result in upgrade failure. This is a known upstream Kubernetes issue tracked on GitHub (issue).
Workaround: Before initiating the upgrade, drain the Windows nodes using the kubectl drain <nodename> command, then retry the upgrade again.
Fix: The fix is included in k8s version 1.30.2 and will be available in the 2.8 release.
New VMware clusters can be provisioned based on Kubernetes v1.29.x and v1.30.x. Existing VMware Kubernetes clusters managed by the controller can be upgraded in-place to Kubernetes v1.29.x and v1.30.x. Read more about the Kubernetes versions on this page
By default, only the Rafay management operator components are treated as “CRITICAL”. Customers will have the option to specify custom add-ons as CRITICAL based on the nature/importance in a blueprint. If a critical add-on fails to install during a blueprint sync operation, the blueprint state will be marked as FAILED and operations on the cluster such as workload deployment will be blocked until the issue is resolved.
The CRITICAL badge can be used for add-ons such as security and monitoring tools which are deemed as critical and mandatory to maintain compliance. In summary,
Blueprint state is marked as FAILED and operations on the cluster are blocked only if one or more critical add-ons fail to install
Blueprint state is marked as PARTIAL SUCCESS if one or more non-critical add-ons fail to install. Operations such as workload deployment, scaling nodes up/down are still allowed in this condition.
The ingress class name for Rafay's managed ingress controller (with the managed ingress add-on) is being updated to default-rafay-nginx instead of simply nginx to prevent potential naming conflicts. This adjustment ensures the uniqueness of the custom ingress, thereby avoiding clashes. Additionally, this name update resolves various ingress-related issues encountered during Blueprint synchronization.
Note
The new ingress class name will be used when the base blueprint version is 2.7+ for both new/existing managed ingress add-on installations
An indicator is being added in the Group listing page to distinguish between an "IDP group" and an "Override/Local group". This will enable customers to easily differentiate between the two groups in the Rafay Console.
There can be security guidelines that mandate that the Rafay Operator and managed add-on images be pulled from a custom registry instead of Rafay hosted registry. This release adds the ability to do that.
Please reach out to the Rafay Customer Success team if this is a requirement in your organization.
In this release, we have added driver support in hooks for both resource templates and environment templates, offering enhanced flexibility and reusability in your workflows.
What's New
Select Drivers in Hooks: Users can now choose a driver within hooks, enabling the utilization of pre-configured drivers.
Override Driver Timeouts: Timeouts configured in hooks will now take precedence over driver timeouts.
Interface Support: This functionality is available across all supported interfaces including UI, SystemSync, Terraform, and Swagger API.
Benefits
Improved Reusability: Simplify workflows by leveraging existing drivers within hooks.
Enhanced Control: Tailor timeouts to specific scenarios with hook-level overrides.
In previous releases, Volume Backup and Restore functionality was accessible through the UI and API. This functionality is now integrated into the GitOps System Sync. This feature empowers users to save configuration data stored in volumes associated with multi template deployments. Volumes serve as storage locations for configuration data required by resources within your environment. Enabling backup ensures you can restore this data if needed, minimizing downtime and data loss. When destroying the environment, these volumes will be cleaned up.
This release includes enhancements and bug fixes around resources listed below:
Existing Resources
rafay_eks_cluster: The rafay_eks_cluster resource now offers improved policy configuration options.
You can now specify policy information in JSON format using the new attach_policy_v2 field. This provides more flexibility for defining complex policies.
The addons and nodegroup IAM sections within the resource support additional fields in attach_policy:
* Condition
* NotAction
* NotPrincipal
* NotResource
rafay_eks_cluster: Previously, Terraform would reject configurations lacking mandatory addons in the rafay_eks_cluster definition. This validation has been removed. Terraform now accepts configurations without explicit addon definitions and implicitly adds them during cluster creation using terraform apply.
rafay_cloud_credential and rafay_cloud_credential_v3:
These resources now allow updating credentials without errors. Simply update the resource definition and run terraform apply to reflect the changes.