Skip to main content
Version: main 🚧

What happens during control plane outages?

vCluster Platform is the management plane for projects, access, templates, lifecycle operations, and connected cluster visibility. Tenant clusters have their own control planes. Those control planes might run on the same Kubernetes cluster as Platform or on separate connected clusters.

Because these layers are separate, an outage affects different capabilities depending on what is unavailable.

Platform outage​

If vCluster Platform is unavailable but the connected clusters and tenant cluster control planes remain healthy, tenant clusters do not automatically stop. Existing workloads are not intentionally deleted because Platform is unavailable.

CapabilityStatusImpact
Platform UI and APIUnavailableUsers and automation cannot reliably use Platform while it is down.
Lifecycle operationsUnavailableCreating, deleting, sleeping, waking, updating, or restoring tenant clusters can fail or wait.
Access, audit, and integrationsLimitedPlatform-provided access features, audit, integrations, and automation can be unavailable or delayed.
Tenant cluster runtimeAvailableTenant clusters do not automatically stop if their own control planes and connected clusters remain healthy.
External deployment toolsAvailableExternally deployed tenant clusters continue to be managed by their original deployment tool.

Platform high availability protects the management plane by running multiple Platform replicas. It doesn't replace high availability for tenant cluster control planes or for the Kubernetes clusters that host tenant clusters. See High Availability Installation.

Connected control plane cluster outage​

A connected control plane cluster is a Kubernetes cluster where Platform can deploy or manage tenant clusters. If a connected control plane cluster is unavailable, Platform can't complete operations that require access to that cluster.

CapabilityStatusImpact
Platform UI and APIAvailablePlatform can remain reachable and continue managing other healthy connected clusters.
Connected cluster visibilityLimitedPlatform can mark the connected cluster or its network peer as offline until connectivity returns.
Lifecycle operations on that clusterUnavailablePlatform can't reliably create, update, delete, sleep, wake, or inspect tenant clusters hosted there.
Tenant cluster reconciliationUnavailablePlatform-managed reconciliation pauses or fails for tenant clusters on the unavailable cluster.
Tenant workloadsLimitedAvailability depends on whether their control planes and worker nodes are still running and reachable.

If tenant cluster control planes are hosted on the unavailable control plane cluster, tenant users can't reliably use those tenant cluster APIs until the control plane cluster recovers. For the tenant cluster view of this behavior, see What happens during control plane outages?.

Tenant cluster control plane outage​

If an individual tenant cluster control plane is unavailable but Platform remains healthy, Platform can still be reachable, but operations against that tenant cluster are limited. Platform may show the tenant cluster as pending, failed, not ready, or offline depending on the deployment model and connectivity path.

CapabilityStatusImpact
Platform UI and APIAvailablePlatform can remain reachable and continue managing other healthy tenant clusters.
Operations for this tenant clusterLimitedOperations that require the unavailable tenant cluster API can fail or remain pending.
Tenant cluster statusLimitedPlatform may show the tenant cluster as pending, failed, not ready, or offline.
Existing tenant workloadsLimitedPlatform does not intentionally delete them, but runtime behavior depends on the tenant cluster and worker nodes.
Other tenant clustersAvailableHealthy tenant clusters can continue to be managed independently.

Recovery resumes after the tenant cluster control plane becomes reachable again.

For production environments, plan availability at each layer: Platform, the connected control plane clusters, and the tenant cluster control planes.