Kubernetes certificate expiry: renewing kubeadm-managed PKI

Self-managed Kubernetes control planes go dark roughly one year after kubeadm init because leaf certificates default to a 365-day lifetime. This article diagnoses the x509 errors you see when kubeadm-managed PKI expires, walks through kubeadm certs renew all, restarts the static pods, fixes the kubelet client certificate edge case that renew all does not touch, and sets you up so it never bites you again.

Symptom: control plane is unreachable with x509 errors

You run a routine kubectl get nodes and the cluster is gone:

Unable to connect to the server: x509: certificate has expired or is not yet valid:
current time 2026-04-24T10:14:23Z is after 2026-04-23T16:42:11Z

Or, on a control plane node, the kube-apiserver static pod is in CrashLoopBackOff and crictl logs for the etcd or apiserver container shows:

W0424 10:14:23.118233  authentication.go:73] Unable to authenticate the request
due to an error: x509: certificate has expired or is not yet valid

If you self-manage the control plane and the cluster has been up for roughly one year without a kubeadm upgrade, this is almost certainly expired kubeadm-managed PKI. The fix is real but mechanical: renew, restart, verify.

This article applies to clusters bootstrapped with kubeadm. EKS, GKE, and AKS handle control-plane certificate rotation for you, so this troubleshooting path does not apply there. AKS specifically auto-rotates non-CA certificates at 80% of their valid time on clusters created or upgraded after March 2022. Worker-side kubelet client certificates are managed by Kubernetes itself, separately from the kubeadm path discussed here.

What this actually means

kubeadm bootstraps a self-signed PKI hierarchy under /etc/kubernetes/pki/. There are three CAs (Kubernetes CA, etcd CA, front-proxy CA) plus a service account key pair, and a fan of leaf certificates that authenticate every component-to-component connection in the control plane.

By default, leaf certificates last 1 year (8760h) and CAs last 10 years (87600h). When a leaf certificate expires, the component that presents it can no longer authenticate itself to the API server. That trips the entire control plane: the kube-apiserver cannot reach etcd, controllers cannot list resources, scheduler cannot bind pods, kubectl from your workstation rejects the server certificate.

The CAs are still valid. Only the leaves expired. That is why renewal works without rebuilding the cluster: kubeadm uses the CA keys still on disk to issue fresh leaves with the same SANs.

There are two important misconceptions to clear up before you start:

kubeadm certs renew all does not rotate the CAs. It only renews leaf certificates and kubeconfig client certs. The CAs keep their original 10-year lifetime. If a CA itself is approaching expiry, that is a different (and harder) procedure.
After renewal, the API server keeps using the old certificate until you restart it. Static pods on the control plane do not pick up new files on disk automatically. You must restart them.

Common causes, ordered by likelihood

The cluster was bootstrapped with kubeadm and not upgraded for roughly a year. This is by far the most common cause. kubeadm renews certificates automatically during a kubeadm upgrade apply if they are within 180 days of expiry, which is roughly 50% of the leaf lifetime. Skip enough upgrade cycles and you hit the wall.

A node was offline during the renewal window. If kubeadm upgrade ran but a control-plane node was down at the time, that node's leaves were not renewed.
External CA mode is in use. When kubeadm finds ca.crt without ca.key, it activates external CA mode and refuses to issue certificates. You must renew externally and reload them onto each node. kubeadm certs renew all will fail with external CA mode: cannot renew certificates.
kubelet client certificate expired separately. The kubelet rotates its own client certificate between 30% and 10% of remaining lifetime, but if the API server was already unreachable when the rotation window opened, the kubelet could not reach the CSR endpoint to renew. This produces system:node x509 errors specifically and needs a separate fix (covered below).
System clock skew. If the node clock jumps forward, certificates appear "not yet valid" or "already expired" even when they should be fine. Check timedatectl status before assuming expiry is the cause.

Diagnosis: identify which certificates expired

SSH to any control plane node and run:

sudo kubeadm certs check-expiration

Expected output (truncated columns shown):

CERTIFICATE                EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
admin.conf                 Apr 23, 2026 16:42 UTC          no
apiserver                  Apr 23, 2026 16:42 UTC          no
apiserver-etcd-client      Apr 23, 2026 16:42 UTC          no
apiserver-kubelet-client   Apr 23, 2026 16:42 UTC          no
controller-manager.conf    Apr 23, 2026 16:42 UTC          no
etcd-healthcheck-client    Apr 23, 2026 16:42 UTC          no
etcd-peer                  Apr 23, 2026 16:42 UTC          no
etcd-server                Apr 23, 2026 16:42 UTC          no
front-proxy-client         Apr 23, 2026 16:42 UTC          no
scheduler.conf             Apr 23, 2026 16:42 UTC          no
super-admin.conf           Apr 23, 2026 16:42 UTC          no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Apr 21, 2035 16:42 UTC   8y              no
etcd-ca                 Apr 21, 2035 16:42 UTC   8y              no
front-proxy-ca          Apr 21, 2035 16:42 UTC   8y              no

<invalid> in the RESIDUAL TIME column confirms expiry. The CAs at the bottom should still have years left. super-admin.conf shows up on clusters created or upgraded with kubeadm 1.29 or later, where it was introduced as a break-glass credential separate from the day-to-day admin.conf.

If you cannot reach the cluster from your workstation, also confirm node clock is sane:

timedatectl status

You want System clock synchronized: yes. A node that drifted forward by months will show "expired" certificates that are actually fine.

For a single certificate file (handy when check-expiration itself fails for some reason):

sudo openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -dates

Output:

notBefore=Apr 23, 2025 16:42:11 GMT
notAfter=Apr 23, 2026 16:42:11 GMT

Step 1: back up the PKI directory and admin.conf

Renewal rewrites files in place. Before doing anything destructive, snapshot the existing state on every control plane node:

sudo cp -a /etc/kubernetes/pki /etc/kubernetes/pki.bak-$(date +%Y%m%d)
sudo cp -a /etc/kubernetes/admin.conf /etc/kubernetes/admin.conf.bak-$(date +%Y%m%d)
sudo cp -a /etc/kubernetes/controller-manager.conf /etc/kubernetes/controller-manager.conf.bak-$(date +%Y%m%d)
sudo cp -a /etc/kubernetes/scheduler.conf /etc/kubernetes/scheduler.conf.bak-$(date +%Y%m%d)
sudo cp -a /etc/kubernetes/kubelet.conf /etc/kubernetes/kubelet.conf.bak-$(date +%Y%m%d)

If you maintain etcd snapshots (which you should, see Kubernetes etcd: backup, restore, and disaster recovery), take a fresh one now too. Renewal does not touch etcd data, but if anything else goes wrong during the procedure, you want a recent snapshot.

Step 2: renew with kubeadm certs renew all

Run on the first control plane node:

sudo kubeadm certs renew all

Expected output:

[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed
certificate embedded in the kubeconfig file for the super-admin renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager,
kube-scheduler and etcd, so that they can use the new certificates.

The command runs unconditionally regardless of expiry and reuses the SANs from the existing certificates. The kubeadm reference documents every individual sub-command if you ever need to renew a single certificate (kubeadm certs renew apiserver, etc.).

For HA clusters, repeat this command on every control plane node. Each node has its own copy of the PKI; renewing on one does not propagate.

If kubeadm certs renew all fails with external CA mode: cannot renew certificates, you are not in kubeadm-managed mode. The CA private key is intentionally not on disk. You must issue replacement leaf certificates from your external CA and write them into /etc/kubernetes/pki/ manually. kubeadm cannot help you here.

Step 3: restart control-plane components

The static pods are still running with the old in-memory certificates. They will not pick up the new files on disk until they restart. The clean way to force a restart is to move each manifest out of /etc/kubernetes/manifests/ and back. The kubelet polls that directory every 20 seconds (fileCheckFrequency) and reconciles based on which manifests it sees.

# Move all four control-plane manifests out
sudo mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/
sudo mv /etc/kubernetes/manifests/kube-controller-manager.yaml /tmp/
sudo mv /etc/kubernetes/manifests/kube-scheduler.yaml /tmp/
sudo mv /etc/kubernetes/manifests/etcd.yaml /tmp/

# Wait ~25 seconds, then confirm the containers are stopped
sleep 25
sudo crictl ps | grep -E 'apiserver|controller-manager|scheduler|etcd_'

You should see no output. If containers are still listed, wait another 10 seconds and check again. The kubelet sometimes takes one extra poll cycle.

Then move them back:

sudo mv /tmp/etcd.yaml /etc/kubernetes/manifests/
sudo mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/
sudo mv /tmp/kube-controller-manager.yaml /etc/kubernetes/manifests/
sudo mv /tmp/kube-scheduler.yaml /etc/kubernetes/manifests/

Bring etcd up first. The API server depends on it. Wait 30 to 60 seconds for the kubelet to recreate all four containers.

A note on etcd specifically: since etcd v3.2.0, TLS certificates are reloaded on every client connection, so an etcd that is already running will pick up the new server.crt and peer.crt as new connections come in. In theory you do not need to restart etcd for the certificate change. In practice, a clean restart is simpler than reasoning about which existing connections are using which certificate, and it gives you a clear before/after for verification.

Finally, restart the kubelet itself so that it reads the renewed kubelet.conf (with caveats; see Special case below):

sudo systemctl restart kubelet

Step 4: verify renewal and restore cluster access

On the control plane node, confirm the new expiration dates:

sudo kubeadm certs check-expiration

Every leaf should now show approximately one year of residual time.

Verify the static pods came back:

sudo crictl ps | grep -E 'apiserver|controller-manager|scheduler|etcd_'

You should see all four containers running.

From your workstation, the local kubeconfig still references the old client certificate. Even if the API server is healthy, kubectl will not authenticate. Refresh it from the renewed admin.conf:

# On the control plane node
sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Or, from your workstation, copy the file back over SSH and overwrite ~/.kube/config. This step is the one that catches people most often: the API server is fine, etcd is fine, kubectl says x509: certificate has expired, and the cause is a stale local kubeconfig. The Kubernetes discussion forum has multiple threads on exactly this confusion.

You will know the recovery succeeded when:

kubectl get nodes
kubectl get pods --all-namespaces

return their normal output without x509 errors and all nodes report Ready.

Special case: etcd certificates (separate renewal path; external etcd caveat)

kubeadm certs renew all covers etcd certificates when etcd runs as a kubeadm-managed static pod (the default stacked etcd topology). The four etcd-related certificates are:

Certificate	What it does
`etcd-server`	Server cert presented by etcd to clients on port 2379
`etcd-peer`	Server + client cert for member-to-member traffic on port 2380
`etcd-healthcheck-client`	Client cert used by etcd's liveness probe
`apiserver-etcd-client`	Client cert the kube-apiserver uses to talk to etcd

If you run etcd externally (separate VMs or its own cluster, not stacked on the control plane nodes), kubeadm cannot renew etcd certificates for you. You manage etcd's PKI through whatever process produced it originally (often a separate etcd-managed CA, or manually-issued certificates). On the kubeadm side, only apiserver-etcd-client is in scope, because that one is signed by the etcd CA but stored in /etc/kubernetes/pki/ for the API server's use. The others live with etcd.

In stacked-etcd kubeadm setups (which is what kubeadm init produces by default), you do not need a separate procedure. kubeadm certs renew all covers all four, and the static-pod restart in Step 3 picks them up.

For the broader operations picture around etcd, including snapshots and disaster recovery, see Kubernetes etcd: backup, restore, and disaster recovery.

Special case: kubelet client certificate (auto-rotation, not in `renew all`)

The kubelet's client certificate (/var/lib/kubelet/pki/kubelet-client-current.pem) is not managed by kubeadm certs renew all. It is auto-rotated by the kubelet itself, between 30% and 10% of remaining lifetime, via the RotateKubeletClientCertificate feature gate (stable and on by default since Kubernetes 1.19). The kubelet posts a CertificateSigningRequest to the API server, the controller-manager auto-approves it, and the kubelet writes the new cert to disk.

There is one edge case where this falls over: if the API server is already unreachable when the rotation window opens, the kubelet cannot reach the CSR endpoint, and the certificate expires anyway. You will see this on a node that was offline during the rotation window, or on any node if the API server itself was down (precisely the scenario this article addresses).

kubelet.conf itself is also separately handled. The historical kubeadm issue #1361 and #2185 explain why: the embedded client certificate in kubelet.conf was meant to be replaced by the auto-rotated certificate stored elsewhere (/var/lib/kubelet/pki/), so kubeadm intentionally does not touch kubelet.conf during certs renew all.

If you have an expired kubelet client certificate and the auto-rotation already failed, the recovery procedure is:

# On the affected node, regenerate kubelet.conf from the CA
sudo kubeadm kubeconfig user \
  --org system:nodes \
  --client-name system:node:$(hostname) \
  > /tmp/kubelet.conf

# Replace the existing kubelet.conf
sudo mv /etc/kubernetes/kubelet.conf /etc/kubernetes/kubelet.conf.expired
sudo mv /tmp/kubelet.conf /etc/kubernetes/kubelet.conf

# Remove the old auto-rotated certs so the kubelet bootstraps fresh ones
sudo rm /var/lib/kubelet/pki/kubelet-client-*.pem

# Restart the kubelet
sudo systemctl restart kubelet

After restart, the kubelet uses kubelet.conf to bootstrap, posts a CSR, and the controller-manager (which now has a renewed certificate of its own and can sign things again) auto-approves it. New kubelet-client-current.pem appears in /var/lib/kubelet/pki/ within a few seconds. Watch the kubelet logs: journalctl -u kubelet -f.

You will know it worked when the node returns to Ready:

kubectl get node $(hostname)

When to escalate

If the procedure does not restore the cluster, collect the following before asking for help:

Output of sudo kubeadm certs check-expiration from each affected node
Output of kubectl version (specifically the server build, even if the client cannot reach the cluster)
Output of timedatectl status from each affected node (clock skew is a silent killer)
The exact x509 error string from crictl logs for the kube-apiserver and etcd containers
The first 50 lines of journalctl -u kubelet --since "30 minutes ago"
Whether the cluster uses external CA mode (check whether /etc/kubernetes/pki/ca.key exists)
Whether etcd is stacked or external
The kubeadm version that originally bootstrapped the cluster (often visible in /etc/kubernetes/manifests/kube-apiserver.yaml image tag history)
Whether /var/lib/kubelet/pki/kubelet-client-current.pem is a symlink and where it points
Whether the workstation kubeconfig has been refreshed from the renewed admin.conf

Multi-node HA failures (where more than one control plane node is affected and quorum is unclear) are the most likely scenario where escalation is genuinely needed. Single-node renewal is mechanical; HA renewal where node 1 succeeded but node 2 lost etcd quorum during the restart is where things get interesting.

Preventing recurrence

The point is to never see this article's symptom again.

Run kubeadm upgrade at least every 6 months. This automatically renews all certificates within the 180-day window. It is also a safer cadence for staying on supported Kubernetes versions, since each Kubernetes minor release is supported for roughly 14 months.
Monitor certificate expiry actively. A simple Prometheus alert on the apiserver_client_certificate_expiration_seconds metric, or a CronJob that runs kubeadm certs check-expiration and emails on <60d residual, catches this before it becomes an outage. Alert at 90 days, page at 30 days.
If you can move to managed Kubernetes, do. The operational cost of running your own kubeadm cluster includes this article. EKS, GKE, and AKS rotate control-plane certificates for you. AKS specifically auto-rotates non-CA certificates at 80% of valid time on clusters from March 2022 onward. GKE rotates etcd certificates 6 months before expiry automatically.
For workload TLS, use cert-manager. kubeadm's PKI is for the cluster's internal control-plane; it is not the right tool for issuing certificates to your applications. See Kubernetes TLS with cert-manager: automated certificate management for that layer.
Increase the default lifetime if your operational cadence is long. kubeadm 1.31 and later support certificateValidityPeriod in the ClusterConfiguration (defaults: 8760h for leaves, 87600h for CAs). A 3-year leaf lifetime is unusual but not wrong if you have weak operational cadence and accept the longer blast radius if a cert is compromised.

The goal is not zero certificate work. The goal is no surprises. A 5-minute renewal during a planned upgrade window beats a 4 AM x509-induced outage by every operational metric.

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy