Kubernetes deployment rollback with kubectl rollout undo

A bad deploy is on production and you need to revert it. The fastest safe path is kubectl rollout undo, but the command does not do quite what most engineers think it does. This guide gives you the one-line rollback, then explains how Deployment revisions actually work, how to roll back to a specific version, how to monitor and preview the action, and how StatefulSet and DaemonSet rollbacks differ from Deployments.

Roll back to the previous revision in one command

The fastest rollback is one command:

kubectl rollout undo deployment/myapp

This tells the Deployment controller to switch back to the previous ReplicaSet. New pods spin up against the older pod template, old pods drain, and traffic shifts back as readiness probes pass. The command is idempotent and respects the same maxUnavailable/maxSurge settings as a normal rollout, so a rollback is itself a rolling update, not an instant snap.

Watch the rollback complete with kubectl rollout status deployment/myapp. If the previous revision was healthy, the cluster is back on it within seconds to a minute, depending on replica count and probe timing.

That command line answers the production-down question. The rest of this article exists because the moment you ask "back to what, exactly?" or "what if the previous version was also broken?", rollout undo stops being self-explanatory.

How deployment revision history actually works

A Deployment does not save a copy of its YAML each time you update it. Kubernetes tracks rollouts at the ReplicaSet level. Each unique pod template (any change to image, env, labels, resources, probes) produces a new ReplicaSet, and the Deployment controller scales the new one up while it scales the old one down.

The Deployment's .spec.revisionHistoryLimit controls how many old ReplicaSets are retained for rollback. The default is 10, set when the apps/v1 API graduated to GA in Kubernetes 1.9 (the older apps/v1beta1 default was 2). Once the limit is hit, the oldest ReplicaSets are garbage-collected and you can no longer roll back to them.

kubectl rollout undo deployment/myapp does exactly one thing: it switches the active ReplicaSet to a previous one (by default the immediately previous one). It does not restore the YAML you applied, it does not reconfigure managed external resources, and it does not put your cluster back in the state it was in at the moment of the previous deploy. Anything outside the Deployment template (a Service rewrite, a ConfigMap edit, a database migration that ran via a Job) stays exactly as it is.

This matters when the rollback question is "back to what state?" The answer for kubectl rollout undo is: back to the previous pod template. Everything else is yours to coordinate.

For StatefulSets and DaemonSets the same idea applies, but the storage object is a ControllerRevision rather than a ReplicaSet. The behavior diverges in a few important ways, covered later in this article.

Inspect the rollout history before you undo

Before reverting, check what you would actually be reverting to:

kubectl rollout history deployment/myapp

Expected output:

deployment.apps/myapp
REVISION  CHANGE-CAUSE
2         <none>
3         <none>
4         <none>

The revision numbers are monotonically increasing. The highest is the current rollout. Numbers can skip if older ReplicaSets were pruned by revisionHistoryLimit.

To see exactly what is in a specific revision:

kubectl rollout history deployment/myapp --revision=3

This prints the full pod template for revision 3, including the image tag, environment variables, probes, and resources. Always confirm the image tag before rolling back, especially when several rollouts happened within a short window. If the previous revision was also broken (a common scenario in incidents that span multiple bad deploys), you need to roll back to a specific older revision rather than the immediately previous one.

If CHANGE-CAUSE is <none> for every row, the deployment never had the legacy --record flag set on the kubectl set image or kubectl apply calls that produced the rollouts. There is a way to fix this going forward; see Annotate deployments for meaningful history.

Roll back to a specific revision (--to-revision)

When the previous revision is itself broken, target an older one explicitly:

kubectl rollout undo deployment/myapp --to-revision=2

The same flag works on DaemonSets and StatefulSets. The behavior is identical: switch the active pod template to the one stored in revision 2, regardless of how many revisions ahead the current one is.

If the revision number you target no longer exists (pruned by revisionHistoryLimit), the command fails with an error like unable to find specified revision 2 in history. Increase revisionHistoryLimit proactively if your team makes frequent deploys and you want a longer rollback window. The trade-off is that each retained ReplicaSet keeps a small amount of metadata in etcd; for typical workloads the overhead is negligible until you go above 50 or so.

The default for --to-revision is 0, which means "the previous revision" rather than literal zero. The official docs describe it as: "The revision to rollback to. Default to 0 (last revision)."

For batch operations across multiple Deployments that share a label, the -l flag works:

kubectl rollout undo deployment -l app.kubernetes.io/part-of=payments

This rolls back every Deployment matching the selector to its previous revision. Use it carefully in production. If only some Deployments need reverting, rolling back the others can introduce its own incidents.

Annotate deployments for meaningful history (replacing --record)

The CHANGE-CAUSE column in kubectl rollout history is populated from the kubernetes.io/change-cause annotation on each ReplicaSet. Older Kubernetes versions had a --record flag that set this annotation automatically with the command line that triggered the rollout. That flag was deprecated and produces a deprecation warning since Kubernetes 1.22, and most teams should treat it as gone.

The replacement is to set the annotation yourself, ideally as part of your CI/CD pipeline:

kubectl annotate deployment/myapp \
  kubernetes.io/change-cause="release v2.4.1 - PR #1234 - hotfix login bug" \
  --overwrite

Run this command in the same pipeline step that updates the image. The annotation is then attached to the new ReplicaSet that Kubernetes creates, and kubectl rollout history shows a meaningful row instead of <none>. The --record flag only saved the literal command string anyway, which is far less useful than a release tag plus a PR link.

Reality check on what change-cause actually is: it is a free-text string. Kubernetes does not validate, parse, or use it for any logic. It is a hint for humans inspecting history. Pipelines that automate rollback decisions should reference the image tag or a Deployment label, not the change-cause annotation.

Monitor rollback progress with kubectl rollout status

Once you have triggered the rollback, watch it actually finish:

kubectl rollout status deployment/myapp --timeout=5m

Expected output during a rollback in progress:

Waiting for deployment "myapp" rollout to finish: 2 out of 4 new replicas have been updated...
Waiting for deployment "myapp" rollout to finish: 3 out of 4 new replicas have been updated...
deployment "myapp" successfully rolled out

The --timeout flag exits non-zero if the rollback does not complete in time. Wire that into your incident automation so the operator does not have to babysit the terminal.

In a second terminal, watch the pod transitions:

kubectl get pods -l app=myapp -w

You should see new pods (running the older image) reach 1/1 Running before the current-but-broken pods enter Terminating. If the rollback hangs with new pods stuck in Pending or CrashLoopBackOff, the previous revision is also unhealthy and you need to roll back to an even earlier one or fix forward. See When rollback does not fix the problem at the end of this article.

For deeper context on why a rollback is itself a rolling update and how to keep it zero-downtime, see Kubernetes rolling updates and zero-downtime deployments.

Set progressDeadlineSeconds, then act on it

Engineers often assume progressDeadlineSeconds triggers an automatic rollback when a rollout stalls. It does not. The official documentation states explicitly that Kubernetes only sets a status condition with reason ProgressDeadlineExceeded when the deadline is hit; the cluster takes no further action.

The default is 600 seconds (10 minutes). Configure it on the Deployment:

spec:
  progressDeadlineSeconds: 300   # mark rollout failed after 5 minutes of no progress

To act on the condition, watch for it in your CI/CD or alerting system and call kubectl rollout undo from there. A minimal pipeline check looks like:

# After kubectl apply, give the rollout 5 minutes to complete
if ! kubectl rollout status deployment/myapp --timeout=5m; then
  echo "Rollout failed; rolling back"
  kubectl rollout undo deployment/myapp
  exit 1
fi

This is what most teams call automatic rollback in their CI/CD platform. The cluster does not do it; the pipeline does. Tools like Argo Rollouts extend this by tying rollback to metric analysis, but the underlying primitive is still a script that watches kubectl rollout status and runs undo on failure.

Preview a rollback safely with --dry-run=server

Before pulling the trigger in production, preview what will happen:

kubectl rollout undo deployment/myapp --dry-run=server

The --dry-run=server flag submits the request to the API server, runs admission and validation, but does not persist the change. The output is the resulting Deployment object as it would exist after the rollback, including the new (rolled-back) pod template. This is useful for confirming you are about to revert to the right image before committing to it during an incident.

There is also --dry-run=client, which just prints what kubectl would send without contacting the server. For rollbacks, server is more useful because it catches admission webhook rejections (PSP, OPA, Kyverno) that would block the real command.

StatefulSet and DaemonSet rollback differences

The same kubectl rollout undo command works on all three workload controllers, but the behavior differs in ways that catch people out.

DaemonSets

DaemonSets gained a RollingUpdate strategy in Kubernetes 1.6 and revision history support in 1.7. Rollback works through the same flow as Deployments:

kubectl rollout history daemonset/fluentd
kubectl rollout undo daemonset/fluentd --to-revision=2

The catch: DaemonSet revisions only roll forward. The Kubernetes documentation puts it directly: "DaemonSet revisions only roll forward. That is to say, after a rollback completes, the revision number (.revision field) of the ControllerRevision being rolled back to will advance. For example, if you have revision 1 and 2 in the system, and roll back from revision 2 to revision 1, the ControllerRevision with .revision: 1 will become .revision: 3."

This means rolling back to revision 1 produces revision 3, and the next rollback's --to-revision=1 will fail because revision 1 no longer exists at that number. If you script DaemonSet rollbacks, do not pin to specific revision numbers across operations; resolve the desired revision from history each time.

StatefulSets

StatefulSets gained RollingUpdate and revision history in Kubernetes 1.7. The pod template stored in the ControllerRevision is restored on kubectl rollout undo, but the practical implications are different from a Deployment because pods are ordinal-bound to persistent volumes.

Three things to keep in mind:

Pods update in reverse ordinal order. A rolling update or rollback walks from the highest ordinal down to the lowest, waiting for each pod to be Ready before moving to the next. A 5-replica StatefulSet rollback takes roughly 5x as long as a Deployment rollback of the same size.
PersistentVolumes do not roll back. The pod template is restored, but the underlying PVCs and PVs keep whatever data the previous (broken) version wrote. If the bug corrupted on-disk state, rollout undo is not enough; restore from backup or run a separate data-fix Job.
Forced manual intervention is sometimes required. The Kubernetes docs warn: "When using Rolling Updates with the default Pod Management Policy (OrderedReady), it's possible to get into a broken state that requires manual intervention to repair." If a rolled-back StatefulSet pod cannot become Ready (because its data is incompatible with the older application version), the controller will not skip it. You may need to delete the pod manually, fix the data, and let the controller recreate it.

If you are partitioning a StatefulSet update with .spec.updateStrategy.rollingUpdate.partition, the partition value also applies to rollbacks. Pods with an ordinal below the partition are not touched, even by a rollback.

GitOps: rollback through Argo CD or git revert

If your cluster is managed by Argo CD or another GitOps controller, kubectl rollout undo works but creates a problem: Argo CD will see the cluster as drifted from Git and, if selfHeal: true is set, will re-apply the broken manifest within seconds. Your rollback gets undone before it stabilizes.

The right rollback path under GitOps is git revert:

git revert <commit-of-the-bad-deploy>
git push

Argo CD detects the new commit, syncs the cluster back to the previous manifest, and the rollback flows through the same pipeline as any other change. This preserves the audit trail and avoids fighting the controller.

For incident speed, two options:

Temporarily disable auto-sync. In the Argo CD UI or with argocd app set <app> --sync-policy none, then run kubectl rollout undo. Re-enable auto-sync once the fix is committed to Git. Use this only when you cannot afford the time for a Git round-trip.
Use the Argo CD UI's built-in rollback. The History and Rollback panel deploys a previous Git commit's manifest without changing Git itself. Argo CD will report the application as OutOfSync until you commit the revert. This is closer to a kubectl rollout undo but stays inside the GitOps tool.

Most teams find option 2 the right balance during incidents and follow up with a git revert PR once the immediate fire is out.

When rollback does not fix the problem

Rollback assumes the previous revision was healthy. When that assumption breaks, common failure modes:

The previous revision is also broken. Inspect history (kubectl rollout history deployment/myapp --revision=N) and roll back further with --to-revision. If the broken-state goes back further than revisionHistoryLimit, the only path is fix-forward. Increase revisionHistoryLimit once the incident is over so future incidents have a longer rollback window.

A database migration ran during the deploy. kubectl rollout undo reverts the pod template, not the database schema. If the new application version migrated the schema in a backward-incompatible way, the rolled-back pods will fail to start. Either roll the migration back (when it has a down step) or fix-forward with a hotfix that is compatible with the new schema.

External resources changed. A new IAM role, a Service that points to the wrong selector, an Ingress with a new path: kubectl rollout undo does not touch any of them. Audit what else was changed during the deploy and revert each piece intentionally.

The previous ReplicaSet was pruned. If revisionHistoryLimit is low and several deploys happened before you noticed the breakage, the target ReplicaSet may already be gone. The error unable to find specified revision N in history is final; you cannot recover a pruned ReplicaSet. Fix forward and raise the limit afterwards.

Stuck on ProgressDeadlineExceeded after the rollback. The previous version cannot start either. Inspect events with kubectl describe deployment/myapp and pod logs from the new (rolled-back) pods. The fix depends on the underlying error, but a stuck rollback is not different from a stuck deploy. See the troubleshooting section of Kubernetes rolling updates and zero-downtime deployments for common probe and resource issues that hang rollouts in either direction.

When rollback is genuinely impossible, the next best move is a focused hotfix that runs through the normal deploy path. Treat the rollback failure as data: it tells you the previous revision was not actually a good fallback, and the next incident postmortem should cover why.

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy