Kubernetes graceful shutdown: handling SIGTERM and pod termination

When Kubernetes terminates a pod, your application has a limited window to drain connections, finish in-flight requests, and clean up resources before it is forcefully killed. Getting this wrong is the most common source of 502 errors during deployments. This article covers the pod termination lifecycle, the endpoint removal race condition, preStop hooks, signal handling in Go, Node.js, Java, and Python, and how to test that your shutdown is actually graceful.

The pod termination lifecycle

When a pod delete request arrives (rolling update, scale-in, node eviction, kubectl delete), Kubernetes runs a specific sequence of events. The critical detail: two parallel tracks start simultaneously, and their interaction is the root cause of most shutdown-related errors.

Track A (kubelet):

The API server sets deletionTimestamp on the pod. The pod enters Terminating state.
The kubelet on the pod's node executes the preStop hook (if configured).
After the preStop hook completes, the kubelet sends SIGTERM to PID 1 in each container.
If the container does not exit within terminationGracePeriodSeconds, the kubelet sends SIGKILL.

Track B (network):

The endpoint controller removes the pod from EndpointSlices.
The API server propagates the change to every kube-proxy instance.
Each kube-proxy updates its iptables/ipvs rules to stop routing to the pod.
Ingress controllers refresh their upstream lists.

Both tracks start at the same moment. Neither waits for the other. That parallelism is the problem.

The endpoint removal race condition

Track A and Track B race against each other. If your application shuts down (Track A) before all kube-proxy instances finish updating their routing rules (Track B), requests still land on a pod that is no longer listening. The result: 502 Bad Gateway errors during rolling updates.

In small clusters, endpoint propagation might finish in under a second. In large clusters with 100+ nodes, or with ingress controllers that use polling instead of watches, it can take 10 to 30 seconds. During that window, traffic keeps arriving at a pod that is already shutting down.

Since Kubernetes v1.28, KEP-1669 (ProxyTerminatingEndpoints) is stable. kube-proxy falls back to terminating pods that are still serving when no ready endpoints exist. This reduces traffic black-holes during rolling updates, but it also means Kubernetes may keep sending traffic to a shutting-down pod. Your application handling shutdown gracefully matters more, not less.

Step 1: add a preStop hook

The preStop hook runs before SIGTERM is sent. A short sleep inside it gives the endpoint propagation machinery time to finish removing the pod from all routing tables.

For Kubernetes 1.30+ (native sleep action, no shell binary required):

lifecycle:
  preStop:
    sleep:
      seconds: 15   # delay SIGTERM until endpoints are updated

For Kubernetes < 1.30 (requires sleep binary in the container):

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 15"]

The sleep does not drain connections itself. It delays the moment SIGTERM arrives, buying time for kube-proxy and ingress controllers to stop routing new traffic to the pod.

Recommended sleep values:

Cluster profile	Sleep duration
Small cluster (< 50 nodes)	5 to 10 seconds
Medium cluster (50 to 100 nodes)	10 to 15 seconds
Large cluster (100+ nodes) or external load balancers	15 to 30 seconds

There is no universal correct value. Measure endpoint propagation latency in your cluster during testing.

Step 2: set terminationGracePeriodSeconds

The grace period is a shared budget. It starts counting the moment the pod enters Terminating, and it covers both the preStop hook and the application's own shutdown time. When it expires, the kubelet sends SIGKILL.

The formula:

terminationGracePeriodSeconds >= preStop_duration + app_shutdown_duration + safety_buffer

For a stateless HTTP service with a 15-second preStop sleep and a 20-second drain window, that gives you 15 + 20 + 10 = 45 seconds minimum:

spec:
  terminationGracePeriodSeconds: 60
  containers:
  - name: app
    image: my-app:v2.4.1
    lifecycle:
      preStop:
        sleep:
          seconds: 15

Recommended values by workload type:

Workload	Grace period	Rationale
Stateless HTTP microservice	45 to 60s	preStop sleep + request drain + buffer
WebSocket / long-poll service	60 to 120s	Long-lived connections need time to drain
Batch worker / job	120 to 300s	May be mid-chunk of large work
Stateful workload (database)	60 to 120s	Flush writes, close WAL, replication handoff

The default is 30 seconds. For most production workloads with a preStop sleep, that default is too low.

Step 3: handle SIGTERM in your application

Kubernetes sends SIGTERM to PID 1 inside the container. Your application must catch it, stop accepting new connections, finish in-flight requests, close resources, and exit cleanly.

The PID 1 requirement

If your application is not PID 1, it will not receive the signal. This is a common trap with Dockerfiles:

# WRONG: /bin/sh is PID 1, does not forward SIGTERM on Alpine
CMD myapp --flag

# CORRECT: myapp is PID 1
CMD ["myapp", "--flag"]

If you must use a shell entrypoint script, replace the shell process with exec:

#!/bin/sh
# setup steps here
exec myapp "$@"   # exec replaces the shell with myapp (same PID)

For containers where neither option works, use tini as a minimal init process that forwards signals correctly:

FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y tini
ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["myapp"]

Go

Go's signal.NotifyContext provides an idiomatic way to tie SIGTERM to context cancellation. The standard library's http.Server.Shutdown stops accepting new connections and waits for in-flight requests to finish:

ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGTERM, syscall.SIGINT)
defer stop()

srv := &http.Server{
    Addr:    ":8080",
    Handler: mux,
    BaseContext: func(net.Listener) context.Context { return ctx },
}

go func() {
    if err := srv.ListenAndServe(); err != http.ErrServerClosed {
        log.Fatalf("ListenAndServe: %v", err)
    }
}()

<-ctx.Done()
log.Println("SIGTERM received, draining...")

shutdownCtx, cancel := context.WithTimeout(context.Background(), 20*time.Second)
defer cancel()

if err := srv.Shutdown(shutdownCtx); err != nil {
    log.Printf("Shutdown error: %v", err)
    srv.Close()  // force-close if drain takes too long
}

Set the Shutdown() timeout to less than terminationGracePeriodSeconds minus the preStop sleep duration.

Node.js

server.close() stops accepting new connections but does not close idle HTTP keep-alive connections. Load balancers maintain persistent connections to your pods, and those will never close on their own. You must destroy them explicitly:

const server = http.createServer(app);
let isShuttingDown = false;

// Track connections to handle keep-alive sockets
const connections = new Set();
server.on('connection', (socket) => {
    connections.add(socket);
    socket.on('close', () => connections.delete(socket));
});

function gracefulShutdown(signal) {
    if (isShuttingDown) return;
    isShuttingDown = true;
    console.log(`${signal} received, draining...`);

    server.close(() => {
        console.log('Server closed');
        process.exit(0);
    });

    // Destroy idle keep-alive connections
    for (const socket of connections) {
        socket.destroy();
    }

    setTimeout(() => process.exit(1), 25000); // hard deadline
}

process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));

A common PID 1 mistake in Node.js: CMD ["npm", "start"] in the Dockerfile makes npm PID 1 instead of your application. npm does not forward SIGTERM. Use CMD ["node", "server.js"] directly.

Java (Spring Boot)

Spring Boot 2.3+ has built-in graceful shutdown support. In application.yml:

server:
  shutdown: graceful
spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s
management:
  endpoint:
    health:
      probes:
        enabled: true

When SIGTERM arrives, Spring stops accepting new requests, waits up to timeout-per-shutdown-phase for in-flight requests to finish, then shuts down the application context. The Actuator readiness endpoint automatically transitions to OUT_OF_SERVICE, which causes Kubernetes to stop routing traffic.

Python

For Flask/WSGI applications, register a SIGTERM handler that sets a shutdown flag:

import signal
import sys

is_shutting_down = False

def handle_sigterm(signum, frame):
    global is_shutting_down
    is_shutting_down = True
    sys.exit(0)

signal.signal(signal.SIGTERM, handle_sigterm)

@app.route('/healthz/ready')
def readiness():
    if is_shutting_down:
        return '', 503
    return '', 200

FastAPI with uvicorn handles SIGTERM natively via uvicorn's signal handling. With Gunicorn + uvicorn workers, verify that SIGTERM propagates from the Gunicorn master to worker processes in your specific setup.

Python's signal handlers run only in the main thread. If your app uses multiprocessing or alternative async frameworks (gevent, trio), test signal propagation separately.

Step 4: verify the result

After configuring preStop, grace period, and signal handling, you must test under load. The race condition only manifests when real traffic is in-flight during a pod restart.

Run a load test and a rolling restart simultaneously:

# Terminal 1: sustained load
hey -z 60s -c 10 http://my-service.default.svc.cluster.local/

# Terminal 2: trigger rolling restart while load runs
kubectl rollout restart deployment/my-app

Expected result: zero non-2xx responses in the load test output. If you see 502 or connection-refused errors, increase the preStop sleep or verify that your application is handling SIGTERM correctly.

To check preStop hook execution:

kubectl describe pod <pod-name>
# Look for "Normal  Killing" and "Warning  FailedPreStopHook" in events

To verify endpoint removal timing:

kubectl get endpointslices -w   # watch endpoint updates during a restart

Complete configuration

Putting it all together. This Deployment configuration handles the endpoint race condition, gives the application time to drain, and ensures the grace period covers the full shutdown window:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0       # never remove a pod without a ready replacement
      maxSurge: 1
  template:
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: app
        image: my-app:v2.4.1
        ports:
        - containerPort: 8080
        lifecycle:
          preStop:
            sleep:
              seconds: 15     # wait for endpoint propagation (K8s 1.30+)
        readinessProbe:
          httpGet:
            path: /healthz/ready
            port: 8080
          periodSeconds: 5
          failureThreshold: 2
        livenessProbe:
          httpGet:
            path: /healthz/live
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10
          failureThreshold: 3

The time budget for this configuration:

t=0s    Pod marked Terminating; endpoint removal starts (Track B)
t=0s    preStop sleep begins (Track A)
t=15s   preStop completes; SIGTERM sent to application
t=15s   Application stops accepting connections, starts draining
t=40s   Application exits (25s drain window)
t=60s   SIGKILL would fire (never reached if shutdown succeeds)

Common problems

preStop hook fails silently. Hooks that depend on a binary not present in the image (like sleep on distroless images) fail with a FailedPreStopHook event. Check kubectl describe pod for this warning. On Kubernetes 1.30+, use the native sleep: action instead of exec.

Application does not receive SIGTERM. Almost always a PID 1 problem. Run kubectl exec <pod> -- ps aux and verify your application process is PID 1. If it is not, fix the Dockerfile or add tini.

Grace period too short. If terminationGracePeriodSeconds is less than the preStop duration plus the application's drain time, the kubelet sends SIGKILL before the application finishes. Exit code 137 in pod status confirms this.

Nginx requires SIGQUIT. Nginx's default SIGTERM handler triggers a fast shutdown that drops connections. For graceful shutdown, send SIGQUIT via a preStop hook: command: ["/usr/sbin/nginx", "-s", "quit"].

Service mesh sidecar exits first. With Istio, both your application and the Envoy sidecar receive SIGTERM simultaneously. If Envoy exits first, outbound calls from your application fail during drain. Set EXIT_ON_ZERO_ACTIVE_CONNECTIONS=true on the sidecar to make Envoy wait for active connections to close.

When to escalate

If you still see 502 errors during deployments after implementing the configuration above, collect the following before asking for help:

Kubernetes version (kubectl version)
Cluster size (number of nodes)
Ingress controller type and version
kubectl describe pod output from a terminated pod showing events
kubectl get endpointslices -w output captured during a rolling restart
Load test results showing the error rate and timing
Whether you run a service mesh and its version
Pod spec showing terminationGracePeriodSeconds, preStop hook, and probe configuration

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy