Kubernetes liveness, readiness, and startup probes: configuration guide

Adding probes to your first Deployment is mostly a series of decisions: which probe types do I need, which mechanism fits my service, and what starter values make sense for my runtime. This guide answers each of those decisions in order, gives you a complete Deployment manifest at the end, and points to a deeper troubleshooting article when something misbehaves.

What you will have at the end

A Deployment manifest with the right combination of liveness, readiness, and startup probes for your service, populated with starter values that match your runtime, and wired into a complete apps/v1 Deployment spec you can kubectl apply. The configuration is conservative on purpose: it errs on the side of stability, then points you at the troubleshooting article when you need to tune.

Prerequisites

  • kubectl connected to a Kubernetes 1.27+ cluster (1.27 is when the gRPC probe went GA; earlier versions are fine if you skip gRPC).
  • An application container image that exposes a health endpoint, or that you can wrap with a small endpoint. If your app does not expose anything yet, the endpoint design section in the troubleshooting article covers what each endpoint should and should not do.
  • Familiarity with Deployment specs at a beginner level: apiVersion, kind, spec.template.spec.containers.
  • Some idea of how long your application takes to boot. A rough number is fine; the decision tree below covers both fast and slow runtimes.

Which probe do you need? A decision tree

Three probes, three different jobs. Most first-time Deployments do not need all three. Walk this list top to bottom and add only what applies.

1. Do you want to send traffic to this pod from a Service? Almost always yes. Add a readiness probe. Without one, Kubernetes routes traffic to a pod the moment its container starts, including during boot when the application cannot yet serve requests.

2. Can the application get into a state where it is alive but stuck (deadlock, frozen event loop, exhausted thread pool) and only a restart fixes it? If yes, add a liveness probe. If the process always crashes itself when it goes wrong, you do not need one. Kubernetes will restart a crashed container regardless.

3. Does your container take longer to boot than failureThreshold x periodSeconds of your liveness probe (default: 30 seconds)? If yes, add a startup probe. It blocks liveness and readiness until boot is complete, so you do not need to inflate initialDelaySeconds on the liveness probe to compensate.

A common mistake is adding all three probes by reflex. For a Go web service that boots in two seconds, a readiness probe alone is enough. For a Spring Boot application that needs 90 seconds, all three probes earn their keep.

Your situation Readiness Liveness Startup
Stateless HTTP API, fast boot (< 10s), self-crashing on errors Yes Optional No
Long-running service that can deadlock Yes Yes No
Slow-booting JVM, .NET, or large Python/Ruby app Yes Yes Yes
Worker pod with no incoming traffic No Optional If slow boot

Probe types: HTTP, TCP, exec, gRPC

Each probe definition picks exactly one mechanism. The right choice is almost always determined by the protocol your service speaks.

HTTP probe (httpGet)

The kubelet sends an HTTP GET request. Status codes 200 through 399 count as success; anything else is a failure. This is the right choice for any service that already speaks HTTP.

readinessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 10
  failureThreshold: 3

Starter values: leave defaults alone. Bump timeoutSeconds from 1 to 2 or 3 if your endpoint touches a database. The kubelet sends User-Agent: kube-probe/<version> by default and you can add custom headers under httpHeaders if your endpoint requires them.

TCP probe (tcpSocket)

The kubelet opens a TCP connection. Success means the connection was accepted; it does not verify that the application is processing requests. Use this for databases, message brokers, and other non-HTTP services.

readinessProbe:
  tcpSocket:
    port: 5432
  periodSeconds: 10
  failureThreshold: 3

Starter values: same defaults. The biggest gotcha is that a TCP probe will pass on a half-broken application that accepts connections but never responds.

Exec probe (exec)

The kubelet runs a command inside the container. Exit code 0 is success, anything else is failure. Use this when no network endpoint is available, for example a worker that writes a heartbeat file.

livenessProbe:
  exec:
    command:
      - cat
      - /tmp/healthy
  periodSeconds: 30
  failureThreshold: 3

Starter values: use a longer periodSeconds (30 instead of 10) because exec probes spawn a child process every time. If your container's PID 1 is your application (not tini or dumb-init), short intervals can leak zombies and exhaust the node's PID space. Prefer HTTP or TCP whenever the application supports it.

gRPC probe (grpc)

The kubelet calls the standard gRPC Health Checking Protocol (grpc.health.v1.Health/Check). GA since Kubernetes 1.27, no feature gate required.

readinessProbe:
  grpc:
    port: 50051
  periodSeconds: 10
  failureThreshold: 3

Starter values: defaults work. Limitations to know up front:

  • Port must be numeric. Named ports are not supported.
  • No client certificate support. Services that require mTLS for the health endpoint cannot use the built-in probe.
  • No server certificate validation. The probe ignores certificate errors entirely.
  • No service-name chaining. You cannot point one probe at a multi-service health aggregator.

If any of those are dealbreakers, fall back to wrapping a small HTTP endpoint or running grpc-health-probe as an exec command.

These are conservative starting points, not optimums. They aim to keep your first deploy stable so you can tune from real data instead of guessing in the dark.

Fast-starting compiled runtimes (Go, Rust, statically-linked binaries, Node.js)

Boot time is typically under 5 seconds. A readiness probe alone is usually enough; add liveness only if you have a specific deadlock concern.

readinessProbe:
  httpGet:
    path: /readyz
    port: 8080
  periodSeconds: 5
  timeoutSeconds: 2
  failureThreshold: 3
livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 10
  timeoutSeconds: 2
  failureThreshold: 6   # restart only after 60s of consecutive failures

No startup probe needed. The default failureThreshold * periodSeconds of 30 seconds covers cold starts comfortably.

Interpreted runtimes (Python, Ruby, PHP)

Boot time is typically 10 to 40 seconds for Django, Rails, or Laravel under typical workloads, depending on the number of installed packages. A startup probe is recommended once cold start crosses the 30-second mark.

startupProbe:
  httpGet:
    path: /readyz
    port: 8080
  periodSeconds: 10
  failureThreshold: 12   # 12 x 10 = 120s, comfortably above typical Django/Rails cold start
readinessProbe:
  httpGet:
    path: /readyz
    port: 8080
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 3
livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 10
  timeoutSeconds: 2
  failureThreshold: 6

The startup probe absorbs migration runs, cache warm-up, and lazy module imports.

JVM and .NET runtimes (Spring Boot, Quarkus, ASP.NET Core)

Boot time can be 30 to 180 seconds depending on dependency injection, classpath scanning, and JIT warm-up. A startup probe is essentially required.

startupProbe:
  httpGet:
    path: /actuator/health/liveness   # Spring Boot Actuator path; .NET uses /healthz
    port: 8080
  periodSeconds: 10
  failureThreshold: 30   # 30 x 10 = 300s, covers up to 5 minutes of cold start
readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8080
  periodSeconds: 10
  timeoutSeconds: 3
  failureThreshold: 3
livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  periodSeconds: 15
  timeoutSeconds: 3
  failureThreshold: 4   # restart after a full minute of failure, not 30s

The 5-minute startup window is generous on purpose. JVM cold start under CPU pressure can stretch unpredictably, and a probe that kills the pod before it finishes booting produces a CrashLoopBackOff that masks the real symptom.

If your application uses Spring Boot Actuator's liveness and readiness groups, use the dedicated paths shown above. Otherwise, expose two endpoints yourself, one that returns 200 once the application has finished booting (liveness) and one that also checks critical dependencies (readiness).

Reference: every probe parameter at a glance

Six parameters shape probe behavior. Defaults are sensible for fast services; only the timeouts typically need tuning.

Parameter Default What it controls Tuning hint
initialDelaySeconds 0 Seconds before the first probe fires Prefer a startup probe over inflating this
periodSeconds 10 Seconds between probe executions Lower for readiness (faster traffic gating), higher for liveness
timeoutSeconds 1 Seconds the kubelet waits for a response Raise to 2-3 if the endpoint touches a dependency
successThreshold 1 Consecutive successes to mark healthy Must be 1 for liveness and startup probes (API enforces)
failureThreshold 3 Consecutive failures before action Raise for liveness to avoid premature restarts
terminationGracePeriodSeconds Inherits pod-level Override for probe-triggered restarts Liveness/startup only; rejected on readiness (1.25+)

All defaults are documented in the official Kubernetes probe reference. Two API constraints are worth knowing in advance:

  • successThreshold for liveness and startup probes must be 1. The API server rejects any other value.
  • Probe-level terminationGracePeriodSeconds (added in 1.25) only applies to liveness and startup. Setting it on a readiness probe is rejected by the API server, since readiness failure does not cause termination in the first place.

Version-dependent behavior across cluster versions

Probe features land in tranches. If you are not sure what your cluster supports, run kubectl version and check the rows below.

Feature First available Stable Notes
Liveness, readiness probes All supported versions All Foundation feature
Startup probe 1.16 (alpha) 1.20 (GA) Beta in 1.18
gRPC probe 1.23 (alpha, off by default) 1.27 (GA) Beta in 1.24
Probe-level terminationGracePeriodSeconds 1.22 (alpha) 1.25 (GA) Liveness and startup only; readiness rejected

If you target a cluster older than 1.27 and want gRPC, you have two choices: enable the GRPCContainerProbe feature gate (only viable on a self-managed cluster) or fall back to running grpc-health-probe as an exec command.

What probes are NOT for

Probes are widely misused as a generic monitoring layer. They are not.

Liveness probes are not for checking external dependencies. This is the single most damaging anti-pattern. If every pod's liveness check pings the database and the database has a brief outage, every pod restarts at the same time. The cluster goes from "degraded database" to "degraded database plus zero healthy application pods". The Google SRE Book is explicit that monitoring should not chain dependencies into alerting logic; the same caution applies to probes that take destructive action.

Probes are not a substitute for metrics. A liveness probe tells the kubelet "kill this pod"; it does not tell you why. Always pair probes with proper observability. The Kubernetes OpenTelemetry observability article walks through what to capture instead.

Readiness probes are not "startup-only". They run for the entire lifetime of the pod. A pod that was Ready five minutes ago can become not-Ready instantly when the probe starts failing. Treat them as "should this pod receive traffic right now?", not "has this pod finished booting?".

Probes are not application-level health monitoring. They check whether the kubelet should take a specific action: restart the container, or stop sending traffic. Anything more nuanced (queue depth, business-level correctness, downstream SLO breaches) belongs in metrics and alerting, not in probes.

Putting it together: complete Deployment manifest

A production-ready manifest for a Spring Boot service. Adjust the probe section to match the runtime block from the previous section.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: orders-api
  labels:
    app: orders-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: orders-api
  template:
    metadata:
      labels:
        app: orders-api
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: app
          image: registry.internal/orders-api:1.7.4   # Spring Boot 3.4 image
          ports:
            - name: http
              containerPort: 8080
          # Startup probe: blocks the others until the JVM is up
          startupProbe:
            httpGet:
              path: /actuator/health/liveness
              port: http
            periodSeconds: 10
            failureThreshold: 30          # 5 minutes worst-case cold start
          # Readiness probe: gate traffic on dependency health
          readinessProbe:
            httpGet:
              path: /actuator/health/readiness
              port: http
            periodSeconds: 10
            timeoutSeconds: 3             # /readiness queries the database
            failureThreshold: 3
          # Liveness probe: last-resort restart on deadlock
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: http
            periodSeconds: 15
            timeoutSeconds: 3
            failureThreshold: 4           # restart only after ~60s of failure
            terminationGracePeriodSeconds: 10   # fast restart on deadlock (1.25+)
          resources:
            requests:
              cpu: "250m"
              memory: "512Mi"
            limits:
              memory: "1Gi"

Apply it and verify the pods reach Ready without restart events:

kubectl apply -f deployment.yaml
kubectl rollout status deployment/orders-api --timeout=5m
kubectl describe pod -l app=orders-api | grep -A 3 "Events:"

Expected output: deployment "orders-api" successfully rolled out, and the events section should show probe successes only, no Unhealthy entries. If you see Liveness probe failed events while the application is still booting, the startup probe is misconfigured: raise failureThreshold until the boot window comfortably fits.

This manifest pairs naturally with the rollout settings in Kubernetes rolling updates and zero-downtime deployments, which adds maxUnavailable: 0, a preStop hook, and a PodDisruptionBudget for full production hardening.

What to do when probes misbehave

If your pods enter CrashLoopBackOff, get restarted under load, or simultaneously go not-Ready during a transient dependency hiccup, the symptoms map to specific configuration mistakes. The companion article How to configure Kubernetes health probes covers each in depth: the endpoint-design rules, the five common misconfiguration patterns, the diagnostic commands, and the escalation checklist when a probe keeps failing despite a configuration that looks correct.

For the related case where pods are 1/1 Running but get killed during deploys, the issue is not the probe but the endpoint-removal race condition during termination. That is covered by Kubernetes graceful shutdown.

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy

Search this site

Start typing to search, or browse the knowledge base and blog.