What you will have at the end
A Deployment manifest with the right combination of liveness, readiness, and startup probes for your service, populated with starter values that match your runtime, and wired into a complete apps/v1 Deployment spec you can kubectl apply. The configuration is conservative on purpose: it errs on the side of stability, then points you at the troubleshooting article when you need to tune.
Prerequisites
kubectlconnected to a Kubernetes 1.27+ cluster (1.27 is when the gRPC probe went GA; earlier versions are fine if you skip gRPC).- An application container image that exposes a health endpoint, or that you can wrap with a small endpoint. If your app does not expose anything yet, the endpoint design section in the troubleshooting article covers what each endpoint should and should not do.
- Familiarity with Deployment specs at a beginner level:
apiVersion,kind,spec.template.spec.containers. - Some idea of how long your application takes to boot. A rough number is fine; the decision tree below covers both fast and slow runtimes.
Which probe do you need? A decision tree
Three probes, three different jobs. Most first-time Deployments do not need all three. Walk this list top to bottom and add only what applies.
1. Do you want to send traffic to this pod from a Service? Almost always yes. Add a readiness probe. Without one, Kubernetes routes traffic to a pod the moment its container starts, including during boot when the application cannot yet serve requests.
2. Can the application get into a state where it is alive but stuck (deadlock, frozen event loop, exhausted thread pool) and only a restart fixes it? If yes, add a liveness probe. If the process always crashes itself when it goes wrong, you do not need one. Kubernetes will restart a crashed container regardless.
3. Does your container take longer to boot than failureThreshold x periodSeconds of your liveness probe (default: 30 seconds)? If yes, add a startup probe. It blocks liveness and readiness until boot is complete, so you do not need to inflate initialDelaySeconds on the liveness probe to compensate.
A common mistake is adding all three probes by reflex. For a Go web service that boots in two seconds, a readiness probe alone is enough. For a Spring Boot application that needs 90 seconds, all three probes earn their keep.
| Your situation | Readiness | Liveness | Startup |
|---|---|---|---|
| Stateless HTTP API, fast boot (< 10s), self-crashing on errors | Yes | Optional | No |
| Long-running service that can deadlock | Yes | Yes | No |
| Slow-booting JVM, .NET, or large Python/Ruby app | Yes | Yes | Yes |
| Worker pod with no incoming traffic | No | Optional | If slow boot |
Probe types: HTTP, TCP, exec, gRPC
Each probe definition picks exactly one mechanism. The right choice is almost always determined by the protocol your service speaks.
HTTP probe (httpGet)
The kubelet sends an HTTP GET request. Status codes 200 through 399 count as success; anything else is a failure. This is the right choice for any service that already speaks HTTP.
readinessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
failureThreshold: 3
Starter values: leave defaults alone. Bump timeoutSeconds from 1 to 2 or 3 if your endpoint touches a database. The kubelet sends User-Agent: kube-probe/<version> by default and you can add custom headers under httpHeaders if your endpoint requires them.
TCP probe (tcpSocket)
The kubelet opens a TCP connection. Success means the connection was accepted; it does not verify that the application is processing requests. Use this for databases, message brokers, and other non-HTTP services.
readinessProbe:
tcpSocket:
port: 5432
periodSeconds: 10
failureThreshold: 3
Starter values: same defaults. The biggest gotcha is that a TCP probe will pass on a half-broken application that accepts connections but never responds.
Exec probe (exec)
The kubelet runs a command inside the container. Exit code 0 is success, anything else is failure. Use this when no network endpoint is available, for example a worker that writes a heartbeat file.
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
periodSeconds: 30
failureThreshold: 3
Starter values: use a longer periodSeconds (30 instead of 10) because exec probes spawn a child process every time. If your container's PID 1 is your application (not tini or dumb-init), short intervals can leak zombies and exhaust the node's PID space. Prefer HTTP or TCP whenever the application supports it.
gRPC probe (grpc)
The kubelet calls the standard gRPC Health Checking Protocol (grpc.health.v1.Health/Check). GA since Kubernetes 1.27, no feature gate required.
readinessProbe:
grpc:
port: 50051
periodSeconds: 10
failureThreshold: 3
Starter values: defaults work. Limitations to know up front:
- Port must be numeric. Named ports are not supported.
- No client certificate support. Services that require mTLS for the health endpoint cannot use the built-in probe.
- No server certificate validation. The probe ignores certificate errors entirely.
- No service-name chaining. You cannot point one probe at a multi-service health aggregator.
If any of those are dealbreakers, fall back to wrapping a small HTTP endpoint or running grpc-health-probe as an exec command.
Recommended starting values by runtime
These are conservative starting points, not optimums. They aim to keep your first deploy stable so you can tune from real data instead of guessing in the dark.
Fast-starting compiled runtimes (Go, Rust, statically-linked binaries, Node.js)
Boot time is typically under 5 seconds. A readiness probe alone is usually enough; add liveness only if you have a specific deadlock concern.
readinessProbe:
httpGet:
path: /readyz
port: 8080
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 3
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
timeoutSeconds: 2
failureThreshold: 6 # restart only after 60s of consecutive failures
No startup probe needed. The default failureThreshold * periodSeconds of 30 seconds covers cold starts comfortably.
Interpreted runtimes (Python, Ruby, PHP)
Boot time is typically 10 to 40 seconds for Django, Rails, or Laravel under typical workloads, depending on the number of installed packages. A startup probe is recommended once cold start crosses the 30-second mark.
startupProbe:
httpGet:
path: /readyz
port: 8080
periodSeconds: 10
failureThreshold: 12 # 12 x 10 = 120s, comfortably above typical Django/Rails cold start
readinessProbe:
httpGet:
path: /readyz
port: 8080
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
timeoutSeconds: 2
failureThreshold: 6
The startup probe absorbs migration runs, cache warm-up, and lazy module imports.
JVM and .NET runtimes (Spring Boot, Quarkus, ASP.NET Core)
Boot time can be 30 to 180 seconds depending on dependency injection, classpath scanning, and JIT warm-up. A startup probe is essentially required.
startupProbe:
httpGet:
path: /actuator/health/liveness # Spring Boot Actuator path; .NET uses /healthz
port: 8080
periodSeconds: 10
failureThreshold: 30 # 30 x 10 = 300s, covers up to 5 minutes of cold start
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
periodSeconds: 15
timeoutSeconds: 3
failureThreshold: 4 # restart after a full minute of failure, not 30s
The 5-minute startup window is generous on purpose. JVM cold start under CPU pressure can stretch unpredictably, and a probe that kills the pod before it finishes booting produces a CrashLoopBackOff that masks the real symptom.
If your application uses Spring Boot Actuator's liveness and readiness groups, use the dedicated paths shown above. Otherwise, expose two endpoints yourself, one that returns 200 once the application has finished booting (liveness) and one that also checks critical dependencies (readiness).
Reference: every probe parameter at a glance
Six parameters shape probe behavior. Defaults are sensible for fast services; only the timeouts typically need tuning.
| Parameter | Default | What it controls | Tuning hint |
|---|---|---|---|
initialDelaySeconds |
0 | Seconds before the first probe fires | Prefer a startup probe over inflating this |
periodSeconds |
10 | Seconds between probe executions | Lower for readiness (faster traffic gating), higher for liveness |
timeoutSeconds |
1 | Seconds the kubelet waits for a response | Raise to 2-3 if the endpoint touches a dependency |
successThreshold |
1 | Consecutive successes to mark healthy | Must be 1 for liveness and startup probes (API enforces) |
failureThreshold |
3 | Consecutive failures before action | Raise for liveness to avoid premature restarts |
terminationGracePeriodSeconds |
Inherits pod-level | Override for probe-triggered restarts | Liveness/startup only; rejected on readiness (1.25+) |
All defaults are documented in the official Kubernetes probe reference. Two API constraints are worth knowing in advance:
successThresholdfor liveness and startup probes must be1. The API server rejects any other value.- Probe-level
terminationGracePeriodSeconds(added in 1.25) only applies to liveness and startup. Setting it on a readiness probe is rejected by the API server, since readiness failure does not cause termination in the first place.
Version-dependent behavior across cluster versions
Probe features land in tranches. If you are not sure what your cluster supports, run kubectl version and check the rows below.
| Feature | First available | Stable | Notes |
|---|---|---|---|
| Liveness, readiness probes | All supported versions | All | Foundation feature |
| Startup probe | 1.16 (alpha) | 1.20 (GA) | Beta in 1.18 |
| gRPC probe | 1.23 (alpha, off by default) | 1.27 (GA) | Beta in 1.24 |
Probe-level terminationGracePeriodSeconds |
1.22 (alpha) | 1.25 (GA) | Liveness and startup only; readiness rejected |
If you target a cluster older than 1.27 and want gRPC, you have two choices: enable the GRPCContainerProbe feature gate (only viable on a self-managed cluster) or fall back to running grpc-health-probe as an exec command.
What probes are NOT for
Probes are widely misused as a generic monitoring layer. They are not.
Liveness probes are not for checking external dependencies. This is the single most damaging anti-pattern. If every pod's liveness check pings the database and the database has a brief outage, every pod restarts at the same time. The cluster goes from "degraded database" to "degraded database plus zero healthy application pods". The Google SRE Book is explicit that monitoring should not chain dependencies into alerting logic; the same caution applies to probes that take destructive action.
Probes are not a substitute for metrics. A liveness probe tells the kubelet "kill this pod"; it does not tell you why. Always pair probes with proper observability. The Kubernetes OpenTelemetry observability article walks through what to capture instead.
Readiness probes are not "startup-only". They run for the entire lifetime of the pod. A pod that was Ready five minutes ago can become not-Ready instantly when the probe starts failing. Treat them as "should this pod receive traffic right now?", not "has this pod finished booting?".
Probes are not application-level health monitoring. They check whether the kubelet should take a specific action: restart the container, or stop sending traffic. Anything more nuanced (queue depth, business-level correctness, downstream SLO breaches) belongs in metrics and alerting, not in probes.
Putting it together: complete Deployment manifest
A production-ready manifest for a Spring Boot service. Adjust the probe section to match the runtime block from the previous section.
apiVersion: apps/v1
kind: Deployment
metadata:
name: orders-api
labels:
app: orders-api
spec:
replicas: 3
selector:
matchLabels:
app: orders-api
template:
metadata:
labels:
app: orders-api
spec:
terminationGracePeriodSeconds: 60
containers:
- name: app
image: registry.internal/orders-api:1.7.4 # Spring Boot 3.4 image
ports:
- name: http
containerPort: 8080
# Startup probe: blocks the others until the JVM is up
startupProbe:
httpGet:
path: /actuator/health/liveness
port: http
periodSeconds: 10
failureThreshold: 30 # 5 minutes worst-case cold start
# Readiness probe: gate traffic on dependency health
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: http
periodSeconds: 10
timeoutSeconds: 3 # /readiness queries the database
failureThreshold: 3
# Liveness probe: last-resort restart on deadlock
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: http
periodSeconds: 15
timeoutSeconds: 3
failureThreshold: 4 # restart only after ~60s of failure
terminationGracePeriodSeconds: 10 # fast restart on deadlock (1.25+)
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
memory: "1Gi"
Apply it and verify the pods reach Ready without restart events:
kubectl apply -f deployment.yaml
kubectl rollout status deployment/orders-api --timeout=5m
kubectl describe pod -l app=orders-api | grep -A 3 "Events:"
Expected output: deployment "orders-api" successfully rolled out, and the events section should show probe successes only, no Unhealthy entries. If you see Liveness probe failed events while the application is still booting, the startup probe is misconfigured: raise failureThreshold until the boot window comfortably fits.
This manifest pairs naturally with the rollout settings in Kubernetes rolling updates and zero-downtime deployments, which adds maxUnavailable: 0, a preStop hook, and a PodDisruptionBudget for full production hardening.
What to do when probes misbehave
If your pods enter CrashLoopBackOff, get restarted under load, or simultaneously go not-Ready during a transient dependency hiccup, the symptoms map to specific configuration mistakes. The companion article How to configure Kubernetes health probes covers each in depth: the endpoint-design rules, the five common misconfiguration patterns, the diagnostic commands, and the escalation checklist when a probe keeps failing despite a configuration that looks correct.
For the related case where pods are 1/1 Running but get killed during deploys, the issue is not the probe but the endpoint-removal race condition during termination. That is covered by Kubernetes graceful shutdown.