KubeasyKubeasy
Concepts

Understanding Kubernetes Probes: Liveness, Readiness & Startup

Deep dive into Kubernetes health probes. When to use each type, configuration examples, and common mistakes to avoid.

Paul BrissaudPaul Brissaud
6 min read
#pods#beginner

Your pod is Running but traffic never reaches it. Or worse, Kubernetes keeps restarting a perfectly healthy container. Both problems usually come down to the same thing: misconfigured probes. Probes are how Kubernetes knows whether your application is alive, ready to serve, and has finished starting up. Get them right and your deployments are resilient. Get them wrong and you'll spend hours chasing phantom crashes.

Quick Answer

Kubernetes has three types of probes:

The key insight: liveness and readiness have different consequences. Liveness kills the container. Readiness just stops sending it traffic. Mixing them up is the most common probe mistake.


How Probes Work

The kubelet on each node runs the probes at regular intervals. Each probe performs a check and gets one of three results:

  • Success — the check passed
  • Failure — the check failed
  • Unknown — the check didn't complete (treated as failure)

Every probe type supports four check mechanisms:

HTTP GET

The kubelet sends an HTTP GET request. Any status code between 200 and 399 is a success.

httpGet:
  path: /health
  port: 8080

This is the most common approach. Your application exposes a health endpoint, and Kubernetes pings it.

TCP Socket

The kubelet tries to open a TCP connection. If the port is open, it's a success.

tcpSocket:
  port: 3306

Useful for databases and services that don't speak HTTP.

Exec Command

The kubelet runs a command inside the container. Exit code 0 is a success.

exec:
  command:
  - cat
  - /tmp/healthy

Useful when health depends on something a simple HTTP check can't capture.

gRPC

The kubelet sends a gRPC health check request following the gRPC Health Checking Protocol. A SERVING status is a success.

grpc:
  port: 50051
  service: myapp-liveness  # Optional: target a specific service

This was introduced as alpha in Kubernetes 1.24 and became stable (GA) in 1.27. Your application must implement the standard gRPC Health Checking Protocol (grpc.health.v1.Health). The port field is required and must be a number — unlike HTTP and TCP probes, you can't reference a port by name.

The optional service field lets you differentiate probe types on the same gRPC endpoint. For example, you can have your health server respond differently to myapp-liveness vs myapp-readiness requests, instead of running two separate gRPC servers. The Kubernetes project recommends concatenating your service name with the probe type (e.g. myservice-liveness) as a naming convention.

This is the natural choice for gRPC-based microservices — no need to bolt on an HTTP endpoint just for health checks.

Important caveats:

  • gRPC probes don't support TLS or authentication parameters
  • Configuration errors (wrong port, unimplemented protocol) count as probe failures
  • The probe runs against the pod IP, so make sure your gRPC endpoint listens on 0.0.0.0, not just localhost

Startup Probe

The startup probe runs first, before liveness and readiness kick in. It answers one question: has the application finished its initialization?

startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30
  periodSeconds: 10

This gives the application up to 300 seconds (30 × 10s) to start. During this time, liveness and readiness probes are disabled. Once the startup probe succeeds, it never runs again — liveness and readiness take over.

When to Use It

Use a startup probe when your application has a slow or unpredictable startup time. Java apps loading Spring contexts, apps running database migrations on boot, or services that need to warm up caches are all good candidates.

Without a startup probe, you'd have to inflate initialDelaySeconds on your liveness probe — which means Kubernetes can't detect a truly dead container during that window.

What Happens on Failure

If the startup probe exhausts all its attempts (failureThreshold reached), Kubernetes kills and restarts the container. This is the same behavior as a liveness probe failure.


Liveness Probe

The liveness probe runs continuously after the startup probe succeeds (or immediately if there's no startup probe). It answers: is this container still functioning?

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 10
  failureThreshold: 3
  timeoutSeconds: 5

When to Use It

Use a liveness probe when your application can enter a broken state that it can't recover from on its own. Deadlocked threads, corrupted internal state, or infinite loops are classic examples.

What Happens on Failure

After failureThreshold consecutive failures, Kubernetes kills the container and restarts it according to the pod's restartPolicy. This is a hard reset — the process is terminated and a new one starts.

When NOT to Use It

Don't point your liveness probe at a dependency. If your app's /healthz checks the database and the database goes down, Kubernetes will restart all your pods — making things worse, not better. Liveness should check whether this container is healthy, not whether the entire system is.


Readiness Probe

The readiness probe also runs continuously. It answers: can this container handle incoming requests right now?

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  periodSeconds: 5
  failureThreshold: 3
  timeoutSeconds: 3

When to Use It

Use a readiness probe when your application might temporarily be unable to serve traffic. Loading a large dataset, waiting for a cache to warm up, or experiencing backpressure from a downstream service are all good reasons.

What Happens on Failure

The pod is removed from Service endpoints. No traffic is routed to it. Crucially, the container is not restarted — it keeps running. Once the readiness probe succeeds again, the pod is added back to the endpoints.

This is the fundamental difference from liveness: readiness is gentle. It gives your app a chance to recover on its own.

Readiness vs Liveness: The Critical Distinction


Probe Configuration Parameters

All three probe types share the same configuration options:

The time before Kubernetes takes action on failure is: periodSeconds × failureThreshold. With defaults, that's 30 seconds (10s × 3).


Complete Example

Here's a production-ready configuration for a typical web application:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: web-app:v1
        ports:
        - containerPort: 8080
        startupProbe:
          httpGet:
            path: /health
            port: 8080
          failureThreshold: 30
          periodSeconds: 10
          # Allows up to 5 min to start
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          periodSeconds: 15
          failureThreshold: 3
          timeoutSeconds: 5
          # Restarts after 45s of failures
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          periodSeconds: 5
          failureThreshold: 3
          timeoutSeconds: 3
          # Removes from traffic after 15s of failures

Notice the three different endpoints. This is intentional:

  • /health — basic "am I booted?" check (startup)
  • /healthz — "am I alive and not deadlocked?" check (liveness)
  • /ready — "can I handle requests?" which might check downstream dependencies (readiness)

Common Mistakes

Mistake 1: Using the Same Endpoint for All Probes

If your /health endpoint checks the database, and you use it for liveness, a database outage will restart all your pods. Use separate endpoints with different logic.

Mistake 2: No Startup Probe on Slow Apps

Without a startup probe, you'll set initialDelaySeconds: 120 on your liveness probe. During those 120 seconds, if your app crashes, Kubernetes won't know. A startup probe is more precise.

Mistake 3: Liveness Probe Too Aggressive

# Don't do this
livenessProbe:
  periodSeconds: 1
  failureThreshold: 1
  timeoutSeconds: 1

One slow response and your container gets killed. Use reasonable thresholds — a liveness failure should mean the container is truly broken, not just momentarily slow.

Mistake 4: Missing Readiness Probe

Without a readiness probe, Kubernetes sends traffic to your pod as soon as the container starts. If your app takes 10 seconds to initialize, users will see errors during that window. Always add a readiness probe.

Mistake 5: timeoutSeconds Too Low

The default timeoutSeconds is 1 second. If your health endpoint queries a database or external service, 1 second might not be enough under load. A timeout failure counts as a probe failure.


Troubleshooting Probe Issues

Pod Keeps Restarting (CrashLoopBackOff)

If Kubernetes keeps killing your container during startup, the liveness probe is probably firing before the app is ready.

Diagnose:

kubectl describe pod <pod-name>

Look for events like:

Warning  Unhealthy  kubelet  Liveness probe failed: connection refused
Normal   Killing    kubelet  Container failed liveness probe, will be restarted

Fix: Add a startup probe or increase initialDelaySeconds on the liveness probe.

Pod Running but Not Receiving Traffic

If your pod is Running but shows 0/1 READY, the readiness probe is failing.

Diagnose:

# Check readiness status
kubectl get pods

# Check probe failures
kubectl describe pod <pod-name>

# Test the endpoint from inside the pod
kubectl exec <pod-name> -- curl -s localhost:8080/ready

Fix: Check what the readiness endpoint returns. Common issues: the app is waiting for a dependency, the endpoint path is wrong, or the port doesn't match.

Intermittent Restarts Under Load

If pods restart during traffic spikes, the liveness probe might be timing out when the app is busy.

Fix: Increase timeoutSeconds and failureThreshold on the liveness probe. Consider separating the liveness endpoint from any heavy logic — it should be as lightweight as possible.


Practice This Scenario

Theory is useful, but nothing beats hands-on troubleshooting. This Kubeasy challenge drops you into a broken cluster and asks you to fix it:

Start the Probes Drift Challenge →

A notification service keeps getting killed mid-startup, even though the app itself is fine. You'll investigate why Kubernetes is restarting a healthy container and fix the probe configuration. (~15 min, medium difficulty)


Prevention Tips

  1. Always implement a readiness probe — This is the bare minimum. Without it, users will hit your app before it's ready
  2. Use startup probes for slow apps — They're more precise than inflating initialDelaySeconds
  3. Keep liveness checks lightweight — A simple "am I alive" check, not a full dependency audit
  4. Never check external dependencies in liveness — A database outage shouldn't cascade-restart all your pods
  5. Test probe behavior locally — Use kubectl port-forward and curl to verify your health endpoints return what you expect
  6. Monitor probe failures — Prometheus can scrape kube_pod_container_status_restarts_total to catch flapping probes early

Written by

Paul Brissaud

Paul Brissaud

Paul Brissaud is a DevOps / Platform Engineer and the creator of Kubeasy. He believes Kubernetes education is often too theoretical and that real understanding comes from hands-on, failure-driven learning.

Related Articles