Zero-Downtime Deploys with Kubernetes
PodDisruptionBudgets, readiness probes, and rolling updates done right.
Zero-downtime deployment is one of those things that sounds simple and isn't. The theory is easy: keep old pods running while new ones start, route traffic only to healthy pods, drain old pods gracefully. The practice involves half a dozen moving parts that all need to be configured correctly, or you get 500s.
This post covers the configuration I use for production services.
What Can Go Wrong
Before the solution, the failure modes:
- New pods receive traffic before they're ready — the readiness probe isn't configured, so Kubernetes marks pods
Runningand routes traffic immediately, before your app has finished initializing. - Old pods are killed mid-request — the old pod gets SIGTERM but doesn't finish in-flight requests before shutting down.
- All pods are updated simultaneously —
maxUnavailable: 100%plusmaxSurge: 0means zero pods for the duration of the update. - PodDisruptionBudget isn't set — cluster autoscaler or node maintenance evicts all replicas of a service at once.
Each of these has a specific fix.
Readiness Probes
A readiness probe tells Kubernetes when a pod is ready to accept traffic. Until the probe passes, the pod is excluded from Service endpoints.
spec:
containers:
- name: api
image: myapp:v2
readinessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
successThreshold: 1
livenessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 5The key distinction:
- Readiness: gates traffic. A failing readiness probe removes the pod from the load balancer but doesn't restart it.
- Liveness: gates existence. A failing liveness probe restarts the pod.
Your /health endpoint should return 200 only when the app is fully initialized — database connections established, caches warm, etc.
Rolling Update Strategy
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1With maxUnavailable: 0 and maxSurge: 1, the update proceeds as:
- Kubernetes creates 1 new pod (now 4 total, 3 old + 1 new)
- Waits for the new pod's readiness probe to pass
- Terminates 1 old pod (back to 3, now 2 old + 1 new)
- Repeats until all old pods are replaced
At no point are there fewer than 3 healthy pods. Traffic is never interrupted.
Graceful Shutdown
When Kubernetes terminates a pod, it sends SIGTERM. Your app must:
- Stop accepting new connections
- Finish processing in-flight requests
- Close database connections cleanly
- Exit with code 0
// Express graceful shutdown
const server = app.listen(3000);
process.on('SIGTERM', () => {
console.log('SIGTERM received, shutting down gracefully');
server.close(() => {
// All connections closed
db.pool.end(() => {
process.exit(0);
});
});
// Force exit after 30s if shutdown hangs
setTimeout(() => {
console.error('Forcing shutdown after timeout');
process.exit(1);
}, 30_000);
});And in your Kubernetes config:
spec:
terminationGracePeriodSeconds: 60
containers:
- name: api
lifecycle:
preStop:
exec:
# Delay SIGTERM by 5s to let the load balancer drain
command: ['/bin/sh', '-c', 'sleep 5']The preStop hook matters because there's a race: Kubernetes sends SIGTERM and updates the Endpoints object at the same time, but kube-proxy (which updates iptables rules) may lag behind. If SIGTERM arrives before iptables is updated, new requests can still be routed to a pod that's shutting down. The 5-second sleep gives kube-proxy time to catch up.
PodDisruptionBudget
A PodDisruptionBudget (PDB) limits how many pods of a deployment can be disrupted simultaneously. "Disruption" includes voluntary evictions: node maintenance, cluster upgrades, autoscaler scale-downs.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: apiWith minAvailable: 2 and 3 replicas, only 1 pod can be evicted at a time. If two nodes are drained simultaneously, the second drain will block until the first completes.
Use minAvailable (absolute count) over maxUnavailable (percentage) for small replica counts. With 3 replicas, maxUnavailable: 33% rounds down to 0, making eviction impossible.
What PDB Doesn't Protect Against
PDB only protects against voluntary disruptions. Node failures, OOM kills, and pod crashes are involuntary and bypass PDB. For those, you need multiple replicas spread across availability zones.
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: apiPutting It Together
A deployment that survives rolling updates, node maintenance, and graceful shutdown:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
template:
spec:
terminationGracePeriodSeconds: 60
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api
containers:
- name: api
image: myapp:v2
lifecycle:
preStop:
exec:
command: ['/bin/sh', '-c', 'sleep 5']
readinessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 5The most common mistake I see is teams deploying with all of this configured except the preStop hook. They see zero errors in staging (no load balancer lag) and mysterious 500s in production during deploys.
Set up the hook. Measure with real traffic. Watch your error rates stay flat during deploys.