Zero-Downtime Deployments with GitLab CI + Kubernetes
2026-01-05 · 9 min read
What zero-downtime actually means
"Zero-downtime deployment" means no requests fail during a deploy. Not "we deploy at 3am when nobody is using the app." Here's how to achieve it properly.
The building blocks
1. **Rolling updates** in Kubernetes (built-in)
2. **Readiness probes** (Kubernetes won't route traffic until pod is ready)
3. **PreStop hook** (graceful shutdown)
4. **Pipeline that waits for rollout** (GitLab CI)
Kubernetes deployment config
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # One extra pod during deploy
maxUnavailable: 0 # Never reduce below 3 pods
template:
spec:
terminationGracePeriodSeconds: 30
containers:
- name: api
image: registry.example.com/api:${CI_COMMIT_SHA}
readinessProbe:
httpGet:
path: /readyz
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
GitLab CI pipeline
# .gitlab-ci.yml
stages:
- build
- test
- deploy
variables:
IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
build:
stage: build
script:
- docker build -t $IMAGE .
- docker push $IMAGE
test:
stage: test
script:
- docker run --rm $IMAGE npm test
deploy-production:
stage: deploy
environment: production
rules:
- if: $CI_COMMIT_BRANCH == "main"
script:
# Update image
- kubectl set image deployment/api api=$IMAGE -n production
# Wait for rollout to complete (fails if pods don't become ready)
- kubectl rollout status deployment/api -n production --timeout=5m
# Verify with a smoke test
- curl --fail https://api.example.com/healthz
after_script:
# Rollback if something went wrong
- |
if [ $CI_JOB_STATUS == "failed" ]; then
kubectl rollout undo deployment/api -n production
fi
Why `kubectl rollout status` is critical
Without `rollout status`, your pipeline says "deployed" as soon as `kubectl set image` runs — before any pods are actually updated. If the new image crashes, you won't know until a monitor fires.
With `rollout status --timeout=5m`, the pipeline blocks until all pods are Running + Ready, or fails if they don't become ready within 5 minutes. This is how you get real deployment signals.
Canary deployments for high-risk changes
For database migrations or major API changes, canary is safer:
# Deploy 1 canary pod alongside 9 stable pods
kubectl scale deployment api-stable --replicas=9
kubectl scale deployment api-canary --replicas=1
# Monitor error rate for 15 minutes
# If OK, promote canary to stable
# If not OK, scale canary to 0
The 5-minute smoke test habit
Always run a quick smoke test after deploy. A simple `curl --fail` on your health endpoint catches 80% of deployment issues before they affect users.
Need help implementing this?
We set this up for teams every week. Book a free call and let's talk about your specific situation.
Book a Discovery Call