DevOpsCraftDevOpsCraft
All posts
GitLab CIKubernetesCI/CD

Zero-Downtime Deployments with GitLab CI + Kubernetes

2026-01-05 · 9 min read

What zero-downtime actually means

"Zero-downtime deployment" means no requests fail during a deploy. Not "we deploy at 3am when nobody is using the app." Here's how to achieve it properly.

The building blocks

1. **Rolling updates** in Kubernetes (built-in)

2. **Readiness probes** (Kubernetes won't route traffic until pod is ready)

3. **PreStop hook** (graceful shutdown)

4. **Pipeline that waits for rollout** (GitLab CI)

Kubernetes deployment config

apiVersion: apps/v1

kind: Deployment

metadata:

name: api

spec:

replicas: 3

strategy:

type: RollingUpdate

rollingUpdate:

maxSurge: 1 # One extra pod during deploy

maxUnavailable: 0 # Never reduce below 3 pods

template:

spec:

terminationGracePeriodSeconds: 30

containers:

- name: api

image: registry.example.com/api:${CI_COMMIT_SHA}

readinessProbe:

httpGet:

path: /readyz

port: 8080

initialDelaySeconds: 5

periodSeconds: 5

lifecycle:

preStop:

exec:

command: ["/bin/sh", "-c", "sleep 5"]

GitLab CI pipeline

# .gitlab-ci.yml

stages:

- build

- test

- deploy

variables:

IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA

build:

stage: build

script:

- docker build -t $IMAGE .

- docker push $IMAGE

test:

stage: test

script:

- docker run --rm $IMAGE npm test

deploy-production:

stage: deploy

environment: production

rules:

- if: $CI_COMMIT_BRANCH == "main"

script:

# Update image

- kubectl set image deployment/api api=$IMAGE -n production

# Wait for rollout to complete (fails if pods don't become ready)

- kubectl rollout status deployment/api -n production --timeout=5m

# Verify with a smoke test

- curl --fail https://api.example.com/healthz

after_script:

# Rollback if something went wrong

- |

if [ $CI_JOB_STATUS == "failed" ]; then

kubectl rollout undo deployment/api -n production

fi

Why `kubectl rollout status` is critical

Without `rollout status`, your pipeline says "deployed" as soon as `kubectl set image` runs — before any pods are actually updated. If the new image crashes, you won't know until a monitor fires.

With `rollout status --timeout=5m`, the pipeline blocks until all pods are Running + Ready, or fails if they don't become ready within 5 minutes. This is how you get real deployment signals.

Canary deployments for high-risk changes

For database migrations or major API changes, canary is safer:

# Deploy 1 canary pod alongside 9 stable pods

kubectl scale deployment api-stable --replicas=9

kubectl scale deployment api-canary --replicas=1

# Monitor error rate for 15 minutes

# If OK, promote canary to stable

# If not OK, scale canary to 0

The 5-minute smoke test habit

Always run a quick smoke test after deploy. A simple `curl --fail` on your health endpoint catches 80% of deployment issues before they affect users.

Need help implementing this?

We set this up for teams every week. Book a free call and let's talk about your specific situation.

Book a Discovery Call