DevOpsCraftDevOpsCraft
All posts
GrafanaLokiObservabilityKubernetes

Production Observability: Grafana + Loki from Scratch

2026-02-01 · 10 min read

Why Grafana + Loki?

Before Loki, most teams use the ELK stack (Elasticsearch + Logstash + Kibana). ELK is powerful but expensive and operationally heavy. Loki takes a different approach: index only the labels, not the full log content. Result: 10x cheaper storage, faster at scale.

Architecture overview

Applications → Promtail (agent) → Loki → Grafana

Kubernetes → kube-state-metrics → Prometheus → Grafana

Step 1: Deploy with Helm

# Add Grafana helm repo

helm repo add grafana https://grafana.github.io/helm-charts

helm repo update

# Deploy Loki (single binary mode for small/medium setups)

helm upgrade --install loki grafana/loki \

--namespace monitoring --create-namespace \

-f loki-values.yaml

# Deploy Promtail (log collector on every node)

helm upgrade --install promtail grafana/promtail \

--namespace monitoring \

--set config.lokiAddress=http://loki:3100/loki/api/v1/push

Step 2: Loki values for production

# loki-values.yaml

loki:

auth_enabled: false

storage:

type: s3

s3:

bucketnames: my-loki-logs

region: ap-southeast-1

limits_config:

retention_period: 30d

ingestion_rate_mb: 16

ingestion_burst_size_mb: 32

compactor:

retention_enabled: true

retention_delete_delay: 2h

Step 3: Useful LogQL queries

# All errors from a service

{namespace="production", app="api"} |= "error" | logfmt | level="error"

# Request rate per endpoint

rate({namespace="production", app="api"} | logfmt | path!="" [5m])

# P99 latency from structured logs

quantile_over_time(0.99, {app="api"} | logfmt | unwrap duration [5m])

Step 4: Connect Grafana

1. Add Loki as a datasource (URL: `http://loki:3100`)

2. Install the Kubernetes Monitoring dashboard (ID: 15760)

3. Create a logs panel linked to your metrics panels

The key feature: click on a spike in a Prometheus graph → Grafana automatically shows Loki logs for that exact time range and service. This correlation cuts debugging time in half.

Production checklist

  • [ ] S3 backend (not local storage)
  • [ ] Retention policy configured
  • [ ] Resource limits on Loki and Promtail
  • [ ] Structured logging in your applications (JSON)
  • [ ] Alert on Loki ingestion errors
  • [ ] Grafana dashboards per service
  • Need help implementing this?

    We set this up for teams every week. Book a free call and let's talk about your specific situation.

    Book a Discovery Call