Table of contents
- What you will learn
- Prerequisites
- How Kubernetes logging works (and where it breaks)
- Why Fluent Bit, not Fluentd
- Deploy Elasticsearch and Kibana with ECK
- Deploy Fluent Bit as a DaemonSet
- Verify logs reach Kibana
- Harden for production
- Common troubleshooting
- What you learned
- Where to go next
What you will learn
By the end of this tutorial you will have a working centralised logging pipeline: Fluent Bit collecting container logs from every node, enriching them with Kubernetes metadata, shipping them over TLS to Elasticsearch 8.x, and a Kibana data view ready for querying. You will understand why each configuration choice exists and what to change when your cluster grows.
Prerequisites
Before starting, make sure you have:
- A running Kubernetes cluster, version 1.25 or later, using containerd as the container runtime. Managed clusters (GKE, EKS, AKS) and local clusters (kind, minikube) both work. This tutorial targets Fluent Bit 5.0 and Elasticsearch 8.17.
- Helm 3 installed and configured to talk to your cluster.
kubectlconfigured with cluster-admin privileges. The ECK operator installs CRDs and cluster-wide resources.- Enough cluster capacity for the Elasticsearch StatefulSet: at least 3 nodes with 8 Gi memory available per ES pod. For a quick test on kind/minikube, a single-node ES setup works but is not production-grade.
- If you already have Prometheus running, Fluent Bit exposes a
/api/v1/metrics/prometheusendpoint you can scrape. Not required for this tutorial, but worth knowing.
How Kubernetes logging works (and where it breaks)
Containers write to stdout and stderr. The container runtime (containerd) captures that output and writes it to log files on the node at /var/log/pods/<namespace>_<pod>_<uid>/<container>/0.log. Symlinks at /var/log/containers/ provide a flat directory of all containers on the node.
The kubelet rotates these files. The defaults are 10 MiB per file (containerLogMaxSize) and 5 rotated files (containerLogMaxFiles). kubectl logs reads only the latest file, so it returns at most 10 MiB of the most recent output for one pod at a time.
That is fine for debugging a single pod in development. In production, it breaks down fast:
- Pod deletion removes logs. When a pod is deleted or evicted, the kubelet removes its log files from the node. There is no post-mortem analysis possible.
- No aggregation. A Deployment with 10 replicas means 10 separate
kubectl logscalls. Correlation across services requires manual effort. - API server load. Every
kubectl logscall routes through the API server to the kubelet. At scale (hundreds of pods, multiple teams), this becomes a bottleneck. - A 50-node cluster with 500 pods can generate 10-50 GB of logs per day. Manual retrieval is not realistic.
The Kubernetes documentation describes three cluster-level logging architectures. The most common is the node logging agent: a DaemonSet that reads container logs from the node filesystem and forwards them to a central store. That is the EFK pattern.
Why Fluent Bit, not Fluentd
Fluent Bit and Fluentd are both CNCF graduated projects under the Fluent umbrella, which causes confusion. They are distinct tools built for different roles.
| Fluentd | Fluent Bit | |
|---|---|---|
| Language | Ruby + C | Pure C |
| Memory footprint | ~40 MB base | Sub-30 MB working memory |
| Runtime dependencies | Ruby runtime, gem management | Zero external dependencies |
| Plugin ecosystem | 1,000+ Ruby gems | ~100 built-in plugins |
| CNCF status | Graduated | Graduated |
| Design role | Central aggregation, complex transforms | Node-level collection agent |
For a Kubernetes DaemonSet (one pod per node), the sub-30 MB footprint and zero-dependency binary make Fluent Bit the better fit. It handles log collection, Kubernetes metadata enrichment, and Elasticsearch forwarding in a single binary without needing Fluentd as an intermediary.
Deploy Elasticsearch and Kibana with ECK
The Elastic Cloud on Kubernetes (ECK) operator automates TLS certificate provisioning, rolling upgrades, and password management. Raw manifests work, but ECK removes a large surface area of manual certificate and lifecycle management.
Step 1: Install the ECK operator
helm repo add elastic https://helm.elastic.co
helm repo update
helm install elastic-operator elastic/eck-operator \
-n elastic-system --create-namespace
Verify the operator is running:
kubectl get pods -n elastic-system
Expected output:
NAME READY STATUS RESTARTS AGE
elastic-operator-<hash> 1/1 Running 0 45s
Step 2: Create the logging namespace and deploy Elasticsearch
# elasticsearch.yaml
apiVersion: v1
kind: Namespace
metadata:
name: logging
---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch
namespace: logging
spec:
version: 8.17.0
nodeSets:
- name: default
count: 3 # minimum for production HA
config:
node.store.allow_mmap: false # required for most container runtimes
podTemplate:
spec:
containers:
- name: elasticsearch
resources:
requests:
memory: 4Gi
cpu: 500m
limits:
memory: 8Gi
cpu: "2"
env:
- name: ES_JAVA_OPTS
value: "-Xms4g -Xmx4g" # heap = 50% of memory limit, identical min/max
kubectl apply -f elasticsearch.yaml
The JVM heap rule matters: set -Xms and -Xmx to the same value at exactly 50% of the pod memory limit. Never exceed 31 GB because above that threshold the JVM cannot use compressed object pointers, which wastes memory.
Wait for the cluster to go green:
kubectl get elasticsearch -n logging
Expected output (may take 2-3 minutes):
NAME HEALTH NODES VERSION PHASE AGE
elasticsearch green 3 8.17.0 Ready 3m
Step 3: Deploy Kibana
# kibana.yaml
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: kibana
namespace: logging
spec:
version: 8.17.0
count: 1
elasticsearchRef:
name: elasticsearch
kubectl apply -f kibana.yaml
Step 4: Retrieve the Elasticsearch password
ECK generates a password for the elastic superuser and stores it in a Secret:
kubectl get secret elasticsearch-es-elastic-user -n logging \
-o go-template='{{.data.elastic | base64decode}}'
Save this password. You will need it for both Fluent Bit configuration and Kibana login.
Checkpoint: Elasticsearch is green with 3 nodes, Kibana pod is Running, and you have the elastic password.
Deploy Fluent Bit as a DaemonSet
Step 5: Create the RBAC resources
Fluent Bit needs read access to pod and namespace metadata for the Kubernetes filter plugin:
# fluent-bit-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: logging
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluent-bit-read
rules:
- apiGroups: [""]
resources:
- namespaces
- pods
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: fluent-bit-read
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluent-bit-read
subjects:
- kind: ServiceAccount
name: fluent-bit
namespace: logging
kubectl apply -f fluent-bit-rbac.yaml
Step 6: Create the Elasticsearch credentials Secret
Store the credentials in a Secret (never in a ConfigMap, which is readable by any pod in the namespace by default):
kubectl create secret generic elasticsearch-credentials \
-n logging \
--from-literal=ES_USERNAME=elastic \
--from-literal=ES_PASSWORD=<paste-password-from-step-4>
Step 7: Copy the ECK CA certificate
ECK stores the CA certificate in elasticsearch-es-http-certs-public. Since Kubernetes Secrets are namespace-scoped, Fluent Bit can access it directly because both live in the logging namespace. If your Fluent Bit pods run in a different namespace, copy the Secret:
kubectl get secret elasticsearch-es-http-certs-public -n logging \
-o yaml | kubectl apply -n <fluent-bit-namespace> -f -
Step 8: Create the Fluent Bit configuration
This ConfigMap uses the YAML configuration format available since Fluent Bit 3.2. YAML simplifies Kubernetes integration because parser definitions can live in the same file, eliminating separate parser ConfigMaps.
# fluent-bit-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
data:
fluent-bit.yaml: |
service:
flush: 5 # flush interval in seconds
log_level: info
http_server: true # enables health and metrics endpoints
http_listen: 0.0.0.0
http_port: 2020
health_check: true
storage.metrics: true
pipeline:
inputs:
- name: tail
path: /var/log/containers/*.log
multiline.parser: docker, cri # handles both formats; containerd uses CRI
tag: kube.*
db: /var/log/flb_kube.db # tracks file position across restarts
mem_buf_limit: 5MB
skip_long_lines: true
refresh_interval: 10
filters:
- name: kubernetes
match: kube.*
kube_url: https://kubernetes.default.svc.cluster.local:443
kube_ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
kube_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
merge_log: true # parse JSON log fields and flatten into record
keep_log: false # drop raw log field after merge (saves storage)
labels: true
annotations: false # annotations tend to be large; enable selectively
outputs:
- name: es
match: kube.*
host: elasticsearch-es-http.logging.svc.cluster.local
port: 9200
http_user: ${ES_USERNAME}
http_passwd: ${ES_PASSWORD}
tls: true
tls.verify: true
tls.ca_file: /etc/ssl/elasticsearch/ca.crt
logstash_format: true # date-stamped indices for ILM compatibility
logstash_prefix: kubernetes # index name becomes kubernetes-2026.04.09
logstash_dateformat: "%Y.%m.%d"
replace_dots: true # ES rejects dots in field names
suppress_type_name: true # required for ES 8.x (mapping types removed)
retry_limit: 5 # finite retries prevent unbounded memory growth
time_key: "@timestamp"
trace_error: true # logs ES error responses for debugging
kubectl apply -f fluent-bit-config.yaml
The db parameter on the Tail input is critical. Without it, Fluent Bit loses its read position on restart and either re-reads all logs (duplicates) or misses logs written during the restart window.
The multiline.parser: docker, cri setting lets Fluent Bit detect the log format automatically. Modern clusters using containerd produce CRI-format logs with a P/F flag indicating partial or full lines. Older Docker-based clusters use JSON-wrapped log lines.
Step 9: Deploy the DaemonSet
# fluent-bit-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
spec:
selector:
matchLabels:
app: fluent-bit
template:
metadata:
labels:
app: fluent-bit
spec:
serviceAccountName: fluent-bit
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule # collect control-plane logs too
containers:
- name: fluent-bit
image: cr.fluentbit.io/fluent/fluent-bit:5.0
ports:
- containerPort: 2020
name: http
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 256Mi
livenessProbe:
httpGet:
path: /api/v1/health
port: 2020
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /api/v1/health
port: 2020
initialDelaySeconds: 10
periodSeconds: 5
envFrom:
- secretRef:
name: elasticsearch-credentials
volumeMounts:
- name: config
mountPath: /fluent-bit/etc/fluent-bit.yaml
subPath: fluent-bit.yaml
- name: varlog
mountPath: /var/log
readOnly: true
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: elasticsearch-ca
mountPath: /etc/ssl/elasticsearch
readOnly: true
volumes:
- name: config
configMap:
name: fluent-bit-config
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: elasticsearch-ca
secret:
secretName: elasticsearch-es-http-certs-public
kubectl apply -f fluent-bit-daemonset.yaml
Verify one pod per node:
kubectl get daemonset fluent-bit -n logging
Expected output:
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
fluent-bit 3 3 3 3 3 <none> 30s
Check the Fluent Bit logs for errors:
kubectl logs -n logging daemonset/fluent-bit --tail=20
If you see [output:es:es.0] ...connected and no [error] lines, the pipeline is healthy.
Checkpoint: Fluent Bit DaemonSet running on every node, no errors in logs, connected to Elasticsearch.
Verify logs reach Kibana
Step 10: Create a Kibana data view
Port-forward Kibana:
kubectl port-forward svc/kibana-kb-http 5601:5601 -n logging
Open https://localhost:5601 in your browser (accept the self-signed certificate). Log in with username elastic and the password from Step 4.
Navigate to Stack Management -> Data Views -> Create data view:
- Index pattern:
kubernetes-* - Timestamp field:
@timestamp - Name: Kubernetes Logs
Save the data view. In Kibana 8.x, "Data Views" replaced the older "Index Patterns" concept.
Go to Discover, select the "Kubernetes Logs" data view, and you should see log records with Kubernetes metadata fields: kubernetes.pod_name, kubernetes.namespace_name, kubernetes.container_name, and kubernetes.labels.*.
Checkpoint: Logs visible in Kibana Discover with Kubernetes metadata attached.
Harden for production
The setup so far works but lacks production durability. Three areas need attention.
Index lifecycle management
A 50-node cluster generating 50 GB/day accumulates 4.5 TB over 90 days. Without Index Lifecycle Management (ILM), storage grows until Elasticsearch runs out of disk.
A sensible default policy:
| Phase | Age | Action |
|---|---|---|
| Hot | 0-7 days | Active write index on fast storage. Rollover at 50 GB or 7 days. |
| Warm | 7-30 days | Read-only, force-merge to 1 segment, compressed. |
| Cold | 30-90 days | Frozen on cheapest storage tier. |
| Delete | 90+ days | Remove. |
ILM works because Fluent Bit's logstash_format: true creates date-stamped indices (kubernetes-2026.04.09) that map naturally to rollover policies.
Filesystem buffering
If Elasticsearch goes down, Fluent Bit's in-memory buffer fills. Without filesystem buffering, it either drops logs or applies backpressure to the Tail input, which stops log reads entirely.
Add to the service section of the ConfigMap:
service:
storage.path: /var/log/flb-storage/
storage.sync: normal
storage.checksum: false
storage.max_chunks_up: 128
storage.backlog.mem_limit: 5M
And set storage.type: filesystem on the Tail input:
inputs:
- name: tail
storage.type: filesystem # spills to disk when memory is full
# ... rest of input config
When storage.type: filesystem is set, the storage.max_chunks_up parameter controls how many chunks stay in memory. The rest spills to the path defined in storage.path.
Elasticsearch resource hardening
For production workloads, set requests equal to limits on the Elasticsearch pods. This gives them Guaranteed QoS class, which prevents kubelet eviction under memory pressure. Add a PodDisruptionBudget with maxUnavailable: 1 to prevent draining more than one ES node at a time.
Common troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| No logs in Kibana | Fluent Bit not reaching ES | Check kubectl logs daemonset/fluent-bit -n logging for failed to flush chunk errors. Verify TLS CA mount and credentials. |
_type unknown error in Fluent Bit logs |
ES 8.x removed mapping types | Add suppress_type_name: true to the ES output. |
| Logs missing after Fluent Bit restart | No position tracking | Add db: /var/log/flb_kube.db to the Tail input. |
| High memory on Fluent Bit pods | Infinite retries during ES outage | Set retry_limit: 5 instead of false (the Helm chart default). |
| No Kubernetes metadata on logs | RBAC missing | Verify the ClusterRole grants get, list, watch on pods and namespaces. |
| Kibana data view shows no results | Index pattern mismatch | The logstash_prefix in Fluent Bit must match the data view pattern. If prefix is kubernetes, the pattern is kubernetes-*. |
For deeper isolation, temporarily add a stdout output to the Fluent Bit config (name: stdout, match: kube.*). If logs appear in the Fluent Bit pod's own output but not in Elasticsearch, the problem is connectivity or authentication, not collection.
What you learned
You deployed a complete centralised logging pipeline: Fluent Bit reads container logs from the node filesystem via the Tail input, enriches records with Kubernetes metadata (pod name, namespace, labels), ships them over TLS to Elasticsearch 8.x, and Kibana provides the query interface. You know why the db file matters for restart safety, why suppress_type_name is required for ES 8.x, and where to start hardening for production (ILM, filesystem buffering, resource guarantees).
Where to go next
- If you do not yet have metrics monitoring alongside logs, the Prometheus and kube-prometheus-stack tutorial covers setting up metric collection and alerting on the same cluster.
- For understanding why Fluent Bit pods (or any pod) might show CPU throttling despite low average utilisation, see Kubernetes CPU throttling: why pods stall at low utilisation.
- The Fluent Bit project also supports Loki as an output destination. If you are already running Grafana for metrics via kube-prometheus-stack, Loki plus Grafana can replace Elasticsearch plus Kibana with a lower storage footprint, at the cost of less powerful full-text search.
- If you also want distributed traces and a unified wire format alongside logs, the OpenTelemetry on Kubernetes tutorial covers deploying the OpenTelemetry Collector. Keeping Fluent Bit for logs and using OpenTelemetry only for traces and metrics is a valid split when your log volume is high.