Kubernetes cluster logging with Fluent Bit and the EFK stack

Container logs disappear the moment a pod is deleted. kubectl logs shows only the latest 10 MiB rotation file for a single pod. For anything beyond local debugging, you need a centralised logging pipeline. This tutorial walks through deploying Fluent Bit as a DaemonSet, shipping logs to Elasticsearch 8.x via TLS, and querying them in Kibana.

Table of contents

What you will learn

By the end of this tutorial you will have a working centralised logging pipeline: Fluent Bit collecting container logs from every node, enriching them with Kubernetes metadata, shipping them over TLS to Elasticsearch 8.x, and a Kibana data view ready for querying. You will understand why each configuration choice exists and what to change when your cluster grows.

Prerequisites

Before starting, make sure you have:

  • A running Kubernetes cluster, version 1.25 or later, using containerd as the container runtime. Managed clusters (GKE, EKS, AKS) and local clusters (kind, minikube) both work. This tutorial targets Fluent Bit 5.0 and Elasticsearch 8.17.
  • Helm 3 installed and configured to talk to your cluster.
  • kubectl configured with cluster-admin privileges. The ECK operator installs CRDs and cluster-wide resources.
  • Enough cluster capacity for the Elasticsearch StatefulSet: at least 3 nodes with 8 Gi memory available per ES pod. For a quick test on kind/minikube, a single-node ES setup works but is not production-grade.
  • If you already have Prometheus running, Fluent Bit exposes a /api/v1/metrics/prometheus endpoint you can scrape. Not required for this tutorial, but worth knowing.

How Kubernetes logging works (and where it breaks)

Containers write to stdout and stderr. The container runtime (containerd) captures that output and writes it to log files on the node at /var/log/pods/<namespace>_<pod>_<uid>/<container>/0.log. Symlinks at /var/log/containers/ provide a flat directory of all containers on the node.

The kubelet rotates these files. The defaults are 10 MiB per file (containerLogMaxSize) and 5 rotated files (containerLogMaxFiles). kubectl logs reads only the latest file, so it returns at most 10 MiB of the most recent output for one pod at a time.

That is fine for debugging a single pod in development. In production, it breaks down fast:

  • Pod deletion removes logs. When a pod is deleted or evicted, the kubelet removes its log files from the node. There is no post-mortem analysis possible.
  • No aggregation. A Deployment with 10 replicas means 10 separate kubectl logs calls. Correlation across services requires manual effort.
  • API server load. Every kubectl logs call routes through the API server to the kubelet. At scale (hundreds of pods, multiple teams), this becomes a bottleneck.
  • A 50-node cluster with 500 pods can generate 10-50 GB of logs per day. Manual retrieval is not realistic.

The Kubernetes documentation describes three cluster-level logging architectures. The most common is the node logging agent: a DaemonSet that reads container logs from the node filesystem and forwards them to a central store. That is the EFK pattern.

Why Fluent Bit, not Fluentd

Fluent Bit and Fluentd are both CNCF graduated projects under the Fluent umbrella, which causes confusion. They are distinct tools built for different roles.

Fluentd Fluent Bit
Language Ruby + C Pure C
Memory footprint ~40 MB base Sub-30 MB working memory
Runtime dependencies Ruby runtime, gem management Zero external dependencies
Plugin ecosystem 1,000+ Ruby gems ~100 built-in plugins
CNCF status Graduated Graduated
Design role Central aggregation, complex transforms Node-level collection agent

For a Kubernetes DaemonSet (one pod per node), the sub-30 MB footprint and zero-dependency binary make Fluent Bit the better fit. It handles log collection, Kubernetes metadata enrichment, and Elasticsearch forwarding in a single binary without needing Fluentd as an intermediary.

Deploy Elasticsearch and Kibana with ECK

The Elastic Cloud on Kubernetes (ECK) operator automates TLS certificate provisioning, rolling upgrades, and password management. Raw manifests work, but ECK removes a large surface area of manual certificate and lifecycle management.

Step 1: Install the ECK operator

helm repo add elastic https://helm.elastic.co
helm repo update
helm install elastic-operator elastic/eck-operator \
  -n elastic-system --create-namespace

Verify the operator is running:

kubectl get pods -n elastic-system

Expected output:

NAME                                READY   STATUS    RESTARTS   AGE
elastic-operator-<hash>             1/1     Running   0          45s

Step 2: Create the logging namespace and deploy Elasticsearch

# elasticsearch.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: logging
---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch
  namespace: logging
spec:
  version: 8.17.0
  nodeSets:
    - name: default
      count: 3                         # minimum for production HA
      config:
        node.store.allow_mmap: false   # required for most container runtimes
      podTemplate:
        spec:
          containers:
            - name: elasticsearch
              resources:
                requests:
                  memory: 4Gi
                  cpu: 500m
                limits:
                  memory: 8Gi
                  cpu: "2"
              env:
                - name: ES_JAVA_OPTS
                  value: "-Xms4g -Xmx4g" # heap = 50% of memory limit, identical min/max
kubectl apply -f elasticsearch.yaml

The JVM heap rule matters: set -Xms and -Xmx to the same value at exactly 50% of the pod memory limit. Never exceed 31 GB because above that threshold the JVM cannot use compressed object pointers, which wastes memory.

Wait for the cluster to go green:

kubectl get elasticsearch -n logging

Expected output (may take 2-3 minutes):

NAME            HEALTH   NODES   VERSION   PHASE   AGE
elasticsearch   green    3       8.17.0    Ready   3m

Step 3: Deploy Kibana

# kibana.yaml
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
  namespace: logging
spec:
  version: 8.17.0
  count: 1
  elasticsearchRef:
    name: elasticsearch
kubectl apply -f kibana.yaml

Step 4: Retrieve the Elasticsearch password

ECK generates a password for the elastic superuser and stores it in a Secret:

kubectl get secret elasticsearch-es-elastic-user -n logging \
  -o go-template='{{.data.elastic | base64decode}}'

Save this password. You will need it for both Fluent Bit configuration and Kibana login.

Checkpoint: Elasticsearch is green with 3 nodes, Kibana pod is Running, and you have the elastic password.

Deploy Fluent Bit as a DaemonSet

Step 5: Create the RBAC resources

Fluent Bit needs read access to pod and namespace metadata for the Kubernetes filter plugin:

# fluent-bit-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluent-bit
  namespace: logging
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluent-bit-read
rules:
  - apiGroups: [""]
    resources:
      - namespaces
      - pods
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: fluent-bit-read
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fluent-bit-read
subjects:
  - kind: ServiceAccount
    name: fluent-bit
    namespace: logging
kubectl apply -f fluent-bit-rbac.yaml

Step 6: Create the Elasticsearch credentials Secret

Store the credentials in a Secret (never in a ConfigMap, which is readable by any pod in the namespace by default):

kubectl create secret generic elasticsearch-credentials \
  -n logging \
  --from-literal=ES_USERNAME=elastic \
  --from-literal=ES_PASSWORD=<paste-password-from-step-4>

Step 7: Copy the ECK CA certificate

ECK stores the CA certificate in elasticsearch-es-http-certs-public. Since Kubernetes Secrets are namespace-scoped, Fluent Bit can access it directly because both live in the logging namespace. If your Fluent Bit pods run in a different namespace, copy the Secret:

kubectl get secret elasticsearch-es-http-certs-public -n logging \
  -o yaml | kubectl apply -n <fluent-bit-namespace> -f -

Step 8: Create the Fluent Bit configuration

This ConfigMap uses the YAML configuration format available since Fluent Bit 3.2. YAML simplifies Kubernetes integration because parser definitions can live in the same file, eliminating separate parser ConfigMaps.

# fluent-bit-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
data:
  fluent-bit.yaml: |
    service:
      flush: 5                    # flush interval in seconds
      log_level: info
      http_server: true           # enables health and metrics endpoints
      http_listen: 0.0.0.0
      http_port: 2020
      health_check: true
      storage.metrics: true

    pipeline:
      inputs:
        - name: tail
          path: /var/log/containers/*.log
          multiline.parser: docker, cri  # handles both formats; containerd uses CRI
          tag: kube.*
          db: /var/log/flb_kube.db       # tracks file position across restarts
          mem_buf_limit: 5MB
          skip_long_lines: true
          refresh_interval: 10

      filters:
        - name: kubernetes
          match: kube.*
          kube_url: https://kubernetes.default.svc.cluster.local:443
          kube_ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          kube_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          merge_log: true          # parse JSON log fields and flatten into record
          keep_log: false          # drop raw log field after merge (saves storage)
          labels: true
          annotations: false       # annotations tend to be large; enable selectively

      outputs:
        - name: es
          match: kube.*
          host: elasticsearch-es-http.logging.svc.cluster.local
          port: 9200
          http_user: ${ES_USERNAME}
          http_passwd: ${ES_PASSWORD}
          tls: true
          tls.verify: true
          tls.ca_file: /etc/ssl/elasticsearch/ca.crt
          logstash_format: true           # date-stamped indices for ILM compatibility
          logstash_prefix: kubernetes     # index name becomes kubernetes-2026.04.09
          logstash_dateformat: "%Y.%m.%d"
          replace_dots: true              # ES rejects dots in field names
          suppress_type_name: true        # required for ES 8.x (mapping types removed)
          retry_limit: 5                  # finite retries prevent unbounded memory growth
          time_key: "@timestamp"
          trace_error: true               # logs ES error responses for debugging
kubectl apply -f fluent-bit-config.yaml

The db parameter on the Tail input is critical. Without it, Fluent Bit loses its read position on restart and either re-reads all logs (duplicates) or misses logs written during the restart window.

The multiline.parser: docker, cri setting lets Fluent Bit detect the log format automatically. Modern clusters using containerd produce CRI-format logs with a P/F flag indicating partial or full lines. Older Docker-based clusters use JSON-wrapped log lines.

Step 9: Deploy the DaemonSet

# fluent-bit-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      serviceAccountName: fluent-bit
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule         # collect control-plane logs too
      containers:
        - name: fluent-bit
          image: cr.fluentbit.io/fluent/fluent-bit:5.0
          ports:
            - containerPort: 2020
              name: http
          resources:
            requests:
              cpu: 50m
              memory: 64Mi
            limits:
              cpu: 200m
              memory: 256Mi
          livenessProbe:
            httpGet:
              path: /api/v1/health
              port: 2020
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /api/v1/health
              port: 2020
            initialDelaySeconds: 10
            periodSeconds: 5
          envFrom:
            - secretRef:
                name: elasticsearch-credentials
          volumeMounts:
            - name: config
              mountPath: /fluent-bit/etc/fluent-bit.yaml
              subPath: fluent-bit.yaml
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: elasticsearch-ca
              mountPath: /etc/ssl/elasticsearch
              readOnly: true
      volumes:
        - name: config
          configMap:
            name: fluent-bit-config
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: elasticsearch-ca
          secret:
            secretName: elasticsearch-es-http-certs-public
kubectl apply -f fluent-bit-daemonset.yaml

Verify one pod per node:

kubectl get daemonset fluent-bit -n logging

Expected output:

NAME         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
fluent-bit   3         3         3       3             3           <none>          30s

Check the Fluent Bit logs for errors:

kubectl logs -n logging daemonset/fluent-bit --tail=20

If you see [output:es:es.0] ...connected and no [error] lines, the pipeline is healthy.

Checkpoint: Fluent Bit DaemonSet running on every node, no errors in logs, connected to Elasticsearch.

Verify logs reach Kibana

Step 10: Create a Kibana data view

Port-forward Kibana:

kubectl port-forward svc/kibana-kb-http 5601:5601 -n logging

Open https://localhost:5601 in your browser (accept the self-signed certificate). Log in with username elastic and the password from Step 4.

Navigate to Stack Management -> Data Views -> Create data view:

  • Index pattern: kubernetes-*
  • Timestamp field: @timestamp
  • Name: Kubernetes Logs

Save the data view. In Kibana 8.x, "Data Views" replaced the older "Index Patterns" concept.

Go to Discover, select the "Kubernetes Logs" data view, and you should see log records with Kubernetes metadata fields: kubernetes.pod_name, kubernetes.namespace_name, kubernetes.container_name, and kubernetes.labels.*.

Checkpoint: Logs visible in Kibana Discover with Kubernetes metadata attached.

Harden for production

The setup so far works but lacks production durability. Three areas need attention.

Index lifecycle management

A 50-node cluster generating 50 GB/day accumulates 4.5 TB over 90 days. Without Index Lifecycle Management (ILM), storage grows until Elasticsearch runs out of disk.

A sensible default policy:

Phase Age Action
Hot 0-7 days Active write index on fast storage. Rollover at 50 GB or 7 days.
Warm 7-30 days Read-only, force-merge to 1 segment, compressed.
Cold 30-90 days Frozen on cheapest storage tier.
Delete 90+ days Remove.

ILM works because Fluent Bit's logstash_format: true creates date-stamped indices (kubernetes-2026.04.09) that map naturally to rollover policies.

Filesystem buffering

If Elasticsearch goes down, Fluent Bit's in-memory buffer fills. Without filesystem buffering, it either drops logs or applies backpressure to the Tail input, which stops log reads entirely.

Add to the service section of the ConfigMap:

service:
  storage.path: /var/log/flb-storage/
  storage.sync: normal
  storage.checksum: false
  storage.max_chunks_up: 128
  storage.backlog.mem_limit: 5M

And set storage.type: filesystem on the Tail input:

inputs:
  - name: tail
    storage.type: filesystem    # spills to disk when memory is full
    # ... rest of input config

When storage.type: filesystem is set, the storage.max_chunks_up parameter controls how many chunks stay in memory. The rest spills to the path defined in storage.path.

Elasticsearch resource hardening

For production workloads, set requests equal to limits on the Elasticsearch pods. This gives them Guaranteed QoS class, which prevents kubelet eviction under memory pressure. Add a PodDisruptionBudget with maxUnavailable: 1 to prevent draining more than one ES node at a time.

Common troubleshooting

Symptom Likely cause Fix
No logs in Kibana Fluent Bit not reaching ES Check kubectl logs daemonset/fluent-bit -n logging for failed to flush chunk errors. Verify TLS CA mount and credentials.
_type unknown error in Fluent Bit logs ES 8.x removed mapping types Add suppress_type_name: true to the ES output.
Logs missing after Fluent Bit restart No position tracking Add db: /var/log/flb_kube.db to the Tail input.
High memory on Fluent Bit pods Infinite retries during ES outage Set retry_limit: 5 instead of false (the Helm chart default).
No Kubernetes metadata on logs RBAC missing Verify the ClusterRole grants get, list, watch on pods and namespaces.
Kibana data view shows no results Index pattern mismatch The logstash_prefix in Fluent Bit must match the data view pattern. If prefix is kubernetes, the pattern is kubernetes-*.

For deeper isolation, temporarily add a stdout output to the Fluent Bit config (name: stdout, match: kube.*). If logs appear in the Fluent Bit pod's own output but not in Elasticsearch, the problem is connectivity or authentication, not collection.

What you learned

You deployed a complete centralised logging pipeline: Fluent Bit reads container logs from the node filesystem via the Tail input, enriches records with Kubernetes metadata (pod name, namespace, labels), ships them over TLS to Elasticsearch 8.x, and Kibana provides the query interface. You know why the db file matters for restart safety, why suppress_type_name is required for ES 8.x, and where to start hardening for production (ILM, filesystem buffering, resource guarantees).

Where to go next

  • If you do not yet have metrics monitoring alongside logs, the Prometheus and kube-prometheus-stack tutorial covers setting up metric collection and alerting on the same cluster.
  • For understanding why Fluent Bit pods (or any pod) might show CPU throttling despite low average utilisation, see Kubernetes CPU throttling: why pods stall at low utilisation.
  • The Fluent Bit project also supports Loki as an output destination. If you are already running Grafana for metrics via kube-prometheus-stack, Loki plus Grafana can replace Elasticsearch plus Kibana with a lower storage footprint, at the cost of less powerful full-text search.
  • If you also want distributed traces and a unified wire format alongside logs, the OpenTelemetry on Kubernetes tutorial covers deploying the OpenTelemetry Collector. Keeping Fluent Bit for logs and using OpenTelemetry only for traces and metrics is a valid split when your log volume is high.

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy

Search this site

Start typing to search, or browse the knowledge base and blog.