Kubernetes monitoring with Prometheus and kube-prometheus-stack

A production Kubernetes cluster without observability is a cluster you are guessing about. This tutorial walks through installing kube-prometheus-stack via Helm, understanding what each component does, scraping your own application metrics with ServiceMonitor, writing alerting rules, routing alerts, and knowing when to add remote storage.

What you will learn
Prerequisites
Install kube-prometheus-stack with Helm
What the stack deploys
Access the dashboards
Monitor your own applications with ServiceMonitor
Create alerting rules with PrometheusRule
Route alerts with Alertmanager
When local Prometheus is not enough
Common gotchas
What you learned
Where to go next

What you will learn

By the end of this tutorial you will have a working Prometheus, Alertmanager, and Grafana stack running on your cluster. You will know how to scrape metrics from your own applications, write alerting rules that fire when things go wrong, route those alerts to Slack or PagerDuty, and understand when local Prometheus storage is no longer enough.

Prerequisites

Before starting, make sure you have:

A running Kubernetes cluster, version 1.25 or later. Anything from a local kind/minikube cluster to a managed GKE/EKS/AKS cluster works. This tutorial was written against chart version 83.4.0, which bundles Prometheus Operator v0.90.1 and Prometheus 3.x.
Helm 3 installed and configured to talk to your cluster.
kubectl configured with cluster-admin privileges (the chart installs CRDs and cluster-wide resources).
Familiarity with Kubernetes Deployments, Services, and namespaces. If you need to brush up on Services, the Kubernetes Services guide covers the Service types relevant for metric endpoints.

Install kube-prometheus-stack with Helm

The kube-prometheus-stack Helm chart is the dominant installation path for Prometheus on Kubernetes. It bundles Prometheus Operator, Prometheus server, Alertmanager, Grafana, kube-state-metrics, and node-exporter into a single chart.

A distinction worth knowing: the name kube-prometheus (without "stack") refers to a separate jsonnet-based project under the Prometheus Operator GitHub organization. The Helm chart is a community-maintained artifact under prometheus-community. This tutorial covers the Helm chart.

Minimal installation

Two commands get you a running stack:

# Add the prometheus-community Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install into a dedicated monitoring namespace
helm install kube-prom-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace

This installs everything with default settings: a single Prometheus replica, a single Alertmanager replica, Grafana with an ephemeral SQLite database, and no persistent storage. Good enough to explore. Not good enough for production.

Production values

For a cluster you care about, create a values.yaml that addresses the three gaps the defaults leave open: persistence, high availability, and resource limits.

# values.yaml for kube-prometheus-stack 83.4.0
prometheus:
  prometheusSpec:
    replicas: 2                    # HA: two Prometheus replicas
    podAntiAffinity: "hard"        # spread across nodes
    retention: "15d"               # default; explicit is clearer
    externalLabels:
      cluster: prod-eu1            # identifies this cluster in multi-cluster setups
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: "standard"
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi        # size depends on series count; see storage section
    resources:
      requests:
        memory: 2Gi
        cpu: "1"
      limits:
        memory: 4Gi
        cpu: "2"

alertmanager:
  alertmanagerSpec:
    replicas: 3                    # HA: three replicas for quorum
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: "standard"
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi

grafana:
  enabled: true
  persistence:
    enabled: true
    storageClassName: "standard"
    size: 10Gi
  # Do not commit plaintext passwords. Use an existing Secret instead:
  # admin:
  #   existingSecret: grafana-admin-credentials
  #   userKey: admin-user
  #   passwordKey: admin-password

Install with the values file:

helm install kube-prom-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --values values.yaml

Checkpoint. After a minute or two, verify the pods are running:

kubectl get pods -n monitoring

You should see pods for prometheus-kube-prom-stack-*, alertmanager-kube-prom-stack-*, kube-prom-stack-grafana-*, kube-prom-stack-kube-state-metrics-*, and kube-prom-stack-prometheus-node-exporter-* in Running state. If any are stuck in Pending, check whether your cluster has a StorageClass named standard and enough node capacity. The resource requests and limits guide explains how requests affect scheduling.

What the stack deploys

The chart installs six components. Understanding what each one does prevents a lot of confusion later.

Prometheus Operator (v0.90.1). A controller that watches Custom Resource Definitions (CRDs) like ServiceMonitor, PodMonitor, and PrometheusRule. When you create a ServiceMonitor, the Operator automatically generates the corresponding Prometheus scrape configuration. You never edit scrape_configs by hand.

Prometheus (3.x). The time-series database and scraper. It pulls metrics from /metrics endpoints at a default interval of 15 seconds, stores them in a local TSDB, and evaluates alerting rules. Queries use PromQL.

Alertmanager. Receives alerts from Prometheus, deduplicates them, groups them, and routes them to notification channels (Slack, PagerDuty, email, webhooks). Runs its own gossip protocol on port 9094 for state sync between replicas.

Grafana (11.x). Dashboards and visualization. The chart ships with pre-built dashboards for cluster overview, node health, pod resources, and namespace workloads.

kube-state-metrics (v2.18.0). A single pod that watches the Kubernetes API server and exposes the state of Kubernetes objects as Prometheus metrics. Deployment replica counts, pod phases, job completion status, node conditions. It does not measure resource usage; it measures object state.

node-exporter. A DaemonSet (one pod per node) that exposes host-level hardware and OS metrics: CPU utilization, memory usage, disk I/O, filesystem capacity, network throughput. Metrics use the node_ prefix and are scraped on port 9100.

What about metrics-server?

A common misconception: metrics-server and Prometheus serve the same purpose. They do not. metrics-server provides ephemeral, in-memory CPU and memory snapshots consumed by kubectl top, the Horizontal Pod Autoscaler (HPA), and the Vertical Pod Autoscaler (VPA). It has no query language, no alerting, and no persistent storage. Prometheus provides the full observability stack. A production cluster needs both.

Aspect	metrics-server	kube-state-metrics	node-exporter
Data source	kubelet Summary API	Kubernetes API server	Linux kernel / procfs
What it measures	Live CPU + memory utilization	Kubernetes object state	Host OS hardware metrics
Storage	In-memory, ephemeral	In-memory (Prometheus stores the scraped data)	None (Prometheus stores the scraped data)
Primary consumer	HPA, VPA, `kubectl top`	Prometheus	Prometheus

Access the dashboards

With the stack running, use kubectl port-forward to reach the UIs from your workstation.

Grafana

kubectl port-forward -n monitoring svc/kube-prom-stack-grafana 3000:80

Open http://localhost:3000. The default credentials are admin / prom-operator. Change the password immediately on any non-throwaway cluster, or better, mount credentials from a Kubernetes Secret.

The chart ships with dozens of pre-built dashboards. Start with "Kubernetes / Compute Resources / Cluster" for a top-level view.

Prometheus UI

kubectl port-forward -n monitoring svc/kube-prom-stack-kube-prom-prometheus 9090:9090

Open http://localhost:9090. The Status > Targets page shows every scrape target and its health. This is the first place to look when a metric seems missing.

Alertmanager

kubectl port-forward -n monitoring svc/kube-prom-stack-kube-prom-alertmanager 9093:9093

Open http://localhost:9093. Shows active alerts, silences, and routing tree.

Checkpoint. In the Prometheus UI, go to Status > Targets. You should see targets for serviceMonitor/monitoring/kube-prom-stack-* entries, all showing state "UP". If any target shows "DOWN", note the error. The most common cause is a network policy or firewall blocking scrape traffic.

Monitor your own applications with ServiceMonitor

Prometheus does not automatically scrape your application workloads. Each application needs a ServiceMonitor (or PodMonitor) CRD to tell the Operator what to scrape.

Step 1: expose a metrics endpoint

Your application needs to serve Prometheus-format metrics on an HTTP path (typically /metrics). Most languages have client libraries: prometheus/client_golang, prometheus/client_python, prometheus/client_java.

Step 2: create a Service with a named port

The ServiceMonitor references a named port on a Kubernetes Service. Make sure your Service exposes the metrics port:

apiVersion: v1
kind: Service
metadata:
  name: order-api
  namespace: production
  labels:
    app: order-api
spec:
  selector:
    app: order-api
  ports:
    - name: metrics         # ServiceMonitor references this name
      port: 9090
      targetPort: 9090
    - name: http
      port: 8080
      targetPort: 8080

Step 3: create the ServiceMonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: order-api
  namespace: production
  labels:
    release: kube-prom-stack    # must match Prometheus serviceMonitorSelector
spec:
  selector:
    matchLabels:
      app: order-api
  endpoints:
    - port: metrics             # matches the named port on the Service
      interval: 30s
      path: /metrics

Apply it:

kubectl apply -f servicemonitor-order-api.yaml

Step 4: verify the target appears

In the Prometheus UI (Status > Targets), a new target group should appear within two minutes. If it does not, check the label selector (see "Common gotchas" below).

When to use PodMonitor instead

PodMonitor scrapes pods directly, bypassing the Service layer. Use it for workloads that do not have a Kubernetes Service: CronJobs, batch Jobs, DaemonSet sidecars, or one-off pods. For anything with a Service, ServiceMonitor is the standard choice.

Create alerting rules with PrometheusRule

The PrometheusRule CRD defines alerting and recording rules as Kubernetes resources. The Operator loads them into Prometheus automatically.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: order-api-alerts
  namespace: monitoring
  labels:
    release: kube-prom-stack     # must match Prometheus ruleSelector
spec:
  groups:
    - name: order-api.rules
      rules:
        - alert: OrderApiDown
          expr: up{job="order-api"} == 0
          for: 2m
          labels:
            severity: critical
          annotations:
            summary: "order-api instance  is unreachable"

        - alert: OrderApiHighErrorRate
          expr: |
            (
              rate(http_requests_total{job="order-api", status=~"5.."}[5m])
              / rate(http_requests_total{job="order-api"}[5m])
            ) > 0.05
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "order-api 5xx error rate above 5% for 5 minutes"

        - alert: OrderApiHighMemory
          expr: |
            (container_memory_usage_bytes{container="order-api"}
            / container_spec_memory_limit_bytes{container="order-api"}) > 0.9
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "order-api pod  memory above 90% of limit"

Apply it and check the Prometheus UI under Status > Rules. The rules should appear within a minute.

The for field is important. It defines how long the condition must be true before the alert fires. Without for, a single bad scrape triggers an alert. With for: 5m, the condition must persist for five consecutive minutes.

Route alerts with Alertmanager

Alertmanager receives fired alerts from Prometheus and routes them to notification channels. You can configure routing with a Kubernetes Secret containing a raw Alertmanager config, or with the AlertmanagerConfig CRD.

The Secret approach is simpler for a single team. Create an alertmanager.yaml:

global:
  resolve_timeout: 5m

route:
  receiver: "slack-default"
  group_by: ["alertname", "namespace"]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  routes:
    - match:
        severity: critical
      receiver: "pagerduty-critical"
    - match:
        severity: warning
      receiver: "slack-warnings"

receivers:
  - name: "slack-default"
    slack_configs:
      - api_url: "https://hooks.slack.com/services/T00/B00/XXXX"
        channel: "#alerts"

  - name: "slack-warnings"
    slack_configs:
      - api_url: "https://hooks.slack.com/services/T00/B00/XXXX"
        channel: "#alerts-warnings"

  - name: "pagerduty-critical"
    pagerduty_configs:
      - service_key: "your-pagerduty-integration-key"

Store it as a Kubernetes Secret (Alertmanager expects the key alertmanager.yaml):

kubectl create secret generic alertmanager-config \
  --namespace monitoring \
  --from-file=alertmanager.yaml=alertmanager.yaml

Then reference it in your Helm values:

alertmanager:
  alertmanagerSpec:
    alertmanagerConfiguration:
      name: alertmanager-config

Upgrade the release:

helm upgrade kube-prom-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring --values values.yaml

Checkpoint. Open the Alertmanager UI at http://localhost:9093. Navigate to Status > Config. Your routing tree should be visible. Trigger a test alert by temporarily lowering a threshold in your PrometheusRule and verify the notification arrives.

When local Prometheus is not enough

Prometheus stores data in a local TSDB with a default retention of 15 days. The TSDB handles up to roughly 10 million active time series before memory and query latency degrade. For a 100-pod cluster generating around 10,000 series, 15 days of retention fits comfortably in 50 Gi of disk. The formula: retention_seconds x ingested_samples_per_second x bytes_per_sample (Prometheus averages 1 to 2 bytes per sample after compression).

Beyond roughly two weeks of retention or 100 GB of data, operational burden rises sharply. That is when you add remote storage.

Solution	Architecture	Best for
Thanos	Sidecar uploads 2-hour TSDB blocks to object storage; Store Gateway serves historical queries	Smoothest migration from existing Prometheus; multi-cluster federation
Grafana Mimir	Centralized remote-write; Grafana's successor to Cortex	Enterprise multi-tenancy, high-cardinality at scale
VictoriaMetrics	Drop-in Prometheus-compatible remote write target	Best performance-to-simplicity ratio for most organizations

Thanos is the most common first step because it attaches as a sidecar to your existing Prometheus pods with roughly 10% CPU overhead. The sidecar uploads completed TSDB blocks every two hours to an S3-compatible bucket. Once uploaded, you can reduce local retention.

One breaking change in Prometheus 3.x to be aware of: enable_http2 in remote_write now defaults to false. If you relied on HTTP/2 for remote write in Prometheus 2.x, you need to set enable_http2: true explicitly after upgrading.

Common gotchas

ServiceMonitor label mismatch. By default, the Prometheus instance installed by kube-prometheus-stack only discovers ServiceMonitors with a release: <release-name> label. If your ServiceMonitor is missing that label or has the wrong value, Prometheus silently ignores it. Check Status > Targets in the Prometheus UI. To allow discovery regardless of labels, add this to your values.yaml:

prometheus:
  prometheusSpec:
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false

CRDs not upgraded by Helm. helm upgrade does not update Custom Resource Definitions. Before major chart version upgrades, manually apply the new CRD manifests from the chart repository.

Missing persistence. Without a PersistentVolumeClaim, Prometheus data is lost on every pod restart. Always set storageSpec.volumeClaimTemplate in production.

GKE private cluster firewall. On private GKE clusters, the control plane firewall blocks the Prometheus Operator admission webhook on port 8443. Either open the firewall rule or set prometheusOperator.admissionWebhooks.enabled: false in values.

kube-proxy metrics unreachable. kube-proxy's default bind address is 127.0.0.1:10249, which is unreachable from Prometheus. To collect kube-proxy metrics, change the bind address to 0.0.0.0:10249 in the kube-system ConfigMap.

High-cardinality metrics. Unique label combinations (high cardinality) increase memory usage and slow down PromQL queries. Monitor the prometheus_tsdb_head_series metric. If it climbs toward 10 million, review your metrics for unbounded labels like user IDs or request paths.

What you learned

This tutorial covered the full path from an empty cluster to a working observability stack:

kube-prometheus-stack installs Prometheus, Alertmanager, Grafana, kube-state-metrics, and node-exporter in one Helm release. The Prometheus Operator uses CRDs to manage configuration as Kubernetes resources.
Application metrics are not scraped automatically. Each application needs a ServiceMonitor (or PodMonitor) with matching labels.
Alerting rules are defined as PrometheusRule resources. Alertmanager handles routing, deduplication, and notification.
Local Prometheus TSDB is designed for short-term retention (default 15 days). For longer history or multi-cluster views, add Thanos, Mimir, or VictoriaMetrics as a remote storage backend.

Where to go next

The resource requests and limits guide explains how to size your Prometheus pods and how the metrics exposed by node-exporter and kube-state-metrics relate to scheduling and eviction.
The health probes guide covers how to configure liveness and readiness probes, which are a natural complement to metric-based alerting.

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy