CoreDNS in Kubernetes: architecture and configuration

CoreDNS is the DNS server behind every service-to-service call in a Kubernetes cluster. It resolves cluster-internal names from an in-memory API watch cache and forwards everything else to upstream resolvers. This article explains how the Corefile drives configuration, how plugins execute, what DNS records Kubernetes services produce, and how to tune CoreDNS for performance at scale.

What CoreDNS does in a Kubernetes cluster

CoreDNS replaced kube-dns as the default cluster DNS server starting with Kubernetes 1.11. It runs as a Deployment in kube-system, typically with 2 replicas on separate nodes for availability. A Service named kube-dns (kept for backward compatibility) exposes it behind a stable ClusterIP, commonly 10.96.0.10 depending on your service CIDR.

Every pod's /etc/resolv.conf is populated by the kubelet at startup:

nameserver 10.96.0.10
search <namespace>.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

All DNS queries from pods flow through CoreDNS. Its responsibilities: resolving Kubernetes service names to cluster IPs, serving reverse DNS for cluster resources, and forwarding non-cluster queries to upstream resolvers.

CoreDNS stays synchronized with the cluster by watching the API server for Service, EndpointSlice, Namespace, and (optionally) Pod objects. When a new Service is created, CoreDNS sees it within seconds through these watches and begins serving DNS records for it without any restart.

How a DNS query flows

Pod (/etc/resolv.conf → nameserver 10.96.0.10)
  └─ iptables DNAT (kube-proxy)
       └─ CoreDNS pod
            ├── cluster.local? → kubernetes plugin (API watch cache)
            └── external?      → forward plugin → upstream resolver

The kube-dns Service's ClusterIP is DNATed by kube-proxy rules to one of the CoreDNS pod IPs. CoreDNS checks the query against its zone configuration. Cluster-internal names are answered from the in-memory cache built by the kubernetes plugin. Everything else passes to the forward plugin.

Corefile anatomy

The Corefile is CoreDNS's configuration file. In Kubernetes, it lives in the coredns ConfigMap in kube-system:

kubectl get configmap coredns -n kube-system -o yaml

Server blocks and zones

The Corefile uses a server-block model. Each block declares one or more DNS zones, an optional port, and a set of plugin directives inside braces:

[SCHEME://]ZONE [:PORT] {
    PLUGIN [ARGUMENTS]
}

The default Kubernetes Corefile:

.:53 {
    errors
    health {
        lameduck 5s
    }
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
        ttl 30
    }
    prometheus :9153
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}

The .:53 block handles all zones (. is the DNS root) on port 53. Multiple server blocks are allowed; when a query arrives, the block with the longest matching zone wins.

Two features worth knowing: the reload plugin polls the ConfigMap's SHA512 checksum every ~30 seconds and hot-reloads on changes (no pod restart required, though propagation may take up to 2 minutes). And {$ENV_VAR} syntax allows environment variable substitution at parse time.

Plugin chain execution order

This is the single most important thing to understand about the Corefile. The order of plugin names in the Corefile does not determine their execution order. CoreDNS has a compile-time plugin.cfg that defines a fixed processing sequence. You can write cache before forward or after it; the execution order is identical.

For the default Kubernetes build, the relevant sequence is:

  1. errors (captures error responses)
  2. log (request logging, if enabled)
  3. rewrite (transforms queries)
  4. hosts (static host-table lookups)
  5. kubernetes (authoritative for cluster zones)
  6. autopath (search-path optimization, if enabled)
  7. forward (upstream proxy)
  8. cache (response caching)
  9. loop (forwarding loop detection)
  10. loadbalance (round-robins A/AAAA records)

Each plugin either serves a response (stopping the chain) or calls the next plugin via fallthrough. If nothing serves the query, CoreDNS returns SERVFAIL.

Key plugins

Plugin What it does
kubernetes Authoritative for cluster.local, in-addr.arpa, ip6.arpa. Watches the API server for Services, EndpointSlices, Pods.
forward Proxies non-cluster queries to upstream resolvers. Supports UDP, TCP, and DNS-over-TLS. Up to 15 upstreams.
cache In-memory response cache. Default capacity: 9984 entries. Separate TTLs for successful and NXDOMAIN responses.
errors Logs error responses to stdout.
health HTTP /health on port 8080. The lameduck 5s directive delays shutdown to let in-flight queries finish.
ready HTTP /ready on port 8181. Returns 200 only when all plugins report ready. Used by the readiness probe.
prometheus Exposes Prometheus metrics on port 9153 (/metrics). Enabled by default.
loop Detects DNS forwarding loops (e.g., CoreDNS → systemd-resolved → back to CoreDNS) and halts the process.
reload Hot-reloads the Corefile when the ConfigMap changes.
loadbalance Randomizes record order in A/AAAA/MX responses for round-robin distribution.

Service DNS records

CoreDNS generates different record types depending on the Service configuration. The Kubernetes DNS specification defines these patterns.

Normal services (ClusterIP)

A query for my-api.production.svc.cluster.local returns a single A record pointing to the Service's ClusterIP. The client talks to the VIP; kube-proxy handles the pod-level routing.

Headless services

A headless Service (clusterIP: None) has no VIP. The same DNS name returns multiple A records, one per Ready pod IP. Clients receive all pod IPs and must handle selection themselves.

For StatefulSets, each pod gets an individual record:

postgres-0.postgres.db.svc.cluster.local → pod-0's IP
postgres-1.postgres.db.svc.cluster.local → pod-1's IP
postgres.db.svc.cluster.local            → all pod IPs

The pattern is <pod-name>.<service-name>.<namespace>.svc.<cluster-domain>. These per-pod records only exist when a headless Service matches the pod's spec.subdomain.

SRV records

Created for named ports on both normal and headless Services:

_http._tcp.my-api.production.svc.cluster.local → 8080 my-api.production.svc.cluster.local

ExternalName services

A Service with type: ExternalName produces a CNAME record:

legacy-db.production.svc.cluster.local CNAME rds.us-east-1.amazonaws.com

No VIP, no kube-proxy rules. The client resolves the CNAME through normal DNS recursion.

Custom DNS configuration

Stub domains

A stub domain routes queries for a specific DNS suffix to a dedicated nameserver instead of the upstream. In CoreDNS, this is an additional server block in the Corefile. Edit the ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health { lameduck 5s }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
            ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf { max_concurrent 1000 }
        cache 30
        loop
        reload
        loadbalance
    }
    consul.local:53 {
        errors
        cache 30
        forward . 10.150.0.1
    }
    internal.corp:53 {
        errors
        cache 30
        forward . 192.168.1.53
    }

One limitation: the forward plugin requires IP addresses for upstream nameservers. FQDNs like ns.corp.example.com do not work.

Custom upstream resolvers

Replace the default forward . /etc/resolv.conf with explicit IPs to decouple CoreDNS from node-level resolver configuration:

forward . 8.8.8.8 8.8.4.4 {
    max_fails 3
    health_check 5s
    policy random
}

For DNS-over-TLS forwarding:

forward . tls://9.9.9.9 tls://149.112.112.112 {
    tls_servername dns.quad9.net
    health_check 5s
    force_tcp
}

Static host overrides

The hosts plugin maps fixed IPs to hostnames, useful for on-premises databases or legacy services that do not have DNS records:

hosts {
    10.0.1.100 mydb.internal.example.com
    192.168.5.5 legacy-api.corp.local
    fallthrough
}

The fallthrough directive passes unmatched names to the next plugin.

NodeLocal DNSCache

NodeLocal DNSCache is a Kubernetes add-on (DaemonSet) that runs a CoreDNS cache on every node. GA since Kubernetes 1.18.

The problem it solves

Without NodeLocal DNSCache, pod DNS queries travel through iptables DNAT to a CoreDNS pod that may be on a different node. Under high DNS load, this causes three problems:

  1. Conntrack table pressure. UDP entries lack connection-close events and must time out (30 seconds by default). At high query rates, the conntrack table fills and packets drop.
  2. DNAT races. Simultaneous DNS queries with the same connection tuple race in the conntrack table, a known Linux kernel issue that causes intermittent 5-second timeouts.
  3. Cross-node latency. Every cache miss requires a network round trip to a CoreDNS pod on another node.

For diagnosing conntrack-related DNS failures, see Kubernetes DNS troubleshooting.

How it works

Without NodeLocal DNSCache:
  Pod → iptables DNAT → kube-dns VIP → CoreDNS pod (possibly another node)

With NodeLocal DNSCache:
  Pod → 169.254.20.10 (link-local, node-local) → node-local-dns DaemonSet pod
          ├── cluster.local cache miss → TCP → kube-dns VIP → CoreDNS
          └── external cache miss     → upstream nameservers

The DaemonSet's init container adds a link-local IP (169.254.20.10 by default) to the node's lo interface. This non-routable address ensures pods reach the local cache without crossing the network. Cache misses for cluster.local are forwarded over TCP (not UDP), which produces explicit conntrack close events and eliminates the conntrack race condition entirely.

In IPVS kube-proxy mode, the kubelet must be configured with --cluster-dns=169.254.20.10 so pods are born with the local cache as their nameserver. In iptables mode (the default), no kubelet changes are needed; the DaemonSet intercepts traffic destined for the kube-dns VIP at the node level.

Memory considerations

If the node-local-dns pod gets OOMKilled, its custom iptables rules remain active and queries are directed to a now-absent process. DNS fails on that node until the DaemonSet controller restarts the pod. Set memory limits generously or use VPA in recommender mode. The default cache holds approximately 10,000 entries (~30 MB).

Performance tuning

The ndots:5 overhead

With ndots:5, a pod querying api.github.com (2 dots) generates four DNS queries before resolving:

  1. api.github.com.app.svc.cluster.local (NXDOMAIN)
  2. api.github.com.svc.cluster.local (NXDOMAIN)
  3. api.github.com.cluster.local (NXDOMAIN)
  4. api.github.com (success)

In high-QPS environments, 75% of DNS queries are guaranteed failures. Three mitigation options:

Reduce ndots per pod (best for workloads with many external calls):

spec:
  dnsConfig:
    options:
      - name: ndots
        value: "2"

Trailing dots in application config force absolute resolution, skipping search-domain expansion entirely:

api_endpoint: "api.github.com."

The autopath plugin (server-side mitigation) intercepts the first search-expanded query, detects the pod's namespace, and returns the absolute answer in a single round trip. It requires pods verified mode in the kubernetes plugin, which increases memory usage approximately 4x per Pod/Service object:

  • Without autopath: (Pods + Services) / 1000 + 54 MB
  • With autopath: (Pods + Services) / 250 + 56 MB

Cache tuning

The default cache 30 sets a maximum TTL of 30 seconds with 9984 entries. For more control:

cache {
    success 9984 3600 5     # capacity, max TTL, min TTL
    denial  9984 30   5     # NXDOMAIN responses: shorter TTL
    prefetch 10 1m 10%      # refresh popular entries before expiry
    serve_stale 1h immediate  # serve expired entries if upstream is down
}

The prefetch directive proactively refreshes entries that have been queried at least 10 times in any 1-minute window when their TTL drops below 10%. serve_stale provides resilience during upstream outages by serving expired cache entries for up to the specified duration.

Scaling replicas

CoreDNS is CPU-bound. A CPU-based HPA is the recommended scaling approach:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: coredns
  namespace: kube-system
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: coredns
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Always run at least 2 replicas with pod anti-affinity to survive single-node failures. The Cluster Proportional Autoscaler (scaling by node/core count) is an alternative, but CPU-based HPA is more appropriate for CoreDNS since its bottleneck is CPU, not connection count.

One caution about resource limits: fixed CPU limits cause throttling that manifests as DNS latency spikes. Some operators remove CPU limits entirely and rely on CPU requests for scheduling. Monitor container_cpu_throttled_seconds_total for the CoreDNS pods.

Key metrics to monitor

CoreDNS exposes Prometheus metrics on port 9153 via the prometheus plugin.

What to watch PromQL
Query rate rate(coredns_dns_requests_total[5m])
NXDOMAIN rate (ndots overhead signal) rate(coredns_dns_responses_total{rcode="NXDOMAIN"}[5m])
P99 latency histogram_quantile(0.99, rate(coredns_dns_request_duration_seconds_bucket[5m]))
Cache hit rate sum(rate(coredns_cache_hits_total[5m])) / (sum(rate(coredns_cache_hits_total[5m])) + sum(rate(coredns_cache_misses_total[5m])))
Upstream health failures rate(coredns_forward_healthcheck_failures_total[5m])

A high NXDOMAIN rate relative to total queries is a strong signal that ndots:5 overhead is dominating your DNS traffic. See Kubernetes DNS troubleshooting for step-by-step diagnosis when these metrics indicate a problem.

What CoreDNS is not

CoreDNS is not a general-purpose recursive resolver. It does not perform full DNSSEC validation by default (though a dnssec plugin exists). It does not serve as an authoritative nameserver for external zones without additional configuration. And it does not replace your organization's DNS infrastructure; it sits in front of it as a cluster-scoped caching forwarder for everything outside cluster.local.

CoreDNS is also not kube-dns, despite the kube-dns Service name. kube-dns was a three-container architecture (dnsmasq + kubedns + sidecar) with configuration split across multiple flags and ConfigMap keys. CoreDNS is a single binary with a unified Corefile configuration. The Service name is a backward-compatibility artifact.

Where to go next

  • Kubernetes DNS troubleshooting covers diagnosing DNS failures step by step: CrashLoopBackOff, ndots query explosion, conntrack races, dnsPolicy misconfiguration, and upstream resolver issues.
  • Kubernetes Services explained covers Service types, kube-proxy modes, and how traffic flows from a Service VIP to pod endpoints.

Recurring server or deployment issues?

I help teams make production reliable with CI/CD, Kubernetes, and cloud—so fixes stick and deploys stop being stressful.

Explore DevOps consultancy

Search this site

Start typing to search, or browse the knowledge base and blog.