What CoreDNS does in a Kubernetes cluster
CoreDNS replaced kube-dns as the default cluster DNS server starting with Kubernetes 1.11. It runs as a Deployment in kube-system, typically with 2 replicas on separate nodes for availability. A Service named kube-dns (kept for backward compatibility) exposes it behind a stable ClusterIP, commonly 10.96.0.10 depending on your service CIDR.
Every pod's /etc/resolv.conf is populated by the kubelet at startup:
nameserver 10.96.0.10
search <namespace>.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
All DNS queries from pods flow through CoreDNS. Its responsibilities: resolving Kubernetes service names to cluster IPs, serving reverse DNS for cluster resources, and forwarding non-cluster queries to upstream resolvers.
CoreDNS stays synchronized with the cluster by watching the API server for Service, EndpointSlice, Namespace, and (optionally) Pod objects. When a new Service is created, CoreDNS sees it within seconds through these watches and begins serving DNS records for it without any restart.
How a DNS query flows
Pod (/etc/resolv.conf → nameserver 10.96.0.10)
└─ iptables DNAT (kube-proxy)
└─ CoreDNS pod
├── cluster.local? → kubernetes plugin (API watch cache)
└── external? → forward plugin → upstream resolver
The kube-dns Service's ClusterIP is DNATed by kube-proxy rules to one of the CoreDNS pod IPs. CoreDNS checks the query against its zone configuration. Cluster-internal names are answered from the in-memory cache built by the kubernetes plugin. Everything else passes to the forward plugin.
Corefile anatomy
The Corefile is CoreDNS's configuration file. In Kubernetes, it lives in the coredns ConfigMap in kube-system:
kubectl get configmap coredns -n kube-system -o yaml
Server blocks and zones
The Corefile uses a server-block model. Each block declares one or more DNS zones, an optional port, and a set of plugin directives inside braces:
[SCHEME://]ZONE [:PORT] {
PLUGIN [ARGUMENTS]
}
The default Kubernetes Corefile:
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
The .:53 block handles all zones (. is the DNS root) on port 53. Multiple server blocks are allowed; when a query arrives, the block with the longest matching zone wins.
Two features worth knowing: the reload plugin polls the ConfigMap's SHA512 checksum every ~30 seconds and hot-reloads on changes (no pod restart required, though propagation may take up to 2 minutes). And {$ENV_VAR} syntax allows environment variable substitution at parse time.
Plugin chain execution order
This is the single most important thing to understand about the Corefile. The order of plugin names in the Corefile does not determine their execution order. CoreDNS has a compile-time plugin.cfg that defines a fixed processing sequence. You can write cache before forward or after it; the execution order is identical.
For the default Kubernetes build, the relevant sequence is:
errors(captures error responses)log(request logging, if enabled)rewrite(transforms queries)hosts(static host-table lookups)kubernetes(authoritative for cluster zones)autopath(search-path optimization, if enabled)forward(upstream proxy)cache(response caching)loop(forwarding loop detection)loadbalance(round-robins A/AAAA records)
Each plugin either serves a response (stopping the chain) or calls the next plugin via fallthrough. If nothing serves the query, CoreDNS returns SERVFAIL.
Key plugins
| Plugin | What it does |
|---|---|
kubernetes |
Authoritative for cluster.local, in-addr.arpa, ip6.arpa. Watches the API server for Services, EndpointSlices, Pods. |
forward |
Proxies non-cluster queries to upstream resolvers. Supports UDP, TCP, and DNS-over-TLS. Up to 15 upstreams. |
cache |
In-memory response cache. Default capacity: 9984 entries. Separate TTLs for successful and NXDOMAIN responses. |
errors |
Logs error responses to stdout. |
health |
HTTP /health on port 8080. The lameduck 5s directive delays shutdown to let in-flight queries finish. |
ready |
HTTP /ready on port 8181. Returns 200 only when all plugins report ready. Used by the readiness probe. |
prometheus |
Exposes Prometheus metrics on port 9153 (/metrics). Enabled by default. |
loop |
Detects DNS forwarding loops (e.g., CoreDNS → systemd-resolved → back to CoreDNS) and halts the process. |
reload |
Hot-reloads the Corefile when the ConfigMap changes. |
loadbalance |
Randomizes record order in A/AAAA/MX responses for round-robin distribution. |
Service DNS records
CoreDNS generates different record types depending on the Service configuration. The Kubernetes DNS specification defines these patterns.
Normal services (ClusterIP)
A query for my-api.production.svc.cluster.local returns a single A record pointing to the Service's ClusterIP. The client talks to the VIP; kube-proxy handles the pod-level routing.
Headless services
A headless Service (clusterIP: None) has no VIP. The same DNS name returns multiple A records, one per Ready pod IP. Clients receive all pod IPs and must handle selection themselves.
For StatefulSets, each pod gets an individual record:
postgres-0.postgres.db.svc.cluster.local → pod-0's IP
postgres-1.postgres.db.svc.cluster.local → pod-1's IP
postgres.db.svc.cluster.local → all pod IPs
The pattern is <pod-name>.<service-name>.<namespace>.svc.<cluster-domain>. These per-pod records only exist when a headless Service matches the pod's spec.subdomain.
SRV records
Created for named ports on both normal and headless Services:
_http._tcp.my-api.production.svc.cluster.local → 8080 my-api.production.svc.cluster.local
ExternalName services
A Service with type: ExternalName produces a CNAME record:
legacy-db.production.svc.cluster.local CNAME rds.us-east-1.amazonaws.com
No VIP, no kube-proxy rules. The client resolves the CNAME through normal DNS recursion.
Custom DNS configuration
Stub domains
A stub domain routes queries for a specific DNS suffix to a dedicated nameserver instead of the upstream. In CoreDNS, this is an additional server block in the Corefile. Edit the ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health { lameduck 5s }
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf { max_concurrent 1000 }
cache 30
loop
reload
loadbalance
}
consul.local:53 {
errors
cache 30
forward . 10.150.0.1
}
internal.corp:53 {
errors
cache 30
forward . 192.168.1.53
}
One limitation: the forward plugin requires IP addresses for upstream nameservers. FQDNs like ns.corp.example.com do not work.
Custom upstream resolvers
Replace the default forward . /etc/resolv.conf with explicit IPs to decouple CoreDNS from node-level resolver configuration:
forward . 8.8.8.8 8.8.4.4 {
max_fails 3
health_check 5s
policy random
}
For DNS-over-TLS forwarding:
forward . tls://9.9.9.9 tls://149.112.112.112 {
tls_servername dns.quad9.net
health_check 5s
force_tcp
}
Static host overrides
The hosts plugin maps fixed IPs to hostnames, useful for on-premises databases or legacy services that do not have DNS records:
hosts {
10.0.1.100 mydb.internal.example.com
192.168.5.5 legacy-api.corp.local
fallthrough
}
The fallthrough directive passes unmatched names to the next plugin.
NodeLocal DNSCache
NodeLocal DNSCache is a Kubernetes add-on (DaemonSet) that runs a CoreDNS cache on every node. GA since Kubernetes 1.18.
The problem it solves
Without NodeLocal DNSCache, pod DNS queries travel through iptables DNAT to a CoreDNS pod that may be on a different node. Under high DNS load, this causes three problems:
- Conntrack table pressure. UDP entries lack connection-close events and must time out (30 seconds by default). At high query rates, the conntrack table fills and packets drop.
- DNAT races. Simultaneous DNS queries with the same connection tuple race in the conntrack table, a known Linux kernel issue that causes intermittent 5-second timeouts.
- Cross-node latency. Every cache miss requires a network round trip to a CoreDNS pod on another node.
For diagnosing conntrack-related DNS failures, see Kubernetes DNS troubleshooting.
How it works
Without NodeLocal DNSCache:
Pod → iptables DNAT → kube-dns VIP → CoreDNS pod (possibly another node)
With NodeLocal DNSCache:
Pod → 169.254.20.10 (link-local, node-local) → node-local-dns DaemonSet pod
├── cluster.local cache miss → TCP → kube-dns VIP → CoreDNS
└── external cache miss → upstream nameservers
The DaemonSet's init container adds a link-local IP (169.254.20.10 by default) to the node's lo interface. This non-routable address ensures pods reach the local cache without crossing the network. Cache misses for cluster.local are forwarded over TCP (not UDP), which produces explicit conntrack close events and eliminates the conntrack race condition entirely.
In IPVS kube-proxy mode, the kubelet must be configured with --cluster-dns=169.254.20.10 so pods are born with the local cache as their nameserver. In iptables mode (the default), no kubelet changes are needed; the DaemonSet intercepts traffic destined for the kube-dns VIP at the node level.
Memory considerations
If the node-local-dns pod gets OOMKilled, its custom iptables rules remain active and queries are directed to a now-absent process. DNS fails on that node until the DaemonSet controller restarts the pod. Set memory limits generously or use VPA in recommender mode. The default cache holds approximately 10,000 entries (~30 MB).
Performance tuning
The ndots:5 overhead
With ndots:5, a pod querying api.github.com (2 dots) generates four DNS queries before resolving:
api.github.com.app.svc.cluster.local(NXDOMAIN)api.github.com.svc.cluster.local(NXDOMAIN)api.github.com.cluster.local(NXDOMAIN)api.github.com(success)
In high-QPS environments, 75% of DNS queries are guaranteed failures. Three mitigation options:
Reduce ndots per pod (best for workloads with many external calls):
spec:
dnsConfig:
options:
- name: ndots
value: "2"
Trailing dots in application config force absolute resolution, skipping search-domain expansion entirely:
api_endpoint: "api.github.com."
The autopath plugin (server-side mitigation) intercepts the first search-expanded query, detects the pod's namespace, and returns the absolute answer in a single round trip. It requires pods verified mode in the kubernetes plugin, which increases memory usage approximately 4x per Pod/Service object:
- Without autopath:
(Pods + Services) / 1000 + 54 MB - With autopath:
(Pods + Services) / 250 + 56 MB
Cache tuning
The default cache 30 sets a maximum TTL of 30 seconds with 9984 entries. For more control:
cache {
success 9984 3600 5 # capacity, max TTL, min TTL
denial 9984 30 5 # NXDOMAIN responses: shorter TTL
prefetch 10 1m 10% # refresh popular entries before expiry
serve_stale 1h immediate # serve expired entries if upstream is down
}
The prefetch directive proactively refreshes entries that have been queried at least 10 times in any 1-minute window when their TTL drops below 10%. serve_stale provides resilience during upstream outages by serving expired cache entries for up to the specified duration.
Scaling replicas
CoreDNS is CPU-bound. A CPU-based HPA is the recommended scaling approach:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: coredns
namespace: kube-system
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: coredns
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Always run at least 2 replicas with pod anti-affinity to survive single-node failures. The Cluster Proportional Autoscaler (scaling by node/core count) is an alternative, but CPU-based HPA is more appropriate for CoreDNS since its bottleneck is CPU, not connection count.
One caution about resource limits: fixed CPU limits cause throttling that manifests as DNS latency spikes. Some operators remove CPU limits entirely and rely on CPU requests for scheduling. Monitor container_cpu_throttled_seconds_total for the CoreDNS pods.
Key metrics to monitor
CoreDNS exposes Prometheus metrics on port 9153 via the prometheus plugin.
| What to watch | PromQL |
|---|---|
| Query rate | rate(coredns_dns_requests_total[5m]) |
| NXDOMAIN rate (ndots overhead signal) | rate(coredns_dns_responses_total{rcode="NXDOMAIN"}[5m]) |
| P99 latency | histogram_quantile(0.99, rate(coredns_dns_request_duration_seconds_bucket[5m])) |
| Cache hit rate | sum(rate(coredns_cache_hits_total[5m])) / (sum(rate(coredns_cache_hits_total[5m])) + sum(rate(coredns_cache_misses_total[5m]))) |
| Upstream health failures | rate(coredns_forward_healthcheck_failures_total[5m]) |
A high NXDOMAIN rate relative to total queries is a strong signal that ndots:5 overhead is dominating your DNS traffic. See Kubernetes DNS troubleshooting for step-by-step diagnosis when these metrics indicate a problem.
What CoreDNS is not
CoreDNS is not a general-purpose recursive resolver. It does not perform full DNSSEC validation by default (though a dnssec plugin exists). It does not serve as an authoritative nameserver for external zones without additional configuration. And it does not replace your organization's DNS infrastructure; it sits in front of it as a cluster-scoped caching forwarder for everything outside cluster.local.
CoreDNS is also not kube-dns, despite the kube-dns Service name. kube-dns was a three-container architecture (dnsmasq + kubedns + sidecar) with configuration split across multiple flags and ConfigMap keys. CoreDNS is a single binary with a unified Corefile configuration. The Service name is a backward-compatibility artifact.
Where to go next
- Kubernetes DNS troubleshooting covers diagnosing DNS failures step by step: CrashLoopBackOff, ndots query explosion, conntrack races, dnsPolicy misconfiguration, and upstream resolver issues.
- Kubernetes Services explained covers Service types, kube-proxy modes, and how traffic flows from a Service VIP to pod endpoints.