How Kubernetes DNS works
Before debugging, it helps to understand the chain a DNS query travels through. When a container calls getaddrinfo("my-service.default.svc.cluster.local"), the following happens:
- The container's libc resolver reads
/etc/resolv.conf, injected by the kubelet at pod creation. - It sends a UDP query to the
nameserverlisted there, which is the ClusterIP of thekube-dnsService inkube-system(commonly10.96.0.10on kubeadm clusters, though this varies by distribution). kube-proxyiptables/IPVS rules route that traffic to one of the CoreDNS pods.- CoreDNS checks the query against the cluster domain (
cluster.local). Cluster-internal names are answered from its in-memory cache of Kubernetes API objects. Everything else is forwarded to upstream resolvers via theforwardplugin. - The answer returns to the pod.
A typical pod's /etc/resolv.conf looks like this:
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
The search list and ndots:5 are set by the kubelet via --cluster-dns and --cluster-domain (or their KubeletConfiguration equivalents). The Service is named kube-dns for backward compatibility, even though CoreDNS replaced kube-dns as the default DNS server since Kubernetes 1.13.
Testing DNS from inside a pod
Start every DNS investigation from inside the cluster, not from your laptop. Spin up a debug pod with DNS tools:
kubectl run dns-test --rm -it --restart=Never \
--image=nicolaka/netshoot -- bash
For the official Kubernetes test image:
kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
kubectl exec -it dnsutils -- sh
Then run through this sequence.
Step 1: confirm what the pod uses as its DNS server.
cat /etc/resolv.conf
Expected output shows the kube-dns ClusterIP as nameserver. If you see 127.0.0.53 or a node IP instead, the pod has an unexpected dnsPolicy (see the dnsPolicy section below).
Step 2: test internal name resolution.
nslookup kubernetes.default
Expected:
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
Step 3: test external name resolution.
nslookup google.com
Now interpret the results:
| Internal works | External works | Likely problem |
|---|---|---|
| No | No | CoreDNS is down or unreachable (network policy, pod crash, wrong dnsPolicy) |
| No | Yes | CoreDNS kubernetes plugin issue or API server connectivity problem |
| Yes | No | Upstream resolver failure (see upstream resolver failures) |
| Yes | Yes | DNS is functional; the problem is elsewhere |
Step 4: verify CoreDNS pod health.
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl get endpointslice -l kubernetes.io/service-name=kube-dns -n kube-system
An empty endpoints list means no CoreDNS pod is passing its readiness probe. Check the CoreDNS not running section.
Step 5: read CoreDNS logs.
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100
For deeper query-level visibility, temporarily add log to the Corefile's .:53 block and restart CoreDNS. Remove it when done; high-QPS clusters generate enormous log volumes with it enabled.
CoreDNS pod not running
Check the state of the CoreDNS pods:
kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide
Each non-Running state points to a different root cause:
| Pod state | Likely cause |
|---|---|
Pending |
No node has capacity, or tolerations do not match taints |
CrashLoopBackOff |
Loop detection, CNI not installed, SELinux conflict, or application error |
OOMKilled (exit 137) |
Memory limit too low for cluster size |
ContainerCreating |
Image pull failure or CNI not yet ready |
Running but 0/1 Ready |
Readiness probe failing on port 8181 |
DNS forwarding loop (CrashLoopBackOff)
This is the most common CoreDNS crash cause on nodes running systemd-resolved. The loop plugin detects when a forwarded query returns to CoreDNS itself. When the node's /etc/resolv.conf points to 127.0.0.53 (systemd-resolved's stub listener), CoreDNS forwards there, systemd-resolved forwards back to CoreDNS through the kube-dns ClusterIP, and CoreDNS detects the loop and exits.
The log line is unmistakable:
[FATAL] plugin/loop: Loop (127.0.0.1:37293 -> :53) detected for zone "."
The loop detection is self-protecting behavior. CoreDNS halts itself to prevent unbounded memory growth. Do not disable the loop plugin; fix the underlying resolution chain instead.
Fix option A: tell the kubelet to use the real resolv.conf (bypassing the systemd-resolved stub):
# KubeletConfiguration (or --resolv-conf flag)
resolvConf: /run/systemd/resolve/resolv.conf
Fix option B: hardcode upstream IPs in the Corefile instead of inheriting from the node:
forward . 8.8.8.8 8.8.4.4
Edit the ConfigMap:
kubectl -n kube-system edit configmap coredns
Then restart CoreDNS:
kubectl rollout restart deployment coredns -n kube-system
OOMKilled (exit code 137)
CoreDNS memory consumption scales with the number of Kubernetes objects it caches, not with query throughput. The default 170Mi limit in many distributions is insufficient for large clusters. The CoreDNS scaling documentation provides this formula:
MB required = (Pods + Services) / 1000 + 54
With the autopath plugin enabled (trades CPU for fewer client-side search queries):
MB required = (Pods + Services) / 250 + 56
A cluster with 5,000 pods and 500 services needs approximately 60 MB without autopath, but clusters at 10,000+ objects should budget 256 MB or more. See OOMKilled: Kubernetes out of memory errors explained for deeper diagnosis of exit code 137.
Patch the Deployment:
kubectl -n kube-system patch deployment coredns \
--patch '{"spec":{"template":{"spec":{"containers":[{"name":"coredns","resources":{"limits":{"memory":"512Mi"},"requests":{"memory":"128Mi","cpu":"100m"}}}]}}}}'
CNI not installed
Immediately after kubeadm init, CoreDNS pods stay Pending or ContainerCreating until you install a CNI plugin (Calico, Flannel, Cilium, etc.). CoreDNS requires pod networking to function. This is expected behavior, not a bug.
Readiness probe failure (Running but not Ready)
CoreDNS exposes a readiness endpoint at http://localhost:8181/ready. A 0/1 READY state typically means the kubernetes plugin cannot reach the API server. Check API server health and RBAC permissions for the CoreDNS ServiceAccount.
Scaling for high availability
The CoreDNS autoscaler uses this default formula for replica count:
replicas = max(ceil(cores / 256), ceil(nodes / 16))
For a 50-node cluster that calculates to at least 4 replicas. Spread them across nodes with pod anti-affinity to avoid a single node failure taking all DNS offline. When a node enters NotReady state, CoreDNS pods on that node are not evicted for up to 5 minutes by default. During that window, some DNS queries may still route to the unreachable pod.
The ndots:5 query explosion
The ndots option in /etc/resolv.conf defines a threshold: if a queried hostname has fewer than ndots dots, the resolver tries all search domains before treating the name as fully qualified.
Kubernetes sets ndots to 5 for a specific reason. SRV records look like _http._tcp.my-service.default.svc (four dots). With ndots:4, the resolver would treat that as absolute and skip search domain expansion, breaking SRV lookups.
The side effect: an external hostname like api.github.com has only 2 dots. With ndots:5, the resolver generates up to 5 queries before succeeding:
api.github.com.default.svc.cluster.local(NXDOMAIN)api.github.com.svc.cluster.local(NXDOMAIN)api.github.com.cluster.local(NXDOMAIN)api.github.com.<cloud-zone>.internal(NXDOMAIN, on cloud providers)api.github.com.(success)
An application making 100 external API calls per second produces 400+ wasted DNS queries per second. That adds 1-5 ms per NXDOMAIN round trip and can overwhelm CoreDNS under load.
Fix: trailing dots for static endpoints
In application config files, append a trailing dot to force immediate FQDN resolution:
api_endpoint: "api.github.com." # trailing dot = absolute name, skips search
Fix: lower ndots per pod
Override ndots in the pod spec via dnsConfig:
spec:
dnsPolicy: ClusterFirst
dnsConfig:
options:
- name: ndots
value: "2"
With ndots:2, any hostname with 2+ dots (like api.github.com) is treated as absolute on the first attempt. Internal short names (my-service) still go through the search list. Trade-off: reducing ndots below 5 can break SRV record lookups that depend on search domain expansion. Test with your actual service names before deploying cluster-wide.
The four dnsPolicy values
The spec.dnsPolicy field controls what gets written into a pod's /etc/resolv.conf.
| dnsPolicy | Uses CoreDNS | Resolves cluster services | Uses node DNS |
|---|---|---|---|
ClusterFirst (the actual default) |
Yes | Yes | No |
Default (not the default) |
No | No | Yes |
None |
Only if configured | Only if configured | Only if configured |
ClusterFirstWithHostNet |
Yes | Yes | No |
ClusterFirst is the implicit default when dnsPolicy is omitted. The pod uses the kube-dns ClusterIP. This is correct for 99% of workloads.
Default inherits the node's /etc/resolv.conf. The naming is genuinely confusing: "Default" is not the default. A pod with this policy cannot resolve my-service.default.svc.cluster.local because it never talks to CoreDNS.
None provides no DNS configuration at all. You must supply spec.dnsConfig with at least one nameserver. A common bug: setting dnsPolicy: None and forgetting dnsConfig, leaving the pod with an empty /etc/resolv.conf.
ClusterFirstWithHostNet exists for pods with hostNetwork: true. Without it, a host-network pod's ClusterFirst policy silently degrades to behave like Default. DaemonSets for node-level monitoring (Prometheus node-exporter, Datadog agent, CNI components) that need cluster DNS should set this explicitly:
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
Upstream resolver failures
When internal DNS works but external lookups fail, the problem is in the CoreDNS forward plugin or the upstream servers it points to.
The default Corefile line:
forward . /etc/resolv.conf
This inherits nameservers from the node. Common failure modes:
- The node's
/etc/resolv.confpoints to an unreachable IP (stale DHCP, decommissioned resolver) - The upstream is temporarily unavailable (cloud provider resolver throttling, network outage)
- Firewall or security group rules block outbound UDP/TCP port 53 from the node
- Split-horizon DNS returning SERVFAIL for certain domains
Diagnosis
Identify the upstream IPs from the CoreDNS pod's perspective:
kubectl get configmap coredns -n kube-system -o yaml | grep forward
Test the upstream directly:
# from a debug pod
dig @<upstream-ip> google.com
Check CoreDNS logs for upstream errors:
kubectl logs -n kube-system -l k8s-app=kube-dns | grep -i "timeout\|servfail\|refused"
Cloud-specific limits
AWS EKS: the VPC resolver at 169.254.169.253 enforces a 1,024 packets-per-second limit per ENI. High DNS volumes (amplified by ndots:5) hit this limit, causing SERVFAIL. Solution: deploy NodeLocal DNSCache or reduce ndots.
Azure AKS: the DNS server at 168.63.129.16 becomes unreachable if UDR or firewall rules block it. Verify UDP/TCP port 53 is not filtered.
GKE: the metadata server at 169.254.169.254 serves as the upstream resolver. Network policies must not block access to this range.
Hardcoding upstreams for stability
Replace the node-inherited resolvers with explicit IPs in the Corefile:
forward . 8.8.8.8 8.8.4.4 {
policy sequential
health_check 5s
}
The sequential policy tries upstreams in order, which matters when mixing a primary corporate resolver with a public fallback. The default random policy splits traffic evenly, and approximately 50% of queries fail if only one upstream can resolve a given domain.
Intermittent 5-second DNS timeouts (conntrack race)
If DNS resolution intermittently takes exactly 5 seconds (the /etc/resolv.conf timeout), you are likely hitting the Linux conntrack UDP race condition.
Root cause
glibc sends A and AAAA queries simultaneously from the same UDP source port. Both packets hit the same iptables NAT rule at the same instant. Two conntrack entries race for the same 5-tuple. One wins; the other packet is silently dropped. The application waits the full 5-second timeout before retrying. This has been documented in multiple production outages.
Symptoms:
- Random 5-second DNS delays, not reproducible on demand
- Worse under high pod density or query rate
- Node
dmesgmay shownf_conntrack: table fullunder extreme load
Fix: single-request-reopen
For containers using glibc (Debian, Ubuntu, RHEL-based images), add this to the pod spec:
spec:
dnsConfig:
options:
- name: single-request-reopen
This forces glibc to send A and AAAA queries sequentially on separate sockets, eliminating the source-port collision. Note: musl libc (Alpine Linux) does not support this option. Alpine users need NodeLocal DNSCache instead.
Fix: NodeLocal DNSCache
NodeLocal DNSCache is the production-grade solution. It runs a node-local-dns DaemonSet on every node, listening on a link-local IP (169.254.20.10 by default). Pods send queries to this local cache instead of through kube-proxy iptables rules. Because the cache is local, DNAT is bypassed entirely, and the conntrack race condition disappears.
Additional benefits:
- Lower latency (local cache hit vs. round-trip through kube-proxy)
- Reduced CoreDNS load
- Per-node DNS metrics
- Resilience against CoreDNS pod failures on other nodes
NodeLocal DNSCache has been GA since Kubernetes 1.18 and is the recommended approach for clusters with more than 100 nodes or high DNS query rates.
Network policies blocking DNS
When a namespace has a default-deny NetworkPolicy, all egress is blocked, including DNS queries to CoreDNS in kube-system.
The symptom: DNS fails entirely and immediately (not intermittent). Every pod in the namespace is affected.
Check for network policies:
kubectl get networkpolicy -n <namespace>
Fix by adding an explicit egress rule allowing DNS:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns-egress
namespace: my-app
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
Both UDP and TCP port 53 must be allowed. TCP is used for responses larger than 512 bytes, which is common with DNSSEC or records with many entries.
When to escalate
If you have worked through the diagnostic sequence above and DNS is still failing, collect this information before asking for help:
- Output of
kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide - CoreDNS logs:
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=200 - The Corefile:
kubectl get configmap coredns -n kube-system -o yaml - Output of
cat /etc/resolv.conffrom inside an affected pod - Results of
nslookup kubernetes.defaultandnslookup google.comfrom a debug pod - Network policies in the affected namespace:
kubectl get networkpolicy -n <namespace> -o yaml - Node-level DNS config:
cat /etc/resolv.confon the node hosting the CoreDNS pod - Kubernetes version (
kubectl version) and CoreDNS version (kubectl -n kube-system describe deployment coredns | grep Image) - Whether the failure is consistent or intermittent, and if intermittent, whether it takes exactly 5 seconds
Preventing DNS issues
- Run at least 3 CoreDNS replicas with pod anti-affinity across nodes. The autoscaler formula
max(ceil(cores/256), ceil(nodes/16))is a reasonable starting point. - Size memory limits to your cluster. Use
(Pods + Services) / 1000 + 54MB as a baseline and monitor actual usage with the Prometheus metrics CoreDNS exposes on port 9153. - Deploy NodeLocal DNSCache on clusters with more than 100 nodes or high DNS query rates. It solves conntrack races, reduces CoreDNS load, and improves latency.
- Lower ndots for workloads that call external APIs heavily. Setting ndots:2 via
dnsConfigeliminates the query explosion for most external hostnames. - Include a DNS egress rule in every default-deny NetworkPolicy. Make it part of your namespace provisioning template.
- Monitor CoreDNS. The
coredns_dns_requests_total,coredns_dns_responses_total, andcoredns_forward_responses_totalmetrics (exposed on port 9153) surface problems before users report them. See Prometheus monitoring on Kubernetes for how to collect these.