1. Introduction
In Kubernetes, CoreDNS is a critical system component that provides DNS-based service discovery. All internal service-to-service communication relies on it. When CoreDNS fails, the DNS resolution within the cluster breaks, which can lead to cascading application failures.
2. What is CoreDNS?
CoreDNS is a DNS server that runs as a Deployment in the kube-system namespace. It answers DNS queries for services and pods in the cluster, enabling one pod to resolve the name of another.
When a pod issues a DNS query (e.g., trying to reach my-service.my-namespace.svc.cluster.local), the request is sent to CoreDNS, which then resolves the IP address of the service.
3. What Happens When CoreDNS Fails?
When CoreDNS fails, internal DNS resolution stops working, meaning:
- Pods cannot resolve other services by DNS name.
- Cluster Add-ons that rely on DNS (e.g., metrics-server, kube-proxy) may fail.
- Deployments may crash or hang if they wait for a dependent service via DNS.
- InitContainers or ReadinessProbes that depend on DNS will fail, stalling deployments.
- ExternalName Services, which rely on DNS, won’t function.
4. Common Symptoms
Some obvious signs that CoreDNS has failed:
- Applications inside pods report “Name resolution errors”.
pingorcurlusing service names returns: makefileCopyEditping: unknown hostnslookupordiginside pods returns DNS failure.- Logs show: nginxCopyEdit
lookup my-service on 10.96.0.10:53: no such host kubectl get pods -n kube-systemshows CoreDNS pods inCrashLoopBackOff,Error, orPending.
5. Root Causes of CoreDNS Failure
Several issues may lead to CoreDNS failure:
- Resource Constraints
CPU or memory starvation causes pods to get evicted or crash. - Configuration Errors
IncorrectCorefile(CoreDNS configuration) leads to syntax or runtime errors. - Network Issues
Flannel/Calico/Cilium misconfiguration blocks communication on port53(DNS). - Node Problems
If CoreDNS pods are scheduled to nodes that are NotReady or tainted. - Service/Endpoint Missing
Thekube-dnsservice or endpoints are accidentally deleted or misconfigured. - Pod Scheduling Issues
Taints or affinity rules prevent CoreDNS pods from scheduling.
6. How Kubernetes Reacts Internally
Here’s what happens under the hood when CoreDNS fails:
- Pods can still run, but they cannot discover or talk to other services by name.
- Service discovery fails, even though
kubectl get svcmay show everything as normal. - Kubelet and control plane components don’t depend on CoreDNS, so the cluster appears “healthy” from the outside.
- Any service using environment variable-based discovery may still function, but this method is deprecated and limited.
7. Diagnosing CoreDNS Issues
1. Check CoreDNS Pod Status
kubectl get pods -n kube-system -l k8s-app=kube-dns
2. Inspect Logs
kubectl logs -n kube-system -l k8s-app=kube-dns
3. Test DNS Inside a Pod
kubectl run -it test --image=busybox --restart=Never -- sh nslookup kubernetes.default
4. Check Corefile Configuration
kubectl -n kube-system edit configmap coredns
5. Ensure kube-dns Service Exists
kubectl get svc -n kube-system
6. Describe CoreDNS Pods for Events
kubectl describe pod <pod-name> -n kube-system
8. Recovery Steps
Step 1: Restart CoreDNS
kubectl delete pod -n kube-system -l k8s-app=kube-dns
Step 2: Fix Resource Limits (if pods are getting OOMKilled)
resources:
limits:
memory: "170Mi"
cpu: "100m"
Step 3: Roll Back Misconfigured Corefile
kubectl -n kube-system edit configmap coredns
Step 4: Scale Up CoreDNS
kubectl scale deployment coredns -n kube-system --replicas=3
Step 4: Scale Up CoreDNS
kubectl scale deployment coredns -n kube-system --replicas=3
Step 5: Check Node and Network Health
- Ensure nodes are Ready.
- Ensure Calico/Cilium/Flannel pods are working.
