What a Service is
A Service is a Kubernetes API object (apiVersion: v1, kind: Service) that provides a stable network endpoint for a set of pods selected by label. The control plane watches for pods matching the Service's spec.selector, records their IPs in EndpointSlice objects, and programs the data plane so that packets addressed to the Service's virtual IP (VIP) reach a healthy pod.
The VIP does not change for the lifetime of the Service object. CoreDNS maintains a DNS record (<svc>.<namespace>.svc.cluster.local) pointing to it, so clients can reference services by name.
One detail that trips people up: kube-proxy, which runs on every node, does not proxy connections in the traditional sense. In iptables and nftables mode, it programs kernel NAT rules. Traffic flows directly from source to pod through the kernel; kube-proxy only sets the rules up. It is a control-plane component for the data plane, not a data-plane intermediary.
As of Kubernetes v1.33, the older Endpoints API is deprecated in favor of EndpointSlice. New tooling should use discovery.k8s.io/v1 EndpointSlice objects rather than the v1 Endpoints API.
ClusterIP: internal traffic only
ClusterIP is the default. If you omit spec.type, you get a ClusterIP Service.
Kubernetes assigns a stable virtual IP from the service CIDR, reachable only from within the cluster. kube-proxy programs iptables/IPVS/nftables rules so that any pod in the cluster can reach this VIP on the declared port. Packets are DNAT'd to one of the backing pod IPs.
apiVersion: v1
kind: Service
metadata:
name: backend-api
namespace: production
spec:
selector:
app: backend-api
ports:
- port: 80 # port clients connect to
targetPort: 8080 # port the container listens on
DNS resolves backend-api.production.svc.cluster.local to the ClusterIP address. Session affinity is available via sessionAffinity: ClientIP with a configurable timeoutSeconds (default 10800), which pins a client IP to a single pod for the duration.
ClusterIP is the right choice for anything that does not need external exposure: databases, caches, internal APIs, monitoring scrape targets. It has the smallest attack surface of any Service type. You can still reach it from outside the cluster via kubectl port-forward or the API proxy for debugging, but that is an explicit operator action, not a routing path.
NodePort: external access via node ports
NodePort is layered on top of ClusterIP. When you create a NodePort Service, Kubernetes also provisions a ClusterIP. On top of that, it opens a port in the 30000–32767 range on every node in the cluster (configurable via --service-node-port-range on kube-apiserver).
apiVersion: v1
kind: Service
metadata:
name: demo-app
namespace: staging
spec:
type: NodePort
selector:
app: demo-app
ports:
- port: 80
targetPort: 8080
nodePort: 31080 # optional; auto-assigned if omitted
Any traffic hitting <any-node-ip>:31080 gets forwarded by kube-proxy to a matching pod. But there is no cross-node load balancing built in. The node the client happens to reach handles the forwarding, and kube-proxy may send the packet to a pod on a different node, causing a second network hop and SNAT that hides the real client IP.
Setting externalTrafficPolicy: Local avoids the extra hop and preserves the source IP, but traffic is dropped entirely if the node has no local pod for that Service. This is a hard trade-off, not a default you should flip without understanding the implications.
For production workloads, NodePort is strongly discouraged as a direct exposure mechanism. Non-standard ports, no automatic cross-node distribution, and every node in the cluster being a potential entry point create both operational complexity and a wide attack surface. NodePort is useful for development, on-prem clusters behind a separate load balancer, or as the building block that LoadBalancer uses internally.
LoadBalancer: cloud provider integration
LoadBalancer is layered on NodePort, which is layered on ClusterIP. Creating one provisions all three: a ClusterIP, a NodePort, and an external cloud load balancer (AWS ELB/NLB, GCP LB, Azure LB, or whatever your cloud controller supports).
The cloud load balancer distributes traffic across the NodePorts on your cluster nodes. kube-proxy on each node then forwards to pods.
apiVersion: v1
kind: Service
metadata:
name: public-api
namespace: production
spec:
type: LoadBalancer
selector:
app: public-api
ports:
- port: 443
targetPort: 8443
The layering matters for cost. Each LoadBalancer Service provisions its own cloud load balancer. On AWS, an ELB costs roughly $16/month before data transfer. Ten services means ten load balancers. For HTTP/HTTPS workloads, Ingress or Gateway API routes multiple services behind a single load balancer, which is far more economical.
Traffic policies and source IP
externalTrafficPolicy: Cluster (the default) distributes traffic across all nodes. The cloud load balancer health-checks every node and forwards broadly. The downside: kube-proxy applies SNAT, replacing the client IP with the node IP before the packet reaches the pod. Your application sees node IPs, not real client IPs.
externalTrafficPolicy: Local tells kube-proxy to only forward to pods running on the local node. No SNAT, so the pod sees the real client IP. The cloud load balancer must use healthCheckNodePort to probe which nodes actually have local pods and stop sending traffic to empty nodes. If your pods are not well-distributed across nodes, traffic distribution becomes uneven.
Since Kubernetes v1.26, internalTrafficPolicy: Local does the same for cluster-internal traffic: restrict forwarding to node-local pods only.
Bypassing the NodePort layer
Since Kubernetes 1.24 (GA), setting allocateLoadBalancerNodePorts: false skips the NodePort allocation entirely. This works with CNIs that support direct pod routing (AWS VPC CNI, Azure CNI), where the cloud load balancer sends traffic straight to routable pod IPs. No NodePort, no extra NAT hop, and source IP is preserved without needing externalTrafficPolicy: Local.
On bare-metal or on-prem clusters without a cloud controller, a LoadBalancer Service stays in Pending state indefinitely. Projects like MetalLB fill this gap by providing a software-based load balancer implementation.
ExternalName: DNS CNAME alias
ExternalName does not create a VIP or program any kube-proxy rules. It is a pure DNS mechanism. When a pod resolves the Service name, CoreDNS returns a CNAME record pointing to whatever hostname you specified in spec.externalName.
apiVersion: v1
kind: Service
metadata:
name: billing-db
namespace: production
spec:
type: ExternalName
externalName: billing.c9xk3q.eu-west-1.rds.amazonaws.com
Pods can now connect to billing-db.production.svc.cluster.local and reach the RDS instance. Useful for migrating external dependencies into the cluster DNS namespace, or for staging a transition from an external database to an in-cluster one without changing application config.
The TLS trap. The official Kubernetes documentation explicitly warns: protocols that use hostnames (especially TLS) will break. The client sends SNI based on the internal service name (billing-db.production.svc.cluster.local), but the server's certificate is issued for the external hostname (billing.c9xk3q.eu-west-1.rds.amazonaws.com). The TLS handshake fails with a certificate mismatch. The workaround is to configure your client to use the external hostname for TLS validation, which partly defeats the purpose of the abstraction.
One more gotcha: the default ndots:5 in pod DNS configuration means that any name with fewer than five dots triggers search-domain expansion before direct resolution. An external name like api.example.com has three dots, so the resolver tries appending each search domain first (three wasted lookups) before resolving it directly.
Headless Services: direct pod addressing
A headless Service is a ClusterIP Service with clusterIP: None. No VIP is allocated. No kube-proxy rules are created. Instead, CoreDNS returns A/AAAA records for every Ready pod IP when the Service name is queried.
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: data
spec:
clusterIP: None
selector:
app: redis
ports:
- port: 6379
Combined with a StatefulSet, each pod gets a stable, predictable DNS entry: redis-0.redis.data.svc.cluster.local, redis-1.redis.data.svc.cluster.local, and so on. This is what distributed systems like Cassandra, etcd, Kafka, and MySQL clusters need for peer discovery and leader election. Each member needs to address every other member directly; a VIP that randomly load-balances defeats the purpose.
Headless Services are not limited to StatefulSets. Some gRPC clients and database drivers (MongoDB, for instance) do their own DNS-based endpoint selection. For those, a headless Service gives them the raw pod IPs and lets the client-side logic handle routing.
SRV records are also published for named ports, which enables protocols that use SRV-based discovery.
How kube-proxy routes traffic
kube-proxy runs on every node and watches the API server for Service and EndpointSlice changes. It programs the Linux kernel's networking stack so that packets addressed to a Service VIP are rewritten to a pod IP. The implementation depends on the mode.
iptables mode (current default)
kube-proxy creates chains in the kernel's NAT table: PREROUTING rules intercept packets, KUBE-SERVICES chains match on destination IP and port, and per-endpoint rules use --probability for roughly equal random distribution across pods.
The problem: rule traversal is O(n) in the number of services. At 10,000 services, the kernel walks through tens of thousands of iptables rules for every new connection. With long-lived keepalive connections (the norm for microservices), the impact is small because the cost is paid once per connection. Without keepalive, CPU overhead can reach ~35% more than IPVS at that scale.
iptables will remain the default even after nftables reaches GA.
IPVS mode
Uses the kernel's IP Virtual Server module, purpose-built for load balancing. Lookup is O(1) via hash tables. Offers seven load-balancing algorithms instead of just random: round-robin, least connections, shortest expected delay, destination hashing, source hashing, and more.
At 10,000 services without keepalive, IPVS adds roughly 8% CPU overhead vs. iptables' 35%. With keepalive (100 requests per connection), the gap shrinks to about 2%. For most clusters under 1,000 services, the difference is negligible.
IPVS packets follow different paths through iptables filter hooks. If you rely on iptables-based network policies or other tooling that inspects iptables chains, test compatibility before switching.
nftables mode (beta in 1.31, expected GA in 1.33)
nftables replaces sequential iptables rules with verdict maps, which are kernel-level hash tables. Lookup is O(1). At 5,000–10,000 Services, the p50 latency for nftables matches the p01 (near-best-case) latency for iptables.
Control-plane performance is also better: nftables supports incremental ruleset updates, where iptables rewrites the full rule table on every sync. Requires Linux kernel 5.13 or newer.
Service discovery via DNS
CoreDNS is the standard cluster DNS provider. The kubelet injects a resolv.conf into every pod:
search staging.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10
options ndots:5
The ndots:5 setting means any name with fewer than five dots triggers search-domain expansion before direct resolution. Resolving backend-api (zero dots) hits backend-api.staging.svc.cluster.local on the first try and succeeds in one lookup. Resolving api.example.com (three dots) tries three search-domain permutations before the direct query succeeds. For pods that make many external DNS calls, setting dnsConfig.options with ndots: 2 reduces unnecessary lookups.
| Query pattern | Record type | Returns |
|---|---|---|
<svc>.<ns>.svc.cluster.local (ClusterIP) |
A / AAAA | ClusterIP address |
<svc>.<ns>.svc.cluster.local (headless) |
A / AAAA | All Ready pod IPs |
_<port>._<proto>.<svc>.<ns>.svc.cluster.local |
SRV | Port number + hostname |
<svc>.<ns>.svc.cluster.local (ExternalName) |
CNAME | External hostname |
<pod-name>.<svc>.<ns>.svc.cluster.local (StatefulSet + headless) |
A | Specific pod IP |
Kubernetes also injects environment variables ({SVCNAME}_SERVICE_HOST, {SVCNAME}_SERVICE_PORT) into pods, but only for Services that existed before the pod started. DNS is the recommended discovery mechanism.
What a Service is not
A Service is not an Ingress or a Gateway. Services operate at L4 (TCP/UDP). They forward packets based on IP and port. They do not inspect HTTP headers, route by hostname or path, terminate TLS, or retry failed requests. Ingress and Gateway API operate at L7 and sit in front of ClusterIP Services to provide HTTP-aware routing.
A Service is not a service mesh. A service mesh (Istio, Linkerd) adds per-request telemetry, mutual TLS, circuit breaking, and retry policies. These are sidecar-level concerns that operate independently of the Kubernetes Service object, even though the mesh uses Service DNS names as routing targets.
A Service does not rebalance long-lived connections. kube-proxy distributes new connections. Once a TCP connection is established (HTTP/2 stream, gRPC channel, WebSocket, database pool), all requests on that connection go to the same pod. If you scale up the backend, the new pods receive none of the existing connections. Strategies for this include shorter keepalive timeouts, client-side load balancing with headless Services, or a service mesh that can split multiplexed streams.
A Service is not a firewall. A ClusterIP Service is cluster-internal by default, but it does not enforce access control. NetworkPolicy is the Kubernetes mechanism for restricting which pods can communicate with each other.
When to use which type
| Scenario | Type | Reason |
|---|---|---|
| Internal microservice, database, cache | ClusterIP | Smallest surface area; no external exposure needed |
| Quick external access in development | NodePort | Simple; no cloud LB dependency |
| TCP/UDP service needing a public IP | LoadBalancer | Cloud LB handles HA and health checks |
| Multiple HTTP/HTTPS services behind one IP | Ingress or Gateway API | One cloud LB shared across services; L7 routing; cost-effective |
| DNS alias for an external database or API | ExternalName | No proxy overhead; mind the TLS caveat |
| Distributed stateful system (Kafka, etcd, Cassandra) | Headless + StatefulSet | Per-pod stable DNS; peer discovery; no random LB |
For most production clusters, the pattern is: ClusterIP for everything internal, one or two LoadBalancer Services for non-HTTP TCP endpoints, and Gateway API or Ingress for HTTP/HTTPS traffic routed through a shared load balancer. NodePort and ExternalName fill niche roles. Headless Services are the standard for any stateful workload that needs deterministic pod addressing.