Learnitweb

What Happens if a Node Runs Out of Memory in Kubernetes?

1. Introduction

In a Kubernetes cluster, a node is a worker machine (virtual or physical) that runs containerized workloads. Each node has finite resources—CPU, memory, disk, and network bandwidth. If a node runs out of memory, Kubernetes and the underlying OS (typically Linux) must respond immediately to prevent system crashes.

2. What is Memory Pressure?

Memory pressure occurs when the available RAM on a node is low or exhausted. Unlike CPU, which can be throttled, memory cannot be oversubscribed. When memory is exhausted, either the Linux kernel or Kubernetes has to reclaim or free up memory—often by terminating processes.

3. Key Components Involved

  • Kubelet – The agent on each node that manages containers and resources.
  • cgroups – Linux control groups that track and limit resource usage per container.
  • OOM Killer – A Linux kernel mechanism that kills processes to reclaim memory.

4. What Happens When a Node Runs Out of Memory?

The behavior depends on the severity and whether the node is under soft or hard memory pressure.

Phase 1: Early Warning

  • Kubelet notices memory pressure and starts reporting MemoryPressure: True in node conditions.
  • The eviction manager kicks in, which may evict pods to reduce memory usage.

You can check this via:

kubectl describe node <node-name>

Look for:

Conditions:
  Type              Status
  MemoryPressure    True

Phase 2: Pod Eviction

Kubernetes uses QoS (Quality of Service) to decide which pods to evict:

  • BestEffort pods are evicted first
  • Then Burstable
  • Guaranteed pods are evicted last

Eviction looks like this in pod events:

The node had condition: [MemoryPressure]
The pod was evicted.

The evicted pod is terminated and rescheduled on another node (if one is available).

Phase 3: OOM Killer Triggered (Out of Memory Killer)

If memory pressure is severe and eviction is too slow, the Linux kernel intervenes.

  • The OOM Killer selects and kills containers using excessive memory.
  • This is based on heuristics like memory usage and oom_score_adj.

In pod logs, you might see:

OOMKilled: true

Or in describe output:

> kubectl describe pod <pod-name>

State:     Terminated
Reason:    OOMKilled

Note:

  • OOMKilled pods are not gracefully shut down.
  • They may restart if part of a deployment or daemonset.

5. Impact on Applications and Cluster

  1. Pods crash or are evicted.
  2. High-priority pods may be spared, but even they can be OOMKilled if limits are exceeded.
  3. Performance degrades due to disk swapping (if enabled).
  4. Node may become NotReady if system daemons (like kubelet) are killed or hang.
  5. Unscheduled pods increase as node capacity appears full.

6. How to Prevent a Node from Running Out of Memory

1. Set Memory Requests and Limits

Always set both resources.requests.memory and resources.limits.memory in your pod specs.

resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"

2. Use QoS Class Effectively

Use Guaranteed or Burstable QoS for critical workloads.

3. Monitor Resource Usage

Set up Prometheus + Grafana dashboards or use cloud monitoring tools to alert on memory usage per node.

4. Enable Resource Autoscaling

Use:

  • Cluster Autoscaler to add nodes under high memory pressure.
  • Vertical Pod Autoscaler (VPA) to recommend pod memory adjustments.

5. Reserve Node Resources for System

Use the --kube-reserved and --system-reserved flags in kubelet config:

kubeReserved:
  memory: "512Mi"
systemReserved:
  memory: "512Mi"

This helps protect the kubelet and OS from being starved.

6. Use Taints and Pod Priority

Taint critical nodes and assign priority classes to important pods to protect them from eviction.