Overview
Single-node Minikube cannot simulate distributed Kubernetes behavior. When there is only one node, every scheduling decision is trivial — affinity rules become no-ops, topology constraints have nothing to spread across, and node failure is total cluster failure. Multi-node clusters let you test the behaviors that matter in production: pod placement, rolling updates across nodes, zone-aware scheduling, and graceful node failure recovery.
Create a Multi-Node Cluster
# 3-node cluster with Docker driver (recommended)
minikube start --nodes 3 --driver=docker --cpus=2 --memory=4096
# Verify all nodes are ready
kubectl get nodes -o wide
# NAME STATUS ROLES AGE VERSION INTERNAL-IP
# minikube Ready control-plane 60s v1.30.0 192.168.49.2
# minikube-m02 Ready <none> 30s v1.30.0 192.168.49.3
# minikube-m03 Ready <none> 15s v1.30.0 192.168.49.4
The --cpus and --memory flags apply per node. A 3-node cluster with --cpus=2 --memory=4096 uses 6 CPUs and 12GB RAM total on your host. Plan accordingly.
Driver Selection
# Docker (recommended — fastest, least overhead)
minikube start --nodes 3 --driver=docker
# Hyperkit (macOS — better network isolation)
minikube start --nodes 3 --driver=hyperkit
# KVM2 (Linux — full VM isolation)
minikube start --nodes 3 --driver=kvm2
# Hyper-V (Windows)
minikube start --nodes 3 --driver=hyperv
Docker driver creates container-based nodes (fastest startup, lowest overhead). VM-based drivers provide better isolation but consume more resources and take longer to start.
Label Nodes for Zone Simulation
Production clusters span availability zones. Simulate this by labeling your minikube nodes:
# Simulate availability zones
kubectl label node minikube topology.kubernetes.io/zone=zone-a
kubectl label node minikube-m02 topology.kubernetes.io/zone=zone-b
kubectl label node minikube-m03 topology.kubernetes.io/zone=zone-c
# Simulate node roles
kubectl label node minikube-m02 workload=compute
kubectl label node minikube-m03 workload=compute
# Simulate instance types
kubectl label node minikube-m02 node.kubernetes.io/instance-type=m5.large
kubectl label node minikube-m03 node.kubernetes.io/instance-type=c5.xlarge
# Verify labels
kubectl get nodes --show-labels
These labels are identical to what cloud providers assign automatically. Code that works with these labels on minikube will work on EKS, GKE, or AKS without changes.
Topology Spread Constraints
Topology spread constraints distribute pods evenly across topology domains (nodes, zones, regions). This is the modern replacement for pod anti-affinity for most use cases:
# spread-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
topologySpreadConstraints:
# Spread evenly across nodes
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: web
# Also spread across zones
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: web
containers:
- name: nginx
image: nginx:1.25-alpine
resources:
requests:
cpu: 100m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi# Apply and verify spread
kubectl apply -f spread-deployment.yaml
kubectl get pods -o wide
# Each pod should be on a different node
The maxSkew: 1 with DoNotSchedule means the scheduler will not place a pod if it would create more than 1 pod difference between any two nodes. With 3 replicas and 3 nodes, you get exactly one pod per node.
Pod Anti-Affinity
For more complex scheduling rules, use pod anti-affinity:
# anti-affinity-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
spec:
replicas: 3
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
affinity:
# Hard rule: never two redis pods on the same node
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- redis
topologyKey: kubernetes.io/hostname
# Soft rule: prefer nodes in different zones
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web
topologyKey: topology.kubernetes.io/zone
containers:
- name: redis
image: redis:7-alpine
resources:
requests:
cpu: 100m
memory: 128MiNode Selectors and Taints
# Taint a node (only pods that tolerate the taint can schedule)
kubectl taint nodes minikube-m03 dedicated=gpu:NoSchedule
# Verify taint
kubectl describe node minikube-m03 | grep -A5 Taints
# gpu-workload.yaml — only schedules on tainted node
apiVersion: v1
kind: Pod
metadata:
name: gpu-job
spec:
nodeSelector:
workload: compute
tolerations:
- key: dedicated
operator: Equal
value: gpu
effect: NoSchedule
containers:
- name: compute
image: python:3.12-slim
command: ["python", "-c", "print('Running on dedicated node')"]Simulate Node Failures
Testing resilience requires breaking things intentionally:
# Stop a worker node (simulates node failure)
minikube node stop minikube-m03
# Watch pods reschedule to remaining nodes
kubectl get pods -o wide --watch
# Check node status — should show NotReady
kubectl get nodes
# Restart the failed node
minikube node start minikube-m03
# Watch pods rebalance (depends on your scheduling rules)
kubectl get pods -o wide --watch
Controlled Drain Test
# Gracefully drain a node (respects PodDisruptionBudgets)
kubectl drain minikube-m02 --ignore-daemonsets --delete-emptydir-data
# Verify pods moved to other nodes
kubectl get pods -o wide
# Uncordon the node to allow scheduling again
kubectl uncordon minikube-m02
The difference between minikube node stop and kubectl drain is important: stop simulates a sudden failure (crash, network partition), while drain simulates a planned maintenance event. Test both.
Dynamic Node Management
# Add a new worker node to a running cluster
minikube node add
# Add a node with a specific name
minikube node add --worker
# Remove a node
minikube node delete minikube-m04
# List all nodes
minikube node list
High Availability Control Plane
Minikube supports multi-control-plane clusters for testing HA configurations:
# Start with 3 control plane nodes and 2 workers
minikube start --ha --nodes 5 --driver=docker --cpus=2 --memory=4096
# Verify HA setup
kubectl get nodes
# 3 nodes with role control-plane, 2 with <none> (workers)
HA clusters require more resources but let you test control plane failure scenarios — what happens when a control plane node goes down, API server failover, etcd quorum loss.
Resource Management
Multi-node clusters consume significant host resources. Manage them carefully:
# Check resource usage per node
kubectl top nodes
# Check resource allocation
kubectl describe node minikube | grep -A10 "Allocated resources"
# Pause the cluster (saves resources, preserves state)
minikube pause
# Resume the cluster
minikube unpause
# Stop the cluster entirely (frees all resources)
minikube stop
# Delete the cluster and all data
minikube delete
Profile-Based Cluster Management
Run multiple named clusters for different testing scenarios:
# Create a profile for multi-node testing
minikube start -p multi-node --nodes 3 --driver=docker
# Create a separate profile for HA testing
minikube start -p ha-test --ha --nodes 5 --driver=docker
# Switch between profiles
minikube profile multi-node
minikube profile ha-test
# List all profiles
minikube profile list
# Delete a specific profile
minikube delete -p ha-test
Testing Rolling Updates Across Nodes
# Deploy v1 across all nodes
kubectl apply -f spread-deployment.yaml
kubectl set image deployment/web nginx=nginx:1.24-alpine
# Trigger a rolling update and watch node-by-node progress
kubectl set image deployment/web nginx=nginx:1.25-alpine
kubectl rollout status deployment/web
kubectl get pods -o wide --watch
With topology spread constraints, the rolling update replaces one pod per node at a time, maintaining availability across all nodes throughout the update.
Best Practices
- -Allocate realistic resources per node — starving nodes masks scheduling issues. Use at least 2 CPUs and 2GB RAM per node.
- -Use node labels that match your cloud provider's conventions (topology.kubernetes.io/zone, node.kubernetes.io/instance-type) so manifests are portable.
- -Test both sudden failure (
minikube node stop) and graceful drain (kubectl drain) — they exercise different code paths. - -Use topology spread constraints over pod anti-affinity for most use cases — they are more flexible and easier to reason about.
- -Clean up multi-node clusters when done — three Docker containers consuming 12GB RAM will drain a laptop battery fast.
- -Use profiles to maintain separate cluster configurations for different test scenarios.
- -Set resource requests on all pods — without requests, the scheduler cannot make informed placement decisions.
Common Pitfalls
- -Allocating too much per node and running out of host resources — a swapping host gives misleading performance results.
- -Forgetting that the control-plane node runs workloads by default in minikube — taint it if you want dedicated worker testing.
- -Not setting resource requests on pods — without requests, topology spread constraints and affinity rules work but scheduling quality degrades.
- -Testing scheduling on a single-node cluster — affinity, anti-affinity, and topology spread do nothing with one node. Syntax validates, but behavior does not.
- -Leaving multi-node clusters running in the background — use
minikube pause when not actively testing.