Kubernetes Taints 與 Tolerations

概述

在 Kubernetes 中，Taints 與 Tolerations 是一組強大的機制，用於控制 Pod 可以被調度到哪些節點上。Taints 設定在節點上，用於排斥特定的 Pod；而 Tolerations 則設定在 Pod 上，允許 Pod 被調度到具有相應 Taint 的節點。

這種機制與 Node Affinity 不同：Node Affinity 是吸引 Pod 到特定節點，而 Taints 則是排斥 Pod 離開特定節點。兩者可以搭配使用，實現更精細的調度控制。

Taint 效果類型

Kubernetes 支援三種 Taint 效果（Effect）：

效果類型	說明
`NoSchedule`	新的 Pod 不會被調度到該節點，但已存在的 Pod 不受影響
`PreferNoSchedule`	軟性限制，調度器會盡量避免將 Pod 調度到該節點，但非強制
`NoExecute`	新的 Pod 不會被調度，且已存在且不容忍該 Taint 的 Pod 會被驅逐

為節點新增 Taint

使用 kubectl taint 命令為節點新增 Taint：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 新增 Taint
kubectl taint nodes node1 key=value:NoSchedule

# 範例：標記節點為專用 GPU 節點
kubectl taint nodes gpu-node-01 gpu=true:NoSchedule

# 範例：標記節點正在維護中
kubectl taint nodes node1 maintenance=true:NoExecute

# 移除 Taint（在結尾加上減號）
kubectl taint nodes node1 key=value:NoSchedule-

# 查看節點的 Taints
kubectl describe node node1 | grep -A 5 Taints

Pod Tolerations 設定

在 Pod 的 spec 中加入 tolerations 欄位，讓 Pod 能夠容忍特定的 Taint：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
spec:
  containers:
  - name: nginx
    image: nginx:1.24
  tolerations:
  - key: "gpu"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"

Deployment 中的 Tolerations 設定

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-workload
spec:
  replicas: 3
  selector:
    matchLabels:
      app: gpu-app
  template:
    metadata:
      labels:
        app: gpu-app
    spec:
      containers:
      - name: gpu-container
        image: nvidia/cuda:12.0-base
        resources:
          limits:
            nvidia.com/gpu: 1
      tolerations:
      - key: "gpu"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
      - key: "dedicated"
        operator: "Equal"
        value: "gpu-workloads"
        effect: "NoSchedule"

運算子（Operator）

Tolerations 支援兩種運算子：

Equal 運算子

Equal 運算子要求 key、value 和 effect 必須完全匹配：

1
2
3
4
5
tolerations:
- key: "environment"
  operator: "Equal"
  value: "production"
  effect: "NoSchedule"

Exists 運算子

Exists 運算子只檢查 key 是否存在，不需要指定 value：

1
2
3
4
5
6
7
8
tolerations:
# 容忍所有具有 key "dedicated" 的 Taint
- key: "dedicated"
  operator: "Exists"
  effect: "NoSchedule"

# 容忍所有 Taint（萬用設定）
- operator: "Exists"

常見使用場景

1. 專用節點（Dedicated Nodes）

為特定工作負載保留專用節點：

1
2
3
# 標記專用節點
kubectl taint nodes dedicated-node dedicated=special-user:NoSchedule
kubectl label nodes dedicated-node dedicated=special-user

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
apiVersion: v1
kind: Pod
metadata:
  name: special-workload
spec:
  containers:
  - name: app
    image: myapp:latest
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "special-user"
    effect: "NoSchedule"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: dedicated
            operator: In
            values:
            - special-user

2. 節點維護

進行節點維護時，先驅逐現有 Pod：

1
2
3
4
5
# 標記節點進入維護模式
kubectl taint nodes node1 maintenance=true:NoExecute

# 維護完成後移除 Taint
kubectl taint nodes node1 maintenance=true:NoExecute-

3. 容忍驅逐的寬限期

使用 tolerationSeconds 設定 Pod 被驅逐前的等待時間：

1
2
3
4
5
tolerations:
- key: "node.kubernetes.io/unreachable"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 300

與 Node Affinity 結合

結合 Taints/Tolerations 與 Node Affinity 可實現更精確的調度：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
apiVersion: apps/v1
kind: Deployment
metadata:
  name: high-priority-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: high-priority
  template:
    metadata:
      labels:
        app: high-priority
    spec:
      containers:
      - name: app
        image: myapp:latest
      tolerations:
      - key: "tier"
        operator: "Equal"
        value: "critical"
        effect: "NoSchedule"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: tier
                operator: In
                values:
                - critical
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: high-priority
              topologyKey: kubernetes.io/hostname

內建 Taints

Kubernetes 會自動為節點新增某些內建 Taints：

Taint Key	說明
`node.kubernetes.io/not-ready`	節點尚未就緒
`node.kubernetes.io/unreachable`	節點無法連線
`node.kubernetes.io/memory-pressure`	節點記憶體壓力過大
`node.kubernetes.io/disk-pressure`	節點磁碟壓力過大
`node.kubernetes.io/pid-pressure`	節點 PID 資源不足
`node.kubernetes.io/network-unavailable`	節點網路不可用
`node.kubernetes.io/unschedulable`	節點被標記為不可調度

DaemonSet Controller 預設會為 DaemonSet Pod 自動加入對這些內建 Taint 的容忍。

最佳實踐

明確命名規範：使用有意義的 key 和 value，例如 dedicated=gpu-workloads 而非 d=g
搭配標籤使用：Taints 通常應搭配對應的 Node Labels 和 Node Affinity 使用，確保 Pod 不只是「可以」調度到節點，而是「必須」調度到節點
謹慎使用 NoExecute：此效果會驅逐現有 Pod，在生產環境中需謹慎評估影響
設定 tolerationSeconds：對於 NoExecute 類型的 Taint，考慮設定合理的寬限期
文件化：將叢集中使用的 Taints 記錄在文件中，方便團隊成員查閱
定期檢視：定期檢查節點上的 Taints，移除不再需要的設定

1
2
# 列出所有節點的 Taints
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints