Kubernetes Cluster Autoscaler 節點擴展

Kubernetes Cluster Autoscaler Node Scaling Configuration

在 Kubernetes 叢集運作中,如何根據工作負載動態調整節點數量是一個關鍵課題。Cluster Autoscaler 能夠自動擴展或縮減叢集中的節點,確保應用程式有足夠的資源運行,同時避免資源浪費。本文將深入探討 Cluster Autoscaler 的運作原理、配置策略及最佳實務。

Cluster Autoscaler 工作原理

核心概念

Cluster Autoscaler 是一個獨立的元件,負責監控叢集中的 Pod 排程狀態和節點資源使用情況。它的主要功能包括:

  1. Scale Up(擴展):當有 Pod 因資源不足而無法排程時,自動新增節點
  2. Scale Down(縮減):當節點使用率過低且其上的 Pod 可以遷移時,移除節點

擴展觸發條件

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
Pod 處於 Pending 狀態
Cluster Autoscaler 檢測到 unschedulable Pod
評估哪個 Node Group 可以容納該 Pod
向雲端供應商 API 請求新增節點
新節點加入叢集並準備就緒
Pod 被排程到新節點

縮減觸發條件

Cluster Autoscaler 會定期檢查每個節點是否符合縮減條件:

1
2
3
4
# 節點符合縮減條件的情況:
# 1. 節點資源使用率低於閾值(預設 50%)
# 2. 節點上的所有 Pod 都可以遷移到其他節點
# 3. 沒有無法遷移的 Pod(如使用 local storage、特定 node selector 等)

不會被縮減的 Pod 類型

1
2
3
4
5
6
# 以下類型的 Pod 會阻止節點被縮減:
# 1. 使用 PodDisruptionBudget 且無法滿足 PDB 的 Pod
# 2. 使用 local storage 的 Pod
# 3. 沒有 controller 管理的 Pod(非 Deployment、ReplicaSet 等建立)
# 4. 帶有 "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" annotation 的 Pod
# 5. Kube-system namespace 下沒有 PDB 或 PDB 設為 0 的 Pod

與 HPA 和 VPA 的關係

三種 Autoscaler 比較

功能HPAVPACluster Autoscaler
調整對象Pod 副本數Pod 資源請求節點數量
觸發條件CPU/Memory/自訂指標歷史資源使用Pod 無法排程
作用範圍Deployment/ReplicaSetPodNode Group

協同運作架構

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
應用負載增加
HPA 偵測到 CPU > 目標值
HPA 增加 Pod 副本數
部分 Pod 處於 Pending(資源不足)
Cluster Autoscaler 新增節點
Pending Pod 被排程到新節點

建議配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# HPA 配置範例
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15

VPA 與 Cluster Autoscaler 整合

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# VPA 配置範例
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"  # 或 "Off" 僅提供建議
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 4
        memory: 8Gi
      controlledResources: ["cpu", "memory"]

AWS EKS 整合設定

IAM 權限配置

首先建立 Cluster Autoscaler 所需的 IAM Policy:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:DescribeScalingActivities",
                "autoscaling:DescribeTags",
                "ec2:DescribeImages",
                "ec2:DescribeInstanceTypes",
                "ec2:DescribeLaunchTemplateVersions",
                "ec2:GetInstanceTypesFromInstanceRequirements",
                "eks:DescribeNodegroup"
            ],
            "Resource": ["*"]
        },
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup"
            ],
            "Resource": ["*"],
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/k8s.io/cluster-autoscaler/enabled": "true",
                    "aws:ResourceTag/k8s.io/cluster-autoscaler/<cluster-name>": "owned"
                }
            }
        }
    ]
}

建立 IAM Role for Service Account (IRSA)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# 建立 OIDC Provider(如果尚未建立)
eksctl utils associate-iam-oidc-provider \
    --cluster my-cluster \
    --approve

# 建立 IAM Role 並關聯 Service Account
eksctl create iamserviceaccount \
    --cluster=my-cluster \
    --namespace=kube-system \
    --name=cluster-autoscaler \
    --attach-policy-arn=arn:aws:iam::ACCOUNT_ID:policy/ClusterAutoscalerPolicy \
    --override-existing-serviceaccounts \
    --approve

使用 Helm 部署 Cluster Autoscaler

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# 新增 Helm repository
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm repo update

# 安裝 Cluster Autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
    --namespace kube-system \
    --set autoDiscovery.clusterName=my-cluster \
    --set awsRegion=ap-northeast-1 \
    --set rbac.serviceAccount.create=false \
    --set rbac.serviceAccount.name=cluster-autoscaler \
    --set extraArgs.balance-similar-node-groups=true \
    --set extraArgs.skip-nodes-with-system-pods=false

手動部署 YAML 配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      priorityClassName: system-cluster-critical
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
        fsGroup: 65534
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: cluster-autoscaler
        image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0
        imagePullPolicy: Always
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
        - --balance-similar-node-groups
        - --skip-nodes-with-system-pods=false
        - --scale-down-enabled=true
        - --scale-down-delay-after-add=10m
        - --scale-down-unneeded-time=10m
        - --scale-down-utilization-threshold=0.5
        resources:
          limits:
            cpu: 100m
            memory: 600Mi
          requests:
            cpu: 100m
            memory: 600Mi
        volumeMounts:
        - name: ssl-certs
          mountPath: /etc/ssl/certs/ca-certificates.crt
          readOnly: true
      volumes:
      - name: ssl-certs
        hostPath:
          path: /etc/ssl/certs/ca-bundle.crt

Auto Scaling Group Tags 配置

確保 ASG 有正確的 Tags:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 必要的 Tags
aws autoscaling create-or-update-tags --tags \
    ResourceId=my-asg-name \
    ResourceType=auto-scaling-group \
    Key=k8s.io/cluster-autoscaler/enabled \
    Value=true \
    PropagateAtLaunch=true

aws autoscaling create-or-update-tags --tags \
    ResourceId=my-asg-name \
    ResourceType=auto-scaling-group \
    Key=k8s.io/cluster-autoscaler/my-cluster \
    Value=owned \
    PropagateAtLaunch=true

Node Group 設定策略

多 Node Group 架構

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# 通用工作負載 Node Group
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: my-cluster
  region: ap-northeast-1

managedNodeGroups:
  # 通用工作負載
  - name: general-workers
    instanceType: m5.xlarge
    minSize: 2
    maxSize: 20
    desiredCapacity: 3
    volumeSize: 100
    labels:
      workload-type: general
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/my-cluster: "owned"
    iam:
      withAddonPolicies:
        autoScaler: true

  # 高記憶體工作負載
  - name: memory-optimized
    instanceType: r5.2xlarge
    minSize: 0
    maxSize: 10
    desiredCapacity: 0
    labels:
      workload-type: memory-intensive
    taints:
      - key: workload-type
        value: memory-intensive
        effect: NoSchedule
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/my-cluster: "owned"

  # GPU 工作負載
  - name: gpu-workers
    instanceType: p3.2xlarge
    minSize: 0
    maxSize: 5
    desiredCapacity: 0
    labels:
      workload-type: gpu
      nvidia.com/gpu: "true"
    taints:
      - key: nvidia.com/gpu
        value: "true"
        effect: NoSchedule
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/my-cluster: "owned"

  # Spot Instances(成本優化)
  - name: spot-workers
    instanceTypes: ["m5.xlarge", "m5a.xlarge", "m4.xlarge"]
    spot: true
    minSize: 0
    maxSize: 50
    desiredCapacity: 0
    labels:
      lifecycle: spot
    taints:
      - key: lifecycle
        value: spot
        effect: NoSchedule
    tags:
      k8s.io/cluster-autoscaler/enabled: "true"
      k8s.io/cluster-autoscaler/my-cluster: "owned"

Node Affinity 配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# 指定 Pod 使用特定 Node Group
apiVersion: apps/v1
kind: Deployment
metadata:
  name: memory-intensive-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: memory-app
  template:
    metadata:
      labels:
        app: memory-app
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: workload-type
                operator: In
                values:
                - memory-intensive
      tolerations:
      - key: workload-type
        operator: Equal
        value: memory-intensive
        effect: NoSchedule
      containers:
      - name: app
        image: my-app:latest
        resources:
          requests:
            memory: "8Gi"
            cpu: "2"
          limits:
            memory: "16Gi"
            cpu: "4"

擴展與縮減參數調校

關鍵參數說明

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Cluster Autoscaler 主要參數
command:
- ./cluster-autoscaler
# 日誌等級
- --v=4

# 雲端供應商
- --cloud-provider=aws

# 節點發現
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster

# 擴展相關參數
- --scale-up-from-zero=true              # 允許從 0 擴展
- --max-node-provision-time=15m          # 節點啟動超時時間
- --max-nodes-total=100                  # 叢集最大節點總數

# 縮減相關參數
- --scale-down-enabled=true              # 啟用縮減
- --scale-down-delay-after-add=10m       # 新增節點後延遲縮減時間
- --scale-down-delay-after-delete=0s     # 刪除節點後延遲縮減時間
- --scale-down-delay-after-failure=3m    # 縮減失敗後延遲時間
- --scale-down-unneeded-time=10m         # 節點閒置多久後縮減
- --scale-down-utilization-threshold=0.5 # 使用率低於此值視為閒置

# 平衡參數
- --balance-similar-node-groups=true     # 平衡相似 Node Group 的節點數

# 擴展器選擇
- --expander=least-waste                 # 選擇最節省資源的 Node Group

擴展行為調整

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 快速擴展配置(適合突發流量)
- --max-node-provision-time=5m
- --scale-down-delay-after-add=5m
- --scan-interval=10s

# 穩定擴展配置(適合穩定工作負載)
- --max-node-provision-time=15m
- --scale-down-delay-after-add=20m
- --scan-interval=30s
- --scale-down-unneeded-time=20m

縮減保護配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 在 Pod 上設定 annotation 防止節點被縮減
apiVersion: v1
kind: Pod
metadata:
  name: critical-pod
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
spec:
  containers:
  - name: app
    image: critical-app:latest
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 使用 PodDisruptionBudget 保護
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: critical-pdb
spec:
  minAvailable: 2  # 或使用 maxUnavailable
  selector:
    matchLabels:
      app: critical-app

Priority Expander 優先順序

Expander 類型

Cluster Autoscaler 支援多種 Expander 策略來決定使用哪個 Node Group:

Expander說明
random隨機選擇
most-pods選擇能容納最多 Pod 的
least-waste選擇資源浪費最少的
price選擇成本最低的(需要額外配置)
priority根據優先順序選擇
grpc使用外部 gRPC 服務決定

Priority Expander 配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 建立 Priority ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: kube-system
data:
  priorities: |-
    100:
      - .*spot.*           # Spot instances 最高優先
    50:
      - .*general.*        # 通用節點次之
    10:
      - .*on-demand.*      # On-demand 最低優先    

複合 Expander 策略

1
2
3
4
5
# 使用多個 expander,依序評估
command:
- ./cluster-autoscaler
- --expander=priority,least-waste
# 先依 priority 篩選,再用 least-waste 從中選擇

Spot 與 On-Demand 混合策略

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: kube-system
data:
  priorities: |-
    100:
      - .*spot-workers.*
    50:
      - .*general-workers.*
    10:
      - .*on-demand-workers.*    
---
# Cluster Autoscaler 配置
command:
- ./cluster-autoscaler
- --expander=priority
- --balance-similar-node-groups=false  # Spot 和 On-demand 不需平衡

監控與故障排除

Prometheus Metrics

Cluster Autoscaler 提供多個重要指標:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# 啟用 metrics endpoint
command:
- ./cluster-autoscaler
- --address=:8085

---
# ServiceMonitor 配置
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: cluster-autoscaler
  endpoints:
  - port: metrics
    interval: 30s

重要監控指標

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# 叢集節點總數
cluster_autoscaler_nodes_count

# 未排程的 Pod 數量
cluster_autoscaler_unschedulable_pods_count

# 擴展操作次數
cluster_autoscaler_scaled_up_nodes_total
cluster_autoscaler_scaled_down_nodes_total

# Node Group 狀態
cluster_autoscaler_node_groups_count

# 擴展失敗次數
cluster_autoscaler_failed_scale_ups_total

# 縮減狀態
cluster_autoscaler_scale_down_in_cooldown

# 最後一次活動時間
cluster_autoscaler_last_activity

Grafana Dashboard 查詢範例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 擴展趨勢
rate(cluster_autoscaler_scaled_up_nodes_total[5m])

# 各 Node Group 節點數
cluster_autoscaler_nodes_count{node_group=~".*"}

# 擴展延遲
histogram_quantile(0.99,
  rate(cluster_autoscaler_function_duration_seconds_bucket[5m])
)

日誌分析

1
2
3
4
5
6
7
8
# 查看 Cluster Autoscaler 日誌
kubectl logs -n kube-system -l app=cluster-autoscaler -f

# 過濾擴展相關日誌
kubectl logs -n kube-system -l app=cluster-autoscaler | grep -E "Scale|scale"

# 查看節點狀態變化
kubectl logs -n kube-system -l app=cluster-autoscaler | grep -E "node|Node"

常見問題排除

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# 1. 檢查 Cluster Autoscaler 狀態
kubectl get pods -n kube-system -l app=cluster-autoscaler
kubectl describe pod -n kube-system -l app=cluster-autoscaler

# 2. 查看 ConfigMap 狀態(包含擴展決策)
kubectl get configmap -n kube-system cluster-autoscaler-status -o yaml

# 3. 檢查 Node Group 設定
kubectl get nodes --show-labels | grep -E "node-group|node.kubernetes.io"

# 4. 查看 Pending Pods
kubectl get pods --all-namespaces --field-selector=status.phase=Pending

# 5. 分析 Pod 無法排程原因
kubectl describe pod <pending-pod-name>

# 6. 檢查 ASG 狀態
aws autoscaling describe-auto-scaling-groups \
    --auto-scaling-group-names my-asg-name \
    --query 'AutoScalingGroups[*].[MinSize,MaxSize,DesiredCapacity]'

擴展失敗常見原因

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 1. ASG 達到上限
# 解決方案:增加 ASG maxSize 或 maxNodes-total

# 2. EC2 配額不足
# 解決方案:請求增加 EC2 Service Quota

# 3. 可用區容量不足
# 解決方案:使用多個 instance type 或可用區

# 4. IAM 權限不足
# 解決方案:確認 IRSA 配置正確

# 5. Node Group Tags 錯誤
# 解決方案:檢查 ASG 的 k8s.io/cluster-autoscaler tags

最佳實務與成本優化

資源請求最佳化

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# 確保 Pod 有明確的 resource requests
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
      - name: app
        resources:
          requests:
            cpu: "500m"      # 明確設定 request
            memory: "512Mi"
          limits:
            cpu: "1000m"
            memory: "1Gi"

多可用區配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# 確保高可用性
managedNodeGroups:
  - name: workers-az-a
    availabilityZones: ["ap-northeast-1a"]
    minSize: 1
    maxSize: 10

  - name: workers-az-c
    availabilityZones: ["ap-northeast-1c"]
    minSize: 1
    maxSize: 10

  - name: workers-az-d
    availabilityZones: ["ap-northeast-1d"]
    minSize: 1
    maxSize: 10
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# 配合 Pod Topology Spread Constraints
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: my-app

Spot Instance 策略

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# 混合使用 Spot 和 On-Demand
managedNodeGroups:
  # On-Demand 基礎容量
  - name: on-demand-base
    instanceType: m5.xlarge
    minSize: 2
    maxSize: 5
    labels:
      lifecycle: on-demand

  # Spot 彈性容量
  - name: spot-workers
    instanceTypes:
      - m5.xlarge
      - m5a.xlarge
      - m5n.xlarge
      - m4.xlarge
    spot: true
    minSize: 0
    maxSize: 20
    labels:
      lifecycle: spot
    taints:
      - key: lifecycle
        value: spot
        effect: NoSchedule
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Spot 友善的 Pod 配置
apiVersion: apps/v1
kind: Deployment
metadata:
  name: stateless-app
spec:
  template:
    spec:
      tolerations:
      - key: lifecycle
        operator: Equal
        value: spot
        effect: NoSchedule
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: lifecycle
                operator: In
                values:
                - spot
      terminationGracePeriodSeconds: 30
      containers:
      - name: app
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 10"]

成本優化清單

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
1. **Right-sizing Node Groups**
   - 使用適當大小的 instance type
   - 避免單一大型節點(增加 bin packing 效率)

2. **Spot Instance 使用**
   - 無狀態工作負載優先使用 Spot
   - 配置多種 instance type 增加可用性

3. **縮減參數調整**
   - 適當降低 scale-down-unneeded-time
   - 根據業務特性調整 utilization-threshold

4. **Resource Requests 優化**
   - 使用 VPA 的建議值調整 requests
   - 避免過度配置造成資源浪費

5. **混合 Node Group 策略**
   - 基礎容量使用 On-Demand
   - 彈性容量使用 Spot
   - 使用 Priority Expander 控制順序

生產環境建議配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - name: cluster-autoscaler
        command:
        - ./cluster-autoscaler
        - --v=4
        - --cloud-provider=aws
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster

        # 擴展配置
        - --scale-up-from-zero=true
        - --max-node-provision-time=10m
        - --max-nodes-total=200
        - --cores-total=0:1000
        - --memory-total=0:5000

        # 縮減配置
        - --scale-down-enabled=true
        - --scale-down-delay-after-add=10m
        - --scale-down-unneeded-time=10m
        - --scale-down-utilization-threshold=0.5
        - --skip-nodes-with-local-storage=false
        - --skip-nodes-with-system-pods=false

        # 平衡與選擇
        - --balance-similar-node-groups=true
        - --expander=priority

        # 安全配置
        - --new-pod-scale-up-delay=0s
        - --max-graceful-termination-sec=600

        resources:
          limits:
            cpu: 200m
            memory: 1Gi
          requests:
            cpu: 100m
            memory: 500Mi

監控告警設定

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Prometheus AlertManager 規則
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: cluster-autoscaler-alerts
  namespace: kube-system
spec:
  groups:
  - name: cluster-autoscaler
    rules:
    - alert: ClusterAutoscalerUnschedulablePods
      expr: cluster_autoscaler_unschedulable_pods_count > 0
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: "有 Pod 無法排程超過 15 分鐘"

    - alert: ClusterAutoscalerNodeGroupAtMax
      expr: cluster_autoscaler_nodes_count >= cluster_autoscaler_max_nodes_count
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Node Group 已達最大節點數"

    - alert: ClusterAutoscalerScaleUpFailure
      expr: increase(cluster_autoscaler_failed_scale_ups_total[1h]) > 5
      labels:
        severity: warning
      annotations:
        summary: "過去一小時擴展失敗超過 5 次"

總結

Kubernetes Cluster Autoscaler 是實現叢集彈性擴展的關鍵元件。正確配置後,它能夠:

  1. 自動因應工作負載變化調整節點數量
  2. 與 HPA/VPA 協同運作,實現完整的自動擴展方案
  3. 透過 Priority Expander 實現成本優化策略
  4. 整合 Spot Instance 大幅降低運算成本

在生產環境中,建議:

  • 設定適當的資源 requests 確保正確的擴展決策
  • 使用多 Node Group 策略分離不同工作負載
  • 配置完善的監控和告警機制
  • 定期檢視和調整擴展參數

透過本文的配置範例和最佳實務,您可以建立一個既能應對流量高峰,又能在閒置時控制成本的 Kubernetes 叢集。

comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy