在 Kubernetes 叢集運作中,如何根據工作負載動態調整節點數量是一個關鍵課題。Cluster Autoscaler 能夠自動擴展或縮減叢集中的節點,確保應用程式有足夠的資源運行,同時避免資源浪費。本文將深入探討 Cluster Autoscaler 的運作原理、配置策略及最佳實務。
Cluster Autoscaler 工作原理
核心概念
Cluster Autoscaler 是一個獨立的元件,負責監控叢集中的 Pod 排程狀態和節點資源使用情況。它的主要功能包括:
- Scale Up(擴展):當有 Pod 因資源不足而無法排程時,自動新增節點
- Scale Down(縮減):當節點使用率過低且其上的 Pod 可以遷移時,移除節點
擴展觸發條件
1
2
3
4
5
6
7
8
9
10
11
| Pod 處於 Pending 狀態
↓
Cluster Autoscaler 檢測到 unschedulable Pod
↓
評估哪個 Node Group 可以容納該 Pod
↓
向雲端供應商 API 請求新增節點
↓
新節點加入叢集並準備就緒
↓
Pod 被排程到新節點
|
縮減觸發條件
Cluster Autoscaler 會定期檢查每個節點是否符合縮減條件:
1
2
3
4
| # 節點符合縮減條件的情況:
# 1. 節點資源使用率低於閾值(預設 50%)
# 2. 節點上的所有 Pod 都可以遷移到其他節點
# 3. 沒有無法遷移的 Pod(如使用 local storage、特定 node selector 等)
|
不會被縮減的 Pod 類型
1
2
3
4
5
6
| # 以下類型的 Pod 會阻止節點被縮減:
# 1. 使用 PodDisruptionBudget 且無法滿足 PDB 的 Pod
# 2. 使用 local storage 的 Pod
# 3. 沒有 controller 管理的 Pod(非 Deployment、ReplicaSet 等建立)
# 4. 帶有 "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" annotation 的 Pod
# 5. Kube-system namespace 下沒有 PDB 或 PDB 設為 0 的 Pod
|
與 HPA 和 VPA 的關係
三種 Autoscaler 比較
| 功能 | HPA | VPA | Cluster Autoscaler |
|---|
| 調整對象 | Pod 副本數 | Pod 資源請求 | 節點數量 |
| 觸發條件 | CPU/Memory/自訂指標 | 歷史資源使用 | Pod 無法排程 |
| 作用範圍 | Deployment/ReplicaSet | Pod | Node Group |
協同運作架構
1
2
3
4
5
6
7
8
9
10
11
| 應用負載增加
↓
HPA 偵測到 CPU > 目標值
↓
HPA 增加 Pod 副本數
↓
部分 Pod 處於 Pending(資源不足)
↓
Cluster Autoscaler 新增節點
↓
Pending Pod 被排程到新節點
|
建議配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| # HPA 配置範例
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 100
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
|
VPA 與 Cluster Autoscaler 整合
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| # VPA 配置範例
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto" # 或 "Off" 僅提供建議
resourcePolicy:
containerPolicies:
- containerName: '*'
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4
memory: 8Gi
controlledResources: ["cpu", "memory"]
|
AWS EKS 整合設定
IAM 權限配置
首先建立 Cluster Autoscaler 所需的 IAM Policy:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
| {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeScalingActivities",
"autoscaling:DescribeTags",
"ec2:DescribeImages",
"ec2:DescribeInstanceTypes",
"ec2:DescribeLaunchTemplateVersions",
"ec2:GetInstanceTypesFromInstanceRequirements",
"eks:DescribeNodegroup"
],
"Resource": ["*"]
},
{
"Effect": "Allow",
"Action": [
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
],
"Resource": ["*"],
"Condition": {
"StringEquals": {
"aws:ResourceTag/k8s.io/cluster-autoscaler/enabled": "true",
"aws:ResourceTag/k8s.io/cluster-autoscaler/<cluster-name>": "owned"
}
}
}
]
}
|
建立 IAM Role for Service Account (IRSA)
1
2
3
4
5
6
7
8
9
10
11
12
13
| # 建立 OIDC Provider(如果尚未建立)
eksctl utils associate-iam-oidc-provider \
--cluster my-cluster \
--approve
# 建立 IAM Role 並關聯 Service Account
eksctl create iamserviceaccount \
--cluster=my-cluster \
--namespace=kube-system \
--name=cluster-autoscaler \
--attach-policy-arn=arn:aws:iam::ACCOUNT_ID:policy/ClusterAutoscalerPolicy \
--override-existing-serviceaccounts \
--approve
|
使用 Helm 部署 Cluster Autoscaler
1
2
3
4
5
6
7
8
9
10
11
12
13
| # 新增 Helm repository
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm repo update
# 安裝 Cluster Autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
--namespace kube-system \
--set autoDiscovery.clusterName=my-cluster \
--set awsRegion=ap-northeast-1 \
--set rbac.serviceAccount.create=false \
--set rbac.serviceAccount.name=cluster-autoscaler \
--set extraArgs.balance-similar-node-groups=true \
--set extraArgs.skip-nodes-with-system-pods=false
|
手動部署 YAML 配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
| apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
app: cluster-autoscaler
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
serviceAccountName: cluster-autoscaler
priorityClassName: system-cluster-critical
securityContext:
runAsNonRoot: true
runAsUser: 65534
fsGroup: 65534
seccompProfile:
type: RuntimeDefault
containers:
- name: cluster-autoscaler
image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0
imagePullPolicy: Always
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
- --balance-similar-node-groups
- --skip-nodes-with-system-pods=false
- --scale-down-enabled=true
- --scale-down-delay-after-add=10m
- --scale-down-unneeded-time=10m
- --scale-down-utilization-threshold=0.5
resources:
limits:
cpu: 100m
memory: 600Mi
requests:
cpu: 100m
memory: 600Mi
volumeMounts:
- name: ssl-certs
mountPath: /etc/ssl/certs/ca-certificates.crt
readOnly: true
volumes:
- name: ssl-certs
hostPath:
path: /etc/ssl/certs/ca-bundle.crt
|
確保 ASG 有正確的 Tags:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| # 必要的 Tags
aws autoscaling create-or-update-tags --tags \
ResourceId=my-asg-name \
ResourceType=auto-scaling-group \
Key=k8s.io/cluster-autoscaler/enabled \
Value=true \
PropagateAtLaunch=true
aws autoscaling create-or-update-tags --tags \
ResourceId=my-asg-name \
ResourceType=auto-scaling-group \
Key=k8s.io/cluster-autoscaler/my-cluster \
Value=owned \
PropagateAtLaunch=true
|
Node Group 設定策略
多 Node Group 架構
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
| # 通用工作負載 Node Group
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: my-cluster
region: ap-northeast-1
managedNodeGroups:
# 通用工作負載
- name: general-workers
instanceType: m5.xlarge
minSize: 2
maxSize: 20
desiredCapacity: 3
volumeSize: 100
labels:
workload-type: general
tags:
k8s.io/cluster-autoscaler/enabled: "true"
k8s.io/cluster-autoscaler/my-cluster: "owned"
iam:
withAddonPolicies:
autoScaler: true
# 高記憶體工作負載
- name: memory-optimized
instanceType: r5.2xlarge
minSize: 0
maxSize: 10
desiredCapacity: 0
labels:
workload-type: memory-intensive
taints:
- key: workload-type
value: memory-intensive
effect: NoSchedule
tags:
k8s.io/cluster-autoscaler/enabled: "true"
k8s.io/cluster-autoscaler/my-cluster: "owned"
# GPU 工作負載
- name: gpu-workers
instanceType: p3.2xlarge
minSize: 0
maxSize: 5
desiredCapacity: 0
labels:
workload-type: gpu
nvidia.com/gpu: "true"
taints:
- key: nvidia.com/gpu
value: "true"
effect: NoSchedule
tags:
k8s.io/cluster-autoscaler/enabled: "true"
k8s.io/cluster-autoscaler/my-cluster: "owned"
# Spot Instances(成本優化)
- name: spot-workers
instanceTypes: ["m5.xlarge", "m5a.xlarge", "m4.xlarge"]
spot: true
minSize: 0
maxSize: 50
desiredCapacity: 0
labels:
lifecycle: spot
taints:
- key: lifecycle
value: spot
effect: NoSchedule
tags:
k8s.io/cluster-autoscaler/enabled: "true"
k8s.io/cluster-autoscaler/my-cluster: "owned"
|
Node Affinity 配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
| # 指定 Pod 使用特定 Node Group
apiVersion: apps/v1
kind: Deployment
metadata:
name: memory-intensive-app
spec:
replicas: 3
selector:
matchLabels:
app: memory-app
template:
metadata:
labels:
app: memory-app
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: workload-type
operator: In
values:
- memory-intensive
tolerations:
- key: workload-type
operator: Equal
value: memory-intensive
effect: NoSchedule
containers:
- name: app
image: my-app:latest
resources:
requests:
memory: "8Gi"
cpu: "2"
limits:
memory: "16Gi"
cpu: "4"
|
擴展與縮減參數調校
關鍵參數說明
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| # Cluster Autoscaler 主要參數
command:
- ./cluster-autoscaler
# 日誌等級
- --v=4
# 雲端供應商
- --cloud-provider=aws
# 節點發現
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
# 擴展相關參數
- --scale-up-from-zero=true # 允許從 0 擴展
- --max-node-provision-time=15m # 節點啟動超時時間
- --max-nodes-total=100 # 叢集最大節點總數
# 縮減相關參數
- --scale-down-enabled=true # 啟用縮減
- --scale-down-delay-after-add=10m # 新增節點後延遲縮減時間
- --scale-down-delay-after-delete=0s # 刪除節點後延遲縮減時間
- --scale-down-delay-after-failure=3m # 縮減失敗後延遲時間
- --scale-down-unneeded-time=10m # 節點閒置多久後縮減
- --scale-down-utilization-threshold=0.5 # 使用率低於此值視為閒置
# 平衡參數
- --balance-similar-node-groups=true # 平衡相似 Node Group 的節點數
# 擴展器選擇
- --expander=least-waste # 選擇最節省資源的 Node Group
|
擴展行為調整
1
2
3
4
5
6
7
8
9
10
| # 快速擴展配置(適合突發流量)
- --max-node-provision-time=5m
- --scale-down-delay-after-add=5m
- --scan-interval=10s
# 穩定擴展配置(適合穩定工作負載)
- --max-node-provision-time=15m
- --scale-down-delay-after-add=20m
- --scan-interval=30s
- --scale-down-unneeded-time=20m
|
縮減保護配置
1
2
3
4
5
6
7
8
9
10
11
| # 在 Pod 上設定 annotation 防止節點被縮減
apiVersion: v1
kind: Pod
metadata:
name: critical-pod
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
spec:
containers:
- name: app
image: critical-app:latest
|
1
2
3
4
5
6
7
8
9
10
| # 使用 PodDisruptionBudget 保護
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: critical-pdb
spec:
minAvailable: 2 # 或使用 maxUnavailable
selector:
matchLabels:
app: critical-app
|
Priority Expander 優先順序
Expander 類型
Cluster Autoscaler 支援多種 Expander 策略來決定使用哪個 Node Group:
| Expander | 說明 |
|---|
| random | 隨機選擇 |
| most-pods | 選擇能容納最多 Pod 的 |
| least-waste | 選擇資源浪費最少的 |
| price | 選擇成本最低的(需要額外配置) |
| priority | 根據優先順序選擇 |
| grpc | 使用外部 gRPC 服務決定 |
Priority Expander 配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| # 建立 Priority ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-priority-expander
namespace: kube-system
data:
priorities: |-
100:
- .*spot.* # Spot instances 最高優先
50:
- .*general.* # 通用節點次之
10:
- .*on-demand.* # On-demand 最低優先
|
複合 Expander 策略
1
2
3
4
5
| # 使用多個 expander,依序評估
command:
- ./cluster-autoscaler
- --expander=priority,least-waste
# 先依 priority 篩選,再用 least-waste 從中選擇
|
Spot 與 On-Demand 混合策略
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-priority-expander
namespace: kube-system
data:
priorities: |-
100:
- .*spot-workers.*
50:
- .*general-workers.*
10:
- .*on-demand-workers.*
---
# Cluster Autoscaler 配置
command:
- ./cluster-autoscaler
- --expander=priority
- --balance-similar-node-groups=false # Spot 和 On-demand 不需平衡
|
監控與故障排除
Prometheus Metrics
Cluster Autoscaler 提供多個重要指標:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| # 啟用 metrics endpoint
command:
- ./cluster-autoscaler
- --address=:8085
---
# ServiceMonitor 配置
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
selector:
matchLabels:
app: cluster-autoscaler
endpoints:
- port: metrics
interval: 30s
|
重要監控指標
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| # 叢集節點總數
cluster_autoscaler_nodes_count
# 未排程的 Pod 數量
cluster_autoscaler_unschedulable_pods_count
# 擴展操作次數
cluster_autoscaler_scaled_up_nodes_total
cluster_autoscaler_scaled_down_nodes_total
# Node Group 狀態
cluster_autoscaler_node_groups_count
# 擴展失敗次數
cluster_autoscaler_failed_scale_ups_total
# 縮減狀態
cluster_autoscaler_scale_down_in_cooldown
# 最後一次活動時間
cluster_autoscaler_last_activity
|
Grafana Dashboard 查詢範例
1
2
3
4
5
6
7
8
9
10
| # 擴展趨勢
rate(cluster_autoscaler_scaled_up_nodes_total[5m])
# 各 Node Group 節點數
cluster_autoscaler_nodes_count{node_group=~".*"}
# 擴展延遲
histogram_quantile(0.99,
rate(cluster_autoscaler_function_duration_seconds_bucket[5m])
)
|
日誌分析
1
2
3
4
5
6
7
8
| # 查看 Cluster Autoscaler 日誌
kubectl logs -n kube-system -l app=cluster-autoscaler -f
# 過濾擴展相關日誌
kubectl logs -n kube-system -l app=cluster-autoscaler | grep -E "Scale|scale"
# 查看節點狀態變化
kubectl logs -n kube-system -l app=cluster-autoscaler | grep -E "node|Node"
|
常見問題排除
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| # 1. 檢查 Cluster Autoscaler 狀態
kubectl get pods -n kube-system -l app=cluster-autoscaler
kubectl describe pod -n kube-system -l app=cluster-autoscaler
# 2. 查看 ConfigMap 狀態(包含擴展決策)
kubectl get configmap -n kube-system cluster-autoscaler-status -o yaml
# 3. 檢查 Node Group 設定
kubectl get nodes --show-labels | grep -E "node-group|node.kubernetes.io"
# 4. 查看 Pending Pods
kubectl get pods --all-namespaces --field-selector=status.phase=Pending
# 5. 分析 Pod 無法排程原因
kubectl describe pod <pending-pod-name>
# 6. 檢查 ASG 狀態
aws autoscaling describe-auto-scaling-groups \
--auto-scaling-group-names my-asg-name \
--query 'AutoScalingGroups[*].[MinSize,MaxSize,DesiredCapacity]'
|
擴展失敗常見原因
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| # 1. ASG 達到上限
# 解決方案:增加 ASG maxSize 或 maxNodes-total
# 2. EC2 配額不足
# 解決方案:請求增加 EC2 Service Quota
# 3. 可用區容量不足
# 解決方案:使用多個 instance type 或可用區
# 4. IAM 權限不足
# 解決方案:確認 IRSA 配置正確
# 5. Node Group Tags 錯誤
# 解決方案:檢查 ASG 的 k8s.io/cluster-autoscaler tags
|
最佳實務與成本優化
資源請求最佳化
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| # 確保 Pod 有明確的 resource requests
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: app
resources:
requests:
cpu: "500m" # 明確設定 request
memory: "512Mi"
limits:
cpu: "1000m"
memory: "1Gi"
|
多可用區配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| # 確保高可用性
managedNodeGroups:
- name: workers-az-a
availabilityZones: ["ap-northeast-1a"]
minSize: 1
maxSize: 10
- name: workers-az-c
availabilityZones: ["ap-northeast-1c"]
minSize: 1
maxSize: 10
- name: workers-az-d
availabilityZones: ["ap-northeast-1d"]
minSize: 1
maxSize: 10
|
1
2
3
4
5
6
7
8
9
10
11
12
13
| # 配合 Pod Topology Spread Constraints
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: my-app
|
Spot Instance 策略
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
| # 混合使用 Spot 和 On-Demand
managedNodeGroups:
# On-Demand 基礎容量
- name: on-demand-base
instanceType: m5.xlarge
minSize: 2
maxSize: 5
labels:
lifecycle: on-demand
# Spot 彈性容量
- name: spot-workers
instanceTypes:
- m5.xlarge
- m5a.xlarge
- m5n.xlarge
- m4.xlarge
spot: true
minSize: 0
maxSize: 20
labels:
lifecycle: spot
taints:
- key: lifecycle
value: spot
effect: NoSchedule
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| # Spot 友善的 Pod 配置
apiVersion: apps/v1
kind: Deployment
metadata:
name: stateless-app
spec:
template:
spec:
tolerations:
- key: lifecycle
operator: Equal
value: spot
effect: NoSchedule
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: lifecycle
operator: In
values:
- spot
terminationGracePeriodSeconds: 30
containers:
- name: app
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
|
成本優化清單
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| 1. **Right-sizing Node Groups**
- 使用適當大小的 instance type
- 避免單一大型節點(增加 bin packing 效率)
2. **Spot Instance 使用**
- 無狀態工作負載優先使用 Spot
- 配置多種 instance type 增加可用性
3. **縮減參數調整**
- 適當降低 scale-down-unneeded-time
- 根據業務特性調整 utilization-threshold
4. **Resource Requests 優化**
- 使用 VPA 的建議值調整 requests
- 避免過度配置造成資源浪費
5. **混合 Node Group 策略**
- 基礎容量使用 On-Demand
- 彈性容量使用 Spot
- 使用 Priority Expander 控制順序
|
生產環境建議配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
| apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
template:
spec:
containers:
- name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --v=4
- --cloud-provider=aws
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
# 擴展配置
- --scale-up-from-zero=true
- --max-node-provision-time=10m
- --max-nodes-total=200
- --cores-total=0:1000
- --memory-total=0:5000
# 縮減配置
- --scale-down-enabled=true
- --scale-down-delay-after-add=10m
- --scale-down-unneeded-time=10m
- --scale-down-utilization-threshold=0.5
- --skip-nodes-with-local-storage=false
- --skip-nodes-with-system-pods=false
# 平衡與選擇
- --balance-similar-node-groups=true
- --expander=priority
# 安全配置
- --new-pod-scale-up-delay=0s
- --max-graceful-termination-sec=600
resources:
limits:
cpu: 200m
memory: 1Gi
requests:
cpu: 100m
memory: 500Mi
|
監控告警設定
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| # Prometheus AlertManager 規則
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cluster-autoscaler-alerts
namespace: kube-system
spec:
groups:
- name: cluster-autoscaler
rules:
- alert: ClusterAutoscalerUnschedulablePods
expr: cluster_autoscaler_unschedulable_pods_count > 0
for: 15m
labels:
severity: warning
annotations:
summary: "有 Pod 無法排程超過 15 分鐘"
- alert: ClusterAutoscalerNodeGroupAtMax
expr: cluster_autoscaler_nodes_count >= cluster_autoscaler_max_nodes_count
for: 5m
labels:
severity: critical
annotations:
summary: "Node Group 已達最大節點數"
- alert: ClusterAutoscalerScaleUpFailure
expr: increase(cluster_autoscaler_failed_scale_ups_total[1h]) > 5
labels:
severity: warning
annotations:
summary: "過去一小時擴展失敗超過 5 次"
|
總結
Kubernetes Cluster Autoscaler 是實現叢集彈性擴展的關鍵元件。正確配置後,它能夠:
- 自動因應工作負載變化調整節點數量
- 與 HPA/VPA 協同運作,實現完整的自動擴展方案
- 透過 Priority Expander 實現成本優化策略
- 整合 Spot Instance 大幅降低運算成本
在生產環境中,建議:
- 設定適當的資源 requests 確保正確的擴展決策
- 使用多 Node Group 策略分離不同工作負載
- 配置完善的監控和告警機制
- 定期檢視和調整擴展參數
透過本文的配置範例和最佳實務,您可以建立一個既能應對流量高峰,又能在閒置時控制成本的 Kubernetes 叢集。