Route53 健康檢查概述
Amazon Route53 健康檢查是一項監控服務,可以持續檢測您的端點(如網站、應用程式或其他資源)的可用性與效能。當端點發生故障時,Route53 可以自動將流量路由到健康的備用端點,確保服務的高可用性。
健康檢查的主要功能包括:
- 監控端點的健康狀態
- 與 DNS 故障轉移路由整合
- 發送 CloudWatch 告警通知
- 支援多種協定(HTTP、HTTPS、TCP)
健康檢查類型
Route53 提供三種類型的健康檢查:
1. 端點健康檢查(Endpoint Health Checks)
直接監控指定的 IP 位址或網域名稱,透過 HTTP、HTTPS 或 TCP 協定檢測端點是否正常運作。
2. 計算型健康檢查(Calculated Health Checks)
基於多個子健康檢查的狀態來計算整體健康狀態,可以設定閾值來決定需要多少個子檢查健康才算整體健康。
3. CloudWatch 告警健康檢查
根據 CloudWatch 告警的狀態來判斷健康檢查的結果,適用於監控內部資源或自定義指標。
建立端點健康檢查
使用 AWS CLI 建立一個基本的 HTTP 端點健康檢查:
1
2
3
4
5
6
7
8
9
10
11
12
13
| # 建立 HTTP 端點健康檢查
aws route53 create-health-check \
--caller-reference "my-health-check-$(date +%s)" \
--health-check-config '{
"IPAddress": "203.0.113.10",
"Port": 80,
"Type": "HTTP",
"ResourcePath": "/health",
"FullyQualifiedDomainName": "example.com",
"RequestInterval": 30,
"FailureThreshold": 3,
"EnableSNI": false
}'
|
建立 HTTPS 端點健康檢查:
1
2
3
4
5
6
7
8
9
10
11
12
13
| # 建立 HTTPS 端點健康檢查
aws route53 create-health-check \
--caller-reference "https-health-check-$(date +%s)" \
--health-check-config '{
"IPAddress": "203.0.113.10",
"Port": 443,
"Type": "HTTPS",
"ResourcePath": "/api/health",
"FullyQualifiedDomainName": "api.example.com",
"RequestInterval": 10,
"FailureThreshold": 2,
"EnableSNI": true
}'
|
設定健康檢查字串匹配
您可以設定健康檢查驗證回應內容是否包含特定字串:
1
2
3
4
5
6
7
8
9
10
11
12
13
| # 建立帶有字串匹配的健康檢查
aws route53 create-health-check \
--caller-reference "string-match-check-$(date +%s)" \
--health-check-config '{
"IPAddress": "203.0.113.10",
"Port": 80,
"Type": "HTTP_STR_MATCH",
"ResourcePath": "/status",
"FullyQualifiedDomainName": "example.com",
"SearchString": "OK",
"RequestInterval": 30,
"FailureThreshold": 3
}'
|
建立計算型健康檢查
計算型健康檢查可以聚合多個子健康檢查的結果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| # 取得現有健康檢查的 ID
aws route53 list-health-checks --query 'HealthChecks[*].Id' --output text
# 建立計算型健康檢查
aws route53 create-health-check \
--caller-reference "calculated-check-$(date +%s)" \
--health-check-config '{
"Type": "CALCULATED",
"ChildHealthChecks": [
"health-check-id-1",
"health-check-id-2",
"health-check-id-3"
],
"HealthThreshold": 2
}'
|
上述設定表示當至少 2 個子健康檢查為健康狀態時,整體才算健康。
故障轉移路由政策
故障轉移路由政策可讓您設定主要和備用資源,當主要資源健康檢查失敗時,自動將流量導向備用資源。
建立主要記錄
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| # 取得託管區域 ID
HOSTED_ZONE_ID=$(aws route53 list-hosted-zones \
--query "HostedZones[?Name=='example.com.'].Id" \
--output text | cut -d'/' -f3)
# 建立主要故障轉移記錄
aws route53 change-resource-record-sets \
--hosted-zone-id $HOSTED_ZONE_ID \
--change-batch '{
"Changes": [{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "www.example.com",
"Type": "A",
"SetIdentifier": "Primary",
"Failover": "PRIMARY",
"TTL": 60,
"ResourceRecords": [{"Value": "203.0.113.10"}],
"HealthCheckId": "your-health-check-id"
}
}]
}'
|
建立備用記錄
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| # 建立備用故障轉移記錄
aws route53 change-resource-record-sets \
--hosted-zone-id $HOSTED_ZONE_ID \
--change-batch '{
"Changes": [{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "www.example.com",
"Type": "A",
"SetIdentifier": "Secondary",
"Failover": "SECONDARY",
"TTL": 60,
"ResourceRecords": [{"Value": "203.0.113.20"}]
}
}]
}'
|
加權故障轉移
結合加權路由和健康檢查,可以實現更靈活的流量分配:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
| # 建立加權記錄 - 伺服器 A (權重 70%)
aws route53 change-resource-record-sets \
--hosted-zone-id $HOSTED_ZONE_ID \
--change-batch '{
"Changes": [{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "api.example.com",
"Type": "A",
"SetIdentifier": "Server-A",
"Weight": 70,
"TTL": 60,
"ResourceRecords": [{"Value": "203.0.113.10"}],
"HealthCheckId": "health-check-id-a"
}
}]
}'
# 建立加權記錄 - 伺服器 B (權重 30%)
aws route53 change-resource-record-sets \
--hosted-zone-id $HOSTED_ZONE_ID \
--change-batch '{
"Changes": [{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "api.example.com",
"Type": "A",
"SetIdentifier": "Server-B",
"Weight": 30,
"TTL": 60,
"ResourceRecords": [{"Value": "203.0.113.20"}],
"HealthCheckId": "health-check-id-b"
}
}]
}'
|
CloudWatch 告警整合
建立 CloudWatch 告警
1
2
3
4
5
6
7
8
9
10
11
12
13
| # 建立健康檢查失敗的告警
aws cloudwatch put-metric-alarm \
--alarm-name "Route53-HealthCheck-Failed" \
--alarm-description "Route53 健康檢查失敗告警" \
--metric-name "HealthCheckStatus" \
--namespace "AWS/Route53" \
--statistic "Minimum" \
--period 60 \
--threshold 1 \
--comparison-operator "LessThanThreshold" \
--dimensions Name=HealthCheckId,Value=your-health-check-id \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:ap-northeast-1:123456789012:alerts
|
建立基於 CloudWatch 告警的健康檢查
1
2
3
4
5
6
7
8
9
10
11
| # 建立 CloudWatch 告警健康檢查
aws route53 create-health-check \
--caller-reference "cloudwatch-check-$(date +%s)" \
--health-check-config '{
"Type": "CLOUDWATCH_METRIC",
"AlarmIdentifier": {
"Region": "ap-northeast-1",
"Name": "my-custom-alarm"
},
"InsufficientDataHealthStatus": "LastKnownStatus"
}'
|
多區域故障轉移
實現跨區域的高可用架構:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
| # 建立延遲路由記錄 - 東京區域
aws route53 change-resource-record-sets \
--hosted-zone-id $HOSTED_ZONE_ID \
--change-batch '{
"Changes": [{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "global.example.com",
"Type": "A",
"SetIdentifier": "Tokyo",
"Region": "ap-northeast-1",
"TTL": 60,
"ResourceRecords": [{"Value": "13.112.0.10"}],
"HealthCheckId": "tokyo-health-check-id"
}
}]
}'
# 建立延遲路由記錄 - 新加坡區域
aws route53 change-resource-record-sets \
--hosted-zone-id $HOSTED_ZONE_ID \
--change-batch '{
"Changes": [{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "global.example.com",
"Type": "A",
"SetIdentifier": "Singapore",
"Region": "ap-southeast-1",
"TTL": 60,
"ResourceRecords": [{"Value": "52.74.0.10"}],
"HealthCheckId": "singapore-health-check-id"
}
}]
}'
|
最佳實踐
1. 健康檢查設定建議
- 請求間隔:一般使用 30 秒,重要服務可縮短至 10 秒
- 失敗閾值:建議設定 2-3 次,避免因暫時性問題觸發故障轉移
- TTL 設定:故障轉移場景建議使用較短的 TTL(60 秒以內)
2. 監控端點設計
1
2
3
4
5
6
7
8
9
10
| # 健康檢查端點範例(回傳 200 OK)
curl -X GET https://example.com/health
# 回應範例
{
"status": "healthy",
"database": "connected",
"cache": "connected",
"timestamp": "2024-11-24T10:00:00Z"
}
|
3. 成本最佳化
- 使用計算型健康檢查減少端點檢查數量
- 非關鍵服務使用較長的請求間隔
- 善用 CloudWatch 告警健康檢查監控內部資源
4. 安全性考量
- 限制健康檢查端點只允許 Route53 健康檢查 IP 範圍
- 健康檢查端點不應洩漏敏感資訊
- 考慮使用 HTTPS 並啟用 SNI
1
2
3
| # 取得 Route53 健康檢查的 IP 範圍
aws ec2 describe-managed-prefix-lists \
--filters Name=prefix-list-name,Values=com.amazonaws.global.route53-healthchecks
|
參考資料