Docker Compose 健康檢查設定

Docker Compose Health Check Configuration

在現代容器化應用程式中,健康檢查(Health Check)是確保服務穩定運行的關鍵機制。本文將深入探討如何在 Docker 和 Docker Compose 中實作健康檢查,幫助您打造更可靠的容器化環境。

健康檢查概念與重要性

什麼是健康檢查?

健康檢查是一種定期驗證容器內應用程式是否正常運行的機制。Docker 會定期執行指定的檢查命令,並根據結果判斷容器的健康狀態。

為什麼需要健康檢查?

  1. 早期問題發現:在問題影響使用者之前及時發現異常
  2. 自動化恢復:搭配編排工具實現自動重啟或替換故障容器
  3. 服務依賴管理:確保依賴的服務真正就緒後再啟動下游服務
  4. 負載均衡整合:讓負載均衡器只將流量導向健康的實例
  5. 監控與告警:提供服務健康狀態的即時可見性

容器健康狀態

Docker 定義了三種健康狀態:

狀態說明
starting容器正在啟動,健康檢查尚未開始或未通過
healthy健康檢查連續成功通過
unhealthy健康檢查連續失敗達到指定次數

Dockerfile HEALTHCHECK 指令

基本語法

在 Dockerfile 中,您可以使用 HEALTHCHECK 指令定義健康檢查:

1
HEALTHCHECK [OPTIONS] CMD command

可用選項

選項預設值說明
--interval30s檢查間隔時間
--timeout30s單次檢查超時時間
--start-period0s容器啟動後的等待時間,期間檢查失敗不計入重試次數
--start-interval5s啟動期間的檢查間隔(Docker 25.0+)
--retries3連續失敗多少次判定為 unhealthy

實際範例

Node.js 應用程式

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
FROM node:20-alpine

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .

EXPOSE 3000

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD node -e "require('http').get('http://localhost:3000/health', (r) => r.statusCode === 200 ? process.exit(0) : process.exit(1))"

CMD ["node", "server.js"]

Python Flask 應用程式

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

EXPOSE 5000

HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
    CMD curl --fail http://localhost:5000/health || exit 1

CMD ["python", "app.py"]

使用 wget 的替代方案

如果容器中沒有 curl,可以使用 wget:

1
2
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
    CMD wget --no-verbose --tries=1 --spider http://localhost:8080/health || exit 1

停用健康檢查

如果基礎映像檔已定義健康檢查,但您想停用它:

1
HEALTHCHECK NONE

Docker Compose healthcheck 設定

基本設定結構

docker-compose.yml 中,健康檢查設定位於服務定義下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
version: "3.9"

services:
  web:
    image: nginx:alpine
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s

test 指令格式

test 欄位支援多種格式:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# 格式一:字串形式(使用 shell 執行)
healthcheck:
  test: curl -f http://localhost/health || exit 1

# 格式二:陣列形式(使用 CMD,推薦)
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost/health"]

# 格式三:使用 CMD-SHELL(透過 shell 執行)
healthcheck:
  test: ["CMD-SHELL", "curl -f http://localhost/health || exit 1"]

# 停用健康檢查
healthcheck:
  test: ["NONE"]

完整設定範例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
version: "3.9"

services:
  api:
    build: ./api
    ports:
      - "3000:3000"
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:3000/api/health || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 30s
    environment:
      - NODE_ENV=production

各類服務健康檢查範例

資料庫服務

PostgreSQL

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
services:
  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: admin
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U admin -d myapp"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:

MySQL / MariaDB

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
services:
  mysql:
    image: mysql:8.0
    environment:
      MYSQL_ROOT_PASSWORD: rootpassword
      MYSQL_DATABASE: myapp
      MYSQL_USER: admin
      MYSQL_PASSWORD: secret
    healthcheck:
      test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-p${MYSQL_ROOT_PASSWORD}"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 60s
    volumes:
      - mysql_data:/var/lib/mysql

volumes:
  mysql_data:

MongoDB

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
services:
  mongodb:
    image: mongo:7
    environment:
      MONGO_INITDB_ROOT_USERNAME: admin
      MONGO_INITDB_ROOT_PASSWORD: secret
    healthcheck:
      test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    volumes:
      - mongo_data:/data/db

volumes:
  mongo_data:

快取服務

Redis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
services:
  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 5s
    volumes:
      - redis_data:/data

volumes:
  redis_data:

Memcached

1
2
3
4
5
6
7
8
services:
  memcached:
    image: memcached:1.6-alpine
    healthcheck:
      test: ["CMD-SHELL", "echo stats | nc localhost 11211 | grep -q 'STAT pid'"]
      interval: 10s
      timeout: 5s
      retries: 3

訊息佇列

RabbitMQ

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
services:
  rabbitmq:
    image: rabbitmq:3-management-alpine
    environment:
      RABBITMQ_DEFAULT_USER: admin
      RABBITMQ_DEFAULT_PASS: secret
    healthcheck:
      test: ["CMD", "rabbitmq-diagnostics", "-q", "ping"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s
    ports:
      - "5672:5672"
      - "15672:15672"

Apache Kafka

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
services:
  kafka:
    image: confluentinc/cp-kafka:7.5.0
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
    healthcheck:
      test: ["CMD-SHELL", "kafka-broker-api-versions --bootstrap-server localhost:9092"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s
    depends_on:
      zookeeper:
        condition: service_healthy

Web 伺服器

Nginx

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
services:
  nginx:
    image: nginx:alpine
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro

對應的 Nginx 設定(需新增 health 端點):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
server {
    listen 80;

    location /health {
        access_log off;
        return 200 "healthy\n";
        add_header Content-Type text/plain;
    }

    location / {
        # 其他設定...
    }
}

Elasticsearch

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
    healthcheck:
      test: ["CMD-SHELL", "curl -s http://localhost:9200/_cluster/health | grep -q '\"status\":\"green\"\\|\"status\":\"yellow\"'"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s
    volumes:
      - es_data:/usr/share/elasticsearch/data

volumes:
  es_data:

depends_on 與 condition 搭配

服務啟動順序控制

Docker Compose v3.9+ 支援使用 condition 來控制服務啟動順序:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
version: "3.9"

services:
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5

  cache:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 5s
      retries: 5

  api:
    build: ./api
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_healthy
    environment:
      DATABASE_URL: postgres://postgres:secret@db:5432/postgres
      REDIS_URL: redis://cache:6379

condition 可用值

條件說明
service_started服務已啟動(預設行為)
service_healthy服務健康檢查通過
service_completed_successfully服務成功執行完成(用於一次性任務)

複雜依賴關係範例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
version: "3.9"

services:
  # 資料庫遷移任務
  db-migration:
    build: ./migration
    command: npm run migrate
    depends_on:
      db:
        condition: service_healthy
    restart: "no"

  # 主要 API 服務
  api:
    build: ./api
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_healthy
      db-migration:
        condition: service_completed_successfully
    ports:
      - "3000:3000"

  # 背景工作者
  worker:
    build: ./worker
    depends_on:
      api:
        condition: service_healthy
      queue:
        condition: service_healthy

  # 基礎設施服務
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready"]
      interval: 5s
      timeout: 5s
      retries: 5

  cache:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 5s
      retries: 5

  queue:
    image: rabbitmq:3-alpine
    healthcheck:
      test: ["CMD", "rabbitmq-diagnostics", "-q", "ping"]
      interval: 10s
      timeout: 10s
      retries: 5
      start_period: 30s

健康狀態監控

查看容器健康狀態

1
2
3
4
5
6
7
# 查看所有容器狀態(包含健康狀態)
docker ps

# 輸出範例:
# CONTAINER ID   IMAGE          STATUS                   NAMES
# abc123         myapp:latest   Up 5 minutes (healthy)   myapp_api_1
# def456         postgres:16    Up 5 minutes (healthy)   myapp_db_1

檢視詳細健康資訊

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# 查看容器健康檢查詳細資訊
docker inspect --format='{{json .State.Health}}' container_name | jq

# 輸出範例:
# {
#   "Status": "healthy",
#   "FailingStreak": 0,
#   "Log": [
#     {
#       "Start": "2025-09-08T10:00:00.000000000Z",
#       "End": "2025-09-08T10:00:01.000000000Z",
#       "ExitCode": 0,
#       "Output": "OK"
#     }
#   ]
# }

使用 docker compose 命令

1
2
3
4
5
6
7
8
# 查看服務狀態
docker compose ps

# 查看特定服務的日誌
docker compose logs -f api

# 查看健康檢查輸出
docker inspect $(docker compose ps -q api) --format='{{range .State.Health.Log}}{{.Output}}{{end}}'

建立監控腳本

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/bin/bash
# health-monitor.sh - 監控所有容器健康狀態

echo "=== Container Health Status ==="
echo ""

docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" | while read line; do
    echo "$line"
done

echo ""
echo "=== Unhealthy Containers ==="

unhealthy=$(docker ps --filter "health=unhealthy" --format "{{.Names}}")
if [ -z "$unhealthy" ]; then
    echo "All containers are healthy!"
else
    echo "$unhealthy"

    # 顯示詳細的健康檢查日誌
    for container in $unhealthy; do
        echo ""
        echo "--- $container ---"
        docker inspect --format='{{range .State.Health.Log}}Exit: {{.ExitCode}} | Output: {{.Output}}{{end}}' $container
    done
fi

Docker Events 監聽

1
2
3
4
5
6
# 監聽健康狀態變更事件
docker events --filter event=health_status

# 輸出範例:
# 2025-09-08T10:00:00.000000000Z container health_status: healthy abc123 (name=myapp_api_1)
# 2025-09-08T10:05:00.000000000Z container health_status: unhealthy def456 (name=myapp_worker_1)

自動重啟與恢復策略

restart 策略

Docker Compose 提供多種重啟策略:

1
2
3
4
5
6
7
8
9
services:
  api:
    image: myapp:latest
    restart: always  # 或 "on-failure", "unless-stopped", "no"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
策略說明
no不自動重啟(預設)
always總是重啟,除非手動停止
on-failure只在非零退出碼時重啟
unless-stopped除非明確停止,否則重啟

結合 Docker Swarm 的進階策略

在 Swarm 模式下,可以使用更進階的部署設定:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
version: "3.9"

services:
  api:
    image: myapp:latest
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
        failure_action: rollback
      rollback_config:
        parallelism: 1
        delay: 10s
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
        window: 120s
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 30s

使用 autoheal 容器

autoheal 是一個可以自動重啟 unhealthy 容器的工具:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
version: "3.9"

services:
  autoheal:
    image: willfarrell/autoheal:latest
    environment:
      AUTOHEAL_CONTAINER_LABEL: all
      AUTOHEAL_INTERVAL: 5
      AUTOHEAL_START_PERIOD: 60
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    restart: always

  api:
    image: myapp:latest
    labels:
      autoheal: "true"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

自訂健康檢查與重啟腳本

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/bin/bash
# auto-restart-unhealthy.sh

COMPOSE_FILE="docker-compose.yml"
MAX_RESTART_ATTEMPTS=3
RESTART_DELAY=30

declare -A restart_counts

while true; do
    unhealthy=$(docker compose -f $COMPOSE_FILE ps --filter "health=unhealthy" --format "{{.Service}}")

    for service in $unhealthy; do
        current_count=${restart_counts[$service]:-0}

        if [ $current_count -lt $MAX_RESTART_ATTEMPTS ]; then
            echo "$(date): Restarting unhealthy service: $service (attempt $((current_count + 1)))"
            docker compose -f $COMPOSE_FILE restart $service
            restart_counts[$service]=$((current_count + 1))
        else
            echo "$(date): Service $service exceeded max restart attempts, alerting..."
            # 在此加入告警邏輯(如發送 Slack 通知)
        fi
    done

    sleep $RESTART_DELAY
done

故障排除與最佳實務

常見問題排除

1. 健康檢查持續失敗

1
2
3
4
5
6
7
8
# 檢查健康檢查命令是否正確
docker exec -it container_name sh -c "curl -f http://localhost:3000/health"

# 查看最近的健康檢查日誌
docker inspect container_name --format='{{json .State.Health.Log}}' | jq '.[-5:]'

# 進入容器內部除錯
docker exec -it container_name sh

2. start_period 設定不當

如果服務需要較長時間啟動,請適當增加 start_period

1
2
3
4
5
6
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 120s  # 給予足夠的啟動時間

3. 網路問題導致檢查失敗

1
2
3
4
# 確保健康檢查使用 localhost 而非服務名稱
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:3000/health"]  # 正確
  # test: ["CMD", "curl", "-f", "http://api:3000/health"]      # 錯誤

4. 檢查命令工具不存在

1
2
3
4
5
6
7
8
9
# 方案一:安裝 curl
FROM node:20-alpine
RUN apk add --no-cache curl

# 方案二:使用 wget(Alpine 預設已安裝)
HEALTHCHECK CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

# 方案三:使用語言原生方式
HEALTHCHECK CMD node -e "fetch('http://localhost:3000/health').then(r => process.exit(r.ok ? 0 : 1))"

最佳實務

1. 設計專用的健康檢查端點

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Node.js Express 範例
app.get('/health', async (req, res) => {
  try {
    // 檢查資料庫連線
    await db.query('SELECT 1');

    // 檢查快取連線
    await redis.ping();

    res.status(200).json({
      status: 'healthy',
      timestamp: new Date().toISOString(),
      checks: {
        database: 'ok',
        cache: 'ok'
      }
    });
  } catch (error) {
    res.status(503).json({
      status: 'unhealthy',
      error: error.message
    });
  }
});

// 輕量級存活檢查(liveness)
app.get('/health/live', (req, res) => {
  res.status(200).send('OK');
});

// 就緒檢查(readiness)
app.get('/health/ready', async (req, res) => {
  // 檢查所有依賴服務...
});

2. 適當的時間間隔設定

1
2
3
4
5
6
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
  interval: 30s      # 不要太頻繁,避免增加負擔
  timeout: 10s       # 給予足夠的回應時間
  retries: 3         # 避免單次失敗就判定為 unhealthy
  start_period: 30s  # 根據應用程式啟動時間調整

3. 區分存活與就緒檢查

1
2
3
4
5
6
7
8
9
services:
  api:
    image: myapp:latest
    healthcheck:
      # 使用輕量級的存活檢查
      test: ["CMD", "curl", "-f", "http://localhost:3000/health/live"]
      interval: 10s
      timeout: 5s
      retries: 3

4. 避免健康檢查的副作用

健康檢查應該是:

  • 唯讀操作
  • 快速回應(< 1秒)
  • 冪等的
  • 不產生大量日誌

5. 記錄健康檢查資訊

1
2
3
4
5
services:
  api:
    image: myapp:latest
    healthcheck:
      test: ["CMD-SHELL", "curl -sf http://localhost:3000/health | tee /proc/1/fd/1"]

6. 完整的生產環境範例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
version: "3.9"

services:
  # 反向代理
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    healthcheck:
      test: ["CMD", "nginx", "-t"]
      interval: 30s
      timeout: 10s
      retries: 3
    depends_on:
      api:
        condition: service_healthy
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    restart: unless-stopped

  # API 服務
  api:
    build: ./api
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 15s
      timeout: 10s
      retries: 3
      start_period: 30s
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    environment:
      NODE_ENV: production
      DATABASE_URL: postgres://admin:secret@db:5432/myapp
      REDIS_URL: redis://redis:6379
    restart: unless-stopped

  # 背景工作者
  worker:
    build: ./worker
    healthcheck:
      test: ["CMD-SHELL", "pgrep -f 'node worker.js' || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 3
    depends_on:
      api:
        condition: service_healthy
    environment:
      NODE_ENV: production
    restart: unless-stopped

  # PostgreSQL
  db:
    image: postgres:16-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U admin -d myapp"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    environment:
      POSTGRES_USER: admin
      POSTGRES_PASSWORD: secret
      POSTGRES_DB: myapp
    volumes:
      - postgres_data:/var/lib/postgresql/data
    restart: unless-stopped

  # Redis
  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 3
    volumes:
      - redis_data:/data
    restart: unless-stopped

  # 自動重啟 unhealthy 容器
  autoheal:
    image: willfarrell/autoheal:latest
    environment:
      AUTOHEAL_CONTAINER_LABEL: all
      AUTOHEAL_INTERVAL: 5
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    restart: always

volumes:
  postgres_data:
  redis_data:

總結

Docker Compose 健康檢查是建立可靠容器化應用程式的重要機制。透過適當的設定,您可以:

  1. 確保服務真正就緒:使用 depends_on 搭配 condition: service_healthy
  2. 快速發現問題:透過定期健康檢查及時發現異常
  3. 自動化恢復:搭配重啟策略或 autoheal 工具自動處理故障
  4. 提升可觀測性:透過健康狀態監控了解系統整體健康程度

記住,健康檢查不是一勞永逸的設定,而是需要根據實際運行情況持續調整優化的過程。建議從簡單的檢查開始,逐步增加覆蓋範圍,並定期審視檢查的有效性。

comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy