Kubernetes Velero 備份與還原

Kubernetes Velero Backup and Restore

前言

在 Kubernetes 環境中,資料備份與災難復原是維運團隊必須面對的重要課題。Velero(前身為 Heptio Ark)是一個開源工具,專門用於 Kubernetes 叢集的備份、還原和遷移。本文將深入介紹 Velero 的架構、安裝配置、備份策略,以及實際操作範例。

1. Velero 概述與架構

什麼是 Velero?

Velero 是一個由 VMware 維護的開源專案,提供以下核心功能:

  • 備份 Kubernetes 叢集資源和持久化卷(Persistent Volumes)
  • 還原叢集資源至相同或不同的叢集
  • 遷移叢集資源至其他叢集
  • 災難復原功能,確保業務連續性

架構組件

Velero 的架構主要包含以下組件:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
┌─────────────────────────────────────────────────────────────┐
│                     Kubernetes Cluster                       │
│  ┌─────────────────┐    ┌─────────────────────────────────┐ │
│  │  Velero Server  │    │         Custom Resources        │ │
│  │   (Deployment)  │    │  - Backup                       │ │
│  │                 │◄───│  - Restore                      │ │
│  │  - Controller   │    │  - Schedule                     │ │
│  │  - Plugins      │    │  - BackupStorageLocation        │ │
│  └────────┬────────┘    │  - VolumeSnapshotLocation       │ │
│           │             └─────────────────────────────────┘ │
└───────────┼─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│                    Object Storage                            │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │   AWS S3    │  │   MinIO     │  │   Azure Blob/GCS    │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

主要組件說明:

組件說明
Velero Server運行於叢集內的控制器,負責執行備份和還原操作
BackupStorageLocation定義備份檔案存放的位置(如 S3 bucket)
VolumeSnapshotLocation定義 PV 快照存放的位置
Backup備份任務的 Custom Resource
Restore還原任務的 Custom Resource
Schedule排程備份的 Custom Resource

2. 安裝與設定(AWS S3)

前置需求

  • Kubernetes 叢集(1.16+)
  • kubectl 已配置並可存取叢集
  • AWS 帳號與適當權限

步驟一:建立 S3 Bucket

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 設定變數
BUCKET_NAME=velero-backup-$(date +%s)
REGION=ap-northeast-1

# 建立 S3 bucket
aws s3api create-bucket \
    --bucket $BUCKET_NAME \
    --region $REGION \
    --create-bucket-configuration LocationConstraint=$REGION

# 啟用版本控制(建議)
aws s3api put-bucket-versioning \
    --bucket $BUCKET_NAME \
    --versioning-configuration Status=Enabled

步驟二:建立 IAM 使用者與政策

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# 建立 IAM 使用者
aws iam create-user --user-name velero

# 建立 IAM 政策檔案
cat > velero-policy.json <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVolumes",
                "ec2:DescribeSnapshots",
                "ec2:CreateTags",
                "ec2:CreateVolume",
                "ec2:CreateSnapshot",
                "ec2:DeleteSnapshot"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::${BUCKET_NAME}/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::${BUCKET_NAME}"
            ]
        }
    ]
}
EOF

# 附加政策至使用者
aws iam put-user-policy \
    --user-name velero \
    --policy-name velero \
    --policy-document file://velero-policy.json

# 建立存取金鑰
aws iam create-access-key --user-name velero

步驟三:安裝 Velero CLI

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 下載 Velero CLI(以 Linux 為例)
VELERO_VERSION=v1.13.0
wget https://github.com/vmware-tanzu/velero/releases/download/${VELERO_VERSION}/velero-${VELERO_VERSION}-linux-amd64.tar.gz

# 解壓縮並安裝
tar -xvf velero-${VELERO_VERSION}-linux-amd64.tar.gz
sudo mv velero-${VELERO_VERSION}-linux-amd64/velero /usr/local/bin/

# 驗證安裝
velero version --client-only

步驟四:建立認證檔案

1
2
3
4
5
cat > credentials-velero <<EOF
[default]
aws_access_key_id=<YOUR_ACCESS_KEY_ID>
aws_secret_access_key=<YOUR_SECRET_ACCESS_KEY>
EOF

步驟五:安裝 Velero 至 Kubernetes

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.9.0 \
    --bucket $BUCKET_NAME \
    --backup-location-config region=$REGION \
    --snapshot-location-config region=$REGION \
    --secret-file ./credentials-velero

# 確認安裝狀態
kubectl get pods -n velero
kubectl get backupstoragelocations -n velero

使用 Helm 安裝(替代方案)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# 新增 Helm repo
helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
helm repo update

# 建立 values 檔案
cat > velero-values.yaml <<EOF
configuration:
  backupStorageLocation:
    - name: default
      provider: aws
      bucket: ${BUCKET_NAME}
      config:
        region: ${REGION}
  volumeSnapshotLocation:
    - name: default
      provider: aws
      config:
        region: ${REGION}
credentials:
  useSecret: true
  secretContents:
    cloud: |
      [default]
      aws_access_key_id=<YOUR_ACCESS_KEY_ID>
      aws_secret_access_key=<YOUR_SECRET_ACCESS_KEY>
initContainers:
  - name: velero-plugin-for-aws
    image: velero/velero-plugin-for-aws:v1.9.0
    volumeMounts:
      - mountPath: /target
        name: plugins
EOF

# 安裝
helm install velero vmware-tanzu/velero \
    --namespace velero \
    --create-namespace \
    -f velero-values.yaml

3. 備份策略與排程

手動備份

1
2
3
4
5
6
7
8
# 備份整個叢集
velero backup create full-cluster-backup

# 查看備份狀態
velero backup describe full-cluster-backup

# 查看備份詳細日誌
velero backup logs full-cluster-backup

排程備份

Velero 支援使用 Cron 表達式進行排程備份:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# 每日凌晨 2 點備份
velero schedule create daily-backup \
    --schedule="0 2 * * *"

# 每週日凌晨 3 點備份,保留 30 天
velero schedule create weekly-backup \
    --schedule="0 3 * * 0" \
    --ttl 720h

# 每 6 小時備份一次
velero schedule create hourly-backup \
    --schedule="0 */6 * * *" \
    --ttl 168h

查看與管理排程

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 列出所有排程
velero schedule get

# 查看排程詳情
velero schedule describe daily-backup

# 暫停排程
velero schedule pause daily-backup

# 恢復排程
velero schedule unpause daily-backup

# 刪除排程
velero schedule delete daily-backup

使用 YAML 定義排程

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: production-backup
  namespace: velero
spec:
  schedule: "0 2 * * *"
  template:
    includedNamespaces:
      - production
      - staging
    excludedResources:
      - events
      - events.events.k8s.io
    snapshotVolumes: true
    ttl: 720h0m0s
    storageLocation: default
    volumeSnapshotLocations:
      - default
1
kubectl apply -f production-schedule.yaml

4. 選擇性備份(Namespace、Labels)

依 Namespace 備份

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 備份單一 namespace
velero backup create app-backup \
    --include-namespaces production

# 備份多個 namespaces
velero backup create multi-ns-backup \
    --include-namespaces production,staging,development

# 排除特定 namespaces
velero backup create exclude-system-backup \
    --exclude-namespaces kube-system,kube-public,velero

依 Labels 備份

1
2
3
4
5
6
7
# 備份具有特定 label 的資源
velero backup create labeled-backup \
    --selector app=nginx

# 使用複雜的 label 選擇器
velero backup create complex-label-backup \
    --selector "app=myapp,environment=production"

依資源類型備份

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 僅備份 Deployments 和 Services
velero backup create resources-backup \
    --include-resources deployments,services

# 排除 Secrets 和 ConfigMaps
velero backup create exclude-secrets-backup \
    --exclude-resources secrets,configmaps

# 備份叢集範圍的資源
velero backup create cluster-resources-backup \
    --include-cluster-resources=true

組合使用

1
2
3
4
5
# 備份 production namespace 中具有 app=web label 的 Deployments
velero backup create selective-backup \
    --include-namespaces production \
    --selector app=web \
    --include-resources deployments,services,configmaps

使用 YAML 定義複雜備份

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: selective-backup
  namespace: velero
spec:
  includedNamespaces:
    - production
    - staging
  excludedNamespaces:
    - kube-system
  includedResources:
    - deployments
    - services
    - configmaps
    - secrets
    - persistentvolumeclaims
  excludedResources:
    - events
  labelSelector:
    matchLabels:
      app: myapp
  includeClusterResources: true
  snapshotVolumes: true
  ttl: 720h0m0s
  storageLocation: default
  volumeSnapshotLocations:
    - default

5. 還原操作與驗證

基本還原

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 從備份還原(還原至相同 namespace)
velero restore create --from-backup full-cluster-backup

# 指定還原名稱
velero restore create my-restore --from-backup full-cluster-backup

# 查看還原狀態
velero restore describe my-restore

# 查看還原日誌
velero restore logs my-restore

選擇性還原

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 僅還原特定 namespace
velero restore create ns-restore \
    --from-backup full-cluster-backup \
    --include-namespaces production

# 還原至不同的 namespace
velero restore create mapped-restore \
    --from-backup full-cluster-backup \
    --namespace-mappings old-namespace:new-namespace

# 排除特定資源
velero restore create selective-restore \
    --from-backup full-cluster-backup \
    --exclude-resources secrets

還原驗證

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#!/bin/bash
# restore-verify.sh - 還原驗證腳本

BACKUP_NAME=$1
RESTORE_NAME="restore-${BACKUP_NAME}-$(date +%s)"

echo "Starting restore from backup: ${BACKUP_NAME}"

# 執行還原
velero restore create ${RESTORE_NAME} --from-backup ${BACKUP_NAME}

# 等待還原完成
echo "Waiting for restore to complete..."
while true; do
    STATUS=$(velero restore get ${RESTORE_NAME} -o jsonpath='{.status.phase}')
    if [ "$STATUS" == "Completed" ]; then
        echo "Restore completed successfully!"
        break
    elif [ "$STATUS" == "Failed" ] || [ "$STATUS" == "PartiallyFailed" ]; then
        echo "Restore failed with status: ${STATUS}"
        velero restore logs ${RESTORE_NAME}
        exit 1
    fi
    echo "Current status: ${STATUS}"
    sleep 10
done

# 驗證資源
echo "Verifying restored resources..."
velero restore describe ${RESTORE_NAME}

# 檢查 Pod 狀態
echo "Checking Pod status..."
kubectl get pods --all-namespaces | grep -v Running | grep -v Completed

echo "Restore verification complete!"

還原衝突處理

1
2
3
4
5
6
7
8
9
# 更新現有資源(預設行為是跳過)
velero restore create update-restore \
    --from-backup full-cluster-backup \
    --existing-resource-policy update

# 預設:保留現有資源
velero restore create preserve-restore \
    --from-backup full-cluster-backup \
    --existing-resource-policy none

6. 跨叢集遷移

遷移前準備

  1. 確保目標叢集已安裝 Velero
  2. 配置相同的 BackupStorageLocation
  3. 驗證網路連通性

在來源叢集建立備份

1
2
3
4
5
6
7
8
# 來源叢集:建立完整備份
velero backup create migration-backup \
    --include-namespaces production \
    --snapshot-volumes \
    --wait

# 確認備份完成
velero backup describe migration-backup

在目標叢集配置存取

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# 目標叢集:配置相同的 BackupStorageLocation
cat > backup-location.yaml <<EOF
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: default
  namespace: velero
spec:
  provider: aws
  objectStorage:
    bucket: ${BUCKET_NAME}
    prefix: ""
  config:
    region: ${REGION}
  accessMode: ReadWrite
EOF

kubectl apply -f backup-location.yaml

# 同步備份清單
velero backup get

在目標叢集還原

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 目標叢集:執行還原
velero restore create migration-restore \
    --from-backup migration-backup \
    --namespace-mappings source-ns:target-ns

# 監控還原進度
watch velero restore get

# 驗證還原結果
kubectl get all -n target-ns

遷移最佳實踐

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/bin/bash
# migration-checklist.sh - 遷移檢查清單

echo "=== Pre-Migration Checklist ==="

# 1. 驗證來源叢集備份
echo "1. Verifying source cluster backup..."
velero backup describe migration-backup --details

# 2. 檢查目標叢集連通性
echo "2. Checking target cluster connectivity..."
kubectl cluster-info

# 3. 驗證 Storage Class 對應
echo "3. Checking Storage Classes..."
kubectl get storageclasses

# 4. 驗證 PV 快照
echo "4. Verifying volume snapshots..."
velero backup describe migration-backup | grep -A 20 "Volume Snapshots"

# 5. 檢查 CRDs
echo "5. Checking required CRDs..."
kubectl get crds | grep -E "(velero|cert-manager|istio)"

echo "=== Checklist Complete ==="

7. 備份鉤子(Hooks)

Velero Hooks 允許在備份或還原過程中執行自訂腳本,適用於:

  • 資料庫一致性備份(如 MySQL、PostgreSQL)
  • 應用程式狀態處理
  • 快取清理

Pre-Backup Hook

在備份前執行命令(例如:凍結資料庫寫入):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
apiVersion: v1
kind: Pod
metadata:
  name: mysql-pod
  annotations:
    pre.hook.backup.velero.io/container: mysql
    pre.hook.backup.velero.io/command: '["/bin/bash", "-c", "mysql -u root -p$MYSQL_ROOT_PASSWORD -e \"FLUSH TABLES WITH READ LOCK;\""]'
    pre.hook.backup.velero.io/timeout: 30s
    pre.hook.backup.velero.io/on-error: Fail
spec:
  containers:
    - name: mysql
      image: mysql:8.0

Post-Backup Hook

在備份後執行命令(例如:解除資料庫鎖定):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
apiVersion: v1
kind: Pod
metadata:
  name: mysql-pod
  annotations:
    pre.hook.backup.velero.io/container: mysql
    pre.hook.backup.velero.io/command: '["/bin/bash", "-c", "mysql -u root -p$MYSQL_ROOT_PASSWORD -e \"FLUSH TABLES WITH READ LOCK;\""]'
    post.hook.backup.velero.io/container: mysql
    post.hook.backup.velero.io/command: '["/bin/bash", "-c", "mysql -u root -p$MYSQL_ROOT_PASSWORD -e \"UNLOCK TABLES;\""]'
    post.hook.backup.velero.io/timeout: 30s
spec:
  containers:
    - name: mysql
      image: mysql:8.0

Restore Hooks

在還原過程中執行命令:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
apiVersion: v1
kind: Pod
metadata:
  name: app-pod
  annotations:
    init.hook.restore.velero.io/container-name: restore-init
    init.hook.restore.velero.io/container-image: busybox:latest
    init.hook.restore.velero.io/command: '["/bin/sh", "-c", "echo Initializing restore..."]'
    post.hook.restore.velero.io/container: app
    post.hook.restore.velero.io/command: '["/bin/bash", "-c", "/scripts/post-restore.sh"]'
    post.hook.restore.velero.io/wait-timeout: 5m
    post.hook.restore.velero.io/exec-timeout: 2m
spec:
  containers:
    - name: app
      image: myapp:latest

PostgreSQL 備份 Hook 範例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgresql
  namespace: database
spec:
  selector:
    matchLabels:
      app: postgresql
  template:
    metadata:
      labels:
        app: postgresql
      annotations:
        # 備份前:開始備份模式
        pre.hook.backup.velero.io/container: postgresql
        pre.hook.backup.velero.io/command: '["/bin/bash", "-c", "psql -U postgres -c \"SELECT pg_start_backup(''velero-backup'', false, false);\""]'
        pre.hook.backup.velero.io/timeout: 60s
        pre.hook.backup.velero.io/on-error: Fail
        # 備份後:結束備份模式
        post.hook.backup.velero.io/container: postgresql
        post.hook.backup.velero.io/command: '["/bin/bash", "-c", "psql -U postgres -c \"SELECT pg_stop_backup(false);\""]'
        post.hook.backup.velero.io/timeout: 60s
    spec:
      containers:
        - name: postgresql
          image: postgres:15
          env:
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgresql-secret
                  key: password

在備份規格中定義 Hooks

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: database-backup
  namespace: velero
spec:
  includedNamespaces:
    - database
  hooks:
    resources:
      - name: postgresql-hook
        includedNamespaces:
          - database
        labelSelector:
          matchLabels:
            app: postgresql
        pre:
          - exec:
              container: postgresql
              command:
                - /bin/bash
                - -c
                - "pg_dump -U postgres mydb > /backup/mydb.sql"
              onError: Fail
              timeout: 5m
        post:
          - exec:
              container: postgresql
              command:
                - /bin/bash
                - -c
                - "echo 'Backup completed successfully'"
              timeout: 30s

8. 監控與故障排除

基本監控命令

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# 查看 Velero 元件狀態
kubectl get pods -n velero
kubectl logs deployment/velero -n velero

# 查看備份儲存位置狀態
velero backup-location get

# 查看最近的備份
velero backup get --show-labels

# 查看備份詳情
velero backup describe <backup-name> --details

# 查看還原狀態
velero restore get
velero restore describe <restore-name> --details

設置 Prometheus 監控

Velero 內建 Prometheus metrics endpoint:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# velero-service-monitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: velero
  namespace: velero
  labels:
    app: velero
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: velero
  endpoints:
    - port: http-monitoring
      interval: 30s

常用 Metrics

Metric說明
velero_backup_total備份總數
velero_backup_success_total成功備份數
velero_backup_failure_total失敗備份數
velero_backup_duration_seconds備份耗時
velero_restore_total還原總數
velero_restore_success_total成功還原數

Grafana Dashboard

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
  "dashboard": {
    "title": "Velero Backup Dashboard",
    "panels": [
      {
        "title": "Backup Success Rate",
        "type": "gauge",
        "targets": [
          {
            "expr": "velero_backup_success_total / velero_backup_total * 100"
          }
        ]
      },
      {
        "title": "Backup Duration",
        "type": "graph",
        "targets": [
          {
            "expr": "velero_backup_duration_seconds"
          }
        ]
      }
    ]
  }
}

常見問題排除

問題一:備份一直處於 InProgress 狀態

1
2
3
4
5
6
7
8
# 檢查 Velero 日誌
kubectl logs deployment/velero -n velero -f

# 檢查備份控制器
kubectl describe backup <backup-name> -n velero

# 重啟 Velero
kubectl rollout restart deployment/velero -n velero

問題二:備份儲存位置無法存取

1
2
3
4
5
6
7
8
# 檢查 BackupStorageLocation 狀態
velero backup-location get

# 驗證認證
kubectl get secret -n velero cloud-credentials -o yaml

# 測試 S3 連線
aws s3 ls s3://${BUCKET_NAME}/

問題三:Volume 快照失敗

1
2
3
4
5
6
7
8
9
# 檢查 VolumeSnapshotLocation
velero snapshot-location get

# 檢查 CSI driver
kubectl get csidrivers

# 查看快照狀態
kubectl get volumesnapshots --all-namespaces
kubectl get volumesnapshotcontents

問題四:還原後 Pod 無法啟動

1
2
3
4
5
6
7
8
9
# 檢查還原日誌
velero restore logs <restore-name>

# 檢查 Pod 事件
kubectl describe pod <pod-name> -n <namespace>

# 檢查 PVC 狀態
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>

故障排除腳本

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#!/bin/bash
# velero-troubleshoot.sh

echo "=== Velero Troubleshooting Report ==="
echo ""

echo "1. Velero Version:"
velero version
echo ""

echo "2. Velero Pod Status:"
kubectl get pods -n velero
echo ""

echo "3. Backup Storage Locations:"
velero backup-location get
echo ""

echo "4. Volume Snapshot Locations:"
velero snapshot-location get
echo ""

echo "5. Recent Backups (last 5):"
velero backup get | head -6
echo ""

echo "6. Failed Backups:"
velero backup get | grep -E "Failed|PartiallyFailed"
echo ""

echo "7. Recent Restores (last 5):"
velero restore get | head -6
echo ""

echo "8. Velero Logs (last 50 lines):"
kubectl logs deployment/velero -n velero --tail=50
echo ""

echo "9. Velero Resource Usage:"
kubectl top pod -n velero
echo ""

echo "=== End of Report ==="

日誌收集

1
2
3
4
5
6
7
8
# 收集完整診斷資訊
velero debug \
    --backup <backup-name> \
    --restore <restore-name> \
    --output-dir ./velero-debug

# 檢視輸出
ls -la ./velero-debug/

最佳實踐總結

  1. 定期測試還原:定期執行還原測試,確保備份可用
  2. 實施 3-2-1 備份策略:3 份備份、2 種媒體、1 份異地
  3. 設定適當的 TTL:根據合規需求設定備份保留期限
  4. 監控備份狀態:設置告警,及時發現備份失敗
  5. 使用 Hooks 確保一致性:資料庫等有狀態應用務必使用 hooks
  6. 加密備份資料:啟用 S3 伺服器端加密
  7. 版本控制:啟用 S3 bucket 版本控制,防止意外刪除
  8. 文件化流程:記錄備份與還原的 SOP

結論

Velero 是 Kubernetes 環境中不可或缺的備份工具,它提供了完整的備份、還原和遷移功能。透過本文的介紹,您應該能夠:

  • 理解 Velero 的架構與工作原理
  • 在 AWS 環境中完成 Velero 的安裝與配置
  • 設計和實施符合需求的備份策略
  • 執行選擇性備份和還原操作
  • 實現跨叢集的資源遷移
  • 使用 Hooks 處理有狀態應用的備份
  • 監控 Velero 運行狀態並排除故障

建議在生產環境部署前,先在測試環境中充分驗證備份和還原流程,確保在真正需要時能夠順利恢復服務。

comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy