憑證驗證服務高可用架構

Certificate Validation Service High Availability Architecture

在現代企業環境中,PKI(Public Key Infrastructure)基礎設施扮演著關鍵角色。憑證驗證服務的穩定性直接影響到整個組織的安全運作。本文將深入探討如何建構一個高可用的憑證驗證服務架構,涵蓋 OCSP 和 CRL 兩種主要驗證機制。

1. 憑證驗證機制概述

OCSP(Online Certificate Status Protocol)

OCSP 是一種即時查詢憑證狀態的協定,定義於 RFC 6960。與傳統 CRL 相比,OCSP 提供更即時的驗證結果。

OCSP 運作流程:

1
2
3
4
5
6
7
Client                    OCSP Responder                    CA Database
   |                            |                               |
   |----OCSP Request----------->|                               |
   |                            |----Query Certificate Status-->|
   |                            |<---Return Status--------------|
   |<---OCSP Response-----------|                               |
   |                            |                               |

OCSP 請求範例:

1
2
3
4
# 使用 OpenSSL 查詢憑證 OCSP 狀態
openssl ocsp -issuer issuer.pem -cert server.pem \
    -url http://ocsp.example.com \
    -CAfile ca-chain.pem

OCSP Response 狀態類型:

狀態說明
good憑證有效
revoked憑證已撤銷
unknown憑證狀態未知

CRL(Certificate Revocation List)

CRL 是一份包含所有已撤銷憑證序號的清單,由 CA 定期發布並簽章。

CRL 結構:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CRL Entry Structure:
├── Version
├── Signature Algorithm
├── Issuer
├── This Update
├── Next Update
├── Revoked Certificates
│   ├── Serial Number
│   ├── Revocation Date
│   └── CRL Entry Extensions
└── CRL Extensions

下載並解析 CRL:

1
2
3
4
5
6
7
8
# 下載 CRL
curl -O http://crl.example.com/ca.crl

# 解析 CRL 內容
openssl crl -in ca.crl -text -noout

# 驗證 CRL 簽章
openssl crl -in ca.crl -CAfile ca-chain.pem -verify

OCSP vs CRL 比較

特性OCSPCRL
即時性即時查詢週期性更新
頻寬消耗低(單一憑證查詢)高(完整清單下載)
隱私考量可能洩漏瀏覽行為無隱私疑慮
離線支援不支援支援(快取後)
伺服器負載較高較低

2. OCSP Responder 高可用設計

架構設計原則

高可用 OCSP Responder 架構應考慮以下元件:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
                    ┌─────────────────┐
                    │   Load Balancer │
                    │   (HAProxy/F5)  │
                    └────────┬────────┘
              ┌──────────────┼──────────────┐
              │              │              │
       ┌──────▼──────┐ ┌─────▼──────┐ ┌─────▼──────┐
       │   OCSP      │ │   OCSP     │ │   OCSP     │
       │ Responder 1 │ │ Responder 2│ │ Responder 3│
       └──────┬──────┘ └─────┬──────┘ └─────┬──────┘
              │              │              │
              └──────────────┼──────────────┘
                    ┌────────▼────────┐
                    │  Database/      │
                    │  Backend Store  │
                    └─────────────────┘

多區域部署架構

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
                        ┌─────────────────┐
                        │   Global DNS    │
                        │  (Route 53/     │
                        │   CloudFlare)   │
                        └────────┬────────┘
            ┌────────────────────┼────────────────────┐
            │                    │                    │
            ▼                    ▼                    ▼
    ┌───────────────┐    ┌───────────────┐    ┌───────────────┐
    │  Region A     │    │  Region B     │    │  Region C     │
    │  Load Balancer│    │  Load Balancer│    │  Load Balancer│
    └───────┬───────┘    └───────┬───────┘    └───────┬───────┘
            │                    │                    │
       ┌────┴────┐          ┌────┴────┐          ┌────┴────┐
       │         │          │         │          │         │
       ▼         ▼          ▼         ▼          ▼         ▼
   ┌──────┐  ┌──────┐   ┌──────┐  ┌──────┐   ┌──────┐  ┌──────┐
   │OCSP-1│  │OCSP-2│   │OCSP-1│  │OCSP-2│   │OCSP-1│  │OCSP-2│
   └──────┘  └──────┘   └──────┘  └──────┘   └──────┘  └──────┘
       │         │          │         │          │         │
       └────┬────┘          └────┬────┘          └────┬────┘
            │                    │                    │
            ▼                    ▼                    ▼
    ┌───────────────┐    ┌───────────────┐    ┌───────────────┐
    │  Redis Cache  │◄──►│  Redis Cache  │◄──►│  Redis Cache  │
    │   (Primary)   │    │   (Replica)   │    │   (Replica)   │
    └───────────────┘    └───────────────┘    └───────────────┘

OpenSSL OCSP Responder 配置

建立 OCSP 簽章金鑰:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# 建立 OCSP Responder 工作目錄
mkdir -p /etc/pki/ocsp/{certs,private,db}
chmod 700 /etc/pki/ocsp/private

# 產生 OCSP 簽章金鑰
openssl genrsa -aes256 -out /etc/pki/ocsp/private/ocsp.key 4096

# 建立 OCSP 簽章憑證請求
openssl req -new -key /etc/pki/ocsp/private/ocsp.key \
    -out /etc/pki/ocsp/certs/ocsp.csr \
    -subj "/CN=OCSP Responder/O=Example Corp/C=TW"

# 簽發 OCSP 簽章憑證(包含 OCSP Signing 擴展)
openssl x509 -req -in /etc/pki/ocsp/certs/ocsp.csr \
    -CA ca.pem -CAkey ca-key.pem \
    -CAcreateserial \
    -out /etc/pki/ocsp/certs/ocsp.crt \
    -days 365 \
    -extfile <(echo "extendedKeyUsage = OCSPSigning")

OCSP 擴充設定(openssl.cnf):

1
2
3
4
5
6
[ ocsp_ext ]
basicConstraints = CA:FALSE
keyUsage = critical, digitalSignature
extendedKeyUsage = critical, OCSPSigning
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid,issuer

啟動 OCSP Responder:

1
2
3
4
5
6
7
8
# 使用 OpenSSL 內建 OCSP Responder
openssl ocsp -index /etc/pki/CA/index.txt \
    -port 8080 \
    -rsigner /etc/pki/ocsp/certs/ocsp.crt \
    -rkey /etc/pki/ocsp/private/ocsp.key \
    -CA /etc/pki/CA/certs/ca.crt \
    -text \
    -nrequest 100

Systemd Service 配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# /etc/systemd/system/ocsp-responder.service
[Unit]
Description=OCSP Responder Service
After=network.target

[Service]
Type=simple
User=ocsp
Group=ocsp
ExecStart=/usr/bin/openssl ocsp \
    -index /etc/pki/CA/index.txt \
    -port 8080 \
    -rsigner /etc/pki/ocsp/certs/ocsp.crt \
    -rkey /etc/pki/ocsp/private/ocsp.key \
    -CA /etc/pki/CA/certs/ca.crt \
    -nrequest 1000 \
    -multi 4
Restart=always
RestartSec=5
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target

使用 EJBCA 建構企業級 OCSP Responder

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# docker-compose.yml for EJBCA OCSP
version: '3.8'
services:
  ocsp-responder:
    image: keyfactor/ejbca-ce:latest
    environment:
      - DATABASE_JDBC_URL=jdbc:mysql://db:3306/ejbca
      - DATABASE_USER=ejbca
      - DATABASE_PASSWORD=${DB_PASSWORD}
      - TLS_SETUP_ENABLED=true
    ports:
      - "8080:8080"
      - "8443:8443"
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
      restart_policy:
        condition: on-failure

3. CRL Distribution Point 架構

分散式 CRL 發布架構

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
                    ┌─────────────────┐
                    │       CA        │
                    │  (Offline/HSM)  │
                    └────────┬────────┘
                             │ CRL Generation
                    ┌─────────────────┐
                    │  CRL Publisher  │
                    └────────┬────────┘
         ┌───────────────────┼───────────────────┐
         │                   │                   │
    ┌────▼────┐         ┌────▼────┐         ┌────▼────┐
    │  CDN    │         │  CDN    │         │  CDN    │
    │ Node 1  │         │ Node 2  │         │ Node 3  │
    │ (Asia)  │         │(Europe) │         │(America)│
    └─────────┘         └─────────┘         └─────────┘

Nginx CRL Distribution Point 配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# /etc/nginx/conf.d/crl-distribution.conf
upstream crl_backend {
    least_conn;
    server 10.0.1.10:80 weight=5 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:80 weight=5 max_fails=3 fail_timeout=30s;
    server 10.0.1.12:80 backup;

    keepalive 32;
}

server {
    listen 80;
    server_name crl.example.com;

    location /crl/ {
        proxy_pass http://crl_backend;
        proxy_cache crl_cache;
        proxy_cache_valid 200 1h;
        proxy_cache_use_stale error timeout updating;

        # CRL 檔案的 Content-Type
        add_header Content-Type application/pkix-crl;

        # 快取控制
        add_header Cache-Control "public, max-age=3600";
        add_header X-Cache-Status $upstream_cache_status;
    }

    location /health {
        return 200 "OK";
        add_header Content-Type text/plain;
    }
}

# 快取配置
proxy_cache_path /var/cache/nginx/crl levels=1:2
    keys_zone=crl_cache:10m
    max_size=100m
    inactive=24h
    use_temp_path=off;

自動化 CRL 發布腳本

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#!/bin/bash
# /usr/local/bin/generate-crl.sh

set -euo pipefail

# 設定變數
CA_DIR="/etc/pki/CA"
CRL_DIR="/var/www/crl"
S3_BUCKET="s3://example-crl-bucket"
LOG_FILE="/var/log/crl-generation.log"
LOCK_FILE="/var/run/crl-generation.lock"

# 日誌函數
log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}

# 確保只有一個實例執行
exec 200>"$LOCK_FILE"
flock -n 200 || { log "ERROR: Another instance is running"; exit 1; }

# 開始執行
log "Starting CRL generation..."

# 產生新的 CRL
openssl ca -config "$CA_DIR/openssl.cnf" \
    -gencrl \
    -out "$CRL_DIR/ca.crl.new" \
    -crldays 7 \
    2>> "$LOG_FILE"

if [ $? -eq 0 ]; then
    # 驗證 CRL
    openssl crl -in "$CRL_DIR/ca.crl.new" -noout -text > /dev/null 2>&1

    if [ $? -eq 0 ]; then
        # 原子性替換
        mv "$CRL_DIR/ca.crl.new" "$CRL_DIR/ca.crl"

        # 產生 DER 格式
        openssl crl -in "$CRL_DIR/ca.crl" \
            -outform DER \
            -out "$CRL_DIR/ca.crl.der"

        # 同步至 S3
        aws s3 cp "$CRL_DIR/ca.crl" "$S3_BUCKET/ca.crl" \
            --cache-control "max-age=3600" \
            --content-type "application/pkix-crl"

        aws s3 cp "$CRL_DIR/ca.crl.der" "$S3_BUCKET/ca.crl.der" \
            --cache-control "max-age=3600" \
            --content-type "application/pkix-crl"

        # 清除 CDN 快取
        aws cloudfront create-invalidation \
            --distribution-id E1234567890ABC \
            --paths "/ca.crl" "/ca.crl.der"

        log "CRL generation and distribution completed successfully"
    else
        log "ERROR: CRL validation failed"
        rm -f "$CRL_DIR/ca.crl.new"
        exit 1
    fi
else
    log "ERROR: CRL generation failed"
    exit 1
fi

Cron 排程設定:

1
2
3
4
5
6
# /etc/cron.d/crl-generation
# 每 6 小時產生一次 CRL
0 */6 * * * root /usr/local/bin/generate-crl.sh

# 每小時檢查 CRL 有效性
0 * * * * root /usr/local/bin/check-crl-validity.sh

Delta CRL 實作

Delta CRL 可減少頻寬消耗,只傳送自上次完整 CRL 以來的變更。

Delta CRL 產生設定(openssl.cnf):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
[ ca ]
default_ca = CA_default

[ CA_default ]
# CRL 設定
crl_dir = $dir/crl
crlnumber = $dir/crlnumber
crl = $dir/crl/ca.crl
default_crl_days = 7
default_crl_hours = 24

# Delta CRL 設定
crl_extensions = crl_ext
delta_crl_hours = 1

[ crl_ext ]
authorityKeyIdentifier = keyid:always
# Delta CRL Indicator
deltaCRLIndicator = critical, deltabase

4. 負載平衡與故障轉移

HAProxy 負載平衡配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# /etc/haproxy/haproxy.cfg
global
    log /dev/log local0
    maxconn 50000
    tune.ssl.default-dh-param 2048

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5000ms
    timeout client  50000ms
    timeout server  50000ms
    retries 3

# OCSP Responder Frontend
frontend ocsp_frontend
    bind *:80
    bind *:443 ssl crt /etc/haproxy/certs/ocsp.pem

    # ACL 定義
    acl is_ocsp_request hdr(Content-Type) -i application/ocsp-request
    acl is_health path /health

    # 健康檢查端點
    use_backend health_backend if is_health

    default_backend ocsp_backend

# OCSP Backend Pool
backend ocsp_backend
    balance roundrobin
    option httpchk GET /health HTTP/1.1\r\nHost:\ localhost
    http-check expect status 200

    # 主要節點
    server ocsp1 10.0.1.10:8080 check inter 5s fall 3 rise 2 weight 100
    server ocsp2 10.0.1.11:8080 check inter 5s fall 3 rise 2 weight 100
    server ocsp3 10.0.1.12:8080 check inter 5s fall 3 rise 2 weight 100

    # 備援節點
    server ocsp-dr1 10.1.1.10:8080 check inter 5s fall 3 rise 2 backup
    server ocsp-dr2 10.1.1.11:8080 check inter 5s fall 3 rise 2 backup

    # 連線重試
    retry-on all-retryable-errors
    retries 3

backend health_backend
    server local 127.0.0.1:8081

# CRL Distribution Frontend
frontend crl_frontend
    bind *:8080
    default_backend crl_backend

backend crl_backend
    balance uri
    hash-type consistent

    server crl1 10.0.2.10:80 check
    server crl2 10.0.2.11:80 check
    server crl3 10.0.2.12:80 check

# 統計頁面
listen stats
    bind *:9000
    mode http
    stats enable
    stats uri /stats
    stats auth admin:${HAPROXY_STATS_PASSWORD}
    stats refresh 30s

Keepalived 故障轉移配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# /etc/keepalived/keepalived.conf
global_defs {
    router_id OCSP_LB_01
    script_user root
    enable_script_security
}

vrrp_script check_haproxy {
    script "/usr/bin/systemctl is-active haproxy"
    interval 2
    weight -20
    fall 3
    rise 2
}

vrrp_instance VI_OCSP {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1

    authentication {
        auth_type PASS
        auth_pass ${KEEPALIVED_AUTH_PASS}
    }

    virtual_ipaddress {
        10.0.0.100/24
    }

    track_script {
        check_haproxy
    }

    notify_master "/usr/local/bin/keepalived-notify.sh master"
    notify_backup "/usr/local/bin/keepalived-notify.sh backup"
    notify_fault "/usr/local/bin/keepalived-notify.sh fault"
}

AWS 負載平衡設定(Terraform)

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
# ALB 資源定義
resource "aws_lb" "ocsp" {
  name               = "ocsp-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.ocsp_alb.id]
  subnets            = var.public_subnet_ids

  enable_deletion_protection = true
  enable_http2              = true

  access_logs {
    bucket  = aws_s3_bucket.alb_logs.bucket
    prefix  = "ocsp-alb"
    enabled = true
  }

  tags = {
    Name        = "ocsp-alb"
    Environment = var.environment
  }
}

# Target Group
resource "aws_lb_target_group" "ocsp" {
  name                 = "ocsp-tg"
  port                 = 8080
  protocol             = "HTTP"
  vpc_id               = var.vpc_id
  target_type          = "instance"
  deregistration_delay = 30

  health_check {
    enabled             = true
    healthy_threshold   = 2
    unhealthy_threshold = 3
    timeout             = 5
    interval            = 10
    path                = "/health"
    protocol            = "HTTP"
    matcher             = "200"
  }

  stickiness {
    type            = "lb_cookie"
    cookie_duration = 86400
    enabled         = false
  }

  tags = {
    Name = "ocsp-tg"
  }
}

# HTTPS Listener
resource "aws_lb_listener" "ocsp_https" {
  load_balancer_arn = aws_lb.ocsp.arn
  port              = "443"
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = var.acm_certificate_arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.ocsp.arn
  }
}

# Auto Scaling Group
resource "aws_autoscaling_group" "ocsp" {
  name                = "ocsp-asg"
  vpc_zone_identifier = var.private_subnet_ids
  target_group_arns   = [aws_lb_target_group.ocsp.arn]
  health_check_type   = "ELB"

  min_size         = 2
  max_size         = 10
  desired_capacity = 3

  launch_template {
    id      = aws_launch_template.ocsp.id
    version = "$Latest"
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 50
    }
  }

  tag {
    key                 = "Name"
    value               = "ocsp-instance"
    propagate_at_launch = true
  }
}

# Scaling Policy
resource "aws_autoscaling_policy" "ocsp_cpu" {
  name                   = "ocsp-cpu-scaling"
  autoscaling_group_name = aws_autoscaling_group.ocsp.name
  policy_type            = "TargetTrackingScaling"

  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ASGAverageCPUUtilization"
    }
    target_value = 70.0
  }
}

Route 53 健康檢查與 DNS 故障轉移

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# 健康檢查
resource "aws_route53_health_check" "ocsp_primary" {
  fqdn              = "ocsp-primary.example.com"
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = "3"
  request_interval  = "10"

  tags = {
    Name = "ocsp-primary-health-check"
  }
}

resource "aws_route53_health_check" "ocsp_secondary" {
  fqdn              = "ocsp-secondary.example.com"
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = "3"
  request_interval  = "10"

  tags = {
    Name = "ocsp-secondary-health-check"
  }
}

# Primary Record
resource "aws_route53_record" "ocsp_primary" {
  zone_id = var.route53_zone_id
  name    = "ocsp.example.com"
  type    = "A"

  alias {
    name                   = aws_lb.ocsp_primary.dns_name
    zone_id                = aws_lb.ocsp_primary.zone_id
    evaluate_target_health = true
  }

  set_identifier  = "primary"
  health_check_id = aws_route53_health_check.ocsp_primary.id

  failover_routing_policy {
    type = "PRIMARY"
  }
}

# Secondary Record
resource "aws_route53_record" "ocsp_secondary" {
  zone_id = var.route53_zone_id
  name    = "ocsp.example.com"
  type    = "A"

  alias {
    name                   = aws_lb.ocsp_secondary.dns_name
    zone_id                = aws_lb.ocsp_secondary.zone_id
    evaluate_target_health = true
  }

  set_identifier  = "secondary"
  health_check_id = aws_route53_health_check.ocsp_secondary.id

  failover_routing_policy {
    type = "SECONDARY"
  }
}

5. 快取策略與效能優化

OCSP Stapling 配置

OCSP Stapling 允許 Web 伺服器預先取得 OCSP 回應並附加在 TLS 握手中,減少客戶端的額外請求。

Nginx OCSP Stapling 配置:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# /etc/nginx/conf.d/ssl.conf
server {
    listen 443 ssl http2;
    server_name www.example.com;

    ssl_certificate /etc/nginx/ssl/server.pem;
    ssl_certificate_key /etc/nginx/ssl/server.key;

    # OCSP Stapling
    ssl_stapling on;
    ssl_stapling_verify on;
    ssl_trusted_certificate /etc/nginx/ssl/ca-chain.pem;

    # OCSP Responder URL (可選,覆蓋憑證中的 AIA)
    # ssl_stapling_responder http://ocsp.example.com;

    # 快取 OCSP Response
    ssl_stapling_file /etc/nginx/ssl/ocsp-response.der;

    # DNS 解析器
    resolver 8.8.8.8 8.8.4.4 valid=300s;
    resolver_timeout 5s;
}

Apache OCSP Stapling 配置:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
<VirtualHost *:443>
    ServerName www.example.com

    SSLEngine on
    SSLCertificateFile /etc/apache2/ssl/www.example.com.crt
    SSLCertificateKeyFile /etc/apache2/ssl/www.example.com.key
    SSLCertificateChainFile /etc/apache2/ssl/ca-chain.crt

    # OCSP Stapling
    SSLUseStapling on
    SSLStaplingCache shmcb:/var/run/ocsp(128000)
    SSLStaplingResponderTimeout 5
    SSLStaplingReturnResponderErrors off
    SSLStaplingStandardCacheTimeout 3600
    SSLStaplingErrorCacheTimeout 600
</VirtualHost>

# 全域 OCSP Stapling 快取設定
SSLStaplingCache shmcb:/var/run/apache2/ssl_stapling(32768)

Redis 快取層

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# ocsp_cache.py
import redis
import hashlib
import time
from datetime import timedelta
from typing import Optional
from dataclasses import dataclass

@dataclass
class OCSPCacheConfig:
    host: str = "localhost"
    port: int = 6379
    db: int = 0
    password: Optional[str] = None
    default_ttl: int = 300  # 5 分鐘
    max_ttl: int = 3600     # 1 小時

class OCSPCache:
    def __init__(self, config: OCSPCacheConfig):
        self.config = config
        self.redis = redis.Redis(
            host=config.host,
            port=config.port,
            db=config.db,
            password=config.password,
            decode_responses=False
        )

    def _generate_key(self, issuer_key_hash: bytes, issuer_name_hash: bytes,
                      serial_number: int) -> str:
        """產生快取鍵值"""
        key_data = issuer_key_hash + issuer_name_hash + str(serial_number).encode()
        return f"ocsp:{hashlib.sha256(key_data).hexdigest()}"

    def get(self, issuer_key_hash: bytes, issuer_name_hash: bytes,
            serial_number: int) -> Optional[bytes]:
        """從快取取得 OCSP Response"""
        key = self._generate_key(issuer_key_hash, issuer_name_hash, serial_number)

        cached = self.redis.get(key)
        if cached:
            # 更新存取時間統計
            self.redis.hincrby("ocsp:stats", "cache_hits", 1)
            return cached

        self.redis.hincrby("ocsp:stats", "cache_misses", 1)
        return None

    def set(self, issuer_key_hash: bytes, issuer_name_hash: bytes,
            serial_number: int, response: bytes,
            next_update: Optional[int] = None) -> None:
        """儲存 OCSP Response 至快取"""
        key = self._generate_key(issuer_key_hash, issuer_name_hash, serial_number)

        # 計算 TTL
        if next_update:
            ttl = min(next_update - int(time.time()), self.config.max_ttl)
            ttl = max(ttl, 60)  # 最少快取 60 秒
        else:
            ttl = self.config.default_ttl

        self.redis.setex(key, ttl, response)

    def invalidate(self, issuer_key_hash: bytes, issuer_name_hash: bytes,
                   serial_number: int) -> None:
        """使快取失效"""
        key = self._generate_key(issuer_key_hash, issuer_name_hash, serial_number)
        self.redis.delete(key)

    def get_stats(self) -> dict:
        """取得快取統計資料"""
        stats = self.redis.hgetall("ocsp:stats")
        hits = int(stats.get(b"cache_hits", 0))
        misses = int(stats.get(b"cache_misses", 0))
        total = hits + misses

        return {
            "hits": hits,
            "misses": misses,
            "hit_rate": hits / total if total > 0 else 0,
            "total_requests": total
        }

Varnish 快取配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# /etc/varnish/default.vcl
vcl 4.1;

backend ocsp_backend {
    .host = "127.0.0.1";
    .port = "8080";
    .probe = {
        .url = "/health";
        .timeout = 2s;
        .interval = 5s;
        .window = 5;
        .threshold = 3;
    }
}

sub vcl_recv {
    # OCSP 請求處理
    if (req.url ~ "^/ocsp") {
        # 基於 POST body 產生快取鍵
        if (req.method == "POST") {
            return (hash);
        }
        return (pass);
    }
}

sub vcl_hash {
    # 對 POST 請求使用 body 作為快取鍵
    if (req.method == "POST") {
        hash_data(req.body);
    }
    hash_data(req.url);
    return (lookup);
}

sub vcl_backend_response {
    # 設定 OCSP 回應快取時間
    if (bereq.url ~ "^/ocsp") {
        set beresp.ttl = 1h;
        set beresp.grace = 6h;

        # 不快取錯誤回應
        if (beresp.status != 200) {
            set beresp.ttl = 0s;
            set beresp.uncacheable = true;
        }
    }
}

sub vcl_deliver {
    # 除錯標頭
    if (obj.hits > 0) {
        set resp.http.X-Cache = "HIT";
        set resp.http.X-Cache-Hits = obj.hits;
    } else {
        set resp.http.X-Cache = "MISS";
    }
}

系統層級效能調優

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# /etc/sysctl.d/99-ocsp-tuning.conf

# 網路效能優化
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535

# TCP 連線優化
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 15

# 連線追蹤
net.netfilter.nf_conntrack_max = 1000000
net.netfilter.nf_conntrack_tcp_timeout_established = 600

# 記憶體優化
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# 增加檔案描述符限制
fs.file-max = 2097152

6. 監控與告警設定

Prometheus 監控配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

rule_files:
  - "ocsp_rules.yml"
  - "crl_rules.yml"

scrape_configs:
  - job_name: 'ocsp-responder'
    static_configs:
      - targets:
        - 'ocsp1:8080'
        - 'ocsp2:8080'
        - 'ocsp3:8080'
    metrics_path: /metrics

  - job_name: 'haproxy'
    static_configs:
      - targets: ['haproxy:9000']

  - job_name: 'crl-distribution'
    static_configs:
      - targets:
        - 'crl1:80'
        - 'crl2:80'
        - 'crl3:80'

OCSP Exporter 實作(Python)

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
#!/usr/bin/env python3
# ocsp_exporter.py

from prometheus_client import start_http_server, Gauge, Counter, Histogram
import time
import subprocess
from datetime import datetime

# Metrics 定義
OCSP_REQUEST_TOTAL = Counter(
    'ocsp_requests_total',
    'Total number of OCSP requests',
    ['status', 'responder']
)

OCSP_RESPONSE_TIME = Histogram(
    'ocsp_response_time_seconds',
    'OCSP response time in seconds',
    ['responder'],
    buckets=[0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]
)

OCSP_RESPONDER_UP = Gauge(
    'ocsp_responder_up',
    'OCSP responder availability',
    ['responder']
)

CRL_LAST_UPDATE = Gauge(
    'crl_last_update_timestamp',
    'CRL last update timestamp',
    ['crl_url']
)

CRL_NEXT_UPDATE = Gauge(
    'crl_next_update_timestamp',
    'CRL next update timestamp',
    ['crl_url']
)

CRL_ENTRIES_COUNT = Gauge(
    'crl_entries_total',
    'Total number of entries in CRL',
    ['crl_url']
)

class OCSPMonitor:
    def __init__(self, responders: list, crls: list, check_interval: int = 60):
        self.responders = responders
        self.crls = crls
        self.check_interval = check_interval

    def check_ocsp_responder(self, responder_url: str) -> dict:
        """檢查 OCSP Responder 可用性"""
        try:
            start_time = time.time()

            # 發送 OCSP 請求(使用 OpenSSL)
            result = subprocess.run([
                'openssl', 'ocsp',
                '-issuer', '/etc/pki/CA/certs/ca.crt',
                '-cert', '/etc/pki/test/test.crt',
                '-url', responder_url,
                '-resp_text'
            ], capture_output=True, timeout=10)

            response_time = time.time() - start_time

            if result.returncode == 0:
                OCSP_RESPONDER_UP.labels(responder=responder_url).set(1)
                OCSP_REQUEST_TOTAL.labels(
                    status='success',
                    responder=responder_url
                ).inc()
                OCSP_RESPONSE_TIME.labels(
                    responder=responder_url
                ).observe(response_time)
                return {'status': 'up', 'response_time': response_time}
            else:
                OCSP_RESPONDER_UP.labels(responder=responder_url).set(0)
                OCSP_REQUEST_TOTAL.labels(
                    status='error',
                    responder=responder_url
                ).inc()
                return {'status': 'down', 'error': result.stderr.decode()}

        except subprocess.TimeoutExpired:
            OCSP_RESPONDER_UP.labels(responder=responder_url).set(0)
            OCSP_REQUEST_TOTAL.labels(
                status='timeout',
                responder=responder_url
            ).inc()
            return {'status': 'timeout'}
        except Exception as e:
            OCSP_RESPONDER_UP.labels(responder=responder_url).set(0)
            return {'status': 'error', 'error': str(e)}

    def run(self):
        """執行監控迴圈"""
        while True:
            for responder in self.responders:
                self.check_ocsp_responder(responder)

            time.sleep(self.check_interval)

if __name__ == '__main__':
    # 啟動 Prometheus metrics 端點
    start_http_server(9100)

    # 設定監控目標
    monitor = OCSPMonitor(
        responders=[
            'http://ocsp.example.com',
            'http://ocsp-backup.example.com'
        ],
        crls=[
            'http://crl.example.com/ca.crl'
        ]
    )

    monitor.run()

告警規則

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# ocsp_rules.yml
groups:
  - name: ocsp_alerts
    rules:
      - alert: OCSPResponderDown
        expr: ocsp_responder_up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "OCSP Responder {{ $labels.responder }} is down"
          description: "OCSP Responder has been unreachable for more than 1 minute."

      - alert: OCSPResponseLatencyHigh
        expr: histogram_quantile(0.95, rate(ocsp_response_duration_seconds_bucket[5m])) > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "OCSP response latency is high"
          description: "95th percentile response time is above 500ms"

      - alert: OCSPCacheHitRateLow
        expr: rate(ocsp_cache_hits_total[5m]) / (rate(ocsp_cache_hits_total[5m]) + rate(ocsp_cache_misses_total[5m])) < 0.7
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "OCSP cache hit rate is below 70%"

      - alert: CRLUpdateOverdue
        expr: time() - crl_last_update_timestamp > 86400
        for: 1h
        labels:
          severity: critical
        annotations:
          summary: "CRL has not been updated in over 24 hours"
          description: "The CRL file may be stale and should be refreshed"

      - alert: CRLExpirationSoon
        expr: (crl_next_update_timestamp - time()) < 7200
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "CRL will expire in less than 2 hours"

Grafana Dashboard JSON

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
{
  "dashboard": {
    "title": "Certificate Validation Services",
    "panels": [
      {
        "title": "OCSP Responder Availability",
        "type": "stat",
        "gridPos": {"h": 4, "w": 6, "x": 0, "y": 0},
        "targets": [
          {
            "expr": "ocsp_responder_up",
            "legendFormat": "{{responder}}"
          }
        ],
        "options": {
          "colorMode": "background",
          "graphMode": "none"
        },
        "fieldConfig": {
          "defaults": {
            "mappings": [
              {"type": "value", "options": {"0": {"text": "DOWN", "color": "red"}}},
              {"type": "value", "options": {"1": {"text": "UP", "color": "green"}}}
            ]
          }
        }
      },
      {
        "title": "OCSP Response Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(ocsp_requests_total[5m])",
            "legendFormat": "{{instance}}"
          }
        ]
      },
      {
        "title": "OCSP Response Latency",
        "type": "heatmap",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(ocsp_response_duration_seconds_bucket[5m]))"
          }
        ]
      },
      {
        "title": "CRL Download Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(crl_downloads_total[5m])"
          }
        ]
      },
      {
        "title": "Active Backend Servers",
        "type": "stat",
        "targets": [
          {
            "expr": "sum(haproxy_backend_active_servers{backend='ocsp_backend'})"
          }
        ]
      }
    ]
  }
}

健康檢查腳本

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#!/bin/bash
# /usr/local/bin/ocsp-health-check.sh

set -euo pipefail

OCSP_ENDPOINTS=("http://ocsp1:8080" "http://ocsp2:8080" "http://ocsp3:8080")
TEST_CERT="/etc/pki/test/server.pem"
ISSUER_CERT="/etc/pki/ca/issuer.pem"
ALERT_WEBHOOK="https://alerts.example.com/webhook"

check_ocsp_endpoint() {
    local endpoint=$1
    local response

    response=$(openssl ocsp -issuer "$ISSUER_CERT" \
        -cert "$TEST_CERT" \
        -url "$endpoint" \
        -resp_text 2>&1) || return 1

    if echo "$response" | grep -q "good"; then
        return 0
    else
        return 1
    fi
}

main() {
    local failed_count=0
    local results=()

    for endpoint in "${OCSP_ENDPOINTS[@]}"; do
        if check_ocsp_endpoint "$endpoint"; then
            results+=("$endpoint: OK")
        else
            results+=("$endpoint: FAILED")
            ((failed_count++))
        fi
    done

    # 輸出結果
    printf '%s\n' "${results[@]}"

    # 如果超過一半節點失敗,發送告警
    if [ $failed_count -ge 2 ]; then
        curl -X POST "$ALERT_WEBHOOK" \
            -H "Content-Type: application/json" \
            -d "{\"severity\": \"critical\", \"message\": \"Multiple OCSP endpoints failed: ${results[*]}\"}"
        exit 1
    fi

    exit 0
}

main "$@"

7. 災難復原規劃

災難復原架構

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
Primary Site (Active)              DR Site (Standby)
┌─────────────────────┐           ┌─────────────────────┐
│   Load Balancer     │           │   Load Balancer     │
│   (Active)          │           │   (Standby)         │
└──────────┬──────────┘           └──────────┬──────────┘
           │                                  │
     ┌─────┴─────┐                     ┌─────┴─────┐
     │           │                     │           │
 ┌───┴───┐ ┌─────┴──┐             ┌───┴───┐ ┌─────┴──┐
 │OCSP 1 │ │ OCSP 2 │             │OCSP 1 │ │ OCSP 2 │
 └───┬───┘ └───┬────┘             └───┬───┘ └───┬────┘
     │         │                      │         │
     └────┬────┘                      └────┬────┘
          │                                │
     ┌────┴────┐                      ┌────┴────┐
     │   DB    │ ═══════════════════> │   DB    │
     │(Primary)│    Replication       │(Replica)│
     └─────────┘                      └─────────┘

RTO/RPO 目標

指標目標值說明
RTO (Recovery Time Objective)15 分鐘服務恢復所需時間
RPO (Recovery Point Objective)5 分鐘可接受的資料遺失時間
同步頻率每 5 分鐘資料同步至 DR 站點的頻率
故障偵測時間30 秒偵測到服務故障的時間

備份策略

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#!/bin/bash
# /usr/local/bin/backup-pki.sh

set -euo pipefail

# 設定
BACKUP_DIR="/var/backup/pki"
S3_BUCKET="s3://example-pki-backup"
RETENTION_DAYS=90
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_NAME="pki-backup-${DATE}"

# 建立備份目錄
mkdir -p "${BACKUP_DIR}/${BACKUP_NAME}"

# 備份 CA 資料庫
echo "Backing up CA database..."
cp -r /etc/pki/CA/index.txt* "${BACKUP_DIR}/${BACKUP_NAME}/"
cp -r /etc/pki/CA/serial* "${BACKUP_DIR}/${BACKUP_NAME}/"
cp -r /etc/pki/CA/crlnumber* "${BACKUP_DIR}/${BACKUP_NAME}/"

# 備份已簽發憑證
echo "Backing up certificates..."
cp -r /etc/pki/CA/newcerts "${BACKUP_DIR}/${BACKUP_NAME}/"
cp -r /etc/pki/CA/certs "${BACKUP_DIR}/${BACKUP_NAME}/"

# 備份 CRL
echo "Backing up CRL..."
cp -r /etc/pki/CA/crl "${BACKUP_DIR}/${BACKUP_NAME}/"

# 備份設定檔
echo "Backing up configuration..."
cp /etc/pki/CA/openssl.cnf "${BACKUP_DIR}/${BACKUP_NAME}/"

# 備份 OCSP Responder 設定
echo "Backing up OCSP configuration..."
cp -r /etc/pki/ocsp "${BACKUP_DIR}/${BACKUP_NAME}/"

# 壓縮備份
echo "Compressing backup..."
cd "${BACKUP_DIR}"
tar -czf "${BACKUP_NAME}.tar.gz" "${BACKUP_NAME}"
rm -rf "${BACKUP_NAME}"

# 計算校驗碼
sha256sum "${BACKUP_NAME}.tar.gz" > "${BACKUP_NAME}.tar.gz.sha256"

# 上傳至 S3
echo "Uploading to S3..."
aws s3 cp "${BACKUP_NAME}.tar.gz" "${S3_BUCKET}/${BACKUP_NAME}.tar.gz" \
    --storage-class STANDARD_IA
aws s3 cp "${BACKUP_NAME}.tar.gz.sha256" "${S3_BUCKET}/${BACKUP_NAME}.tar.gz.sha256"

# 清理本地舊備份
echo "Cleaning up old backups..."
find "${BACKUP_DIR}" -name "pki-backup-*.tar.gz*" -mtime +${RETENTION_DAYS} -delete

echo "Backup completed: ${BACKUP_NAME}"

資料同步腳本

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#!/bin/bash
# /usr/local/bin/dr-sync.sh

set -euo pipefail

PRIMARY_HOST="primary-db.example.com"
DR_HOST="dr-db.example.com"
SYNC_PATHS=(
    "/var/lib/ca/index.txt"
    "/var/lib/ca/crl/"
    "/etc/pki/ocsp/"
)

sync_data() {
    for path in "${SYNC_PATHS[@]}"; do
        rsync -avz --delete \
            -e "ssh -o StrictHostKeyChecking=no" \
            "${PRIMARY_HOST}:${path}" \
            "${DR_HOST}:${path}"
    done
}

verify_sync() {
    local primary_hash
    local dr_hash

    for path in "${SYNC_PATHS[@]}"; do
        primary_hash=$(ssh "$PRIMARY_HOST" "find $path -type f -exec md5sum {} \; | sort | md5sum")
        dr_hash=$(ssh "$DR_HOST" "find $path -type f -exec md5sum {} \; | sort | md5sum")

        if [ "$primary_hash" != "$dr_hash" ]; then
            echo "Sync verification failed for $path"
            return 1
        fi
    done

    echo "Sync verification successful"
    return 0
}

main() {
    echo "Starting DR sync at $(date)"
    sync_data
    verify_sync
    echo "DR sync completed at $(date)"
}

main "$@"

故障切換程序

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# failover-playbook.yml
---
- name: Certificate Validation Service Failover
  hosts: dr_site
  become: yes
  vars:
    primary_vip: 10.0.0.100
    dr_vip: 10.1.0.100

  tasks:
    - name: Verify DR site health
      uri:
        url: "http://{{ dr_vip }}:8080/health"
        method: GET
        status_code: 200
      register: health_check

    - name: Update DNS records
      community.general.nsupdate:
        key_name: "update-key"
        key_secret: "{{ dns_key_secret }}"
        server: "dns.example.com"
        zone: "example.com"
        record: "ocsp"
        type: "A"
        value: "{{ dr_vip }}"
        ttl: 60
      when: health_check.status == 200

    - name: Activate DR load balancer
      command: /usr/local/bin/activate-lb.sh
      when: health_check.status == 200

    - name: Send notification
      community.general.slack:
        token: "{{ slack_token }}"
        channel: "#pki-alerts"
        msg: "Certificate validation service failover to DR site completed"

復原腳本

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
#!/bin/bash
# /usr/local/bin/restore-pki.sh

set -euo pipefail

# 檢查參數
if [ $# -ne 1 ]; then
    echo "Usage: $0 <backup-name>"
    exit 1
fi

BACKUP_NAME=$1
S3_BUCKET="s3://example-pki-backup"
RESTORE_DIR="/tmp/pki-restore"
TARGET_DIR="/etc/pki"

# 建立還原目錄
mkdir -p "${RESTORE_DIR}"
cd "${RESTORE_DIR}"

# 下載備份
echo "Downloading backup from S3..."
aws s3 cp "${S3_BUCKET}/${BACKUP_NAME}.tar.gz" .
aws s3 cp "${S3_BUCKET}/${BACKUP_NAME}.tar.gz.sha256" .

# 驗證校驗碼
echo "Verifying checksum..."
sha256sum -c "${BACKUP_NAME}.tar.gz.sha256"

# 解壓縮
echo "Extracting backup..."
tar -xzf "${BACKUP_NAME}.tar.gz"

# 停止相關服務
echo "Stopping services..."
systemctl stop ocsp-responder || true
systemctl stop nginx || true

# 備份現有資料
echo "Backing up current data..."
if [ -d "${TARGET_DIR}/CA" ]; then
    mv "${TARGET_DIR}/CA" "${TARGET_DIR}/CA.old.$(date +%Y%m%d_%H%M%S)"
fi

# 還原資料
echo "Restoring data..."
mkdir -p "${TARGET_DIR}/CA"
cp -r "${BACKUP_NAME}"/* "${TARGET_DIR}/CA/"

# 修復權限
echo "Fixing permissions..."
chown -R root:root "${TARGET_DIR}/CA"
chmod 700 "${TARGET_DIR}/CA/private"
chmod 600 "${TARGET_DIR}/CA/private"/*

# 重新產生 CRL
echo "Regenerating CRL..."
openssl ca -config "${TARGET_DIR}/CA/openssl.cnf" \
    -gencrl \
    -out "${TARGET_DIR}/CA/crl/ca.crl"

# 重新啟動服務
echo "Starting services..."
systemctl start ocsp-responder
systemctl start nginx

# 驗證服務
echo "Verifying services..."
sleep 5
curl -s http://localhost/health || echo "Warning: Health check failed"

# 清理
echo "Cleaning up..."
rm -rf "${RESTORE_DIR}"

echo "Restore completed successfully"

8. 最佳實務與合規考量

安全最佳實務

1. HSM 整合

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# 使用 HSM 保護 OCSP 簽章金鑰
pkcs11-tool --module /usr/lib/softhsm/libsofthsm2.so \
    --login --pin ${HSM_PIN} \
    --keypairgen --key-type rsa:4096 \
    --id 01 --label "OCSP Signing Key"

# OpenSSL 引擎配置
# openssl.cnf
[engine_section]
pkcs11 = pkcs11_section

[pkcs11_section]
engine_id = pkcs11
dynamic_path = /usr/lib/engines/libpkcs11.so
MODULE_PATH = /usr/lib/softhsm/libsofthsm2.so
init = 0

2. 網路隔離

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# Security Group 設定 (Terraform)
resource "aws_security_group" "ocsp_sg" {
  name        = "ocsp-security-group"
  description = "Security group for OCSP Responder"
  vpc_id      = var.vpc_id

  # 僅允許 HTTPS 流量
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # 允許 HTTP(用於 OCSP,某些客戶端不支援 HTTPS)
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # 限制管理流量
  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = [var.admin_cidr]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "ocsp-sg"
  }
}

3. 稽核日誌

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# audit_logger.py
import logging
import json
from datetime import datetime
import boto3

class OCSPAuditLogger:
    def __init__(self, log_group: str):
        self.logs = boto3.client('logs')
        self.log_group = log_group
        self.log_stream = f"ocsp-audit-{datetime.now().strftime('%Y-%m-%d')}"

    def log_request(self, request_data: dict, response_data: dict):
        """記錄 OCSP 請求與回應"""
        audit_entry = {
            'timestamp': datetime.utcnow().isoformat(),
            'event_type': 'OCSP_REQUEST',
            'request': {
                'serial_number': request_data.get('serial_number'),
                'issuer_key_hash': request_data.get('issuer_key_hash'),
                'source_ip': request_data.get('source_ip'),
            },
            'response': {
                'status': response_data.get('status'),
                'this_update': response_data.get('this_update'),
                'next_update': response_data.get('next_update'),
            }
        }

        self.logs.put_log_events(
            logGroupName=self.log_group,
            logStreamName=self.log_stream,
            logEvents=[{
                'timestamp': int(datetime.utcnow().timestamp() * 1000),
                'message': json.dumps(audit_entry)
            }]
        )

    def log_crl_generation(self, crl_data: dict):
        """記錄 CRL 產生事件"""
        audit_entry = {
            'timestamp': datetime.utcnow().isoformat(),
            'event_type': 'CRL_GENERATION',
            'crl': {
                'this_update': crl_data.get('this_update'),
                'next_update': crl_data.get('next_update'),
                'entries_count': crl_data.get('entries_count'),
                'crl_number': crl_data.get('crl_number'),
            }
        }

        self.logs.put_log_events(
            logGroupName=self.log_group,
            logStreamName=self.log_stream,
            logEvents=[{
                'timestamp': int(datetime.utcnow().timestamp() * 1000),
                'message': json.dumps(audit_entry)
            }]
        )

合規要求

CA/Browser Forum 基準要求

要求項目說明合規措施
OCSP 回應時間必須在 10 秒內回應實施快取與負載平衡
CRL 更新頻率至少每 7 天更新一次自動化 CRL 發布排程
OCSP 簽章憑證有效期不超過 3 年憑證生命週期管理
高可用性99.9% 可用性多站點部署

稽核檢查清單

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# compliance-checklist.yml
compliance_checks:
  - category: "Availability"
    items:
      - check: "Multiple OCSP responders deployed"
        requirement: "CA/B Forum BR 4.10.2"
        status: "compliant"

      - check: "Geographic redundancy"
        requirement: "Internal Policy"
        status: "compliant"

  - category: "Security"
    items:
      - check: "OCSP signing key protected by HSM"
        requirement: "CA/B Forum BR 6.2.7"
        status: "compliant"

      - check: "TLS 1.2+ for all connections"
        requirement: "PCI DSS 4.1"
        status: "compliant"

  - category: "Logging"
    items:
      - check: "All OCSP requests logged"
        requirement: "CA/B Forum BR 5.4.1"
        status: "compliant"

      - check: "Logs retained for 7 years"
        requirement: "CA/B Forum BR 5.4.3"
        status: "compliant"

效能基準

負載測試腳本:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/bin/bash
# OCSP 負載測試

OCSP_URL="http://ocsp.example.com"
CERT_FILE="/tmp/test.crt"
ISSUER_FILE="/tmp/issuer.crt"
CONCURRENT=100
REQUESTS=10000

# 使用 hey 進行負載測試
echo "Starting OCSP load test..."

# 產生 OCSP 請求
openssl ocsp -issuer "$ISSUER_FILE" -cert "$CERT_FILE" \
    -reqout /tmp/ocsp-request.der

# 執行負載測試
hey -n $REQUESTS -c $CONCURRENT \
    -m POST \
    -H "Content-Type: application/ocsp-request" \
    -D /tmp/ocsp-request.der \
    "$OCSP_URL"

效能目標:

指標目標值告警閾值
P50 回應時間< 50ms> 100ms
P95 回應時間< 200ms> 500ms
P99 回應時間< 500ms> 1s
錯誤率< 0.1%> 1%
可用性> 99.95%< 99.5%

總結

建構高可用的憑證驗證服務需要全面考量多個層面:

  1. 架構設計:採用多層負載平衡與地理分散部署,消除單點故障
  2. 快取策略:善用 OCSP Stapling 與多層快取降低延遲
  3. 監控告警:建立完整的可觀測性平台及時發現問題
  4. 災難復原:制定明確的 RTO/RPO 目標與切換程序
  5. 合規安全:遵循 CA/Browser Forum 等業界標準與法規要求

透過本文介紹的架構與配置,您可以建立一個穩定、高效且符合合規要求的憑證驗證服務基礎設施。記得定期進行災難復原演練,確保在真正發生故障時能夠快速恢復服務。

參考資源

comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy