SPIFFE 與 SPIRE 服務身份框架

SPIFFE and SPIRE Service Identity Framework

在現代雲原生架構中,服務之間的安全通訊與身份驗證變得至關重要。SPIFFE(Secure Production Identity Framework For Everyone)與 SPIRE(SPIFFE Runtime Environment)提供了一套標準化的解決方案,用於在動態、異質化的環境中建立和管理服務身份。本文將深入探討這兩個重要的開源專案。

SPIFFE 標準概述

什麼是 SPIFFE?

SPIFFE 是由 CNCF(Cloud Native Computing Foundation)託管的開源標準,旨在為分散式系統中的工作負載提供安全的身份識別機制。它定義了一套標準化的方式來:

  • 識別工作負載:為每個服務提供唯一的身份標識
  • 驗證身份:透過加密方式驗證服務的真實性
  • 建立信任:在不同平台和環境之間建立統一的信任關係

SPIFFE 的核心優勢

  1. 平台無關性:可在 Kubernetes、虛擬機器、裸機等各種環境中運作
  2. 自動化:無需人工介入即可完成身份發放和輪換
  3. 短期憑證:使用短期效期的憑證,降低憑證洩露的風險
  4. 零信任基礎:符合零信任安全模型的核心要求

SPIFFE 信任域(Trust Domain)

信任域是 SPIFFE 中的核心概念,代表一個管理邊界內的身份頒發機構:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
信任域範例:example.org
    ┌────────┴────────┐
    │    SPIRE Server │
    │   (Trust Root)  │
    └────────┬────────┘
    ┌────────┴────────┐
    │                 │
 服務 A            服務 B

SPIFFE ID 與 SVID 格式

SPIFFE ID

SPIFFE ID 是工作負載的唯一識別符,採用 URI 格式:

1
spiffe://信任域/工作負載識別路徑

實際範例:

1
2
3
spiffe://example.org/ns/production/sa/web-frontend
spiffe://example.org/region/us-west/host/database-01
spiffe://example.org/cluster/k8s-prod/ns/default/pod/api-server-abc123

SPIFFE ID 的組成部分

組成部分說明範例
Scheme固定為 spiffespiffe
Trust Domain信任域名稱example.org
Path工作負載路徑/ns/production/sa/web-frontend

SVID(SPIFFE Verifiable Identity Document)

SVID 是 SPIFFE ID 的可驗證表示形式,目前支援兩種格式:

X.509-SVID

最常用的 SVID 格式,將 SPIFFE ID 編碼在 X.509 憑證的 SAN(Subject Alternative Name)URI 欄位中:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 1234567890
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=SPIRE Server CA
        Validity
            Not Before: Jun 28 00:00:00 2025 GMT
            Not After : Jun 28 01:00:00 2025 GMT
        Subject: CN=web-frontend
        Subject Public Key Info:
            ...
        X509v3 extensions:
            X509v3 Subject Alternative Name:
                URI:spiffe://example.org/ns/production/sa/web-frontend

JWT-SVID

適用於 HTTP 和 gRPC 通訊的 JWT 格式:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
  "alg": "RS256",
  "kid": "key-id-12345",
  "typ": "JWT"
}
{
  "aud": ["spiffe://example.org/backend-service"],
  "exp": 1719619200,
  "iat": 1719615600,
  "sub": "spiffe://example.org/ns/production/sa/web-frontend"
}

SPIRE 架構與元件

SPIRE 概述

SPIRE 是 SPIFFE 標準的參考實作,提供了完整的身份管理解決方案。

架構圖

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
┌─────────────────────────────────────────────────────────────┐
                      SPIRE Server                           
  ┌─────────────┐  ┌──────────────┐  ┌───────────────────┐  
    CA Manager    Registration     Node Attestor      
                     API                              
  └─────────────┘  └──────────────┘  └───────────────────┘  
  ┌─────────────┐  ┌──────────────┐  ┌───────────────────┐  
    DataStore     Key Manager     Upstream Authority  
    (SQL/etcd)                                        
  └─────────────┘  └──────────────┘  └───────────────────┘  
└─────────────────────────────┬───────────────────────────────┘
                               Node API
                              
        ┌─────────────────────┼─────────────────────┐
                                                  
                                                  
┌───────────────┐     ┌───────────────┐     ┌───────────────┐
  SPIRE Agent         SPIRE Agent         SPIRE Agent  
   ┌───────┐           ┌───────┐           ┌───────┐   
   Workload          Workload          Workload  
   Attestor          Attestor          Attestor  
   └───────┘           └───────┘           └───────┘   
└───────┬───────┘     └───────┬───────┘     └───────┬───────┘
                                                  
                                                  
   ┌─────────┐           ┌─────────┐           ┌─────────┐
   Workload            Workload            Workload 
      A                   B                   C     
   └─────────┘           └─────────┘           └─────────┘

核心元件說明

SPIRE Server

SPIRE Server 是整個系統的控制平面,負責:

  1. CA 管理:簽發和管理 X.509 憑證
  2. 註冊管理:維護工作負載身份註冊資訊
  3. 節點認證:驗證 SPIRE Agent 的身份
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# SPIRE Server 設定範例
server {
    bind_address = "0.0.0.0"
    bind_port = "8081"
    trust_domain = "example.org"
    data_dir = "/run/spire/data"
    log_level = "INFO"

    ca_key_type = "ec-p256"
    ca_ttl = "24h"
    default_svid_ttl = "1h"
}

plugins {
    DataStore "sql" {
        plugin_data {
            database_type = "postgres"
            connection_string = "dbname=spire host=postgres user=spire password=secret"
        }
    }

    NodeAttestor "k8s_psat" {
        plugin_data {
            clusters = {
                "k8s-cluster" = {
                    service_account_allow_list = ["spire:spire-agent"]
                }
            }
        }
    }

    KeyManager "disk" {
        plugin_data {
            keys_path = "/run/spire/data/keys.json"
        }
    }
}

SPIRE Agent

SPIRE Agent 運行在每個節點上,負責:

  1. 工作負載認證:識別並驗證本地工作負載
  2. SVID 發放:向工作負載提供身份憑證
  3. 憑證快取:管理本地 SVID 快取
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# SPIRE Agent 設定範例
agent {
    data_dir = "/run/spire/agent"
    log_level = "INFO"
    server_address = "spire-server"
    server_port = "8081"
    trust_domain = "example.org"
    socket_path = "/run/spire/sockets/agent.sock"
}

plugins {
    NodeAttestor "k8s_psat" {
        plugin_data {
            cluster = "k8s-cluster"
        }
    }

    KeyManager "memory" {
        plugin_data {}
    }

    WorkloadAttestor "k8s" {
        plugin_data {
            skip_kubelet_verification = true
        }
    }

    WorkloadAttestor "unix" {
        plugin_data {}
    }
}

SPIRE Server 與 Agent 部署

在 Kubernetes 上部署 SPIRE

步驟 1:建立命名空間和 RBAC

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# 建立 SPIRE 命名空間
kubectl create namespace spire

# 套用 RBAC 設定
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spire-server
  namespace: spire
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: spire-server-cluster-role
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["authentication.k8s.io"]
  resources: ["tokenreviews"]
  verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: spire-server-cluster-role-binding
subjects:
- kind: ServiceAccount
  name: spire-server
  namespace: spire
roleRef:
  kind: ClusterRole
  name: spire-server-cluster-role
  apiGroup: rbac.authorization.k8s.io
EOF

步驟 2:部署 SPIRE Server

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# spire-server-deployment.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: spire-server
  namespace: spire
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spire-server
  serviceName: spire-server
  template:
    metadata:
      labels:
        app: spire-server
    spec:
      serviceAccountName: spire-server
      containers:
      - name: spire-server
        image: ghcr.io/spiffe/spire-server:1.9.0
        args:
        - -config
        - /run/spire/config/server.conf
        ports:
        - name: grpc
          containerPort: 8081
        volumeMounts:
        - name: spire-config
          mountPath: /run/spire/config
          readOnly: true
        - name: spire-data
          mountPath: /run/spire/data
        livenessProbe:
          exec:
            command:
            - /opt/spire/bin/spire-server
            - healthcheck
          initialDelaySeconds: 15
          periodSeconds: 60
        readinessProbe:
          exec:
            command:
            - /opt/spire/bin/spire-server
            - healthcheck
          initialDelaySeconds: 5
          periodSeconds: 10
      volumes:
      - name: spire-config
        configMap:
          name: spire-server-config
  volumeClaimTemplates:
  - metadata:
      name: spire-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: spire-server
  namespace: spire
spec:
  type: ClusterIP
  ports:
  - name: grpc
    port: 8081
    targetPort: 8081
  selector:
    app: spire-server
1
2
# 套用 SPIRE Server 部署
kubectl apply -f spire-server-deployment.yaml

步驟 3:部署 SPIRE Agent

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# spire-agent-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: spire-agent
  namespace: spire
spec:
  selector:
    matchLabels:
      app: spire-agent
  template:
    metadata:
      labels:
        app: spire-agent
    spec:
      serviceAccountName: spire-agent
      hostPID: true
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      initContainers:
      - name: init
        image: ghcr.io/spiffe/spire-agent:1.9.0
        args:
        - -c
        - |
          /opt/spire/bin/spire-agent healthcheck -socketPath /run/spire/sockets/agent.sock || true          
        command:
        - /bin/sh
        volumeMounts:
        - name: spire-agent-socket
          mountPath: /run/spire/sockets
      containers:
      - name: spire-agent
        image: ghcr.io/spiffe/spire-agent:1.9.0
        args:
        - -config
        - /run/spire/config/agent.conf
        volumeMounts:
        - name: spire-config
          mountPath: /run/spire/config
          readOnly: true
        - name: spire-agent-socket
          mountPath: /run/spire/sockets
        - name: spire-token
          mountPath: /var/run/secrets/tokens
        livenessProbe:
          exec:
            command:
            - /opt/spire/bin/spire-agent
            - healthcheck
            - -socketPath
            - /run/spire/sockets/agent.sock
          initialDelaySeconds: 15
          periodSeconds: 60
        readinessProbe:
          exec:
            command:
            - /opt/spire/bin/spire-agent
            - healthcheck
            - -socketPath
            - /run/spire/sockets/agent.sock
          initialDelaySeconds: 5
          periodSeconds: 10
      volumes:
      - name: spire-config
        configMap:
          name: spire-agent-config
      - name: spire-agent-socket
        hostPath:
          path: /run/spire/sockets
          type: DirectoryOrCreate
      - name: spire-token
        projected:
          sources:
          - serviceAccountToken:
              path: spire-agent
              expirationSeconds: 7200
              audience: spire-server
1
2
# 套用 SPIRE Agent 部署
kubectl apply -f spire-agent-daemonset.yaml

步驟 4:驗證部署

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 檢查 SPIRE Server 狀態
kubectl -n spire get pods -l app=spire-server

# 檢查 SPIRE Agent 狀態
kubectl -n spire get pods -l app=spire-agent

# 查看 SPIRE Server 日誌
kubectl -n spire logs -l app=spire-server

# 執行健康檢查
kubectl -n spire exec -it spire-server-0 -- \
    /opt/spire/bin/spire-server healthcheck

使用 Helm 部署

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 新增 SPIRE Helm 倉庫
helm repo add spiffe https://spiffe.github.io/helm-charts/
helm repo update

# 安裝 SPIRE
helm install spire spiffe/spire \
    --namespace spire \
    --create-namespace \
    --set global.spire.trustDomain=example.org \
    --set global.spire.clusterName=k8s-cluster

Workload Attestation 機制

什麼是 Workload Attestation?

Workload Attestation 是 SPIRE 用來驗證工作負載身份的機制。它透過收集工作負載的各種屬性來確定其身份。

Attestation 流程

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
┌─────────────────────────────────────────────────────────────┐
│                    Attestation 流程                          │
└─────────────────────────────────────────────────────────────┘

  工作負載                SPIRE Agent              SPIRE Server
      │                       │                         │
      │  1. 請求 SVID         │                         │
      │──────────────────────>│                         │
      │                       │                         │
      │  2. 收集工作負載屬性   │                         │
      │   (PID, UID, K8s Pod) │                         │
      │<──────────────────────│                         │
      │                       │                         │
      │                       │  3. 查詢匹配的註冊項目    │
      │                       │────────────────────────>│
      │                       │                         │
      │                       │  4. 返回 SPIFFE ID      │
      │                       │<────────────────────────│
      │                       │                         │
      │                       │  5. 簽發 SVID           │
      │                       │────────────────────────>│
      │                       │                         │
      │  6. 返回 SVID         │<────────────────────────│
      │<──────────────────────│                         │
      │                       │                         │

常見的 Workload Attestor 類型

1. Kubernetes Workload Attestor

識別 Kubernetes Pod 的屬性:

1
2
3
4
5
6
7
8
# 可用的 Selector 類型
k8s:ns:namespace                    # 命名空間
k8s:sa:service-account              # 服務帳號
k8s:pod-label:key:value             # Pod 標籤
k8s:pod-name:name                   # Pod 名稱
k8s:container-name:name             # 容器名稱
k8s:container-image:image           # 容器映像
k8s:node-name:name                  # 節點名稱

2. Unix Workload Attestor

識別 Unix 程序屬性:

1
2
3
4
5
6
7
# 可用的 Selector 類型
unix:uid:1000                       # 使用者 ID
unix:gid:1000                       # 群組 ID
unix:user:username                  # 使用者名稱
unix:group:groupname                # 群組名稱
unix:path:/usr/bin/myapp            # 執行檔路徑
unix:sha256:abc123...               # 執行檔 SHA256 雜湊

3. Docker Workload Attestor

識別 Docker 容器屬性:

1
2
3
4
# 可用的 Selector 類型
docker:label:key:value              # 容器標籤
docker:image_id:id                  # 映像 ID
docker:env:key:value                # 環境變數

建立工作負載註冊

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# 為 Kubernetes 工作負載建立註冊
kubectl exec -n spire spire-server-0 -- \
    /opt/spire/bin/spire-server entry create \
    -spiffeID spiffe://example.org/ns/production/sa/web-frontend \
    -parentID spiffe://example.org/spire/agent/k8s_psat/k8s-cluster/$(kubectl get node -o jsonpath='{.items[0].metadata.uid}') \
    -selector k8s:ns:production \
    -selector k8s:sa:web-frontend

# 為特定 Pod 標籤建立註冊
kubectl exec -n spire spire-server-0 -- \
    /opt/spire/bin/spire-server entry create \
    -spiffeID spiffe://example.org/app/api-server \
    -parentID spiffe://example.org/spire/agent/k8s_psat/k8s-cluster \
    -selector k8s:pod-label:app:api-server \
    -selector k8s:ns:default

# 查看所有註冊項目
kubectl exec -n spire spire-server-0 -- \
    /opt/spire/bin/spire-server entry show

使用 Kubernetes SPIFFE Controller

透過 CRD 管理註冊項目:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# ClusterSPIFFEID 範例
apiVersion: spire.spiffe.io/v1alpha1
kind: ClusterSPIFFEID
metadata:
  name: web-frontend
spec:
  spiffeIDTemplate: "spiffe://{{ .TrustDomain }}/ns/{{ .PodMeta.Namespace }}/sa/{{ .PodSpec.ServiceAccountName }}"
  podSelector:
    matchLabels:
      app: web-frontend
  namespaceSelector:
    matchLabels:
      environment: production

與 Kubernetes 整合

使用 SPIFFE CSI Driver

SPIFFE CSI Driver 讓 Pod 可以直接掛載 SVID:

1
2
3
# 安裝 SPIFFE CSI Driver
helm install spiffe-csi-driver spiffe/spiffe-csi-driver \
    --namespace spire
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# 使用 CSI Driver 的 Pod 範例
apiVersion: v1
kind: Pod
metadata:
  name: my-workload
  namespace: production
spec:
  serviceAccountName: web-frontend
  containers:
  - name: app
    image: my-app:latest
    volumeMounts:
    - name: spiffe-workload-api
      mountPath: /spiffe-workload-api
      readOnly: true
    env:
    - name: SPIFFE_ENDPOINT_SOCKET
      value: unix:///spiffe-workload-api/spire-agent.sock
  volumes:
  - name: spiffe-workload-api
    csi:
      driver: "csi.spiffe.io"
      readOnly: true

使用 Workload API

在應用程式中使用 SPIFFE Workload API:

Go 範例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
package main

import (
    "context"
    "log"
    "time"

    "github.com/spiffe/go-spiffe/v2/spiffeid"
    "github.com/spiffe/go-spiffe/v2/spiffetls"
    "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig"
    "github.com/spiffe/go-spiffe/v2/workloadapi"
)

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    // 建立 Workload API 客戶端
    client, err := workloadapi.New(ctx, workloadapi.WithAddr("unix:///spiffe-workload-api/spire-agent.sock"))
    if err != nil {
        log.Fatalf("無法建立 workload API 客戶端: %v", err)
    }
    defer client.Close()

    // 取得 X.509 SVID
    x509SVID, err := client.FetchX509SVID(ctx)
    if err != nil {
        log.Fatalf("無法取得 X.509 SVID: %v", err)
    }

    log.Printf("SPIFFE ID: %s", x509SVID.ID)
    log.Printf("憑證有效期至: %s", x509SVID.Certificates[0].NotAfter)

    // 建立 mTLS 伺服器
    serverID := spiffeid.RequireFromString("spiffe://example.org/server")

    tlsConfig := tlsconfig.MTLSServerConfig(
        x509SVID,
        x509SVID.Bundles,
        tlsconfig.AuthorizeID(serverID),
    )

    // 使用 tlsConfig 建立 HTTPS 伺服器...
}

Python 範例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import logging
from pyspiffe.spiffe_id.spiffe_id import SpiffeId
from pyspiffe.workloadapi.default_workload_api_client import DefaultWorkloadApiClient
from pyspiffe.workloadapi.x509_context import X509Context

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def main():
    # 建立 Workload API 客戶端
    socket_path = "unix:///spiffe-workload-api/spire-agent.sock"

    with DefaultWorkloadApiClient(socket_path) as client:
        # 取得 X.509 Context
        x509_context: X509Context = client.fetch_x509_context()

        # 取得預設 SVID
        svid = x509_context.default_svid()

        logger.info(f"SPIFFE ID: {svid.spiffe_id}")
        logger.info(f"憑證: {svid.cert_chain[0].subject}")

        # 取得信任套件
        bundles = x509_context.x509_bundle_set()
        logger.info(f"信任域數量: {len(bundles)}")

if __name__ == "__main__":
    main()

Kubernetes 服務帳號與 SPIFFE ID 對應

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# 自動產生 SPIFFE ID 的設定
apiVersion: spire.spiffe.io/v1alpha1
kind: ClusterSPIFFEID
metadata:
  name: default-spiffe-id
spec:
  className: "spire-spiffe-class"
  spiffeIDTemplate: >-
    spiffe://{{ .TrustDomain }}/k8s/{{ .PodMeta.Namespace }}/{{ .PodSpec.ServiceAccountName }}    
  podSelector: {}
  namespaceSelector: {}
  dnsNameTemplates:
    - "{{ .PodMeta.Name }}.{{ .PodMeta.Namespace }}.svc.cluster.local"
  workloadSelectorTemplates:
    - "k8s:ns:{{ .PodMeta.Namespace }}"
    - "k8s:sa:{{ .PodSpec.ServiceAccountName }}"

與 Envoy/Istio 整合

SPIFFE 與 Envoy 整合

Envoy 原生支援 SDS(Secret Discovery Service)來取得 SPIFFE 憑證:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
# Envoy 設定範例
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 8443
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: backend
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: backend_cluster
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
          common_tls_context:
            tls_certificate_sds_secret_configs:
            - name: "spiffe://example.org/ns/production/sa/frontend"
              sds_config:
                resource_api_version: V3
                api_config_source:
                  api_type: GRPC
                  transport_api_version: V3
                  grpc_services:
                  - envoy_grpc:
                      cluster_name: spire_agent
            validation_context_sds_secret_config:
              name: "spiffe://example.org"
              sds_config:
                resource_api_version: V3
                api_config_source:
                  api_type: GRPC
                  transport_api_version: V3
                  grpc_services:
                  - envoy_grpc:
                      cluster_name: spire_agent
            tls_params:
              tls_minimum_protocol_version: TLSv1_3

  clusters:
  - name: spire_agent
    connect_timeout: 1s
    type: STATIC
    lb_policy: ROUND_ROBIN
    typed_extension_protocol_options:
      envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
        "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
        explicit_http_config:
          http2_protocol_options: {}
    load_assignment:
      cluster_name: spire_agent
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              pipe:
                path: /run/spire/sockets/agent.sock

  - name: backend_cluster
    connect_timeout: 5s
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: backend_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: backend-service
                port_value: 8080
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
        common_tls_context:
          tls_certificate_sds_secret_configs:
          - name: "spiffe://example.org/ns/production/sa/frontend"
            sds_config:
              resource_api_version: V3
              api_config_source:
                api_type: GRPC
                transport_api_version: V3
                grpc_services:
                - envoy_grpc:
                    cluster_name: spire_agent
          validation_context_sds_secret_config:
            name: "spiffe://example.org"
            sds_config:
              resource_api_version: V3
              api_config_source:
                api_type: GRPC
                transport_api_version: V3
                grpc_services:
                - envoy_grpc:
                    cluster_name: spire_agent

與 Istio 整合

Istio 可以使用 SPIRE 作為身份提供者:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Istio 設定使用 SPIRE
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: istio-spire
spec:
  profile: default
  meshConfig:
    trustDomain: example.org
  values:
    global:
      caAddress: spire-server.spire:8081
    pilot:
      env:
        ENABLE_SPIFFE_WORKLOAD_API: "true"
        SPIFFE_ENDPOINT_SOCKET: "unix:///run/spire/sockets/agent.sock"
  components:
    ingressGateways:
    - name: istio-ingressgateway
      enabled: true
      k8s:
        overlays:
        - kind: Deployment
          name: istio-ingressgateway
          patches:
          - path: spec.template.spec.volumes[100]
            value:
              name: spire-agent-socket
              hostPath:
                path: /run/spire/sockets
                type: Directory
          - path: spec.template.spec.containers[0].volumeMounts[100]
            value:
              name: spire-agent-socket
              mountPath: /run/spire/sockets
              readOnly: true

跨叢集信任聯邦

在多叢集環境中建立 SPIFFE 信任聯邦:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# 在叢集 A 匯出信任套件
kubectl exec -n spire spire-server-0 -- \
    /opt/spire/bin/spire-server bundle show -format spiffe > cluster-a-bundle.json

# 在叢集 B 設定信任叢集 A
kubectl exec -n spire spire-server-0 -- \
    /opt/spire/bin/spire-server bundle set \
    -id spiffe://cluster-a.example.org \
    -path /tmp/cluster-a-bundle.json

# 建立 Federation Relationship
kubectl exec -n spire spire-server-0 -- \
    /opt/spire/bin/spire-server federation create \
    -bundleEndpointURL https://spire-server.cluster-a.example.org:8443 \
    -bundleEndpointProfile https_spiffe \
    -trustDomain cluster-a.example.org \
    -trustDomainBundleFormat spiffe

最佳實務與安全考量

憑證生命週期管理

1. 設定適當的 TTL

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# SPIRE Server 設定
server {
    # CA 憑證有效期(建議 24-72 小時)
    ca_ttl = "24h"

    # SVID 預設有效期(建議 1 小時或更短)
    default_svid_ttl = "1h"

    # JWT-SVID 有效期
    jwt_issuer = "spire-server"
}

2. 憑證輪換策略

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// 在應用程式中監聽憑證更新
package main

import (
    "context"
    "log"

    "github.com/spiffe/go-spiffe/v2/workloadapi"
)

func main() {
    ctx := context.Background()

    // 建立持續監聽的客戶端
    client, err := workloadapi.New(ctx)
    if err != nil {
        log.Fatal(err)
    }
    defer client.Close()

    // 持續監聽 X.509 Context 更新
    err = client.WatchX509Context(ctx, &x509Watcher{})
    if err != nil {
        log.Fatal(err)
    }
}

type x509Watcher struct{}

func (w *x509Watcher) OnX509ContextUpdate(x509Context *workloadapi.X509Context) {
    svid := x509Context.DefaultSVID()
    log.Printf("收到新的 SVID: %s", svid.ID)
    log.Printf("新憑證有效期至: %s", svid.Certificates[0].NotAfter)

    // 更新應用程式的 TLS 設定
    // updateTLSConfig(svid)
}

func (w *x509Watcher) OnX509ContextWatchError(err error) {
    log.Printf("監聽錯誤: %v", err)
}

安全強化建議

1. Node Attestation 最佳實務

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# 使用 Projected Service Account Token(PSAT)
plugins {
    NodeAttestor "k8s_psat" {
        plugin_data {
            clusters = {
                "production" = {
                    # 明確允許的服務帳號
                    service_account_allow_list = ["spire:spire-agent"]
                    # 使用特定的 audience
                    audience = ["spire-server"]
                }
            }
        }
    }
}

2. 工作負載選擇器強化

1
2
3
4
5
6
7
8
9
# 使用多個選擇器增加安全性
kubectl exec -n spire spire-server-0 -- \
    /opt/spire/bin/spire-server entry create \
    -spiffeID spiffe://example.org/critical-service \
    -parentID spiffe://example.org/spire/agent/k8s_psat/production \
    -selector k8s:ns:production \
    -selector k8s:sa:critical-service \
    -selector k8s:pod-label:security-level:high \
    -selector k8s:container-image:registry.example.org/critical-service@sha256:abc123

3. 信任域隔離

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
生產環境信任域:prod.example.org
    ├── spiffe://prod.example.org/payment-service
    ├── spiffe://prod.example.org/user-service
    └── spiffe://prod.example.org/inventory-service

測試環境信任域:staging.example.org
    ├── spiffe://staging.example.org/payment-service
    ├── spiffe://staging.example.org/user-service
    └── spiffe://staging.example.org/inventory-service

監控與可觀測性

1. SPIRE 指標監控

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: spire-server
  namespace: spire
spec:
  selector:
    matchLabels:
      app: spire-server
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

2. 重要指標

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# SPIRE Server 指標
spire_server_ca_manager_x509_ca_rotate_total           # CA 輪換次數
spire_server_datastore_bundle_count                     # 信任套件數量
spire_server_svid_issued_total                          # 簽發的 SVID 總數
spire_server_node_attestation_duration_seconds          # 節點認證耗時

# SPIRE Agent 指標
spire_agent_svid_rotations_total                        # SVID 輪換次數
spire_agent_workload_attestation_duration_seconds       # 工作負載認證耗時
spire_agent_delegated_identity_api_connection_count     # API 連線數

3. 日誌設定

1
2
3
4
5
6
7
8
9
# 設定結構化日誌
server {
    log_level = "INFO"
    log_format = "json"
}

# 在 Kubernetes 中收集日誌
kubectl logs -n spire -l app=spire-server -f | \
    jq 'select(.level == "error" or .level == "warn")'

災難復原

1. 備份 SPIRE Server 資料

1
2
3
4
5
6
7
8
# 備份 DataStore(以 PostgreSQL 為例)
pg_dump -h postgres -U spire -d spire > spire_backup_$(date +%Y%m%d).sql

# 備份 CA 金鑰
kubectl exec -n spire spire-server-0 -- \
    tar czf /tmp/keys_backup.tar.gz /run/spire/data/keys.json

kubectl cp spire/spire-server-0:/tmp/keys_backup.tar.gz ./keys_backup.tar.gz

2. 高可用性部署

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# SPIRE Server 高可用設定
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: spire-server
  namespace: spire
spec:
  replicas: 3  # 部署多個副本
  selector:
    matchLabels:
      app: spire-server
  serviceName: spire-server
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: spire-server
            topologyKey: "kubernetes.io/hostname"
      # ... 其他設定

常見問題排解

1. 工作負載無法取得 SVID

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 檢查 SPIRE Agent 連線
kubectl exec -n spire -l app=spire-agent -- \
    /opt/spire/bin/spire-agent healthcheck \
    -socketPath /run/spire/sockets/agent.sock

# 檢查工作負載註冊
kubectl exec -n spire spire-server-0 -- \
    /opt/spire/bin/spire-server entry show \
    -selector k8s:ns:YOUR_NAMESPACE

# 檢查 Agent 日誌
kubectl logs -n spire -l app=spire-agent | grep -i error

2. 憑證驗證失敗

1
2
3
4
5
6
7
# 驗證信任套件
kubectl exec -n spire spire-server-0 -- \
    /opt/spire/bin/spire-server bundle show

# 測試 mTLS 連線
openssl s_client -connect service:443 \
    -cert svid.pem -key key.pem -CAfile bundle.pem

總結

SPIFFE 和 SPIRE 為現代雲原生架構提供了強大的服務身份解決方案。透過本文的介紹,您應該已經了解:

  1. SPIFFE 標準:提供統一的服務身份識別框架
  2. SVID 格式:X.509-SVID 和 JWT-SVID 的使用場景
  3. SPIRE 架構:Server 和 Agent 的角色與互動
  4. 部署策略:在 Kubernetes 環境中的最佳部署方式
  5. Workload Attestation:自動化的身份驗證機制
  6. 服務網格整合:與 Envoy 和 Istio 的無縫協作
  7. 安全最佳實務:確保生產環境的安全性

採用 SPIFFE/SPIRE 可以顯著提升您的微服務架構安全性,實現真正的零信任網路架構。建議從小規模試點開始,逐步擴展到整個組織的服務身份管理。

參考資源

comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy