Terraform Stacks 大規模部署管理

Terraform Stacks Large Scale Deployment Management

Terraform Stacks 是 HashiCorp 在 2024 年推出的新功能,專為大規模基礎設施部署而設計。它提供了一種宣告式的方式來管理多個 Terraform 配置,實現跨區域、跨環境的一致性部署。本文將深入探討 Terraform Stacks 的核心概念、配置語法及最佳實務。

Terraform Stacks 概念

什麼是 Terraform Stacks?

Terraform Stacks 是 HCP Terraform(原 Terraform Cloud)的進階功能,允許使用者將多個 Terraform 配置組織成一個協調的部署單元。每個 Stack 可以包含多個 Component,並支援跨多個環境或區域進行部署。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
┌─────────────────────────────────────────────────────────────────┐
│                      Terraform Stack                             │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                     Components                               ││
│  │  ┌───────────┐  ┌───────────┐  ┌───────────┐               ││
│  │  │    VPC    │  │    EKS    │  │    RDS    │               ││
│  │  │ Component │  │ Component │  │ Component │               ││
│  │  └───────────┘  └───────────┘  └───────────┘               ││
│  └─────────────────────────────────────────────────────────────┘│
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                    Deployments                               ││
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         ││
│  │  │ us-east-1   │  │ us-west-2   │  │ eu-west-1   │         ││
│  │  │ production  │  │ production  │  │ production  │         ││
│  │  └─────────────┘  └─────────────┘  └─────────────┘         ││
│  └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘

核心元素

元素說明用途
Stack最上層的組織單位定義整體基礎設施架構
Component可重複使用的基礎設施區塊封裝特定的 Terraform 配置
DeploymentStack 的特定實例定義特定環境或區域的部署
Orchestration Rule部署協調規則控制部署順序和依賴關係

Stack 專案結構

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
my-infrastructure-stack/
├── components/
   ├── networking/
      ├── main.tf
      ├── variables.tf
      └── outputs.tf
   ├── compute/
      ├── main.tf
      ├── variables.tf
      └── outputs.tf
   └── database/
       ├── main.tf
       ├── variables.tf
       └── outputs.tf
├── deployments.tfdeploy.hcl
└── components.tfstack.hcl

與傳統 Workspace 比較

傳統 Workspace 模式

在傳統的 Terraform 工作流程中,我們使用 Workspace 來管理不同環境:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# 傳統 Workspace 方式
terraform {
  cloud {
    organization = "my-organization"

    workspaces {
      name = "production-vpc"
    }
  }
}

# 需要為每個環境/區域建立獨立的 Workspace
# production-vpc-us-east-1
# production-vpc-us-west-2
# production-vpc-eu-west-1
# staging-vpc-us-east-1
# ... 更多 Workspace

Terraform Stacks 模式

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Stacks 方式 - 一次定義,多處部署
# deployments.tfdeploy.hcl

deployment "production-us-east-1" {
  inputs = {
    region      = "us-east-1"
    environment = "production"
    vpc_cidr    = "10.0.0.0/16"
  }
}

deployment "production-us-west-2" {
  inputs = {
    region      = "us-west-2"
    environment = "production"
    vpc_cidr    = "10.1.0.0/16"
  }
}

deployment "production-eu-west-1" {
  inputs = {
    region      = "eu-west-1"
    environment = "production"
    vpc_cidr    = "10.2.0.0/16"
  }
}

功能比較表

功能WorkspaceStacks
多環境管理需建立多個 Workspace單一 Stack,多個 Deployment
跨區域部署手動協調內建支援
組件重用使用 Module使用 Component
依賴管理Run TriggersOrchestration Rules
狀態管理每個 Workspace 獨立統一管理
部署協調手動或腳本宣告式自動化
版本控制個別管理統一版本

何時選擇 Stacks

使用 Terraform Stacks 的適合場景:

  • 需要在多個區域部署相同的基礎設施
  • 多環境(開發、測試、生產)的一致性部署
  • 複雜的依賴關係需要協調
  • 大規模企業級基礎設施管理
  • 需要統一的部署策略和變更管理

Stack 配置語法

tfstack.hcl 檔案格式

Terraform Stacks 使用 .tfstack.hcl 副檔名來定義 Stack 配置。這個檔案採用 HCL(HashiCorp Configuration Language)語法。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# components.tfstack.hcl

# 定義必要的 Provider
required_providers {
  aws = {
    source  = "hashicorp/aws"
    version = "~> 5.0"
  }
}

# 定義 Provider 配置
provider "aws" "this" {
  config {
    region = var.region

    assume_role_with_web_identity {
      role_arn           = var.role_arn
      web_identity_token = var.identity_token
    }
  }
}

# 定義輸入變數
variable "region" {
  type        = string
  description = "AWS region for deployment"
}

variable "environment" {
  type        = string
  description = "Environment name (dev, staging, production)"
}

variable "vpc_cidr" {
  type        = string
  description = "CIDR block for VPC"
}

variable "role_arn" {
  type        = string
  description = "IAM role ARN for authentication"
}

variable "identity_token" {
  type      = string
  ephemeral = true
}

Provider 配置

Stacks 中的 Provider 配置與傳統 Terraform 略有不同:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# 多區域 Provider 配置
provider "aws" "primary" {
  config {
    region = var.primary_region

    assume_role_with_web_identity {
      role_arn           = var.role_arn
      web_identity_token = var.identity_token
    }
  }
}

provider "aws" "secondary" {
  config {
    region = var.secondary_region

    assume_role_with_web_identity {
      role_arn           = var.role_arn
      web_identity_token = var.identity_token
    }
  }
}

# 使用 for_each 動態建立多個 Provider
provider "aws" "regional" {
  for_each = var.regions

  config {
    region = each.value

    assume_role_with_web_identity {
      role_arn           = var.role_arn
      web_identity_token = var.identity_token
    }
  }
}

Identity Token 配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# 設定 Identity Token 來源
identity_token "aws" {
  audience = ["aws.workload.identity"]
}

# 在 Provider 中使用
provider "aws" "this" {
  config {
    region = var.region

    assume_role_with_web_identity {
      role_arn           = var.role_arn
      web_identity_token = identity_token.aws.jwt
    }
  }
}

Component 定義

什麼是 Component

Component 是 Stack 中的基本建構區塊,每個 Component 對應一個 Terraform Module 或 Root Module。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# components.tfstack.hcl

# 定義 VPC Component
component "vpc" {
  source = "./components/networking"

  inputs = {
    vpc_name   = "main-${var.environment}"
    vpc_cidr   = var.vpc_cidr
    azs        = ["${var.region}a", "${var.region}b", "${var.region}c"]

    enable_nat_gateway = true
    single_nat_gateway = var.environment != "production"

    tags = {
      Environment = var.environment
      ManagedBy   = "terraform-stacks"
    }
  }

  providers = {
    aws = provider.aws.this
  }
}

# 定義 EKS Component
component "eks" {
  source = "./components/compute"

  inputs = {
    cluster_name    = "main-${var.environment}"
    cluster_version = "1.29"

    vpc_id          = component.vpc.vpc_id
    private_subnets = component.vpc.private_subnet_ids

    node_groups = {
      general = {
        instance_types = ["t3.medium", "t3.large"]
        min_size       = 2
        max_size       = 10
        desired_size   = 3
      }
    }
  }

  providers = {
    aws = provider.aws.this
  }
}

# 定義 RDS Component
component "database" {
  source = "./components/database"

  inputs = {
    identifier        = "main-${var.environment}"
    engine            = "postgres"
    engine_version    = "15.4"
    instance_class    = var.environment == "production" ? "db.r6g.large" : "db.t3.medium"

    vpc_id            = component.vpc.vpc_id
    subnet_ids        = component.vpc.private_subnet_ids
    security_group_id = component.vpc.database_security_group_id

    multi_az          = var.environment == "production"
  }

  providers = {
    aws = provider.aws.this
  }
}

Component 之間的依賴

Component 可以透過引用其他 Component 的輸出來建立依賴關係:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# 自動建立依賴關係
component "application" {
  source = "./components/application"

  inputs = {
    # 引用 VPC Component 的輸出
    vpc_id     = component.vpc.vpc_id
    subnet_ids = component.vpc.private_subnet_ids

    # 引用 EKS Component 的輸出
    cluster_endpoint = component.eks.cluster_endpoint
    cluster_ca_data  = component.eks.cluster_ca_certificate

    # 引用 Database Component 的輸出
    database_endpoint = component.database.endpoint
    database_port     = component.database.port
  }

  providers = {
    aws        = provider.aws.this
    kubernetes = provider.kubernetes.this
  }
}

使用遠端 Module 作為 Component

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# 使用 Terraform Registry Module
component "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.0.0"

  inputs = {
    name = "main-${var.environment}"
    cidr = var.vpc_cidr

    azs             = ["${var.region}a", "${var.region}b", "${var.region}c"]
    private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
    public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

    enable_nat_gateway = true
    enable_vpn_gateway = false
  }

  providers = {
    aws = provider.aws.this
  }
}

# 使用私有 Registry Module
component "eks" {
  source  = "app.terraform.io/my-org/eks-cluster/aws"
  version = "2.1.0"

  inputs = {
    cluster_name = "main-${var.environment}"
    vpc_id       = component.vpc.vpc_id
  }

  providers = {
    aws = provider.aws.this
  }
}

for_each 動態 Component

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# 動態建立多個 Component
variable "microservices" {
  type = map(object({
    port     = number
    replicas = number
    memory   = string
  }))
}

component "microservice" {
  source   = "./components/microservice"
  for_each = var.microservices

  inputs = {
    name          = each.key
    port          = each.value.port
    replicas      = each.value.replicas
    memory_limit  = each.value.memory

    cluster_name  = component.eks.cluster_name
    namespace     = "applications"
  }

  providers = {
    kubernetes = provider.kubernetes.this
  }
}

Deployment 管理

tfdeploy.hcl 檔案格式

Deployment 定義在 .tfdeploy.hcl 檔案中,用於指定 Stack 的具體實例:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# deployments.tfdeploy.hcl

# Identity Token 設定
identity_token "aws" {
  audience = ["aws.workload.identity"]
}

# 開發環境部署
deployment "development" {
  inputs = {
    region         = "us-east-1"
    environment    = "development"
    vpc_cidr       = "10.10.0.0/16"
    role_arn       = "arn:aws:iam::111111111111:role/terraform-stacks-dev"
    identity_token = identity_token.aws.jwt
  }
}

# 預備環境部署
deployment "staging" {
  inputs = {
    region         = "us-east-1"
    environment    = "staging"
    vpc_cidr       = "10.20.0.0/16"
    role_arn       = "arn:aws:iam::222222222222:role/terraform-stacks-staging"
    identity_token = identity_token.aws.jwt
  }
}

# 生產環境 - 多區域部署
deployment "production-us-east-1" {
  inputs = {
    region         = "us-east-1"
    environment    = "production"
    vpc_cidr       = "10.0.0.0/16"
    role_arn       = "arn:aws:iam::333333333333:role/terraform-stacks-prod"
    identity_token = identity_token.aws.jwt
  }
}

deployment "production-us-west-2" {
  inputs = {
    region         = "us-west-2"
    environment    = "production"
    vpc_cidr       = "10.1.0.0/16"
    role_arn       = "arn:aws:iam::333333333333:role/terraform-stacks-prod"
    identity_token = identity_token.aws.jwt
  }
}

deployment "production-eu-west-1" {
  inputs = {
    region         = "eu-west-1"
    environment    = "production"
    vpc_cidr       = "10.2.0.0/16"
    role_arn       = "arn:aws:iam::333333333333:role/terraform-stacks-prod-eu"
    identity_token = identity_token.aws.jwt
  }
}

Orchestration Rules

Orchestration Rules 定義了 Deployment 之間的順序和依賴關係:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# deployments.tfdeploy.hcl

# 定義部署順序規則
orchestrate "auto_approve" "development" {
  check {
    # 開發環境自動批准
    condition = context.plan.deployment == deployment.development
    reason    = "Auto-approve development deployments"
  }
}

orchestrate "auto_approve" "staging_after_dev" {
  check {
    # Staging 在開發環境成功後自動部署
    condition = context.plan.deployment == deployment.staging
    reason    = "Auto-approve staging after development success"
  }

  depends_on = [deployment.development]
}

# 生產環境需要手動批准
orchestrate "manual_approval" "production" {
  check {
    condition = (
      context.plan.deployment == deployment.production-us-east-1 ||
      context.plan.deployment == deployment.production-us-west-2 ||
      context.plan.deployment == deployment.production-eu-west-1
    )
    reason = "Production deployments require manual approval"
  }

  # 所有生產區域依賴 staging
  depends_on = [deployment.staging]
}

# 滾動部署策略
orchestrate "rolling" "production_regions" {
  check {
    condition = (
      context.plan.deployment == deployment.production-us-west-2 ||
      context.plan.deployment == deployment.production-eu-west-1
    )
    reason = "Rolling deployment for secondary regions"
  }

  # us-west-2 和 eu-west-1 在 us-east-1 成功後依序部署
  depends_on = [deployment.production-us-east-1]
}

部署計畫與執行

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 檢視所有 Deployment 的計畫
tfstacks plan

# 僅針對特定 Deployment 執行計畫
tfstacks plan -deployment=development

# 執行所有 Deployment
tfstacks apply

# 執行特定 Deployment
tfstacks apply -deployment=development

# 銷毀特定 Deployment
tfstacks destroy -deployment=development

HCP Terraform 整合

在 HCP Terraform 中管理 Stacks:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# 透過 API 或 UI 觸發 Stack 操作
# POST /api/v2/stacks/{stack_id}/plans

# API 請求範例
{
  "data": {
    "type": "stack-plans",
    "attributes": {
      "operation": "plan"
    },
    "relationships": {
      "deployments": {
        "data": [
          { "type": "stack-deployments", "id": "deployment-abc123" }
        ]
      }
    }
  }
}

變數與輸出值

輸入變數定義

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# components.tfstack.hcl

# 基本變數類型
variable "region" {
  type        = string
  description = "AWS region for deployment"
}

variable "environment" {
  type        = string
  description = "Environment name"

  validation {
    condition     = contains(["development", "staging", "production"], var.environment)
    error_message = "Environment must be development, staging, or production."
  }
}

# 複雜變數類型
variable "node_pools" {
  type = map(object({
    instance_type = string
    min_size      = number
    max_size      = number
    desired_size  = number
    labels        = map(string)
    taints = list(object({
      key    = string
      value  = string
      effect = string
    }))
  }))

  default = {
    general = {
      instance_type = "t3.medium"
      min_size      = 1
      max_size      = 5
      desired_size  = 2
      labels        = {}
      taints        = []
    }
  }
}

# Ephemeral 變數(不會儲存在狀態中)
variable "identity_token" {
  type      = string
  ephemeral = true
}

variable "database_password" {
  type      = string
  sensitive = true
  ephemeral = true
}

輸出值定義

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# components.tfstack.hcl

# 定義 Stack 輸出
output "vpc_id" {
  type        = string
  description = "The ID of the VPC"
  value       = component.vpc.vpc_id
}

output "eks_cluster_endpoint" {
  type        = string
  description = "EKS cluster API endpoint"
  value       = component.eks.cluster_endpoint
}

output "database_endpoint" {
  type        = string
  description = "RDS database endpoint"
  value       = component.database.endpoint
  sensitive   = true
}

# 輸出複雜結構
output "subnet_ids" {
  type = object({
    public  = list(string)
    private = list(string)
  })
  value = {
    public  = component.vpc.public_subnet_ids
    private = component.vpc.private_subnet_ids
  }
}

# 使用 for_each Component 的輸出
output "microservice_endpoints" {
  type = map(string)
  value = {
    for name, svc in component.microservice : name => svc.endpoint
  }
}

在 Deployment 中傳遞變數

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# deployments.tfdeploy.hcl

locals {
  common_tags = {
    Project   = "main-infrastructure"
    ManagedBy = "terraform-stacks"
  }

  # 環境特定配置
  env_config = {
    development = {
      instance_class = "db.t3.medium"
      multi_az       = false
      backup_retention = 7
    }
    staging = {
      instance_class = "db.t3.large"
      multi_az       = false
      backup_retention = 14
    }
    production = {
      instance_class = "db.r6g.large"
      multi_az       = true
      backup_retention = 30
    }
  }
}

deployment "production-us-east-1" {
  inputs = {
    region         = "us-east-1"
    environment    = "production"
    vpc_cidr       = "10.0.0.0/16"

    # 使用 local 值
    tags           = merge(local.common_tags, {
      Environment = "production"
      Region      = "us-east-1"
    })

    # 從環境配置取值
    db_instance_class    = local.env_config["production"].instance_class
    db_multi_az          = local.env_config["production"].multi_az
    db_backup_retention  = local.env_config["production"].backup_retention

    # 認證相關
    role_arn       = "arn:aws:iam::333333333333:role/terraform-stacks-prod"
    identity_token = identity_token.aws.jwt
  }
}

狀態管理與協作

狀態儲存架構

Terraform Stacks 的狀態由 HCP Terraform 自動管理:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
┌─────────────────────────────────────────────────────────────────┐
│                        HCP Terraform                             │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                        Stack                                 ││
│  │  ┌───────────────────────────────────────────────────────┐  ││
│  │  │              State Management                          │  ││
│  │  │  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐     │  ││
│  │  │  │ development │ │   staging   │ │ production  │     │  ││
│  │  │  │    state    │ │    state    │ │   states    │     │  ││
│  │  │  └─────────────┘ └─────────────┘ └─────────────┘     │  ││
│  │  │                                   ┌──────┬──────┐    │  ││
│  │  │                                   │us-e-1│us-w-2│    │  ││
│  │  │                                   └──────┴──────┘    │  ││
│  │  └───────────────────────────────────────────────────────┘  ││
│  └─────────────────────────────────────────────────────────────┘│
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                   Features                                   ││
│  │  • Automatic state locking                                  ││
│  │  • State versioning and history                             ││
│  │  • Encryption at rest                                       ││
│  │  • Access control per deployment                            ││
│  └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘

狀態鎖定機制

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Stacks 自動處理狀態鎖定
# 當一個 Deployment 正在執行時,其他對同一 Deployment 的操作會被阻擋

# 查看鎖定狀態
# GET /api/v2/stacks/{stack_id}/deployments/{deployment_id}

{
  "data": {
    "id": "deployment-abc123",
    "type": "stack-deployments",
    "attributes": {
      "name": "production-us-east-1",
      "locked": true,
      "locked-by": {
        "run-id": "run-xyz789",
        "user": "user@example.com"
      }
    }
  }
}

團隊協作設定

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# 使用 Terraform 管理 Stack 權限
resource "tfe_team" "infrastructure" {
  name         = "infrastructure-team"
  organization = "my-organization"
}

resource "tfe_team" "developers" {
  name         = "developers"
  organization = "my-organization"
}

# Stack 層級權限
resource "tfe_stack_team_access" "infra_admin" {
  stack_id = tfe_stack.main.id
  team_id  = tfe_team.infrastructure.id

  access = "admin"
}

resource "tfe_stack_team_access" "dev_read" {
  stack_id = tfe_stack.main.id
  team_id  = tfe_team.developers.id

  access = "read"

  # 開發者只能在特定 Deployment 執行 plan
  deployment_permissions = {
    "development" = "plan"
    "staging"     = "read"
    "production"  = "read"
  }
}

版本控制整合

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# .github/workflows/terraform-stacks.yml

name: Terraform Stacks

on:
  push:
    branches: [main]
    paths:
      - 'infrastructure/**'
  pull_request:
    branches: [main]
    paths:
      - 'infrastructure/**'

jobs:
  plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          cli_config_credentials_token: ${{ secrets.TF_API_TOKEN }}

      - name: Terraform Stacks Plan
        working-directory: infrastructure
        run: |
          terraform init
          tfstacks plan -out=plan.json          

      - name: Upload Plan
        uses: actions/upload-artifact@v4
        with:
          name: terraform-plan
          path: infrastructure/plan.json

  apply:
    needs: plan
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4

      - name: Download Plan
        uses: actions/download-artifact@v4
        with:
          name: terraform-plan
          path: infrastructure

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          cli_config_credentials_token: ${{ secrets.TF_API_TOKEN }}

      - name: Terraform Stacks Apply
        working-directory: infrastructure
        run: |
          terraform init
          tfstacks apply plan.json          

最佳實務與案例

目錄結構最佳實務

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
infrastructure/
├── README.md
├── components.tfstack.hcl       # Stack 定義
├── deployments.tfdeploy.hcl     # Deployment 定義

├── components/                   # Component 模組
   ├── networking/
      ├── main.tf
      ├── variables.tf
      ├── outputs.tf
      └── versions.tf
   
   ├── compute/
      ├── eks/
         ├── main.tf
         ├── variables.tf
         └── outputs.tf
      └── ecs/
          ├── main.tf
          ├── variables.tf
          └── outputs.tf
   
   ├── database/
      ├── rds/
         ├── main.tf
         ├── variables.tf
         └── outputs.tf
      └── elasticache/
          ├── main.tf
          ├── variables.tf
          └── outputs.tf
   
   └── observability/
       ├── main.tf
       ├── variables.tf
       └── outputs.tf

├── configs/                      # 環境配置
   ├── development.tfvars
   ├── staging.tfvars
   └── production.tfvars

└── tests/                        # 測試檔案
    ├── unit/
       └── networking_test.go
    └── integration/
        └── stack_test.go

命名規範

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# 採用一致的命名規範

# Component 命名:使用功能描述性名稱
component "primary_vpc" { }
component "eks_cluster" { }
component "aurora_database" { }

# Deployment 命名:環境-區域 格式
deployment "development-us-east-1" { }
deployment "staging-us-east-1" { }
deployment "production-us-east-1" { }
deployment "production-us-west-2" { }

# 變數命名:使用 snake_case
variable "cluster_name" { }
variable "node_instance_type" { }
variable "enable_encryption" { }

# 輸出命名:與變數保持一致
output "cluster_endpoint" { }
output "database_connection_string" { }

實際案例:多區域 SaaS 平台

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
# components.tfstack.hcl

required_providers {
  aws = {
    source  = "hashicorp/aws"
    version = "~> 5.0"
  }
  kubernetes = {
    source  = "hashicorp/kubernetes"
    version = "~> 2.25"
  }
}

# 變數定義
variable "region" {
  type = string
}

variable "environment" {
  type = string
}

variable "domain" {
  type = string
}

variable "services" {
  type = map(object({
    replicas = number
    cpu      = string
    memory   = string
    port     = number
  }))
}

# Provider 配置
provider "aws" "this" {
  config {
    region = var.region

    assume_role_with_web_identity {
      role_arn           = var.role_arn
      web_identity_token = identity_token.aws.jwt
    }
  }
}

provider "kubernetes" "this" {
  config {
    host                   = component.eks.cluster_endpoint
    cluster_ca_certificate = component.eks.cluster_ca_certificate
    token                  = component.eks.auth_token
  }
}

# 基礎網路層
component "vpc" {
  source = "./components/networking"

  inputs = {
    name       = "${var.environment}-main"
    cidr       = var.vpc_cidr
    azs        = ["${var.region}a", "${var.region}b", "${var.region}c"]

    enable_nat_gateway     = true
    enable_vpn_gateway     = false
    enable_dns_hostnames   = true

    tags = {
      Environment = var.environment
      Region      = var.region
    }
  }

  providers = {
    aws = provider.aws.this
  }
}

# EKS 叢集
component "eks" {
  source = "./components/compute/eks"

  inputs = {
    cluster_name    = "${var.environment}-platform"
    cluster_version = "1.29"

    vpc_id          = component.vpc.vpc_id
    subnet_ids      = component.vpc.private_subnet_ids

    node_groups = {
      general = {
        instance_types = ["t3.large", "t3.xlarge"]
        min_size       = 3
        max_size       = 20
        desired_size   = 5

        labels = {
          workload = "general"
        }
      }

      memory_optimized = {
        instance_types = ["r6g.large", "r6g.xlarge"]
        min_size       = 0
        max_size       = 10
        desired_size   = 2

        labels = {
          workload = "memory"
        }

        taints = [{
          key    = "workload"
          value  = "memory"
          effect = "NO_SCHEDULE"
        }]
      }
    }

    enable_cluster_autoscaler = true
    enable_metrics_server     = true
  }

  providers = {
    aws = provider.aws.this
  }
}

# Aurora 資料庫
component "database" {
  source = "./components/database/aurora"

  inputs = {
    cluster_identifier = "${var.environment}-platform"
    engine             = "aurora-postgresql"
    engine_version     = "15.4"

    instance_class = var.environment == "production" ? "db.r6g.large" : "db.t4g.medium"
    instances      = var.environment == "production" ? 3 : 1

    vpc_id             = component.vpc.vpc_id
    subnet_ids         = component.vpc.database_subnet_ids
    security_group_ids = [component.vpc.database_security_group_id]

    backup_retention_period = var.environment == "production" ? 30 : 7
    deletion_protection     = var.environment == "production"

    performance_insights_enabled = true
  }

  providers = {
    aws = provider.aws.this
  }
}

# Redis 快取
component "cache" {
  source = "./components/database/elasticache"

  inputs = {
    cluster_id     = "${var.environment}-platform"
    engine         = "redis"
    engine_version = "7.0"

    node_type       = var.environment == "production" ? "cache.r6g.large" : "cache.t4g.medium"
    num_cache_nodes = var.environment == "production" ? 3 : 1

    vpc_id             = component.vpc.vpc_id
    subnet_ids         = component.vpc.elasticache_subnet_ids
    security_group_ids = [component.vpc.cache_security_group_id]

    automatic_failover_enabled = var.environment == "production"
    multi_az_enabled           = var.environment == "production"
  }

  providers = {
    aws = provider.aws.this
  }
}

# 微服務部署
component "services" {
  source   = "./components/microservices"
  for_each = var.services

  inputs = {
    name      = each.key
    namespace = "applications"

    replicas = each.value.replicas
    cpu      = each.value.cpu
    memory   = each.value.memory
    port     = each.value.port

    database_url = component.database.connection_string
    redis_url    = component.cache.connection_string

    domain          = var.domain
    certificate_arn = component.certificate.arn
  }

  providers = {
    kubernetes = provider.kubernetes.this
  }
}

# 輸出
output "eks_endpoint" {
  value = component.eks.cluster_endpoint
}

output "database_endpoint" {
  value     = component.database.endpoint
  sensitive = true
}

output "service_endpoints" {
  value = {
    for name, svc in component.services : name => svc.endpoint
  }
}

Deployment 配置

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# deployments.tfdeploy.hcl

identity_token "aws" {
  audience = ["aws.workload.identity"]
}

locals {
  services = {
    api = {
      replicas = 3
      cpu      = "500m"
      memory   = "1Gi"
      port     = 8080
    }
    worker = {
      replicas = 2
      cpu      = "1000m"
      memory   = "2Gi"
      port     = 9090
    }
    scheduler = {
      replicas = 1
      cpu      = "250m"
      memory   = "512Mi"
      port     = 9091
    }
  }
}

deployment "development" {
  inputs = {
    region         = "us-east-1"
    environment    = "development"
    vpc_cidr       = "10.10.0.0/16"
    domain         = "dev.example.com"
    services       = local.services
    role_arn       = "arn:aws:iam::111111111111:role/stacks-dev"
    identity_token = identity_token.aws.jwt
  }
}

deployment "staging" {
  inputs = {
    region         = "us-east-1"
    environment    = "staging"
    vpc_cidr       = "10.20.0.0/16"
    domain         = "staging.example.com"
    services       = local.services
    role_arn       = "arn:aws:iam::222222222222:role/stacks-staging"
    identity_token = identity_token.aws.jwt
  }
}

deployment "production-us-east-1" {
  inputs = {
    region         = "us-east-1"
    environment    = "production"
    vpc_cidr       = "10.0.0.0/16"
    domain         = "api.example.com"
    services = {
      api = {
        replicas = 10
        cpu      = "1000m"
        memory   = "2Gi"
        port     = 8080
      }
      worker = {
        replicas = 5
        cpu      = "2000m"
        memory   = "4Gi"
        port     = 9090
      }
      scheduler = {
        replicas = 2
        cpu      = "500m"
        memory   = "1Gi"
        port     = 9091
      }
    }
    role_arn       = "arn:aws:iam::333333333333:role/stacks-prod"
    identity_token = identity_token.aws.jwt
  }
}

deployment "production-eu-west-1" {
  inputs = {
    region         = "eu-west-1"
    environment    = "production"
    vpc_cidr       = "10.1.0.0/16"
    domain         = "eu.api.example.com"
    services = {
      api = {
        replicas = 5
        cpu      = "1000m"
        memory   = "2Gi"
        port     = 8080
      }
      worker = {
        replicas = 3
        cpu      = "2000m"
        memory   = "4Gi"
        port     = 9090
      }
      scheduler = {
        replicas = 1
        cpu      = "500m"
        memory   = "1Gi"
        port     = 9091
      }
    }
    role_arn       = "arn:aws:iam::333333333333:role/stacks-prod-eu"
    identity_token = identity_token.aws.jwt
  }
}

# Orchestration 規則
orchestrate "auto_approve" "dev" {
  check {
    condition = context.plan.deployment == deployment.development
    reason    = "Auto-approve development"
  }
}

orchestrate "auto_approve" "staging" {
  check {
    condition = context.plan.deployment == deployment.staging
    reason    = "Auto-approve staging after development"
  }
  depends_on = [deployment.development]
}

orchestrate "manual_approval" "production" {
  check {
    condition = (
      context.plan.deployment == deployment.production-us-east-1 ||
      context.plan.deployment == deployment.production-eu-west-1
    )
    reason = "Production requires manual approval"
  }
  depends_on = [deployment.staging]
}

orchestrate "sequential" "production_regions" {
  check {
    condition = context.plan.deployment == deployment.production-eu-west-1
    reason    = "Deploy EU after US"
  }
  depends_on = [deployment.production-us-east-1]
}

常見問題與解決方案

問題原因解決方案
Component 之間循環依賴不正確的引用關係重新設計 Component 邊界,提取共用資源
Deployment 失敗後無法復原狀態不一致使用 tfstacks refresh 同步狀態
Provider 認證失敗Identity Token 過期確認 OIDC 設定正確
跨 Deployment 資料共享未設定輸出使用 Stack 輸出和遠端狀態
部署順序錯誤Orchestration Rule 設定不當檢查 depends_on 設定

總結

Terraform Stacks 為大規模基礎設施管理帶來了革命性的改進:

  • 統一管理:一次定義,多處部署,減少配置重複
  • 自動協調:透過 Orchestration Rules 管理複雜的部署依賴
  • 安全性:整合 OIDC 認證,避免長期憑證暴露
  • 可觀測性:集中的狀態管理和執行歷史
  • 團隊協作:細緻的權限控制和審批流程

對於需要管理多環境、多區域基礎設施的企業來說,Terraform Stacks 提供了一個強大且靈活的解決方案,能夠顯著提升 DevOps 團隊的效率和基礎設施的一致性。

參考資料

comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy