AWS ECS 藍綠部署與滾動更新

AWS ECS Blue-Green Deployment and Rolling Update Strategies

在現代化應用程式部署中,選擇正確的部署策略對於確保服務可用性和降低風險至關重要。本文將深入探討 AWS ECS(Elastic Container Service)中的兩種主要部署策略:滾動更新(Rolling Update)和藍綠部署(Blue-Green Deployment),並提供完整的實作指南。

1. 部署策略概述

滾動更新 (Rolling Update)

滾動更新是 ECS 的預設部署策略,它會逐步將舊版本的任務替換為新版本:

  • 逐步啟動新任務並終止舊任務
  • 過程中保持服務持續運行
  • 適合對成本敏感且可接受短暫混合版本的場景

優點:

  • 資源使用效率高
  • 配置簡單
  • 成本較低

缺點:

  • 回滾時間較長
  • 部署期間可能存在版本混合
  • 較難進行完整的預部署測試

藍綠部署 (Blue-Green Deployment)

藍綠部署維護兩個完全獨立的環境(藍色和綠色),在任何時候只有一個環境處於活動狀態:

  • 新版本部署到非活動環境
  • 通過流量切換實現瞬間切換
  • 適合對服務穩定性要求極高的場景

優點:

  • 零停機時間
  • 快速回滾
  • 完整的預部署測試能力

缺點:

  • 需要雙倍的運算資源
  • 配置較為複雜
  • 成本較高

2. 滾動更新設定

ECS Service 滾動更新配置

在 ECS 服務中配置滾動更新時,主要需要設定兩個參數:

1
2
3
4
5
6
{
  "deploymentConfiguration": {
    "minimumHealthyPercent": 50,
    "maximumPercent": 200
  }
}

參數說明:

  • minimumHealthyPercent:部署期間必須保持運行的任務百分比
  • maximumPercent:部署期間允許運行的最大任務百分比

AWS CLI 建立滾動更新服務

1
2
3
4
5
6
7
8
9
# 建立 ECS 服務並配置滾動更新
aws ecs create-service \
  --cluster my-cluster \
  --service-name my-service \
  --task-definition my-task:1 \
  --desired-count 4 \
  --launch-type FARGATE \
  --deployment-configuration "minimumHealthyPercent=50,maximumPercent=200" \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-12345678],securityGroups=[sg-12345678],assignPublicIp=ENABLED}"

更新服務觸發滾動部署

1
2
3
4
5
6
# 更新任務定義後觸發滾動部署
aws ecs update-service \
  --cluster my-cluster \
  --service-name my-service \
  --task-definition my-task:2 \
  --force-new-deployment

監控滾動更新狀態

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 查看服務部署狀態
aws ecs describe-services \
  --cluster my-cluster \
  --services my-service \
  --query 'services[0].deployments'

# 查看服務事件
aws ecs describe-services \
  --cluster my-cluster \
  --services my-service \
  --query 'services[0].events[:10]'

3. 藍綠部署架構

架構組件

藍綠部署架構包含以下核心組件:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
                    ┌─────────────────┐
                    │  Application    │
                    │  Load Balancer  │
                    └────────┬────────┘
              ┌──────────────┴──────────────┐
              │                             │
    ┌─────────┴─────────┐         ┌─────────┴─────────┐
    │  Target Group 1   │         │  Target Group 2   │
    │     (Blue)        │         │     (Green)       │
    └─────────┬─────────┘         └─────────┬─────────┘
              │                             │
    ┌─────────┴─────────┐         ┌─────────┴─────────┐
    │    ECS Service    │         │    ECS Service    │
    │    (Blue - v1)    │         │    (Green - v2)   │
    └───────────────────┘         └───────────────────┘

ECS 藍綠部署類型

AWS ECS 支援以下藍綠部署類型:

  1. CODE_DEPLOY:使用 AWS CodeDeploy 管理藍綠部署
  2. EXTERNAL:使用外部部署控制器

設定藍綠部署的 ECS 服務

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 建立支援藍綠部署的 ECS 服務
aws ecs create-service \
  --cluster my-cluster \
  --service-name my-bg-service \
  --task-definition my-task:1 \
  --desired-count 2 \
  --launch-type FARGATE \
  --deployment-controller type=CODE_DEPLOY \
  --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:region:account:targetgroup/blue-tg/xxx,containerName=my-container,containerPort=80" \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-12345678],securityGroups=[sg-12345678],assignPublicIp=ENABLED}"

4. CodeDeploy 整合

建立 CodeDeploy 應用程式

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 建立 CodeDeploy 應用程式
aws deploy create-application \
  --application-name my-ecs-app \
  --compute-platform ECS

# 建立部署群組
aws deploy create-deployment-group \
  --application-name my-ecs-app \
  --deployment-group-name my-deployment-group \
  --deployment-config-name CodeDeployDefault.ECSLinear10PercentEvery1Minutes \
  --service-role-arn arn:aws:iam::123456789012:role/CodeDeployServiceRole \
  --ecs-services clusterName=my-cluster,serviceName=my-bg-service \
  --load-balancer-info "targetGroupPairInfoList=[{targetGroups=[{name=blue-tg},{name=green-tg}],prodTrafficRoute={listenerArns=[arn:aws:elasticloadbalancing:region:account:listener/app/my-alb/xxx/yyy]}}]" \
  --blue-green-deployment-configuration "terminateBlueInstancesOnDeploymentSuccess={action=TERMINATE,terminationWaitTimeInMinutes=5},deploymentReadyOption={actionOnTimeout=CONTINUE_DEPLOYMENT,waitTimeInMinutes=0}"

AppSpec 檔案配置

建立 appspec.yaml 檔案:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: "arn:aws:ecs:region:account:task-definition/my-task:2"
        LoadBalancerInfo:
          ContainerName: "my-container"
          ContainerPort: 80
        PlatformVersion: "LATEST"
        NetworkConfiguration:
          AwsvpcConfiguration:
            Subnets:
              - "subnet-12345678"
              - "subnet-87654321"
            SecurityGroups:
              - "sg-12345678"
            AssignPublicIp: "ENABLED"

Hooks:
  - BeforeInstall: "LambdaFunctionToValidateBeforeInstall"
  - AfterInstall: "LambdaFunctionToValidateAfterInstall"
  - AfterAllowTestTraffic: "LambdaFunctionToValidateAfterTestTrafficStarts"
  - BeforeAllowTraffic: "LambdaFunctionToValidateBeforeAllowingProductionTraffic"
  - AfterAllowTraffic: "LambdaFunctionToValidateAfterAllowingProductionTraffic"

部署配置選項

CodeDeploy 提供多種內建部署配置:

配置名稱說明
CodeDeployDefault.ECSAllAtOnce一次性切換所有流量
CodeDeployDefault.ECSLinear10PercentEvery1Minutes每分鐘切換 10% 流量
CodeDeployDefault.ECSLinear10PercentEvery3Minutes每 3 分鐘切換 10% 流量
CodeDeployDefault.ECSCanary10Percent5Minutes先切換 10%,5 分鐘後切換剩餘
CodeDeployDefault.ECSCanary10Percent15Minutes先切換 10%,15 分鐘後切換剩餘

觸發部署

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 建立部署
aws deploy create-deployment \
  --application-name my-ecs-app \
  --deployment-group-name my-deployment-group \
  --revision revisionType=AppSpecContent,appSpecContent="{content='$(cat appspec.yaml)'}"

# 查看部署狀態
aws deploy get-deployment \
  --deployment-id d-XXXXXXXXX \
  --query 'deploymentInfo.status'

5. 健康檢查與回滾

ALB 健康檢查配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 建立 Target Group 並配置健康檢查
aws elbv2 create-target-group \
  --name my-target-group \
  --protocol HTTP \
  --port 80 \
  --vpc-id vpc-12345678 \
  --target-type ip \
  --health-check-protocol HTTP \
  --health-check-path /health \
  --health-check-interval-seconds 30 \
  --health-check-timeout-seconds 5 \
  --healthy-threshold-count 2 \
  --unhealthy-threshold-count 3 \
  --matcher HttpCode=200

ECS 任務定義中的健康檢查

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
  "containerDefinitions": [
    {
      "name": "my-container",
      "image": "my-app:latest",
      "healthCheck": {
        "command": [
          "CMD-SHELL",
          "curl -f http://localhost:80/health || exit 1"
        ],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60
      },
      "portMappings": [
        {
          "containerPort": 80,
          "protocol": "tcp"
        }
      ]
    }
  ]
}

自動回滾配置

1
2
3
4
5
# 更新部署群組以啟用自動回滾
aws deploy update-deployment-group \
  --application-name my-ecs-app \
  --current-deployment-group-name my-deployment-group \
  --auto-rollback-configuration "enabled=true,events=[DEPLOYMENT_FAILURE,DEPLOYMENT_STOP_ON_REQUEST]"

手動回滾

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 停止當前部署並回滾
aws deploy stop-deployment \
  --deployment-id d-XXXXXXXXX \
  --auto-rollback-enabled

# 或重新部署舊版本
aws deploy create-deployment \
  --application-name my-ecs-app \
  --deployment-group-name my-deployment-group \
  --revision revisionType=AppSpecContent,appSpecContent="{content='$(cat appspec-rollback.yaml)'}"

回滾監控 CloudWatch 告警

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# 建立 CloudWatch 告警用於監控部署
aws cloudwatch put-metric-alarm \
  --alarm-name ecs-deployment-failures \
  --alarm-description "Alarm when ECS deployment has failures" \
  --metric-name HTTPCode_Target_5XX_Count \
  --namespace AWS/ApplicationELB \
  --statistic Sum \
  --period 60 \
  --threshold 10 \
  --comparison-operator GreaterThanThreshold \
  --dimensions Name=LoadBalancer,Value=app/my-alb/xxx Name=TargetGroup,Value=targetgroup/my-tg/yyy \
  --evaluation-periods 2 \
  --alarm-actions arn:aws:sns:region:account:my-topic

6. ALB Target Group 切換

建立雙 Target Group

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# 建立 Blue Target Group
aws elbv2 create-target-group \
  --name blue-target-group \
  --protocol HTTP \
  --port 80 \
  --vpc-id vpc-12345678 \
  --target-type ip \
  --health-check-path /health

# 建立 Green Target Group
aws elbv2 create-target-group \
  --name green-target-group \
  --protocol HTTP \
  --port 80 \
  --vpc-id vpc-12345678 \
  --target-type ip \
  --health-check-path /health

配置 Listener 規則

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# 建立生產環境 Listener(指向 Blue)
aws elbv2 create-listener \
  --load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/my-alb/xxx \
  --protocol HTTP \
  --port 80 \
  --default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:region:account:targetgroup/blue-target-group/xxx

# 建立測試環境 Listener(指向 Green)
aws elbv2 create-listener \
  --load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/my-alb/xxx \
  --protocol HTTP \
  --port 8080 \
  --default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:region:account:targetgroup/green-target-group/xxx

手動切換流量

1
2
3
4
# 修改生產 Listener 指向 Green Target Group
aws elbv2 modify-listener \
  --listener-arn arn:aws:elasticloadbalancing:region:account:listener/app/my-alb/xxx/yyy \
  --default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:region:account:targetgroup/green-target-group/xxx

加權流量切換(金絲雀部署)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# 配置加權流量
aws elbv2 modify-listener \
  --listener-arn arn:aws:elasticloadbalancing:region:account:listener/app/my-alb/xxx/yyy \
  --default-actions '[
    {
      "Type": "forward",
      "ForwardConfig": {
        "TargetGroups": [
          {
            "TargetGroupArn": "arn:aws:elasticloadbalancing:region:account:targetgroup/blue-target-group/xxx",
            "Weight": 90
          },
          {
            "TargetGroupArn": "arn:aws:elasticloadbalancing:region:account:targetgroup/green-target-group/xxx",
            "Weight": 10
          }
        ]
      }
    }
  ]'

7. Terraform 部署範例

完整的藍綠部署 Terraform 配置

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
# providers.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "ap-northeast-1"
}

# variables.tf
variable "cluster_name" {
  description = "ECS Cluster name"
  type        = string
  default     = "my-ecs-cluster"
}

variable "service_name" {
  description = "ECS Service name"
  type        = string
  default     = "my-app-service"
}

variable "container_port" {
  description = "Container port"
  type        = number
  default     = 80
}

# vpc.tf
data "aws_vpc" "main" {
  default = true
}

data "aws_subnets" "public" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.main.id]
  }
}

# security_groups.tf
resource "aws_security_group" "alb" {
  name        = "${var.service_name}-alb-sg"
  description = "Security group for ALB"
  vpc_id      = data.aws_vpc.main.id

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 8080
    to_port     = 8080
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_security_group" "ecs" {
  name        = "${var.service_name}-ecs-sg"
  description = "Security group for ECS tasks"
  vpc_id      = data.aws_vpc.main.id

  ingress {
    from_port       = var.container_port
    to_port         = var.container_port
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# alb.tf
resource "aws_lb" "main" {
  name               = "${var.service_name}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = data.aws_subnets.public.ids

  enable_deletion_protection = false

  tags = {
    Name = "${var.service_name}-alb"
  }
}

# Blue Target Group
resource "aws_lb_target_group" "blue" {
  name        = "${var.service_name}-blue-tg"
  port        = var.container_port
  protocol    = "HTTP"
  vpc_id      = data.aws_vpc.main.id
  target_type = "ip"

  health_check {
    enabled             = true
    healthy_threshold   = 2
    interval            = 30
    matcher             = "200"
    path                = "/health"
    port                = "traffic-port"
    protocol            = "HTTP"
    timeout             = 5
    unhealthy_threshold = 3
  }

  lifecycle {
    create_before_destroy = true
  }
}

# Green Target Group
resource "aws_lb_target_group" "green" {
  name        = "${var.service_name}-green-tg"
  port        = var.container_port
  protocol    = "HTTP"
  vpc_id      = data.aws_vpc.main.id
  target_type = "ip"

  health_check {
    enabled             = true
    healthy_threshold   = 2
    interval            = 30
    matcher             = "200"
    path                = "/health"
    port                = "traffic-port"
    protocol            = "HTTP"
    timeout             = 5
    unhealthy_threshold = 3
  }

  lifecycle {
    create_before_destroy = true
  }
}

# Production Listener
resource "aws_lb_listener" "production" {
  load_balancer_arn = aws_lb.main.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.blue.arn
  }

  lifecycle {
    ignore_changes = [default_action]
  }
}

# Test Listener
resource "aws_lb_listener" "test" {
  load_balancer_arn = aws_lb.main.arn
  port              = 8080
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.green.arn
  }

  lifecycle {
    ignore_changes = [default_action]
  }
}

# ecs.tf
resource "aws_ecs_cluster" "main" {
  name = var.cluster_name

  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}

resource "aws_ecs_cluster_capacity_providers" "main" {
  cluster_name = aws_ecs_cluster.main.name

  capacity_providers = ["FARGATE", "FARGATE_SPOT"]

  default_capacity_provider_strategy {
    base              = 1
    weight            = 100
    capacity_provider = "FARGATE"
  }
}

# Task Execution Role
resource "aws_iam_role" "ecs_task_execution" {
  name = "${var.service_name}-task-execution-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ecs-tasks.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "ecs_task_execution" {
  role       = aws_iam_role.ecs_task_execution.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

# Task Definition
resource "aws_ecs_task_definition" "main" {
  family                   = var.service_name
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "256"
  memory                   = "512"
  execution_role_arn       = aws_iam_role.ecs_task_execution.arn

  container_definitions = jsonencode([
    {
      name  = "app"
      image = "nginx:latest"
      portMappings = [
        {
          containerPort = var.container_port
          protocol      = "tcp"
        }
      ]
      healthCheck = {
        command     = ["CMD-SHELL", "curl -f http://localhost:${var.container_port}/health || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 60
      }
      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = "/ecs/${var.service_name}"
          "awslogs-region"        = "ap-northeast-1"
          "awslogs-stream-prefix" = "ecs"
        }
      }
    }
  ])
}

# CloudWatch Log Group
resource "aws_cloudwatch_log_group" "ecs" {
  name              = "/ecs/${var.service_name}"
  retention_in_days = 30
}

# ECS Service with Blue-Green Deployment
resource "aws_ecs_service" "main" {
  name            = var.service_name
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.main.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  deployment_controller {
    type = "CODE_DEPLOY"
  }

  network_configuration {
    subnets          = data.aws_subnets.public.ids
    security_groups  = [aws_security_group.ecs.id]
    assign_public_ip = true
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.blue.arn
    container_name   = "app"
    container_port   = var.container_port
  }

  lifecycle {
    ignore_changes = [
      task_definition,
      load_balancer
    ]
  }

  depends_on = [aws_lb_listener.production]
}

# codedeploy.tf
resource "aws_iam_role" "codedeploy" {
  name = "${var.service_name}-codedeploy-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "codedeploy.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "codedeploy" {
  role       = aws_iam_role.codedeploy.name
  policy_arn = "arn:aws:iam::aws:policy/AWSCodeDeployRoleForECS"
}

resource "aws_codedeploy_app" "main" {
  compute_platform = "ECS"
  name             = "${var.service_name}-app"
}

resource "aws_codedeploy_deployment_group" "main" {
  app_name               = aws_codedeploy_app.main.name
  deployment_config_name = "CodeDeployDefault.ECSLinear10PercentEvery1Minutes"
  deployment_group_name  = "${var.service_name}-deployment-group"
  service_role_arn       = aws_iam_role.codedeploy.arn

  auto_rollback_configuration {
    enabled = true
    events  = ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_REQUEST"]
  }

  blue_green_deployment_config {
    deployment_ready_option {
      action_on_timeout = "CONTINUE_DEPLOYMENT"
    }

    terminate_blue_instances_on_deployment_success {
      action                           = "TERMINATE"
      termination_wait_time_in_minutes = 5
    }
  }

  deployment_style {
    deployment_option = "WITH_TRAFFIC_CONTROL"
    deployment_type   = "BLUE_GREEN"
  }

  ecs_service {
    cluster_name = aws_ecs_cluster.main.name
    service_name = aws_ecs_service.main.name
  }

  load_balancer_info {
    target_group_pair_info {
      prod_traffic_route {
        listener_arns = [aws_lb_listener.production.arn]
      }

      test_traffic_route {
        listener_arns = [aws_lb_listener.test.arn]
      }

      target_group {
        name = aws_lb_target_group.blue.name
      }

      target_group {
        name = aws_lb_target_group.green.name
      }
    }
  }
}

# outputs.tf
output "alb_dns_name" {
  description = "ALB DNS name"
  value       = aws_lb.main.dns_name
}

output "production_url" {
  description = "Production URL"
  value       = "http://${aws_lb.main.dns_name}"
}

output "test_url" {
  description = "Test URL"
  value       = "http://${aws_lb.main.dns_name}:8080"
}

output "codedeploy_app_name" {
  description = "CodeDeploy Application name"
  value       = aws_codedeploy_app.main.name
}

output "codedeploy_deployment_group" {
  description = "CodeDeploy Deployment Group name"
  value       = aws_codedeploy_deployment_group.main.deployment_group_name
}

使用 Terraform 部署

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 初始化 Terraform
terraform init

# 預覽變更
terraform plan

# 套用變更
terraform apply

# 查看輸出
terraform output

8. 最佳實務與故障排除

部署最佳實務

健康檢查配置

  1. 設定適當的 startPeriod:給予應用程式足夠的啟動時間
  2. 使用專用的健康檢查端點:避免使用首頁作為健康檢查
  3. 實作深度健康檢查:檢查資料庫連線、外部服務等依賴
1
2
3
4
5
6
7
8
9
{
  "healthCheck": {
    "command": ["CMD-SHELL", "curl -f http://localhost/health/ready || exit 1"],
    "interval": 30,
    "timeout": 5,
    "retries": 3,
    "startPeriod": 120
  }
}

部署配置建議

場景minimumHealthyPercentmaximumPercent說明
高可用性100200確保零停機
成本敏感50100減少額外資源使用
快速部署0200適合開發環境
平衡配置50150一般生產環境

藍綠部署注意事項

  1. 資料庫相容性:確保新舊版本可以共用資料庫 schema
  2. Session 處理:使用外部 Session 儲存(如 Redis)
  3. 快取清理:部署後考慮清理 CDN 和應用程式快取
  4. 功能開關:使用 Feature Flag 控制新功能

常見問題排除

部署卡住或失敗

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# 檢查服務事件
aws ecs describe-services \
  --cluster my-cluster \
  --services my-service \
  --query 'services[0].events[:10]'

# 檢查任務狀態
aws ecs list-tasks \
  --cluster my-cluster \
  --service-name my-service

# 查看任務停止原因
aws ecs describe-tasks \
  --cluster my-cluster \
  --tasks <task-arn> \
  --query 'tasks[0].stoppedReason'

健康檢查失敗

1
2
3
4
5
6
7
8
# 檢查 Target Group 健康狀態
aws elbv2 describe-target-health \
  --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/my-tg/xxx

# 檢查容器日誌
aws logs get-log-events \
  --log-group-name /ecs/my-service \
  --log-stream-name ecs/app/<task-id>

CodeDeploy 部署問題

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 查看部署詳情
aws deploy get-deployment \
  --deployment-id d-XXXXXXXXX

# 查看部署目標
aws deploy get-deployment-target \
  --deployment-id d-XXXXXXXXX \
  --target-id my-cluster:my-service

# 列出部署生命週期事件
aws deploy list-deployment-targets \
  --deployment-id d-XXXXXXXXX

回滾失敗處理

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 手動更新服務到舊版本
aws ecs update-service \
  --cluster my-cluster \
  --service-name my-service \
  --task-definition my-task:1 \
  --force-new-deployment

# 或修改 Listener 直接切換流量
aws elbv2 modify-listener \
  --listener-arn <listener-arn> \
  --default-actions Type=forward,TargetGroupArn=<previous-target-group-arn>

監控與告警設置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# 建立部署監控 Dashboard
aws cloudwatch put-dashboard \
  --dashboard-name ECS-Deployment-Monitor \
  --dashboard-body '{
    "widgets": [
      {
        "type": "metric",
        "properties": {
          "title": "ECS Running Tasks",
          "metrics": [
            ["AWS/ECS", "RunningTaskCount", "ClusterName", "my-cluster", "ServiceName", "my-service"]
          ]
        }
      },
      {
        "type": "metric",
        "properties": {
          "title": "Target Response Time",
          "metrics": [
            ["AWS/ApplicationELB", "TargetResponseTime", "LoadBalancer", "app/my-alb/xxx", "TargetGroup", "targetgroup/my-tg/yyy"]
          ]
        }
      },
      {
        "type": "metric",
        "properties": {
          "title": "HTTP 5xx Errors",
          "metrics": [
            ["AWS/ApplicationELB", "HTTPCode_Target_5XX_Count", "LoadBalancer", "app/my-alb/xxx"]
          ]
        }
      }
    ]
  }'

安全性考量

  1. 最小權限原則:僅授予必要的 IAM 權限
  2. 網路隔離:使用私有子網路部署 ECS 任務
  3. Secret 管理:使用 AWS Secrets Manager 或 Parameter Store
  4. 映像掃描:啟用 ECR 映像掃描
1
2
3
4
# 啟用 ECR 掃描
aws ecr put-image-scanning-configuration \
  --repository-name my-repo \
  --image-scanning-configuration scanOnPush=true

總結

選擇適當的部署策略取決於您的應用程式需求、團隊能力和預算考量:

  • 滾動更新:適合大多數情況,配置簡單,成本較低
  • 藍綠部署:適合需要快速回滾和零停機時間的關鍵應用程式

無論選擇哪種策略,都應確保:

  1. 完善的健康檢查機制
  2. 適當的監控和告警
  3. 清晰的回滾流程
  4. 定期進行部署演練

透過本文的指南,您應該能夠在 AWS ECS 中成功實作滾動更新和藍綠部署策略,為您的應用程式提供可靠、安全的部署流程。

comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy