AWS ECS Service Connect 服務網格

Service Connect 概述

AWS ECS Service Connect 是 Amazon ECS 於 2022 年推出的原生服務網格解決方案，旨在簡化微服務之間的通訊。它提供了一種無需額外基礎設施即可實現服務發現、負載平衡和可觀測性的方式。

Service Connect 的核心特點

簡化的服務發現：自動處理服務註冊和 DNS 解析，無需手動配置
內建負載平衡：提供客戶端負載平衡，無需額外的 Load Balancer
統一的可觀測性：自動收集連線指標並整合至 CloudWatch
零程式碼變更：應用程式無需修改即可使用服務網格功能
與 ECS 深度整合：原生支援 ECS 服務，配置簡單直觀

運作原理

Service Connect 在每個 ECS 任務中注入一個 Envoy Proxy sidecar 容器，負責處理所有進出流量：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
┌─────────────────────────────────────────────────────────────┐
│                        ECS Task                             │
│  ┌──────────────────┐    ┌─────────────────────────────┐   │
│  │                  │    │                             │   │
│  │   Application    │◄───│      Envoy Proxy            │   │
│  │   Container      │    │      (Service Connect)      │   │
│  │                  │───►│                             │   │
│  └──────────────────┘    └─────────────────────────────┘   │
│                                      │                      │
└──────────────────────────────────────│──────────────────────┘
                                       │
                          ┌────────────▼────────────┐
                          │   Cloud Map Namespace   │
                          │   (Service Discovery)   │
                          └─────────────────────────┘

與 App Mesh 比較

在 Service Connect 推出之前，AWS App Mesh 是 AWS 上實現服務網格的主要方案。以下是兩者的詳細比較：

功能對比表

功能	Service Connect	App Mesh
設定複雜度	低（ECS 原生整合）	高（需額外資源配置）
服務發現	AWS Cloud Map	AWS Cloud Map
Proxy	Envoy（自動管理）	Envoy（手動配置）
流量路由	基本負載平衡	進階路由規則
重試策略	內建預設值	完全可自定義
斷路器	內建	完全可自定義
mTLS	不支援（截至目前）	支援
跨帳號/跨叢集	不支援	支援
可觀測性	CloudWatch 整合	X-Ray、CloudWatch

選擇建議

選擇 Service Connect 的情境：

希望快速實現服務網格功能
主要需求是服務發現和基本負載平衡
所有服務都在同一個 ECS 叢集中
團隊對服務網格經驗較少

選擇 App Mesh 的情境：

需要進階流量管理（金絲雀部署、流量分割）
需要 mTLS 進行服務間加密
跨叢集或跨帳號的服務通訊
需要與 EKS 或 EC2 上的服務整合

遷移考量

1
2
3
4
5
# 查看現有 App Mesh 資源
aws appmesh list-meshes

# 列出特定 Mesh 中的虛擬服務
aws appmesh list-virtual-services --mesh-name my-mesh

從 App Mesh 遷移至 Service Connect 時，需要注意：

Service Connect 目前不支援 mTLS，若有加密需求需另行處理
進階路由規則需要在應用層實現
遷移過程中可能需要維護兩套配置

Namespace 設定

Cloud Map Namespace 是 Service Connect 的核心元件，所有服務都會註冊到 Namespace 中進行發現。

建立 Cloud Map Namespace

1
2
3
4
5
6
7
8
# 建立 HTTP Namespace（推薦用於 Service Connect）
aws servicediscovery create-http-namespace \
  --name production \
  --description "Production services namespace"

# 查看 Namespace 詳細資訊
aws servicediscovery get-namespace \
  --id ns-xxxxxxxxxxxxxxxxx

Namespace 類型比較

類型	用途	Service Connect 支援
HTTP Namespace	純粹的服務發現	完全支援
DNS Private Namespace	VPC 內 DNS 解析	部分支援
DNS Public Namespace	公開 DNS 解析	不支援

在 ECS Cluster 啟用 Service Connect

1
2
3
4
5
6
7
8
9
# 更新 Cluster 以啟用 Service Connect 預設 Namespace
aws ecs update-cluster \
  --cluster my-cluster \
  --service-connect-defaults namespace=arn:aws:servicediscovery:ap-northeast-1:123456789012:namespace/ns-xxxxxxxxx

# 驗證 Cluster 配置
aws ecs describe-clusters \
  --clusters my-cluster \
  --include SETTINGS

使用 AWS Console 建立 Namespace

前往 AWS Cloud Map 控制台
點選「Create namespace」
選擇「API calls」作為實例發現方式
輸入 Namespace 名稱（例如：production）
完成建立後，在 ECS Cluster 設定中啟用

服務端與客戶端設定

Service Connect 區分兩種角色：服務端（提供服務）和客戶端（呼叫服務）。一個服務可以同時扮演兩種角色。

服務端配置（Server/Producer）

服務端需要定義 portMappings 並設定 Service Connect 配置：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
{
  "family": "backend-api",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "containerDefinitions": [
    {
      "name": "api",
      "image": "123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/backend-api:latest",
      "essential": true,
      "portMappings": [
        {
          "name": "api-port",
          "containerPort": 8080,
          "protocol": "tcp",
          "appProtocol": "http"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/backend-api",
          "awslogs-region": "ap-northeast-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

建立服務端 ECS Service：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
aws ecs create-service \
  --cluster my-cluster \
  --service-name backend-api \
  --task-definition backend-api:1 \
  --desired-count 3 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-private-1,subnet-private-2],
    securityGroups=[sg-ecs-tasks],
    assignPublicIp=DISABLED
  }" \
  --service-connect-configuration '{
    "enabled": true,
    "namespace": "production",
    "services": [
      {
        "portName": "api-port",
        "discoveryName": "backend-api",
        "clientAliases": [
          {
            "port": 8080,
            "dnsName": "backend-api"
          }
        ]
      }
    ]
  }'

客戶端配置（Client/Consumer）

客戶端只需啟用 Service Connect，即可透過服務名稱存取其他服務：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
aws ecs create-service \
  --cluster my-cluster \
  --service-name frontend-web \
  --task-definition frontend-web:1 \
  --desired-count 2 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-private-1,subnet-private-2],
    securityGroups=[sg-ecs-tasks],
    assignPublicIp=DISABLED
  }" \
  --service-connect-configuration '{
    "enabled": true,
    "namespace": "production"
  }'

服務間通訊

啟用 Service Connect 後，客戶端可以直接使用服務名稱進行通訊：

1
2
3
4
5
6
7
8
# Python 範例 - 從 frontend-web 呼叫 backend-api
import requests

# 使用 Service Connect DNS 名稱
response = requests.get("http://backend-api:8080/api/users")

# 或使用完整的 namespace 格式
response = requests.get("http://backend-api.production:8080/api/users")

1
2
3
4
5
6
7
8
// Node.js 範例
const axios = require('axios');

async function fetchUsers() {
  // Service Connect 自動解析服務名稱
  const response = await axios.get('http://backend-api:8080/api/users');
  return response.data;
}

服務同時作為客戶端和服務端

許多微服務需要同時提供服務並呼叫其他服務：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
aws ecs create-service \
  --cluster my-cluster \
  --service-name order-service \
  --task-definition order-service:1 \
  --desired-count 3 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-private-1,subnet-private-2],
    securityGroups=[sg-ecs-tasks]
  }" \
  --service-connect-configuration '{
    "enabled": true,
    "namespace": "production",
    "services": [
      {
        "portName": "order-api",
        "discoveryName": "order-service",
        "clientAliases": [
          {
            "port": 8080,
            "dnsName": "order-service"
          }
        ]
      }
    ]
  }'

流量管理與負載平衡

Service Connect 提供內建的客戶端負載平衡功能，透過 Envoy Proxy 實現智慧流量分配。

負載平衡策略

Service Connect 預設使用 Round Robin 負載平衡策略，將請求平均分配到所有健康的後端實例：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
                                    ┌──────────────┐
                                ┌──►│  Task 1      │
                                │   │  10.0.1.10   │
┌──────────────┐   ┌─────────┐  │   └──────────────┘
│   Client     │──►│  Envoy  │──┤
│   Service    │   │  Proxy  │  │   ┌──────────────┐
└──────────────┘   └─────────┘  ├──►│  Task 2      │
                                │   │  10.0.1.11   │
                                │   └──────────────┘
                                │
                                │   ┌──────────────┐
                                └──►│  Task 3      │
                                    │  10.0.1.12   │
                                    └──────────────┘

健康檢查與自動故障轉移

Service Connect 會自動將流量從不健康的實例轉移：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
{
  "containerDefinitions": [
    {
      "name": "api",
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
        "interval": 10,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 30
      }
    }
  ]
}

逾時設定

透過 Task Definition 中的 Service Connect 配置設定連線逾時：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
  "serviceConnectConfiguration": {
    "enabled": true,
    "namespace": "production",
    "services": [
      {
        "portName": "api-port",
        "discoveryName": "backend-api",
        "timeout": {
          "idleTimeoutSeconds": 300,
          "perRequestTimeoutSeconds": 30
        },
        "clientAliases": [
          {
            "port": 8080,
            "dnsName": "backend-api"
          }
        ]
      }
    ]
  }
}

重試機制

Service Connect 提供內建的重試機制，處理暫時性故障：

1
2
3
4
5
# 查看 Service Connect 配置
aws ecs describe-services \
  --cluster my-cluster \
  --services backend-api \
  --query 'services[0].deployments[0].serviceConnectConfiguration'

連線池管理

Envoy Proxy 自動管理連線池，優化服務間通訊效能：

HTTP/1.1：維持持久連線，減少連線建立開銷
HTTP/2：支援多工，單一連線處理多個請求
自動重連：連線中斷時自動重新建立

監控與可觀測性

Service Connect 自動收集豐富的指標數據，並整合至 Amazon CloudWatch。

自動收集的指標

Service Connect 會自動產生以下 CloudWatch 指標：

指標名稱	說明	維度
`RequestCount`	請求總數	ServiceName, TargetService
`RequestCountPerTarget`	每個目標的請求數	ServiceName, TargetService, TargetIP
`ActiveConnectionCount`	活躍連線數	ServiceName
`NewConnectionCount`	新建連線數	ServiceName
`ProcessedBytes`	處理的位元組數	ServiceName
`TargetResponseTime`	目標回應時間	ServiceName, TargetService
`HTTPCode_Target_2XX_Count`	2XX 回應數量	ServiceName
`HTTPCode_Target_4XX_Count`	4XX 回應數量	ServiceName
`HTTPCode_Target_5XX_Count`	5XX 回應數量	ServiceName

建立 CloudWatch Dashboard

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# 建立包含 Service Connect 指標的 Dashboard
aws cloudwatch put-dashboard \
  --dashboard-name "ECS-Service-Connect-Dashboard" \
  --dashboard-body '{
    "widgets": [
      {
        "type": "metric",
        "x": 0,
        "y": 0,
        "width": 12,
        "height": 6,
        "properties": {
          "title": "Request Count by Service",
          "metrics": [
            ["AWS/ECS", "RequestCount", "ClusterName", "my-cluster", "ServiceName", "backend-api"],
            ["...", "frontend-web"],
            ["...", "order-service"]
          ],
          "period": 60,
          "stat": "Sum"
        }
      },
      {
        "type": "metric",
        "x": 12,
        "y": 0,
        "width": 12,
        "height": 6,
        "properties": {
          "title": "Response Time",
          "metrics": [
            ["AWS/ECS", "TargetResponseTime", "ClusterName", "my-cluster", "ServiceName", "backend-api", {"stat": "p99"}],
            ["...", {"stat": "p50"}]
          ],
          "period": 60
        }
      },
      {
        "type": "metric",
        "x": 0,
        "y": 6,
        "width": 12,
        "height": 6,
        "properties": {
          "title": "Error Rate",
          "metrics": [
            ["AWS/ECS", "HTTPCode_Target_5XX_Count", "ClusterName", "my-cluster", "ServiceName", "backend-api"],
            [".", "HTTPCode_Target_4XX_Count", ".", ".", ".", "."]
          ],
          "period": 60,
          "stat": "Sum"
        }
      }
    ]
  }'

設定 CloudWatch 告警

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# 建立高錯誤率告警
aws cloudwatch put-metric-alarm \
  --alarm-name "ServiceConnect-HighErrorRate" \
  --alarm-description "Service Connect 5XX error rate exceeded threshold" \
  --metric-name HTTPCode_Target_5XX_Count \
  --namespace AWS/ECS \
  --dimensions Name=ClusterName,Value=my-cluster Name=ServiceName,Value=backend-api \
  --statistic Sum \
  --period 60 \
  --threshold 10 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 3 \
  --alarm-actions arn:aws:sns:ap-northeast-1:123456789012:alerts

# 建立延遲告警
aws cloudwatch put-metric-alarm \
  --alarm-name "ServiceConnect-HighLatency" \
  --alarm-description "Service Connect response time exceeded threshold" \
  --metric-name TargetResponseTime \
  --namespace AWS/ECS \
  --dimensions Name=ClusterName,Value=my-cluster Name=ServiceName,Value=backend-api \
  --extended-statistic p99 \
  --period 60 \
  --threshold 1000 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 3 \
  --alarm-actions arn:aws:sns:ap-northeast-1:123456789012:alerts

整合 AWS X-Ray

雖然 Service Connect 不直接支援 X-Ray，但可以在應用程式中加入 X-Ray SDK 實現分散式追蹤：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
  "containerDefinitions": [
    {
      "name": "api",
      "image": "my-app:latest",
      "environment": [
        {
          "name": "AWS_XRAY_DAEMON_ADDRESS",
          "value": "xray-daemon:2000"
        }
      ]
    },
    {
      "name": "xray-daemon",
      "image": "public.ecr.aws/xray/aws-xray-daemon:latest",
      "portMappings": [
        {
          "containerPort": 2000,
          "protocol": "udp"
        }
      ]
    }
  ]
}

日誌分析

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 使用 CloudWatch Logs Insights 查詢 Service Connect 相關日誌
aws logs start-query \
  --log-group-name /ecs/backend-api \
  --start-time $(date -d '1 hour ago' +%s) \
  --end-time $(date +%s) \
  --query-string '
    fields @timestamp, @message
    | filter @message like /connection|upstream|downstream/
    | sort @timestamp desc
    | limit 100
  '

# 取得查詢結果
aws logs get-query-results --query-id <query-id>

Terraform 部署範例

以下是使用 Terraform 完整部署 Service Connect 架構的範例。

專案結構

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
terraform/
├── main.tf
├── variables.tf
├── outputs.tf
├── vpc.tf
├── ecs.tf
├── service-connect.tf
└── services/
    ├── backend-api.tf
    └── frontend-web.tf

基礎設施配置（main.tf）

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.aws_region
}

# 變數定義
variable "aws_region" {
  description = "AWS Region"
  default     = "ap-northeast-1"
}

variable "environment" {
  description = "Environment name"
  default     = "production"
}

variable "project_name" {
  description = "Project name"
  default     = "myapp"
}

VPC 和網路配置（vpc.tf）

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# VPC
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${var.project_name}-vpc"
  }
}

# 私有子網路
resource "aws_subnet" "private" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index + 1}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    Name = "${var.project_name}-private-${count.index + 1}"
  }
}

# 公有子網路
resource "aws_subnet" "public" {
  count                   = 2
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.${count.index + 101}.0/24"
  availability_zone       = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.project_name}-public-${count.index + 1}"
  }
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}-igw"
  }
}

# NAT Gateway
resource "aws_eip" "nat" {
  domain = "vpc"

  tags = {
    Name = "${var.project_name}-nat-eip"
  }
}

resource "aws_nat_gateway" "main" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public[0].id

  tags = {
    Name = "${var.project_name}-nat"
  }
}

# 路由表
resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main.id
  }

  tags = {
    Name = "${var.project_name}-private-rt"
  }
}

resource "aws_route_table_association" "private" {
  count          = 2
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private.id
}

data "aws_availability_zones" "available" {
  state = "available"
}

Cloud Map Namespace 和 ECS Cluster（ecs.tf）

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
# Cloud Map Namespace
resource "aws_service_discovery_http_namespace" "main" {
  name        = var.environment
  description = "${var.environment} services namespace for Service Connect"

  tags = {
    Environment = var.environment
  }
}

# ECS Cluster
resource "aws_ecs_cluster" "main" {
  name = "${var.project_name}-cluster"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }

  # 設定預設的 Service Connect namespace
  service_connect_defaults {
    namespace = aws_service_discovery_http_namespace.main.arn
  }

  tags = {
    Environment = var.environment
  }
}

# ECS Cluster 容量提供者
resource "aws_ecs_cluster_capacity_providers" "main" {
  cluster_name = aws_ecs_cluster.main.name

  capacity_providers = ["FARGATE", "FARGATE_SPOT"]

  default_capacity_provider_strategy {
    base              = 1
    weight            = 100
    capacity_provider = "FARGATE"
  }
}

# ECS Task 執行角色
resource "aws_iam_role" "ecs_task_execution" {
  name = "${var.project_name}-ecs-task-execution"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ecs-tasks.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "ecs_task_execution" {
  role       = aws_iam_role.ecs_task_execution.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

# ECS Task 角色
resource "aws_iam_role" "ecs_task" {
  name = "${var.project_name}-ecs-task"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ecs-tasks.amazonaws.com"
        }
      }
    ]
  })
}

# 安全群組
resource "aws_security_group" "ecs_tasks" {
  name        = "${var.project_name}-ecs-tasks-sg"
  description = "Security group for ECS tasks"
  vpc_id      = aws_vpc.main.id

  # 允許任務間通訊
  ingress {
    from_port = 0
    to_port   = 65535
    protocol  = "tcp"
    self      = true
  }

  # 允許所有出站流量
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-ecs-tasks-sg"
  }
}

Backend API Service（services/backend-api.tf）

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# CloudWatch Log Group
resource "aws_cloudwatch_log_group" "backend_api" {
  name              = "/ecs/${var.project_name}/backend-api"
  retention_in_days = 30

  tags = {
    Service = "backend-api"
  }
}

# Task Definition
resource "aws_ecs_task_definition" "backend_api" {
  family                   = "${var.project_name}-backend-api"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = 512
  memory                   = 1024
  execution_role_arn       = aws_iam_role.ecs_task_execution.arn
  task_role_arn            = aws_iam_role.ecs_task.arn

  container_definitions = jsonencode([
    {
      name      = "api"
      image     = "${var.ecr_repository_url}/backend-api:latest"
      essential = true

      portMappings = [
        {
          name          = "api-port"
          containerPort = 8080
          protocol      = "tcp"
          appProtocol   = "http"
        }
      ]

      environment = [
        {
          name  = "NODE_ENV"
          value = var.environment
        },
        {
          name  = "PORT"
          value = "8080"
        }
      ]

      healthCheck = {
        command     = ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 60
      }

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.backend_api.name
          "awslogs-region"        = var.aws_region
          "awslogs-stream-prefix" = "ecs"
        }
      }
    }
  ])

  tags = {
    Service = "backend-api"
  }
}

# ECS Service with Service Connect
resource "aws_ecs_service" "backend_api" {
  name            = "backend-api"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.backend_api.arn
  desired_count   = 3
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = aws_subnet.private[*].id
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  # Service Connect 配置 - 作為服務端
  service_connect_configuration {
    enabled   = true
    namespace = aws_service_discovery_http_namespace.main.arn

    service {
      port_name      = "api-port"
      discovery_name = "backend-api"

      client_alias {
        port     = 8080
        dns_name = "backend-api"
      }

      timeout {
        idle_timeout_seconds        = 300
        per_request_timeout_seconds = 30
      }
    }

    log_configuration {
      log_driver = "awslogs"
      options = {
        "awslogs-group"         = aws_cloudwatch_log_group.backend_api.name
        "awslogs-region"        = var.aws_region
        "awslogs-stream-prefix" = "service-connect"
      }
    }
  }

  deployment_configuration {
    maximum_percent         = 200
    minimum_healthy_percent = 100

    deployment_circuit_breaker {
      enable   = true
      rollback = true
    }
  }

  tags = {
    Service = "backend-api"
  }
}

Frontend Web Service（services/frontend-web.tf）

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# CloudWatch Log Group
resource "aws_cloudwatch_log_group" "frontend_web" {
  name              = "/ecs/${var.project_name}/frontend-web"
  retention_in_days = 30

  tags = {
    Service = "frontend-web"
  }
}

# Task Definition
resource "aws_ecs_task_definition" "frontend_web" {
  family                   = "${var.project_name}-frontend-web"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = 256
  memory                   = 512
  execution_role_arn       = aws_iam_role.ecs_task_execution.arn
  task_role_arn            = aws_iam_role.ecs_task.arn

  container_definitions = jsonencode([
    {
      name      = "web"
      image     = "${var.ecr_repository_url}/frontend-web:latest"
      essential = true

      portMappings = [
        {
          name          = "web-port"
          containerPort = 3000
          protocol      = "tcp"
          appProtocol   = "http"
        }
      ]

      environment = [
        {
          name  = "BACKEND_URL"
          value = "http://backend-api:8080"  # 使用 Service Connect DNS 名稱
        }
      ]

      healthCheck = {
        command     = ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 30
      }

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.frontend_web.name
          "awslogs-region"        = var.aws_region
          "awslogs-stream-prefix" = "ecs"
        }
      }
    }
  ])

  tags = {
    Service = "frontend-web"
  }
}

# ECS Service with Service Connect - 純客戶端
resource "aws_ecs_service" "frontend_web" {
  name            = "frontend-web"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.frontend_web.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = aws_subnet.private[*].id
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  # Service Connect 配置 - 作為客戶端
  service_connect_configuration {
    enabled   = true
    namespace = aws_service_discovery_http_namespace.main.arn
    # 不需要定義 service 區塊，因為只作為客戶端

    log_configuration {
      log_driver = "awslogs"
      options = {
        "awslogs-group"         = aws_cloudwatch_log_group.frontend_web.name
        "awslogs-region"        = var.aws_region
        "awslogs-stream-prefix" = "service-connect"
      }
    }
  }

  deployment_configuration {
    maximum_percent         = 200
    minimum_healthy_percent = 100

    deployment_circuit_breaker {
      enable   = true
      rollback = true
    }
  }

  # 與 ALB 整合（可選）
  load_balancer {
    target_group_arn = aws_lb_target_group.frontend.arn
    container_name   = "web"
    container_port   = 3000
  }

  tags = {
    Service = "frontend-web"
  }

  depends_on = [aws_ecs_service.backend_api]
}

輸出配置（outputs.tf）

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
output "cluster_name" {
  description = "ECS Cluster name"
  value       = aws_ecs_cluster.main.name
}

output "namespace_arn" {
  description = "Cloud Map namespace ARN"
  value       = aws_service_discovery_http_namespace.main.arn
}

output "namespace_name" {
  description = "Cloud Map namespace name"
  value       = aws_service_discovery_http_namespace.main.name
}

output "backend_api_service_name" {
  description = "Backend API service name"
  value       = aws_ecs_service.backend_api.name
}

output "frontend_web_service_name" {
  description = "Frontend Web service name"
  value       = aws_ecs_service.frontend_web.name
}

部署指令

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 初始化 Terraform
terraform init

# 檢視執行計畫
terraform plan -out=tfplan

# 套用變更
terraform apply tfplan

# 驗證部署
aws ecs describe-services \
  --cluster myapp-cluster \
  --services backend-api frontend-web \
  --query 'services[*].{Name:serviceName,Status:status,Running:runningCount}'

故障排除與最佳實務

常見問題與解決方案

1. 服務無法相互連線

症狀：客戶端服務無法透過服務名稱存取其他服務

診斷步驟：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# 檢查 Service Connect 配置是否正確
aws ecs describe-services \
  --cluster my-cluster \
  --services backend-api \
  --query 'services[0].deployments[0].serviceConnectConfiguration'

# 確認 Namespace 是否正確設定
aws servicediscovery list-services \
  --filters Name=NAMESPACE_ID,Values=ns-xxxxxxxxx

# 檢查任務中的 Envoy Proxy 狀態
aws ecs describe-tasks \
  --cluster my-cluster \
  --tasks <task-id> \
  --query 'tasks[0].containers[?name==`ecs-service-connect-agent`]'

解決方案：

確認兩個服務都在同一個 Namespace 中
驗證 portMappings 的 name 屬性與 Service Connect 配置中的 portName 一致
檢查安全群組是否允許服務間通訊

2. Envoy Proxy Sidecar 啟動失敗

症狀：任務啟動但快速失敗，容器日誌顯示 Envoy 相關錯誤

診斷步驟：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 查看任務停止原因
aws ecs describe-tasks \
  --cluster my-cluster \
  --tasks <task-id> \
  --query 'tasks[0].stoppedReason'

# 查看 Envoy Proxy 日誌
aws logs filter-log-events \
  --log-group-name /ecs/my-service \
  --log-stream-name-prefix "service-connect" \
  --start-time $(date -d '1 hour ago' +%s)000

解決方案：

確認 Task Definition 中的 portMappings 使用正確的 appProtocol（http 或 grpc）
檢查 ECS Task 執行角色是否有足夠權限
確認 Container 的健康檢查端點可正常存取

3. 延遲過高

症狀：服務間通訊延遲明顯高於預期

診斷步驟：

1
2
3
4
5
6
7
8
9
# 查看 CloudWatch 延遲指標
aws cloudwatch get-metric-statistics \
  --namespace AWS/ECS \
  --metric-name TargetResponseTime \
  --dimensions Name=ClusterName,Value=my-cluster Name=ServiceName,Value=backend-api \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 60 \
  --statistics Average p99

解決方案：

檢查目標服務的資源使用情況（CPU、記憶體）
考慮增加目標服務的任務數量
檢查應用程式本身的效能問題
調整逾時設定以符合實際需求

4. 服務發現更新延遲

症狀：新部署的任務需要較長時間才能接收流量

解決方案：

1
2
3
4
5
6
7
8
9
{
  "healthCheck": {
    "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
    "interval": 10,
    "timeout": 5,
    "retries": 2,
    "startPeriod": 30
  }
}

縮短健康檢查間隔
減少 startPeriod 以加快健康狀態確認
確保健康檢查端點快速回應

最佳實務

1. 命名規範

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# 使用一致的命名規範
# Namespace: 使用環境名稱
production
staging
development

# 服務名稱: 使用 kebab-case
user-service
order-service
payment-gateway

# DNS 別名: 保持簡潔且有意義
users
orders
payments

2. 資源配置建議

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# 為 Envoy Proxy 預留額外資源
resource "aws_ecs_task_definition" "api" {
  cpu    = 512  # 其中約 64-128 用於 Envoy
  memory = 1024 # 其中約 128-256 用於 Envoy

  # 使用明確的 appProtocol
  container_definitions = jsonencode([
    {
      portMappings = [
        {
          name          = "api-http"
          containerPort = 8080
          appProtocol   = "http"  # 或 "grpc"
        }
      ]
    }
  ])
}

3. 安全性配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# 最小權限原則的安全群組
resource "aws_security_group" "ecs_tasks" {
  # 僅允許必要的服務間通訊端口
  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.allowed_services.id]
  }

  # 限制出站流量
  egress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "HTTPS for AWS services"
  }
}

4. 可觀測性配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# 為 Service Connect 啟用詳細日誌
service_connect_configuration {
  enabled = true

  log_configuration {
    log_driver = "awslogs"
    options = {
      "awslogs-group"         = "/ecs/service-connect"
      "awslogs-region"        = var.aws_region
      "awslogs-stream-prefix" = "envoy"
    }
  }
}

5. 部署策略

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 使用藍綠部署確保穩定性
deployment_configuration {
  maximum_percent         = 200
  minimum_healthy_percent = 100

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }
}

效能調優建議

連線池優化：Service Connect 的 Envoy Proxy 會自動管理連線池，但應確保應用程式也正確配置 HTTP 連線重用
健康檢查間隔：根據服務特性調整健康檢查頻率，避免過於頻繁造成資源浪費
逾時設定：根據實際服務回應時間設定合理的逾時值
資源監控：定期檢視 CloudWatch 指標，識別效能瓶頸

Service Connect 概述

Service Connect 的核心特點

運作原理

與 App Mesh 比較

功能對比表

選擇建議

遷移考量

Namespace 設定

建立 Cloud Map Namespace

Namespace 類型比較

在 ECS Cluster 啟用 Service Connect

使用 AWS Console 建立 Namespace

服務端與客戶端設定

服務端配置（Server/Producer）

客戶端配置（Client/Consumer）

服務間通訊

服務同時作為客戶端和服務端

流量管理與負載平衡

負載平衡策略

健康檢查與自動故障轉移

逾時設定

重試機制

連線池管理

監控與可觀測性

自動收集的指標

建立 CloudWatch Dashboard

設定 CloudWatch 告警

整合 AWS X-Ray

日誌分析

Terraform 部署範例

專案結構

基礎設施配置（main.tf）

VPC 和網路配置（vpc.tf）

Cloud Map Namespace 和 ECS Cluster（ecs.tf）

Backend API Service（services/backend-api.tf）

Frontend Web Service（services/frontend-web.tf）

輸出配置（outputs.tf）

部署指令

故障排除與最佳實務

常見問題與解決方案

1. 服務無法相互連線

2. Envoy Proxy Sidecar 啟動失敗

3. 延遲過高

4. 服務發現更新延遲

最佳實務

1. 命名規範

2. 資源配置建議

3. 安全性配置

4. 可觀測性配置

5. 部署策略

效能調優建議

參考資料