ecs

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AWS ECS

AWS ECS

Amazon Elastic Container Service (ECS) is a fully managed container orchestration service. Run containers on AWS Fargate (serverless) or EC2 instances.
Amazon Elastic Container Service(ECS)是一项全托管的容器编排服务。可在AWS Fargate(无服务器)或EC2实例上运行容器。

Table of Contents

目录

Core Concepts

核心概念

Cluster

集群

Logical grouping of tasks or services. Can contain Fargate tasks, EC2 instances, or both.
任务或服务的逻辑分组。可包含Fargate任务、EC2实例,或两者兼具。

Task Definition

任务定义

Blueprint for your application. Defines containers, resources, networking, and IAM roles.
应用的蓝图。定义容器、资源、网络和IAM角色。

Task

任务

Running instance of a task definition. Can run standalone or as part of a service.
任务定义的运行实例。可独立运行,也可作为服务的一部分运行。

Service

服务

Maintains desired count of tasks. Handles deployments, load balancing, and auto scaling.
维持任务的期望数量。处理部署、负载均衡和自动扩缩容。

Launch Types

启动类型

TypeDescriptionUse Case
FargateServerless, pay per taskMost workloads
EC2Self-managed instancesGPU, Windows, specific requirements
类型描述使用场景
Fargate无服务器,按任务付费大多数工作负载
EC2自管理实例GPU、Windows、特定需求场景

Common Patterns

常见模式

Create a Fargate Cluster

创建Fargate集群

AWS CLI:
bash
undefined
AWS CLI:
bash
undefined

Create cluster

Create cluster

aws ecs create-cluster --cluster-name my-cluster
aws ecs create-cluster --cluster-name my-cluster

With capacity providers

With capacity providers

aws ecs create-cluster
--cluster-name my-cluster
--capacity-providers FARGATE FARGATE_SPOT
--default-capacity-provider-strategy
capacityProvider=FARGATE,weight=1
capacityProvider=FARGATE_SPOT,weight=1
undefined
aws ecs create-cluster
--cluster-name my-cluster
--capacity-providers FARGATE FARGATE_SPOT
--default-capacity-provider-strategy
capacityProvider=FARGATE,weight=1
capacityProvider=FARGATE_SPOT,weight=1
undefined

Register Task Definition

注册任务定义

bash
cat > task-definition.json << 'EOF'
{
  "family": "web-app",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512",
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::123456789012:role/ecsTaskRole",
  "containerDefinitions": [
    {
      "name": "web",
      "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {"name": "NODE_ENV", "value": "production"}
      ],
      "secrets": [
        {
          "name": "DB_PASSWORD",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:db-password"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/web-app",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60
      }
    }
  ]
}
EOF

aws ecs register-task-definition --cli-input-json file://task-definition.json
bash
cat > task-definition.json << 'EOF'
{
  "family": "web-app",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512",
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::123456789012:role/ecsTaskRole",
  "containerDefinitions": [
    {
      "name": "web",
      "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {"name": "NODE_ENV", "value": "production"}
      ],
      "secrets": [
        {
          "name": "DB_PASSWORD",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:db-password"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/web-app",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60
      }
    }
  ]
}
EOF

aws ecs register-task-definition --cli-input-json file://task-definition.json

Create Service with Load Balancer

创建带负载均衡器的服务

bash
aws ecs create-service \
  --cluster my-cluster \
  --service-name web-service \
  --task-definition web-app:1 \
  --desired-count 2 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-12345678,subnet-87654321],
    securityGroups=[sg-12345678],
    assignPublicIp=DISABLED
  }" \
  --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/web-tg/1234567890123456,containerName=web,containerPort=8080" \
  --health-check-grace-period-seconds 60
bash
aws ecs create-service \
  --cluster my-cluster \
  --service-name web-service \
  --task-definition web-app:1 \
  --desired-count 2 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-12345678,subnet-87654321],
    securityGroups=[sg-12345678],
    assignPublicIp=DISABLED
  }" \
  --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/web-tg/1234567890123456,containerName=web,containerPort=8080" \
  --health-check-grace-period-seconds 60

Run Standalone Task

运行独立任务

bash
aws ecs run-task \
  --cluster my-cluster \
  --task-definition my-batch-job:1 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-12345678],
    securityGroups=[sg-12345678],
    assignPublicIp=ENABLED
  }"
bash
aws ecs run-task \
  --cluster my-cluster \
  --task-definition my-batch-job:1 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-12345678],
    securityGroups=[sg-12345678],
    assignPublicIp=ENABLED
  }"

Update Service (Deploy New Image)

更新服务(部署新镜像)

bash
undefined
bash
undefined

Register new task definition with updated image

Register new task definition with updated image

aws ecs register-task-definition --cli-input-json file://task-definition.json
aws ecs register-task-definition --cli-input-json file://task-definition.json

Update service to use new version

Update service to use new version

aws ecs update-service
--cluster my-cluster
--service web-service
--task-definition web-app:2
--force-new-deployment
undefined
aws ecs update-service
--cluster my-cluster
--service web-service
--task-definition web-app:2
--force-new-deployment
undefined

Auto Scaling

自动扩缩容

bash
undefined
bash
undefined

Register scalable target

Register scalable target

aws application-autoscaling register-scalable-target
--service-namespace ecs
--resource-id service/my-cluster/web-service
--scalable-dimension ecs:service:DesiredCount
--min-capacity 2
--max-capacity 10
aws application-autoscaling register-scalable-target
--service-namespace ecs
--resource-id service/my-cluster/web-service
--scalable-dimension ecs:service:DesiredCount
--min-capacity 2
--max-capacity 10

Target tracking policy

Target tracking policy

aws application-autoscaling put-scaling-policy
--service-namespace ecs
--resource-id service/my-cluster/web-service
--scalable-dimension ecs:service:DesiredCount
--policy-name cpu-target-tracking
--policy-type TargetTrackingScaling
--target-tracking-scaling-policy-configuration '{ "TargetValue": 70.0, "PredefinedMetricSpecification": { "PredefinedMetricType": "ECSServiceAverageCPUUtilization" }, "ScaleOutCooldown": 60, "ScaleInCooldown": 120 }'
undefined
aws application-autoscaling put-scaling-policy
--service-namespace ecs
--resource-id service/my-cluster/web-service
--scalable-dimension ecs:service:DesiredCount
--policy-name cpu-target-tracking
--policy-type TargetTrackingScaling
--target-tracking-scaling-policy-configuration '{ "TargetValue": 70.0, "PredefinedMetricSpecification": { "PredefinedMetricType": "ECSServiceAverageCPUUtilization" }, "ScaleOutCooldown": 60, "ScaleInCooldown": 120 }'
undefined

CLI Reference

CLI参考

Cluster Management

集群管理

CommandDescription
aws ecs create-cluster
Create cluster
aws ecs describe-clusters
Get cluster details
aws ecs list-clusters
List clusters
aws ecs delete-cluster
Delete cluster
命令描述
aws ecs create-cluster
创建集群
aws ecs describe-clusters
获取集群详情
aws ecs list-clusters
列出集群
aws ecs delete-cluster
删除集群

Task Definitions

任务定义

CommandDescription
aws ecs register-task-definition
Create task definition
aws ecs describe-task-definition
Get task definition
aws ecs list-task-definitions
List task definitions
aws ecs deregister-task-definition
Deregister version
命令描述
aws ecs register-task-definition
创建任务定义
aws ecs describe-task-definition
获取任务定义
aws ecs list-task-definitions
列出任务定义
aws ecs deregister-task-definition
注销版本

Services

服务

CommandDescription
aws ecs create-service
Create service
aws ecs update-service
Update service
aws ecs describe-services
Get service details
aws ecs delete-service
Delete service
命令描述
aws ecs create-service
创建服务
aws ecs update-service
更新服务
aws ecs describe-services
获取服务详情
aws ecs delete-service
删除服务

Tasks

任务

CommandDescription
aws ecs run-task
Run standalone task
aws ecs stop-task
Stop running task
aws ecs describe-tasks
Get task details
aws ecs list-tasks
List tasks
命令描述
aws ecs run-task
运行独立任务
aws ecs stop-task
停止运行中任务
aws ecs describe-tasks
获取任务详情
aws ecs list-tasks
列出任务

Best Practices

最佳实践

Security

安全

  • Use task roles for AWS API access (not access keys)
  • Use execution roles for ECR/Secrets access
  • Store secrets in Secrets Manager or Parameter Store
  • Use private subnets with NAT gateway
  • Enable CloudTrail for API auditing
  • 使用任务角色访问AWS API(而非访问密钥)
  • 使用执行角色访问ECR/密钥
  • 将密钥存储在Secrets Manager或Parameter Store中
  • 使用私有子网搭配NAT网关
  • 启用CloudTrail进行API审计

Performance

性能

  • Right-size CPU/memory — monitor and adjust
  • Use Fargate Spot for fault-tolerant workloads (70% savings)
  • Enable container insights for monitoring
  • Use service discovery for internal communication
  • 合理配置CPU/内存 — 监控并调整
  • 使用Fargate Spot运行容错性工作负载(节省70%成本)
  • 启用容器洞察进行监控
  • 使用服务发现实现内部通信

Reliability

可靠性

  • Deploy across multiple AZs
  • Configure health checks properly
  • Set appropriate deregistration delay
  • Use circuit breaker for deployments
bash
aws ecs update-service \
  --cluster my-cluster \
  --service web-service \
  --deployment-configuration '{
    "deploymentCircuitBreaker": {
      "enable": true,
      "rollback": true
    }
  }'
  • 跨多个可用区部署
  • 正确配置健康检查
  • 设置合适的注销延迟
  • 使用断路器控制部署
bash
aws ecs update-service \
  --cluster my-cluster \
  --service web-service \
  --deployment-configuration '{
    "deploymentCircuitBreaker": {
      "enable": true,
      "rollback": true
    }
  }'

Cost Optimization

成本优化

  • Use Fargate Spot for batch workloads
  • Right-size task resources
  • Scale to zero when not needed
  • Use capacity providers for mixed Fargate/Spot
  • 使用Fargate Spot运行批处理工作负载
  • 合理调整任务资源
  • 闲置时缩容至零
  • 使用容量提供商混合Fargate/Spot实例

Troubleshooting

故障排查

Task Fails to Start

任务启动失败

Check:
bash
undefined
检查:
bash
undefined

View stopped tasks

View stopped tasks

aws ecs describe-tasks
--cluster my-cluster
--tasks $(aws ecs list-tasks --cluster my-cluster --desired-status STOPPED --query 'taskArns[0]' --output text)

**Common causes:**
- Image not found (ECR permissions)
- Secrets access denied
- Network configuration (subnets, security groups)
- Resource limits exceeded
aws ecs describe-tasks
--cluster my-cluster
--tasks $(aws ecs list-tasks --cluster my-cluster --desired-status STOPPED --query 'taskArns[0]' --output text)

**常见原因:**
- 镜像未找到(ECR权限问题)
- 密钥访问被拒绝
- 网络配置(子网、安全组)
- 资源限制超出

Container Keeps Restarting

容器持续重启

Debug:
bash
undefined
调试:
bash
undefined

Check CloudWatch logs

Check CloudWatch logs

aws logs get-log-events
--log-group-name /ecs/web-app
--log-stream-name "ecs/web/abc123"
aws logs get-log-events
--log-group-name /ecs/web-app
--log-stream-name "ecs/web/abc123"

Check task details

Check task details

aws ecs describe-tasks
--cluster my-cluster
--tasks task-arn
--query 'tasks[0].containers[0].{reason:reason,exitCode:exitCode}'

**Causes:**
- Health check failing
- Application crashing
- Out of memory
aws ecs describe-tasks
--cluster my-cluster
--tasks task-arn
--query 'tasks[0].containers[0].{reason:reason,exitCode:exitCode}'

**原因:**
- 健康检查失败
- 应用崩溃
- 内存不足

Service Stuck Deploying

服务部署停滞

bash
undefined
bash
undefined

Check deployment status

Check deployment status

aws ecs describe-services
--cluster my-cluster
--services web-service
--query 'services[0].deployments'
aws ecs describe-services
--cluster my-cluster
--services web-service
--query 'services[0].deployments'

Check events

Check events

aws ecs describe-services
--cluster my-cluster
--services web-service
--query 'services[0].events[:5]'

**Causes:**
- Health check failing on new tasks
- Not enough capacity
- Target group health checks failing
aws ecs describe-services
--cluster my-cluster
--services web-service
--query 'services[0].events[:5]'

**原因:**
- 新任务健康检查失败
- 容量不足
- 目标组健康检查失败

Cannot Pull Image from ECR

无法从ECR拉取镜像

Check execution role has:
json
{
  "Effect": "Allow",
  "Action": [
    "ecr:GetAuthorizationToken",
    "ecr:BatchCheckLayerAvailability",
    "ecr:GetDownloadUrlForLayer",
    "ecr:BatchGetImage"
  ],
  "Resource": "*"
}
Also check:
  • VPC endpoint for ECR (if private subnet)
  • NAT gateway (if private subnet)
  • Security group allows HTTPS outbound
检查执行角色是否拥有:
json
{
  "Effect": "Allow",
  "Action": [
    "ecr:GetAuthorizationToken",
    "ecr:BatchCheckLayerAvailability",
    "ecr:GetDownloadUrlForLayer",
    "ecr:BatchGetImage"
  ],
  "Resource": "*"
}
同时检查:
  • ECR的VPC端点(如果使用私有子网)
  • NAT网关(如果使用私有子网)
  • 安全组允许HTTPS出站

References

参考资料