ecs
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAWS ECS
AWS ECS
Amazon Elastic Container Service (ECS) is a fully managed container orchestration service. Run containers on AWS Fargate (serverless) or EC2 instances.
Amazon Elastic Container Service(ECS)是一项全托管的容器编排服务。可在AWS Fargate(无服务器)或EC2实例上运行容器。
Table of Contents
目录
Core Concepts
核心概念
Cluster
集群
Logical grouping of tasks or services. Can contain Fargate tasks, EC2 instances, or both.
任务或服务的逻辑分组。可包含Fargate任务、EC2实例,或两者兼具。
Task Definition
任务定义
Blueprint for your application. Defines containers, resources, networking, and IAM roles.
应用的蓝图。定义容器、资源、网络和IAM角色。
Task
任务
Running instance of a task definition. Can run standalone or as part of a service.
任务定义的运行实例。可独立运行,也可作为服务的一部分运行。
Service
服务
Maintains desired count of tasks. Handles deployments, load balancing, and auto scaling.
维持任务的期望数量。处理部署、负载均衡和自动扩缩容。
Launch Types
启动类型
| Type | Description | Use Case |
|---|---|---|
| Fargate | Serverless, pay per task | Most workloads |
| EC2 | Self-managed instances | GPU, Windows, specific requirements |
| 类型 | 描述 | 使用场景 |
|---|---|---|
| Fargate | 无服务器,按任务付费 | 大多数工作负载 |
| EC2 | 自管理实例 | GPU、Windows、特定需求场景 |
Common Patterns
常见模式
Create a Fargate Cluster
创建Fargate集群
AWS CLI:
bash
undefinedAWS CLI:
bash
undefinedCreate cluster
Create cluster
aws ecs create-cluster --cluster-name my-cluster
aws ecs create-cluster --cluster-name my-cluster
With capacity providers
With capacity providers
aws ecs create-cluster
--cluster-name my-cluster
--capacity-providers FARGATE FARGATE_SPOT
--default-capacity-provider-strategy
capacityProvider=FARGATE,weight=1
capacityProvider=FARGATE_SPOT,weight=1
--cluster-name my-cluster
--capacity-providers FARGATE FARGATE_SPOT
--default-capacity-provider-strategy
capacityProvider=FARGATE,weight=1
capacityProvider=FARGATE_SPOT,weight=1
undefinedaws ecs create-cluster
--cluster-name my-cluster
--capacity-providers FARGATE FARGATE_SPOT
--default-capacity-provider-strategy
capacityProvider=FARGATE,weight=1
capacityProvider=FARGATE_SPOT,weight=1
--cluster-name my-cluster
--capacity-providers FARGATE FARGATE_SPOT
--default-capacity-provider-strategy
capacityProvider=FARGATE,weight=1
capacityProvider=FARGATE_SPOT,weight=1
undefinedRegister Task Definition
注册任务定义
bash
cat > task-definition.json << 'EOF'
{
"family": "web-app",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::123456789012:role/ecsTaskRole",
"containerDefinitions": [
{
"name": "web",
"image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"environment": [
{"name": "NODE_ENV", "value": "production"}
],
"secrets": [
{
"name": "DB_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:db-password"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/web-app",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
}
}
]
}
EOF
aws ecs register-task-definition --cli-input-json file://task-definition.jsonbash
cat > task-definition.json << 'EOF'
{
"family": "web-app",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::123456789012:role/ecsTaskRole",
"containerDefinitions": [
{
"name": "web",
"image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"environment": [
{"name": "NODE_ENV", "value": "production"}
],
"secrets": [
{
"name": "DB_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:db-password"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/web-app",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
}
}
]
}
EOF
aws ecs register-task-definition --cli-input-json file://task-definition.jsonCreate Service with Load Balancer
创建带负载均衡器的服务
bash
aws ecs create-service \
--cluster my-cluster \
--service-name web-service \
--task-definition web-app:1 \
--desired-count 2 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={
subnets=[subnet-12345678,subnet-87654321],
securityGroups=[sg-12345678],
assignPublicIp=DISABLED
}" \
--load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/web-tg/1234567890123456,containerName=web,containerPort=8080" \
--health-check-grace-period-seconds 60bash
aws ecs create-service \
--cluster my-cluster \
--service-name web-service \
--task-definition web-app:1 \
--desired-count 2 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={
subnets=[subnet-12345678,subnet-87654321],
securityGroups=[sg-12345678],
assignPublicIp=DISABLED
}" \
--load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/web-tg/1234567890123456,containerName=web,containerPort=8080" \
--health-check-grace-period-seconds 60Run Standalone Task
运行独立任务
bash
aws ecs run-task \
--cluster my-cluster \
--task-definition my-batch-job:1 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={
subnets=[subnet-12345678],
securityGroups=[sg-12345678],
assignPublicIp=ENABLED
}"bash
aws ecs run-task \
--cluster my-cluster \
--task-definition my-batch-job:1 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={
subnets=[subnet-12345678],
securityGroups=[sg-12345678],
assignPublicIp=ENABLED
}"Update Service (Deploy New Image)
更新服务(部署新镜像)
bash
undefinedbash
undefinedRegister new task definition with updated image
Register new task definition with updated image
aws ecs register-task-definition --cli-input-json file://task-definition.json
aws ecs register-task-definition --cli-input-json file://task-definition.json
Update service to use new version
Update service to use new version
aws ecs update-service
--cluster my-cluster
--service web-service
--task-definition web-app:2
--force-new-deployment
--cluster my-cluster
--service web-service
--task-definition web-app:2
--force-new-deployment
undefinedaws ecs update-service
--cluster my-cluster
--service web-service
--task-definition web-app:2
--force-new-deployment
--cluster my-cluster
--service web-service
--task-definition web-app:2
--force-new-deployment
undefinedAuto Scaling
自动扩缩容
bash
undefinedbash
undefinedRegister scalable target
Register scalable target
aws application-autoscaling register-scalable-target
--service-namespace ecs
--resource-id service/my-cluster/web-service
--scalable-dimension ecs:service:DesiredCount
--min-capacity 2
--max-capacity 10
--service-namespace ecs
--resource-id service/my-cluster/web-service
--scalable-dimension ecs:service:DesiredCount
--min-capacity 2
--max-capacity 10
aws application-autoscaling register-scalable-target
--service-namespace ecs
--resource-id service/my-cluster/web-service
--scalable-dimension ecs:service:DesiredCount
--min-capacity 2
--max-capacity 10
--service-namespace ecs
--resource-id service/my-cluster/web-service
--scalable-dimension ecs:service:DesiredCount
--min-capacity 2
--max-capacity 10
Target tracking policy
Target tracking policy
aws application-autoscaling put-scaling-policy
--service-namespace ecs
--resource-id service/my-cluster/web-service
--scalable-dimension ecs:service:DesiredCount
--policy-name cpu-target-tracking
--policy-type TargetTrackingScaling
--target-tracking-scaling-policy-configuration '{ "TargetValue": 70.0, "PredefinedMetricSpecification": { "PredefinedMetricType": "ECSServiceAverageCPUUtilization" }, "ScaleOutCooldown": 60, "ScaleInCooldown": 120 }'
--service-namespace ecs
--resource-id service/my-cluster/web-service
--scalable-dimension ecs:service:DesiredCount
--policy-name cpu-target-tracking
--policy-type TargetTrackingScaling
--target-tracking-scaling-policy-configuration '{ "TargetValue": 70.0, "PredefinedMetricSpecification": { "PredefinedMetricType": "ECSServiceAverageCPUUtilization" }, "ScaleOutCooldown": 60, "ScaleInCooldown": 120 }'
undefinedaws application-autoscaling put-scaling-policy
--service-namespace ecs
--resource-id service/my-cluster/web-service
--scalable-dimension ecs:service:DesiredCount
--policy-name cpu-target-tracking
--policy-type TargetTrackingScaling
--target-tracking-scaling-policy-configuration '{ "TargetValue": 70.0, "PredefinedMetricSpecification": { "PredefinedMetricType": "ECSServiceAverageCPUUtilization" }, "ScaleOutCooldown": 60, "ScaleInCooldown": 120 }'
--service-namespace ecs
--resource-id service/my-cluster/web-service
--scalable-dimension ecs:service:DesiredCount
--policy-name cpu-target-tracking
--policy-type TargetTrackingScaling
--target-tracking-scaling-policy-configuration '{ "TargetValue": 70.0, "PredefinedMetricSpecification": { "PredefinedMetricType": "ECSServiceAverageCPUUtilization" }, "ScaleOutCooldown": 60, "ScaleInCooldown": 120 }'
undefinedCLI Reference
CLI参考
Cluster Management
集群管理
| Command | Description |
|---|---|
| Create cluster |
| Get cluster details |
| List clusters |
| Delete cluster |
| 命令 | 描述 |
|---|---|
| 创建集群 |
| 获取集群详情 |
| 列出集群 |
| 删除集群 |
Task Definitions
任务定义
| Command | Description |
|---|---|
| Create task definition |
| Get task definition |
| List task definitions |
| Deregister version |
| 命令 | 描述 |
|---|---|
| 创建任务定义 |
| 获取任务定义 |
| 列出任务定义 |
| 注销版本 |
Services
服务
| Command | Description |
|---|---|
| Create service |
| Update service |
| Get service details |
| Delete service |
| 命令 | 描述 |
|---|---|
| 创建服务 |
| 更新服务 |
| 获取服务详情 |
| 删除服务 |
Tasks
任务
| Command | Description |
|---|---|
| Run standalone task |
| Stop running task |
| Get task details |
| List tasks |
| 命令 | 描述 |
|---|---|
| 运行独立任务 |
| 停止运行中任务 |
| 获取任务详情 |
| 列出任务 |
Best Practices
最佳实践
Security
安全
- Use task roles for AWS API access (not access keys)
- Use execution roles for ECR/Secrets access
- Store secrets in Secrets Manager or Parameter Store
- Use private subnets with NAT gateway
- Enable CloudTrail for API auditing
- 使用任务角色访问AWS API(而非访问密钥)
- 使用执行角色访问ECR/密钥
- 将密钥存储在Secrets Manager或Parameter Store中
- 使用私有子网搭配NAT网关
- 启用CloudTrail进行API审计
Performance
性能
- Right-size CPU/memory — monitor and adjust
- Use Fargate Spot for fault-tolerant workloads (70% savings)
- Enable container insights for monitoring
- Use service discovery for internal communication
- 合理配置CPU/内存 — 监控并调整
- 使用Fargate Spot运行容错性工作负载(节省70%成本)
- 启用容器洞察进行监控
- 使用服务发现实现内部通信
Reliability
可靠性
- Deploy across multiple AZs
- Configure health checks properly
- Set appropriate deregistration delay
- Use circuit breaker for deployments
bash
aws ecs update-service \
--cluster my-cluster \
--service web-service \
--deployment-configuration '{
"deploymentCircuitBreaker": {
"enable": true,
"rollback": true
}
}'- 跨多个可用区部署
- 正确配置健康检查
- 设置合适的注销延迟
- 使用断路器控制部署
bash
aws ecs update-service \
--cluster my-cluster \
--service web-service \
--deployment-configuration '{
"deploymentCircuitBreaker": {
"enable": true,
"rollback": true
}
}'Cost Optimization
成本优化
- Use Fargate Spot for batch workloads
- Right-size task resources
- Scale to zero when not needed
- Use capacity providers for mixed Fargate/Spot
- 使用Fargate Spot运行批处理工作负载
- 合理调整任务资源
- 闲置时缩容至零
- 使用容量提供商混合Fargate/Spot实例
Troubleshooting
故障排查
Task Fails to Start
任务启动失败
Check:
bash
undefined检查:
bash
undefinedView stopped tasks
View stopped tasks
aws ecs describe-tasks
--cluster my-cluster
--tasks $(aws ecs list-tasks --cluster my-cluster --desired-status STOPPED --query 'taskArns[0]' --output text)
--cluster my-cluster
--tasks $(aws ecs list-tasks --cluster my-cluster --desired-status STOPPED --query 'taskArns[0]' --output text)
**Common causes:**
- Image not found (ECR permissions)
- Secrets access denied
- Network configuration (subnets, security groups)
- Resource limits exceededaws ecs describe-tasks
--cluster my-cluster
--tasks $(aws ecs list-tasks --cluster my-cluster --desired-status STOPPED --query 'taskArns[0]' --output text)
--cluster my-cluster
--tasks $(aws ecs list-tasks --cluster my-cluster --desired-status STOPPED --query 'taskArns[0]' --output text)
**常见原因:**
- 镜像未找到(ECR权限问题)
- 密钥访问被拒绝
- 网络配置(子网、安全组)
- 资源限制超出Container Keeps Restarting
容器持续重启
Debug:
bash
undefined调试:
bash
undefinedCheck CloudWatch logs
Check CloudWatch logs
aws logs get-log-events
--log-group-name /ecs/web-app
--log-stream-name "ecs/web/abc123"
--log-group-name /ecs/web-app
--log-stream-name "ecs/web/abc123"
aws logs get-log-events
--log-group-name /ecs/web-app
--log-stream-name "ecs/web/abc123"
--log-group-name /ecs/web-app
--log-stream-name "ecs/web/abc123"
Check task details
Check task details
aws ecs describe-tasks
--cluster my-cluster
--tasks task-arn
--query 'tasks[0].containers[0].{reason:reason,exitCode:exitCode}'
--cluster my-cluster
--tasks task-arn
--query 'tasks[0].containers[0].{reason:reason,exitCode:exitCode}'
**Causes:**
- Health check failing
- Application crashing
- Out of memoryaws ecs describe-tasks
--cluster my-cluster
--tasks task-arn
--query 'tasks[0].containers[0].{reason:reason,exitCode:exitCode}'
--cluster my-cluster
--tasks task-arn
--query 'tasks[0].containers[0].{reason:reason,exitCode:exitCode}'
**原因:**
- 健康检查失败
- 应用崩溃
- 内存不足Service Stuck Deploying
服务部署停滞
bash
undefinedbash
undefinedCheck deployment status
Check deployment status
aws ecs describe-services
--cluster my-cluster
--services web-service
--query 'services[0].deployments'
--cluster my-cluster
--services web-service
--query 'services[0].deployments'
aws ecs describe-services
--cluster my-cluster
--services web-service
--query 'services[0].deployments'
--cluster my-cluster
--services web-service
--query 'services[0].deployments'
Check events
Check events
aws ecs describe-services
--cluster my-cluster
--services web-service
--query 'services[0].events[:5]'
--cluster my-cluster
--services web-service
--query 'services[0].events[:5]'
**Causes:**
- Health check failing on new tasks
- Not enough capacity
- Target group health checks failingaws ecs describe-services
--cluster my-cluster
--services web-service
--query 'services[0].events[:5]'
--cluster my-cluster
--services web-service
--query 'services[0].events[:5]'
**原因:**
- 新任务健康检查失败
- 容量不足
- 目标组健康检查失败Cannot Pull Image from ECR
无法从ECR拉取镜像
Check execution role has:
json
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Resource": "*"
}Also check:
- VPC endpoint for ECR (if private subnet)
- NAT gateway (if private subnet)
- Security group allows HTTPS outbound
检查执行角色是否拥有:
json
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Resource": "*"
}同时检查:
- ECR的VPC端点(如果使用私有子网)
- NAT网关(如果使用私有子网)
- 安全组允许HTTPS出站