aws-cost-operations

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AWS Cost & Operations

AWS成本与运维

This skill provides comprehensive guidance for AWS cost optimization, monitoring, observability, and operational excellence with integrated MCP servers.
该Skill提供全面的AWS成本优化、监控、可观测性及运维卓越指导,并集成了MCP服务器。

AWS Documentation Requirement

AWS文档要求

CRITICAL: This skill requires AWS MCP tools for accurate, up-to-date AWS information.
重要提示:该Skill需要AWS MCP工具来获取准确、最新的AWS信息。

Before Answering AWS Questions

回答AWS问题前的准备

  1. Always verify using AWS MCP tools (if available):
    • mcp__aws-mcp__aws___search_documentation
      or
      mcp__*awsdocs*__aws___search_documentation
      - Search AWS docs
    • mcp__aws-mcp__aws___read_documentation
      or
      mcp__*awsdocs*__aws___read_documentation
      - Read specific pages
    • mcp__aws-mcp__aws___get_regional_availability
      - Check service availability
  2. If AWS MCP tools are unavailable:
    • Guide user to configure AWS MCP using the
      aws-mcp-setup
      skill (auto-loaded as dependency)
    • Help determine which option fits their environment:
      • Has uvx + AWS credentials → Full AWS MCP Server
      • No Python/credentials → AWS Documentation MCP (no auth)
    • If cannot determine → Ask user which option to use
  1. 务必验证(若可用):使用AWS MCP工具
    • mcp__aws-mcp__aws___search_documentation
      mcp__*awsdocs*__aws___search_documentation
      - 搜索AWS文档
    • mcp__aws-mcp__aws___read_documentation
      mcp__*awsdocs*__aws___read_documentation
      - 阅读特定页面
    • mcp__aws-mcp__aws___get_regional_availability
      - 检查服务可用性
  2. 若AWS MCP工具不可用
    • 引导用户使用
      aws-mcp-setup
      Skill配置AWS MCP(作为依赖自动加载)
    • 帮助用户确定适合其环境的选项:
      • 拥有uvx + AWS凭证 → 完整AWS MCP服务器
      • 无Python/凭证 → AWS文档MCP(无需认证)
    • 若无法确定 → 询问用户使用哪个选项

Integrated MCP Servers

集成的MCP服务器

This skill includes 8 MCP servers automatically configured with the plugin:
该Skill包含8个通过插件自动配置的MCP服务器:

Cost Management Servers

成本管理服务器

1. AWS Billing and Cost Management MCP Server

1. AWS账单与成本管理MCP服务器

Purpose: Real-time billing and cost management
  • View current AWS spending and trends
  • Analyze billing details across services
  • Track budget utilization
  • Monitor cost allocation tags
  • Review consolidated billing for organizations
用途:实时账单与成本管理
  • 查看当前AWS支出及趋势
  • 跨服务分析账单明细
  • 跟踪预算使用情况
  • 监控成本分配标签
  • 查看组织的合并账单

2. AWS Pricing MCP Server

2. AWS定价MCP服务器

Purpose: Pre-deployment cost estimation and optimization
  • Estimate costs before deploying resources
  • Compare pricing across regions
  • Calculate Total Cost of Ownership (TCO)
  • Evaluate different service options for cost efficiency
  • Get current pricing information for AWS services
用途:部署前成本估算与优化
  • 估算资源部署前的成本
  • 跨区域比较定价
  • 计算总拥有成本(TCO)
  • 评估不同服务选项的成本效益
  • 获取AWS服务的当前定价信息

3. AWS Cost Explorer MCP Server

3. AWS成本资源管理器MCP服务器

Purpose: Detailed cost analysis and reporting
  • Analyze historical spending patterns
  • Create custom cost reports
  • Identify cost anomalies and trends
  • Forecast future costs
  • Analyze cost by service, region, or tag
  • Generate cost optimization recommendations
用途:详细的成本分析与报告
  • 分析历史支出模式
  • 创建自定义成本报告
  • 识别成本异常与趋势
  • 预测未来成本
  • 按服务、区域或标签分析成本
  • 生成成本优化建议

Monitoring & Observability Servers

监控与可观测性服务器

4. Amazon CloudWatch MCP Server

4. Amazon CloudWatch MCP服务器

Purpose: Metrics, alarms, and logs analysis
  • Query CloudWatch metrics and logs
  • Create and manage CloudWatch alarms
  • Analyze application performance metrics
  • Troubleshoot operational issues
  • Set up custom dashboards
  • Monitor resource utilization
用途:指标、告警与日志分析
  • 查询CloudWatch指标与日志
  • 创建和管理CloudWatch告警
  • 分析应用性能指标
  • 排查运维问题
  • 设置自定义仪表板
  • 监控资源利用率

5. Amazon CloudWatch Application Signals MCP Server

5. Amazon CloudWatch Application Signals MCP服务器

Purpose: Application monitoring and performance insights
  • Monitor application health and performance
  • Analyze service-level objectives (SLOs)
  • Track application dependencies
  • Identify performance bottlenecks
  • Monitor service map and traces
用途:应用监控与性能洞察
  • 监控应用健康与性能
  • 分析服务级别目标(SLO)
  • 跟踪应用依赖关系
  • 识别性能瓶颈
  • 监控服务地图与追踪

6. AWS Managed Prometheus MCP Server

6. AWS托管Prometheus MCP服务器

Purpose: Prometheus-compatible monitoring
  • Query Prometheus metrics
  • Monitor containerized applications
  • Analyze Kubernetes workload metrics
  • Create PromQL queries
  • Track custom application metrics
用途:兼容Prometheus的监控
  • 查询Prometheus指标
  • 监控容器化应用
  • 分析Kubernetes工作负载指标
  • 创建PromQL查询
  • 跟踪自定义应用指标

Audit & Security Servers

审计与安全服务器

7. AWS CloudTrail MCP Server

7. AWS CloudTrail MCP服务器

Purpose: AWS API activity and audit analysis
  • Analyze AWS API calls and user activity
  • Track resource changes and modifications
  • Investigate security incidents
  • Audit compliance requirements
  • Identify unusual access patterns
  • Review who made what changes when
用途:AWS API活动与审计分析
  • 分析AWS API调用与用户活动
  • 跟踪资源变更与修改
  • 调查安全事件
  • 审计合规要求
  • 识别异常访问模式
  • 查看谁在何时进行了何种变更

8. AWS Well-Architected Security Assessment Tool MCP Server

8. AWS卓越架构安全评估工具MCP服务器

Purpose: Security assessment against Well-Architected Framework
  • Assess security posture against AWS best practices
  • Identify security gaps and vulnerabilities
  • Get security improvement recommendations
  • Review security pillar compliance
  • Generate security assessment reports
用途:基于卓越架构框架的安全评估
  • 对照AWS最佳实践评估安全状况
  • 识别安全差距与漏洞
  • 获取安全改进建议
  • 查看安全支柱合规性
  • 生成安全评估报告

When to Use This Skill

何时使用该Skill

Use this skill when:
  • Optimizing AWS costs and reducing spending
  • Estimating costs before deployment
  • Monitoring application and infrastructure performance
  • Setting up observability and alerting
  • Analyzing spending patterns and trends
  • Investigating operational issues
  • Auditing AWS activity and changes
  • Assessing security posture
  • Implementing operational excellence
在以下场景使用该Skill:
  • 优化AWS成本并减少支出
  • 部署前估算成本
  • 监控应用与基础设施性能
  • 设置可观测性与告警
  • 分析支出模式与趋势
  • 排查运维问题
  • 审计AWS活动与变更
  • 评估安全状况
  • 实施运维卓越实践

Cost Optimization Best Practices

成本优化最佳实践

Pre-Deployment Cost Estimation

部署前成本估算

Always estimate costs before deploying:
  1. Use AWS Pricing MCP to estimate resource costs
  2. Compare pricing across different regions
  3. Evaluate alternative service options
  4. Calculate expected monthly costs
  5. Plan for scaling and growth
Example workflow:
"Estimate the monthly cost of running a Lambda function with
1 million invocations, 512MB memory, 3-second duration in us-east-1"
部署前务必估算成本
  1. 使用AWS定价MCP估算资源成本
  2. 比较不同区域的定价
  3. 评估替代服务选项
  4. 计算预期月度成本
  5. 规划扩展与增长
示例工作流
"估算在us-east-1区域运行一个Lambda函数的月度成本,该函数有100万次调用,512MB内存,3秒执行时长"

Cost Analysis and Optimization

成本分析与优化

Regular cost reviews:
  1. Use Cost Explorer MCP to analyze spending trends
  2. Identify cost anomalies and unexpected charges
  3. Review costs by service, region, and environment
  4. Compare actual vs. budgeted costs
  5. Generate cost optimization recommendations
Cost optimization strategies:
  • Right-size over-provisioned resources
  • Use appropriate storage classes (S3, EBS)
  • Implement auto-scaling for dynamic workloads
  • Leverage Savings Plans and Reserved Instances
  • Delete unused resources and snapshots
  • Use cost allocation tags effectively
定期成本审查
  1. 使用成本资源管理器MCP分析支出趋势
  2. 识别成本异常与意外收费
  3. 按服务、区域和环境审查成本
  4. 比较实际成本与预算成本
  5. 生成成本优化建议
成本优化策略
  • 调整过度配置的资源规格
  • 使用合适的存储类(S3、EBS)
  • 为动态工作负载实现自动扩缩容
  • 利用Savings Plans和预留实例
  • 删除未使用的资源与快照
  • 有效使用成本分配标签

Budget Monitoring

预算监控

Track spending against budgets:
  1. Use Billing and Cost Management MCP to monitor budgets
  2. Set up budget alerts for threshold breaches
  3. Review budget utilization regularly
  4. Adjust budgets based on trends
  5. Implement cost controls and governance
跟踪支出与预算的对比
  1. 使用账单与成本管理MCP监控预算
  2. 为阈值突破设置预算告警
  3. 定期审查预算使用情况
  4. 根据趋势调整预算
  5. 实施成本控制与治理

Monitoring and Observability Best Practices

监控与可观测性最佳实践

CloudWatch Metrics and Alarms

CloudWatch指标与告警

Implement comprehensive monitoring:
  1. Use CloudWatch MCP to query metrics and logs
  2. Set up alarms for critical metrics:
    • CPU and memory utilization
    • Error rates and latency
    • Queue depths and processing times
    • API gateway throttling
    • Lambda errors and timeouts
  3. Create CloudWatch dashboards for visualization
  4. Use log insights for troubleshooting
Example alarm scenarios:
  • Lambda error rate > 1%
  • EC2 CPU utilization > 80%
  • API Gateway 4xx/5xx error spike
  • DynamoDB throttled requests
  • ECS task failures
实施全面监控
  1. 使用CloudWatch MCP查询指标与日志
  2. 为关键指标设置告警:
    • CPU与内存利用率
    • 错误率与延迟
    • 队列深度与处理时间
    • API网关限流
    • Lambda错误与超时
  3. 创建CloudWatch可视化仪表板
  4. 使用日志洞察进行故障排查
示例告警场景
  • Lambda错误率 > 1%
  • EC2 CPU利用率 > 80%
  • API Gateway 4xx/5xx错误激增
  • DynamoDB限流请求
  • ECS任务失败

Application Performance Monitoring

应用性能监控

Monitor application health:
  1. Use CloudWatch Application Signals MCP for APM
  2. Track service-level objectives (SLOs)
  3. Monitor application dependencies
  4. Identify performance bottlenecks
  5. Set up distributed tracing
监控应用健康状况
  1. 使用CloudWatch Application Signals MCP进行APM
  2. 跟踪服务级别目标(SLO)
  3. 监控应用依赖关系
  4. 识别性能瓶颈
  5. 设置分布式追踪

Container and Kubernetes Monitoring

容器与Kubernetes监控

For containerized workloads:
  1. Use AWS Managed Prometheus MCP for metrics
  2. Monitor container resource utilization
  3. Track pod and node health
  4. Create PromQL queries for custom metrics
  5. Set up alerts for container anomalies
针对容器化工作负载
  1. 使用AWS托管Prometheus MCP获取指标
  2. 监控容器资源利用率
  3. 跟踪Pod与节点健康状况
  4. 为自定义指标创建PromQL查询
  5. 为容器异常设置告警

Audit and Security Best Practices

审计与安全最佳实践

CloudTrail Activity Analysis

CloudTrail活动分析

Audit AWS activity:
  1. Use CloudTrail MCP to analyze API activity
  2. Track who made changes to resources
  3. Investigate security incidents
  4. Monitor for suspicious activity patterns
  5. Audit compliance with policies
Common audit scenarios:
  • "Who deleted this S3 bucket?"
  • "Show all IAM role changes in the last 24 hours"
  • "List failed login attempts"
  • "Find all actions by a specific user"
  • "Track modifications to security groups"
审计AWS活动
  1. 使用CloudTrail MCP分析API活动
  2. 跟踪谁对资源进行了变更
  3. 调查安全事件
  4. 监控可疑活动模式
  5. 审计策略合规性
常见审计场景
  • "谁删除了这个S3存储桶?"
  • "显示过去24小时内所有IAM角色变更"
  • "列出失败的登录尝试"
  • "查找特定用户的所有操作"
  • "跟踪安全组的修改"

Security Assessment

安全评估

Regular security reviews:
  1. Use Well-Architected Security Assessment MCP
  2. Assess security posture against best practices
  3. Identify security gaps and vulnerabilities
  4. Implement recommended security improvements
  5. Document security compliance
Security assessment areas:
  • Identity and Access Management (IAM)
  • Detective controls and monitoring
  • Infrastructure protection
  • Data protection and encryption
  • Incident response preparedness
定期安全审查
  1. 使用卓越架构安全评估MCP
  2. 对照最佳实践评估安全状况
  3. 识别安全差距与漏洞
  4. 实施建议的安全改进措施
  5. 记录安全合规性
安全评估领域
  • 身份与访问管理(IAM)
  • 检测控制与监控
  • 基础设施保护
  • 数据保护与加密
  • 事件响应准备

Using MCP Servers Effectively

有效使用MCP服务器

Cost Analysis Workflow

成本分析工作流

  1. Pre-deployment: Use Pricing MCP to estimate costs
  2. Post-deployment: Use Billing MCP to track actual spending
  3. Analysis: Use Cost Explorer MCP for detailed cost analysis
  4. Optimization: Implement recommendations from Cost Explorer
  1. 部署前:使用定价MCP估算成本
  2. 部署后:使用账单MCP跟踪实际支出
  3. 分析:使用成本资源管理器MCP进行详细成本分析
  4. 优化:实施成本资源管理器的建议

Monitoring Workflow

监控工作流

  1. Setup: Configure CloudWatch metrics and alarms
  2. Monitor: Use CloudWatch MCP to track key metrics
  3. Analyze: Use Application Signals for APM insights
  4. Troubleshoot: Query CloudWatch Logs for issue resolution
  1. 设置:配置CloudWatch指标与告警
  2. 监控:使用CloudWatch MCP跟踪关键指标
  3. 分析:使用Application Signals获取APM洞察
  4. 排查:查询CloudWatch日志解决问题

Security Workflow

安全工作流

  1. Audit: Use CloudTrail MCP to review activity
  2. Assess: Use Well-Architected Security Assessment
  3. Remediate: Implement security recommendations
  4. Monitor: Track security events via CloudWatch
  1. 审计:使用CloudTrail MCP审查活动
  2. 评估:使用卓越架构安全评估工具
  3. 整改:实施安全建议
  4. 监控:通过CloudWatch跟踪安全事件

MCP Usage Best Practices

MCP使用最佳实践

  1. Cost Awareness: Check pricing before deploying resources
  2. Proactive Monitoring: Set up alarms for critical metrics
  3. Regular Reviews: Analyze costs and performance weekly
  4. Audit Trails: Review CloudTrail logs for compliance
  5. Security First: Run security assessments regularly
  6. Optimize Continuously: Act on cost and performance recommendations
  1. 成本意识:部署资源前检查定价
  2. 主动监控:为关键指标设置告警
  3. 定期审查:每周分析成本与性能
  4. 审计跟踪:为合规性审查CloudTrail日志
  5. 安全优先:定期运行安全评估
  6. 持续优化:根据成本与性能建议采取行动

Operational Excellence Guidelines

运维卓越指南

Cost Optimization

成本优化

  • Tag Everything: Use consistent cost allocation tags
  • Review Monthly: Analyze spending trends and anomalies
  • Right-size: Match resources to actual usage
  • Automate: Use auto-scaling and scheduling
  • Monitor Budgets: Set alerts for cost overruns
  • 万物打标签:使用一致的成本分配标签
  • 月度审查:分析支出趋势与异常
  • 合理配置:匹配资源与实际使用情况
  • 自动化:使用自动扩缩容与调度
  • 监控预算:为超支设置告警

Monitoring and Alerting

监控与告警

  • Critical Metrics: Alert on business-critical metrics
  • Noise Reduction: Fine-tune thresholds to reduce false positives
  • Actionable Alerts: Ensure alerts have clear remediation steps
  • Dashboard Visibility: Create dashboards for key stakeholders
  • Log Retention: Balance cost and compliance needs
  • 关键指标:对业务关键指标设置告警
  • 减少噪音:微调阈值以减少误报
  • 可执行告警:确保告警有明确的整改步骤
  • 仪表板可见性:为关键利益相关者创建仪表板
  • 日志保留:平衡成本与合规需求

Security and Compliance

安全与合规

  • Least Privilege: Grant minimum required permissions
  • Audit Regularly: Review CloudTrail logs for anomalies
  • Encrypt Data: Use encryption at rest and in transit
  • Assess Continuously: Run security assessments frequently
  • Incident Response: Have procedures for security events
  • 最小权限:授予所需的最小权限
  • 定期审计:审查CloudTrail日志中的异常
  • 加密数据:使用静态与传输中的加密
  • 持续评估:频繁运行安全评估
  • 事件响应:制定安全事件处理流程

Additional Resources

额外资源

For detailed operational patterns and best practices, refer to the comprehensive reference:
File:
references/operations-patterns.md
This reference includes:
  • Cost optimization strategies
  • Monitoring and alerting patterns
  • Observability best practices
  • Security and compliance guidelines
  • Troubleshooting workflows
如需详细的运维模式与最佳实践,请参考综合参考资料:
文件
references/operations-patterns.md
该参考资料包括:
  • 成本优化策略
  • 监控与告警模式
  • 可观测性最佳实践
  • 安全与合规指南
  • 故障排查工作流

CloudWatch Alarms Reference

CloudWatch告警参考

File:
references/cloudwatch-alarms.md
Common alarm configurations for:
  • Lambda functions
  • EC2 instances
  • RDS databases
  • DynamoDB tables
  • API Gateway
  • ECS services
  • Application Load Balancers
文件
references/cloudwatch-alarms.md
常见告警配置适用于:
  • Lambda函数
  • EC2实例
  • RDS数据库
  • DynamoDB表
  • API Gateway
  • ECS服务
  • 应用负载均衡器