aws-cost-operations
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAWS Cost & Operations
AWS成本与运维
This skill provides comprehensive guidance for AWS cost optimization, monitoring, observability, and operational excellence with integrated MCP servers.
该Skill提供全面的AWS成本优化、监控、可观测性及运维卓越指导,并集成了MCP服务器。
AWS Documentation Requirement
AWS文档要求
CRITICAL: This skill requires AWS MCP tools for accurate, up-to-date AWS information.
重要提示:该Skill需要AWS MCP工具来获取准确、最新的AWS信息。
Before Answering AWS Questions
回答AWS问题前的准备
-
Always verify using AWS MCP tools (if available):
- or
mcp__aws-mcp__aws___search_documentation- Search AWS docsmcp__*awsdocs*__aws___search_documentation - or
mcp__aws-mcp__aws___read_documentation- Read specific pagesmcp__*awsdocs*__aws___read_documentation - - Check service availability
mcp__aws-mcp__aws___get_regional_availability
-
If AWS MCP tools are unavailable:
- Guide user to configure AWS MCP using the skill (auto-loaded as dependency)
aws-mcp-setup - Help determine which option fits their environment:
- Has uvx + AWS credentials → Full AWS MCP Server
- No Python/credentials → AWS Documentation MCP (no auth)
- If cannot determine → Ask user which option to use
- Guide user to configure AWS MCP using the
-
务必验证(若可用):使用AWS MCP工具
- 或
mcp__aws-mcp__aws___search_documentation- 搜索AWS文档mcp__*awsdocs*__aws___search_documentation - 或
mcp__aws-mcp__aws___read_documentation- 阅读特定页面mcp__*awsdocs*__aws___read_documentation - - 检查服务可用性
mcp__aws-mcp__aws___get_regional_availability
-
若AWS MCP工具不可用:
- 引导用户使用Skill配置AWS MCP(作为依赖自动加载)
aws-mcp-setup - 帮助用户确定适合其环境的选项:
- 拥有uvx + AWS凭证 → 完整AWS MCP服务器
- 无Python/凭证 → AWS文档MCP(无需认证)
- 若无法确定 → 询问用户使用哪个选项
- 引导用户使用
Integrated MCP Servers
集成的MCP服务器
This skill includes 8 MCP servers automatically configured with the plugin:
该Skill包含8个通过插件自动配置的MCP服务器:
Cost Management Servers
成本管理服务器
1. AWS Billing and Cost Management MCP Server
1. AWS账单与成本管理MCP服务器
Purpose: Real-time billing and cost management
- View current AWS spending and trends
- Analyze billing details across services
- Track budget utilization
- Monitor cost allocation tags
- Review consolidated billing for organizations
用途:实时账单与成本管理
- 查看当前AWS支出及趋势
- 跨服务分析账单明细
- 跟踪预算使用情况
- 监控成本分配标签
- 查看组织的合并账单
2. AWS Pricing MCP Server
2. AWS定价MCP服务器
Purpose: Pre-deployment cost estimation and optimization
- Estimate costs before deploying resources
- Compare pricing across regions
- Calculate Total Cost of Ownership (TCO)
- Evaluate different service options for cost efficiency
- Get current pricing information for AWS services
用途:部署前成本估算与优化
- 估算资源部署前的成本
- 跨区域比较定价
- 计算总拥有成本(TCO)
- 评估不同服务选项的成本效益
- 获取AWS服务的当前定价信息
3. AWS Cost Explorer MCP Server
3. AWS成本资源管理器MCP服务器
Purpose: Detailed cost analysis and reporting
- Analyze historical spending patterns
- Create custom cost reports
- Identify cost anomalies and trends
- Forecast future costs
- Analyze cost by service, region, or tag
- Generate cost optimization recommendations
用途:详细的成本分析与报告
- 分析历史支出模式
- 创建自定义成本报告
- 识别成本异常与趋势
- 预测未来成本
- 按服务、区域或标签分析成本
- 生成成本优化建议
Monitoring & Observability Servers
监控与可观测性服务器
4. Amazon CloudWatch MCP Server
4. Amazon CloudWatch MCP服务器
Purpose: Metrics, alarms, and logs analysis
- Query CloudWatch metrics and logs
- Create and manage CloudWatch alarms
- Analyze application performance metrics
- Troubleshoot operational issues
- Set up custom dashboards
- Monitor resource utilization
用途:指标、告警与日志分析
- 查询CloudWatch指标与日志
- 创建和管理CloudWatch告警
- 分析应用性能指标
- 排查运维问题
- 设置自定义仪表板
- 监控资源利用率
5. Amazon CloudWatch Application Signals MCP Server
5. Amazon CloudWatch Application Signals MCP服务器
Purpose: Application monitoring and performance insights
- Monitor application health and performance
- Analyze service-level objectives (SLOs)
- Track application dependencies
- Identify performance bottlenecks
- Monitor service map and traces
用途:应用监控与性能洞察
- 监控应用健康与性能
- 分析服务级别目标(SLO)
- 跟踪应用依赖关系
- 识别性能瓶颈
- 监控服务地图与追踪
6. AWS Managed Prometheus MCP Server
6. AWS托管Prometheus MCP服务器
Purpose: Prometheus-compatible monitoring
- Query Prometheus metrics
- Monitor containerized applications
- Analyze Kubernetes workload metrics
- Create PromQL queries
- Track custom application metrics
用途:兼容Prometheus的监控
- 查询Prometheus指标
- 监控容器化应用
- 分析Kubernetes工作负载指标
- 创建PromQL查询
- 跟踪自定义应用指标
Audit & Security Servers
审计与安全服务器
7. AWS CloudTrail MCP Server
7. AWS CloudTrail MCP服务器
Purpose: AWS API activity and audit analysis
- Analyze AWS API calls and user activity
- Track resource changes and modifications
- Investigate security incidents
- Audit compliance requirements
- Identify unusual access patterns
- Review who made what changes when
用途:AWS API活动与审计分析
- 分析AWS API调用与用户活动
- 跟踪资源变更与修改
- 调查安全事件
- 审计合规要求
- 识别异常访问模式
- 查看谁在何时进行了何种变更
8. AWS Well-Architected Security Assessment Tool MCP Server
8. AWS卓越架构安全评估工具MCP服务器
Purpose: Security assessment against Well-Architected Framework
- Assess security posture against AWS best practices
- Identify security gaps and vulnerabilities
- Get security improvement recommendations
- Review security pillar compliance
- Generate security assessment reports
用途:基于卓越架构框架的安全评估
- 对照AWS最佳实践评估安全状况
- 识别安全差距与漏洞
- 获取安全改进建议
- 查看安全支柱合规性
- 生成安全评估报告
When to Use This Skill
何时使用该Skill
Use this skill when:
- Optimizing AWS costs and reducing spending
- Estimating costs before deployment
- Monitoring application and infrastructure performance
- Setting up observability and alerting
- Analyzing spending patterns and trends
- Investigating operational issues
- Auditing AWS activity and changes
- Assessing security posture
- Implementing operational excellence
在以下场景使用该Skill:
- 优化AWS成本并减少支出
- 部署前估算成本
- 监控应用与基础设施性能
- 设置可观测性与告警
- 分析支出模式与趋势
- 排查运维问题
- 审计AWS活动与变更
- 评估安全状况
- 实施运维卓越实践
Cost Optimization Best Practices
成本优化最佳实践
Pre-Deployment Cost Estimation
部署前成本估算
Always estimate costs before deploying:
- Use AWS Pricing MCP to estimate resource costs
- Compare pricing across different regions
- Evaluate alternative service options
- Calculate expected monthly costs
- Plan for scaling and growth
Example workflow:
"Estimate the monthly cost of running a Lambda function with
1 million invocations, 512MB memory, 3-second duration in us-east-1"部署前务必估算成本:
- 使用AWS定价MCP估算资源成本
- 比较不同区域的定价
- 评估替代服务选项
- 计算预期月度成本
- 规划扩展与增长
示例工作流:
"估算在us-east-1区域运行一个Lambda函数的月度成本,该函数有100万次调用,512MB内存,3秒执行时长"Cost Analysis and Optimization
成本分析与优化
Regular cost reviews:
- Use Cost Explorer MCP to analyze spending trends
- Identify cost anomalies and unexpected charges
- Review costs by service, region, and environment
- Compare actual vs. budgeted costs
- Generate cost optimization recommendations
Cost optimization strategies:
- Right-size over-provisioned resources
- Use appropriate storage classes (S3, EBS)
- Implement auto-scaling for dynamic workloads
- Leverage Savings Plans and Reserved Instances
- Delete unused resources and snapshots
- Use cost allocation tags effectively
定期成本审查:
- 使用成本资源管理器MCP分析支出趋势
- 识别成本异常与意外收费
- 按服务、区域和环境审查成本
- 比较实际成本与预算成本
- 生成成本优化建议
成本优化策略:
- 调整过度配置的资源规格
- 使用合适的存储类(S3、EBS)
- 为动态工作负载实现自动扩缩容
- 利用Savings Plans和预留实例
- 删除未使用的资源与快照
- 有效使用成本分配标签
Budget Monitoring
预算监控
Track spending against budgets:
- Use Billing and Cost Management MCP to monitor budgets
- Set up budget alerts for threshold breaches
- Review budget utilization regularly
- Adjust budgets based on trends
- Implement cost controls and governance
跟踪支出与预算的对比:
- 使用账单与成本管理MCP监控预算
- 为阈值突破设置预算告警
- 定期审查预算使用情况
- 根据趋势调整预算
- 实施成本控制与治理
Monitoring and Observability Best Practices
监控与可观测性最佳实践
CloudWatch Metrics and Alarms
CloudWatch指标与告警
Implement comprehensive monitoring:
- Use CloudWatch MCP to query metrics and logs
- Set up alarms for critical metrics:
- CPU and memory utilization
- Error rates and latency
- Queue depths and processing times
- API gateway throttling
- Lambda errors and timeouts
- Create CloudWatch dashboards for visualization
- Use log insights for troubleshooting
Example alarm scenarios:
- Lambda error rate > 1%
- EC2 CPU utilization > 80%
- API Gateway 4xx/5xx error spike
- DynamoDB throttled requests
- ECS task failures
实施全面监控:
- 使用CloudWatch MCP查询指标与日志
- 为关键指标设置告警:
- CPU与内存利用率
- 错误率与延迟
- 队列深度与处理时间
- API网关限流
- Lambda错误与超时
- 创建CloudWatch可视化仪表板
- 使用日志洞察进行故障排查
示例告警场景:
- Lambda错误率 > 1%
- EC2 CPU利用率 > 80%
- API Gateway 4xx/5xx错误激增
- DynamoDB限流请求
- ECS任务失败
Application Performance Monitoring
应用性能监控
Monitor application health:
- Use CloudWatch Application Signals MCP for APM
- Track service-level objectives (SLOs)
- Monitor application dependencies
- Identify performance bottlenecks
- Set up distributed tracing
监控应用健康状况:
- 使用CloudWatch Application Signals MCP进行APM
- 跟踪服务级别目标(SLO)
- 监控应用依赖关系
- 识别性能瓶颈
- 设置分布式追踪
Container and Kubernetes Monitoring
容器与Kubernetes监控
For containerized workloads:
- Use AWS Managed Prometheus MCP for metrics
- Monitor container resource utilization
- Track pod and node health
- Create PromQL queries for custom metrics
- Set up alerts for container anomalies
针对容器化工作负载:
- 使用AWS托管Prometheus MCP获取指标
- 监控容器资源利用率
- 跟踪Pod与节点健康状况
- 为自定义指标创建PromQL查询
- 为容器异常设置告警
Audit and Security Best Practices
审计与安全最佳实践
CloudTrail Activity Analysis
CloudTrail活动分析
Audit AWS activity:
- Use CloudTrail MCP to analyze API activity
- Track who made changes to resources
- Investigate security incidents
- Monitor for suspicious activity patterns
- Audit compliance with policies
Common audit scenarios:
- "Who deleted this S3 bucket?"
- "Show all IAM role changes in the last 24 hours"
- "List failed login attempts"
- "Find all actions by a specific user"
- "Track modifications to security groups"
审计AWS活动:
- 使用CloudTrail MCP分析API活动
- 跟踪谁对资源进行了变更
- 调查安全事件
- 监控可疑活动模式
- 审计策略合规性
常见审计场景:
- "谁删除了这个S3存储桶?"
- "显示过去24小时内所有IAM角色变更"
- "列出失败的登录尝试"
- "查找特定用户的所有操作"
- "跟踪安全组的修改"
Security Assessment
安全评估
Regular security reviews:
- Use Well-Architected Security Assessment MCP
- Assess security posture against best practices
- Identify security gaps and vulnerabilities
- Implement recommended security improvements
- Document security compliance
Security assessment areas:
- Identity and Access Management (IAM)
- Detective controls and monitoring
- Infrastructure protection
- Data protection and encryption
- Incident response preparedness
定期安全审查:
- 使用卓越架构安全评估MCP
- 对照最佳实践评估安全状况
- 识别安全差距与漏洞
- 实施建议的安全改进措施
- 记录安全合规性
安全评估领域:
- 身份与访问管理(IAM)
- 检测控制与监控
- 基础设施保护
- 数据保护与加密
- 事件响应准备
Using MCP Servers Effectively
有效使用MCP服务器
Cost Analysis Workflow
成本分析工作流
- Pre-deployment: Use Pricing MCP to estimate costs
- Post-deployment: Use Billing MCP to track actual spending
- Analysis: Use Cost Explorer MCP for detailed cost analysis
- Optimization: Implement recommendations from Cost Explorer
- 部署前:使用定价MCP估算成本
- 部署后:使用账单MCP跟踪实际支出
- 分析:使用成本资源管理器MCP进行详细成本分析
- 优化:实施成本资源管理器的建议
Monitoring Workflow
监控工作流
- Setup: Configure CloudWatch metrics and alarms
- Monitor: Use CloudWatch MCP to track key metrics
- Analyze: Use Application Signals for APM insights
- Troubleshoot: Query CloudWatch Logs for issue resolution
- 设置:配置CloudWatch指标与告警
- 监控:使用CloudWatch MCP跟踪关键指标
- 分析:使用Application Signals获取APM洞察
- 排查:查询CloudWatch日志解决问题
Security Workflow
安全工作流
- Audit: Use CloudTrail MCP to review activity
- Assess: Use Well-Architected Security Assessment
- Remediate: Implement security recommendations
- Monitor: Track security events via CloudWatch
- 审计:使用CloudTrail MCP审查活动
- 评估:使用卓越架构安全评估工具
- 整改:实施安全建议
- 监控:通过CloudWatch跟踪安全事件
MCP Usage Best Practices
MCP使用最佳实践
- Cost Awareness: Check pricing before deploying resources
- Proactive Monitoring: Set up alarms for critical metrics
- Regular Reviews: Analyze costs and performance weekly
- Audit Trails: Review CloudTrail logs for compliance
- Security First: Run security assessments regularly
- Optimize Continuously: Act on cost and performance recommendations
- 成本意识:部署资源前检查定价
- 主动监控:为关键指标设置告警
- 定期审查:每周分析成本与性能
- 审计跟踪:为合规性审查CloudTrail日志
- 安全优先:定期运行安全评估
- 持续优化:根据成本与性能建议采取行动
Operational Excellence Guidelines
运维卓越指南
Cost Optimization
成本优化
- Tag Everything: Use consistent cost allocation tags
- Review Monthly: Analyze spending trends and anomalies
- Right-size: Match resources to actual usage
- Automate: Use auto-scaling and scheduling
- Monitor Budgets: Set alerts for cost overruns
- 万物打标签:使用一致的成本分配标签
- 月度审查:分析支出趋势与异常
- 合理配置:匹配资源与实际使用情况
- 自动化:使用自动扩缩容与调度
- 监控预算:为超支设置告警
Monitoring and Alerting
监控与告警
- Critical Metrics: Alert on business-critical metrics
- Noise Reduction: Fine-tune thresholds to reduce false positives
- Actionable Alerts: Ensure alerts have clear remediation steps
- Dashboard Visibility: Create dashboards for key stakeholders
- Log Retention: Balance cost and compliance needs
- 关键指标:对业务关键指标设置告警
- 减少噪音:微调阈值以减少误报
- 可执行告警:确保告警有明确的整改步骤
- 仪表板可见性:为关键利益相关者创建仪表板
- 日志保留:平衡成本与合规需求
Security and Compliance
安全与合规
- Least Privilege: Grant minimum required permissions
- Audit Regularly: Review CloudTrail logs for anomalies
- Encrypt Data: Use encryption at rest and in transit
- Assess Continuously: Run security assessments frequently
- Incident Response: Have procedures for security events
- 最小权限:授予所需的最小权限
- 定期审计:审查CloudTrail日志中的异常
- 加密数据:使用静态与传输中的加密
- 持续评估:频繁运行安全评估
- 事件响应:制定安全事件处理流程
Additional Resources
额外资源
For detailed operational patterns and best practices, refer to the comprehensive reference:
File:
references/operations-patterns.mdThis reference includes:
- Cost optimization strategies
- Monitoring and alerting patterns
- Observability best practices
- Security and compliance guidelines
- Troubleshooting workflows
如需详细的运维模式与最佳实践,请参考综合参考资料:
文件:
references/operations-patterns.md该参考资料包括:
- 成本优化策略
- 监控与告警模式
- 可观测性最佳实践
- 安全与合规指南
- 故障排查工作流
CloudWatch Alarms Reference
CloudWatch告警参考
File:
references/cloudwatch-alarms.mdCommon alarm configurations for:
- Lambda functions
- EC2 instances
- RDS databases
- DynamoDB tables
- API Gateway
- ECS services
- Application Load Balancers
文件:
references/cloudwatch-alarms.md常见告警配置适用于:
- Lambda函数
- EC2实例
- RDS数据库
- DynamoDB表
- API Gateway
- ECS服务
- 应用负载均衡器