aws-cost-finops
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAWS Cost Optimization & FinOps
AWS成本优化与FinOps
Systematic workflows for AWS cost optimization and financial operations management.
AWS成本优化与财务运营管理的系统化工作流。
When to Use This Skill
何时使用本技能
Use this skill when you need to:
- Find cost savings: Identify unused resources, rightsizing opportunities, or commitment discounts
- Analyze spending: Understand cost trends, detect anomalies, or break down costs
- Optimize architecture: Choose cost-effective services, storage tiers, or instance types
- Implement FinOps: Set up governance, tagging, budgets, or monthly reviews
- Make purchase decisions: Evaluate Reserved Instances, Savings Plans, or Spot instances
- Troubleshoot costs: Investigate unexpected bills or cost spikes
- Plan budgets: Forecast costs or evaluate impact of new projects
当你需要以下操作时,使用本技能:
- 挖掘成本节省空间:识别未使用资源、实例规格调整机会或承诺折扣
- 分析支出情况:了解成本趋势、检测异常或拆分成本明细
- 优化架构:选择高性价比服务、存储层级或实例类型
- 实施FinOps:建立治理机制、标签策略、预算或月度评审流程
- 制定采购决策:评估Reserved Instance、Savings Plans或Spot实例
- 排查成本问题:调查意外账单或成本激增情况
- 规划预算:预测成本或评估新项目的成本影响
Cost Optimization Workflow
成本优化工作流
Follow this systematic approach for AWS cost optimization:
┌─────────────────────────────────────────────┐
│ 1. DISCOVER │
│ What are we spending money on? │
│ Run: find_unused_resources.py │
│ Run: cost_anomaly_detector.py │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 2. ANALYZE │
│ Where are the optimization opportunities?│
│ Run: rightsizing_analyzer.py │
│ Run: detect_old_generations.py │
│ Run: spot_recommendations.py │
│ Run: analyze_ri_recommendations.py │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 3. PRIORITIZE │
│ What should we optimize first? │
│ - Quick wins (low risk, high savings) │
│ - Low-hanging fruit (easy to implement) │
│ - Strategic improvements │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 4. IMPLEMENT │
│ Execute optimization actions │
│ - Delete unused resources │
│ - Rightsize instances │
│ - Purchase commitments │
│ - Migrate to new generations │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 5. MONITOR │
│ Verify savings and track metrics │
│ - Monthly cost reviews │
│ - Tag compliance monitoring │
│ - Budget variance tracking │
└─────────────────────────────────────────────┘遵循以下系统化方法进行AWS成本优化:
┌─────────────────────────────────────────────┐
│ 1. DISCOVER │
│ What are we spending money on? │
│ Run: find_unused_resources.py │
│ Run: cost_anomaly_detector.py │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 2. ANALYZE │
│ Where are the optimization opportunities?│
│ Run: rightsizing_analyzer.py │
│ Run: detect_old_generations.py │
│ Run: spot_recommendations.py │
│ Run: analyze_ri_recommendations.py │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 3. PRIORITIZE │
│ What should we optimize first? │
│ - Quick wins (low risk, high savings) │
│ - Low-hanging fruit (easy to implement) │
│ - Strategic improvements │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 4. IMPLEMENT │
│ Execute optimization actions │
│ - Delete unused resources │
│ - Rightsize instances │
│ - Purchase commitments │
│ - Migrate to new generations │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ 5. MONITOR │
│ Verify savings and track metrics │
│ - Monthly cost reviews │
│ - Tag compliance monitoring │
│ - Budget variance tracking │
└─────────────────────────────────────────────┘Core Workflows
核心工作流
Workflow 1: Monthly Cost Optimization Review
工作流1:月度成本优化评审
Frequency: Run monthly (first week of each month)
Step 1: Find Unused Resources
bash
undefined频率:每月运行(每月第一周)
步骤1:查找未使用资源
bash
undefinedScan for waste across all resources
Scan for waste across all resources
python3 scripts/find_unused_resources.py
python3 scripts/find_unused_resources.py
Expected output:
Expected output:
- Unattached EBS volumes
- Unattached EBS volumes
- Old snapshots
- Old snapshots
- Unused Elastic IPs
- Unused Elastic IPs
- Idle NAT Gateways
- Idle NAT Gateways
- Idle EC2 instances
- Idle EC2 instances
- Unused load balancers
- Unused load balancers
- Estimated monthly savings
- Estimated monthly savings
**Step 2: Analyze Cost Anomalies**
```bash
**步骤2:分析成本异常**
```bashDetect unusual spending patterns
Detect unusual spending patterns
python3 scripts/cost_anomaly_detector.py --days 30
python3 scripts/cost_anomaly_detector.py --days 30
Expected output:
Expected output:
- Cost spikes and anomalies
- Cost spikes and anomalies
- Top cost drivers
- Top cost drivers
- Period-over-period comparison
- Period-over-period comparison
- 30-day forecast
- 30-day forecast
**Step 3: Identify Rightsizing Opportunities**
```bash
**步骤3:识别实例规格调整机会**
```bashFind oversized instances
Find oversized instances
python3 scripts/rightsizing_analyzer.py --days 30
python3 scripts/rightsizing_analyzer.py --days 30
Expected output:
Expected output:
- EC2 instances with low utilization
- EC2 instances with low utilization
- RDS instances with low utilization
- RDS instances with low utilization
- Recommended smaller instance types
- Recommended smaller instance types
- Estimated savings
- Estimated savings
**Step 4: Generate Monthly Report**
```bash
**步骤4:生成月度报告**
```bashUse the template to compile findings
Use the template to compile findings
cp assets/templates/monthly_cost_report.md reports/$(date +%Y-%m)-cost-report.md
cp assets/templates/monthly_cost_report.md reports/$(date +%Y-%m)-cost-report.md
Fill in:
Fill in:
- Findings from scripts
- Findings from scripts
- Action items
- Action items
- Team cost breakdowns
- Team cost breakdowns
- Optimization wins
- Optimization wins
**Step 5: Team Review Meeting**
- Present findings to engineering teams
- Assign optimization tasks
- Track action items to completion
---
**步骤5:团队评审会议**
- 向工程团队展示发现结果
- 分配优化任务
- 跟踪行动项直至完成
---Workflow 2: Commitment Purchase Analysis (RI/Savings Plans)
工作流2:承诺型采购分析(RI/Savings Plans)
When: Quarterly or when usage patterns stabilize
Step 1: Analyze Current Usage
bash
undefined时机:每季度或使用模式稳定时
步骤1:分析当前使用情况
bash
undefinedIdentify workloads suitable for commitments
Identify workloads suitable for commitments
python3 scripts/analyze_ri_recommendations.py --days 60
python3 scripts/analyze_ri_recommendations.py --days 60
Looks for:
Looks for:
- EC2 instances running consistently for 60+ days
- EC2 instances running consistently for 60+ days
- RDS instances with stable usage
- RDS instances with stable usage
- Calculates ROI for 1yr vs 3yr commitments
- Calculates ROI for 1yr vs 3yr commitments
**Step 2: Review Recommendations**
Evaluate each recommendation:✅ Good candidate if:
- Running 24/7 for 60+ days
- Workload is stable and predictable
- No plans to change architecture
- Savings > 30%
❌ Poor candidate if:
- Workload is variable or experimental
- Architecture changes planned
- Instance type may change
- Dev/test environment
**Step 3: Choose Commitment Type**
**Reserved Instances**:
- Standard RI: Highest discount (63%), no flexibility
- Convertible RI: Moderate discount (54%), can change instance type
- Best for: Specific instance types, stable workloads
**Savings Plans**:
- Compute SP: Flexible across instance types, regions (66% savings)
- EC2 Instance SP: Flexible across sizes in same family (72% savings)
- Best for: Variable workloads within constraints
**Decision Matrix**:Known instance type, won't change → Standard RI
May need to change types → Convertible RI or Compute SP
Variable workloads → Compute Savings Plan
Maximum flexibility → Compute Savings Plan
**Step 4: Purchase and Track**
- Purchase through AWS Console or CLI
- Tag commitments with purchase date and owner
- Monitor utilization monthly
- Aim for >90% utilization
**Reference**: See `references/best_practices.md` for detailed commitment strategies
---
**步骤2:评估推荐方案**
评估每个推荐项:✅ Good candidate if:
- Running 24/7 for 60+ days
- Workload is stable and predictable
- No plans to change architecture
- Savings > 30%
❌ Poor candidate if:
- Workload is variable or experimental
- Architecture changes planned
- Instance type may change
- Dev/test environment
**步骤3:选择承诺类型**
**Reserved Instances**:
- Standard RI:折扣最高(63%),无灵活性
- Convertible RI:中等折扣(54%),可更改实例类型
- 适用场景:特定实例类型、稳定工作负载
**Savings Plans**:
- Compute SP:跨实例类型、区域灵活适用(66%节省)
- EC2 Instance SP:同系列内跨规格灵活适用(72%节省)
- 适用场景:约束范围内的可变工作负载
**决策矩阵**:Known instance type, won't change → Standard RI
May need to change types → Convertible RI or Compute SP
Variable workloads → Compute Savings Plan
Maximum flexibility → Compute Savings Plan
**步骤4:采购与跟踪**
- 通过AWS控制台或CLI进行采购
- 为承诺项添加采购日期和所有者标签
- 每月监控利用率
- 目标利用率>90%
**参考**:详见`references/best_practices.md`中的详细承诺策略
---Workflow 3: Instance Generation Migration
工作流3:实例代际迁移
When: During architecture reviews or optimization sprints
Step 1: Detect Old Instances
bash
undefined时机:架构评审或优化冲刺期间
步骤1:检测旧代际实例
bash
undefinedFind outdated instance generations
Find outdated instance generations
python3 scripts/detect_old_generations.py
python3 scripts/detect_old_generations.py
Identifies:
Identifies:
- t2 → t3 migrations (10% savings)
- t2 → t3 migrations (10% savings)
- m4 → m5 → m6i migrations
- m4 → m5 → m6i migrations
- Intel → Graviton opportunities (20% savings)
- Intel → Graviton opportunities (20% savings)
**Step 2: Prioritize Migrations**
**Quick Wins (Low Risk)**:t2 → t3: Drop-in replacement, 10% savings
m4 → m5: Better performance, 5% savings
gp2 → gp3: No downtime, 20% savings
**Medium Effort (Test Required)**:x86 → Graviton (ARM64): 20% savings
- Requires ARM64 compatibility testing
- Most modern frameworks support ARM64
- Test in staging first
**Step 3: Execute Migration**
**For EC2 (x86 to x86)**:
1. Stop instance
2. Change instance type
3. Start instance
4. Verify application
**For Graviton Migration**:
1. Create ARM64 AMI or Docker image
2. Launch new Graviton instance
3. Test thoroughly
4. Cut over traffic
5. Terminate old instance
**Step 4: Validate Savings**
- Monitor new costs in Cost Explorer
- Verify performance is acceptable
- Document migration for other teams
**Reference**: See `references/best_practices.md` → Compute Optimization
---
**步骤2:优先排序迁移任务**
**快速见效项(低风险)**:t2 → t3: 直接替换,节省10%
m4 → m5: 性能提升,节省5%
gp2 → gp3: 无停机,节省20%
**中等工作量(需测试)**:x86 → Graviton (ARM64): 节省20%
- 需要ARM64兼容性测试
- 大多数现代框架支持ARM64
- 先在预发布环境测试
**步骤3:执行迁移**
**EC2(x86到x86)迁移**:
1. 停止实例
2. 更改实例类型
3. 启动实例
4. 验证应用
**Graviton迁移**:
1. 创建ARM64 AMI或Docker镜像
2. 启动新的Graviton实例
3. 全面测试
4. 切换流量
5. 终止旧实例
**步骤4:验证节省效果**
- 在Cost Explorer中监控新成本
- 验证性能符合要求
- 记录迁移过程供其他团队参考
**参考**:详见`references/best_practices.md` → 计算优化
---Workflow 4: Spot Instance Evaluation
工作流4:Spot实例评估
When: For fault-tolerant workloads or Auto Scaling Groups
Step 1: Identify Candidates
bash
undefined时机:适用于容错工作负载或Auto Scaling组
步骤1:识别候选对象
bash
undefinedAnalyze workloads for Spot suitability
Analyze workloads for Spot suitability
python3 scripts/spot_recommendations.py
python3 scripts/spot_recommendations.py
Evaluates:
Evaluates:
- Instances in Auto Scaling Groups (good candidates)
- Instances in Auto Scaling Groups (good candidates)
- Dev/test/staging environments
- Dev/test/staging environments
- Batch processing workloads
- Batch processing workloads
- CI/CD and build servers
- CI/CD and build servers
**Step 2: Assess Suitability**
**Excellent for Spot**:
- Stateless applications
- Batch jobs
- CI/CD pipelines
- Data processing
- Auto Scaling Groups
**NOT suitable for Spot**:
- Databases (without replicas)
- Stateful applications
- Real-time services
- Mission-critical workloads
**Step 3: Implementation Strategy**
**Option 1: Fargate Spot (Easiest)**
```yaml
**步骤2:评估适用性**
**非常适合Spot的场景**:
- 无状态应用
- 批处理作业
- CI/CD流水线
- 数据处理
- Auto Scaling组
**不适合Spot的场景**:
- 数据库(无副本时)
- 有状态应用
- 实时服务
- 关键业务工作负载
**步骤3:实施策略**
**选项1:Fargate Spot(最简单)**
```yamlECS task definition
ECS task definition
requiresCompatibilities:
- FARGATE capacityProviderStrategy:
- capacityProvider: FARGATE_SPOT weight: 70 # 70% Spot
- capacityProvider: FARGATE weight: 30 # 30% On-Demand
**Option 2: EC2 Auto Scaling with Spot**
```yamlrequiresCompatibilities:
- FARGATE capacityProviderStrategy:
- capacityProvider: FARGATE_SPOT weight: 70 # 70% Spot
- capacityProvider: FARGATE weight: 30 # 30% On-Demand
**选项2:带Spot的EC2 Auto Scaling**
```yamlMixed instances policy
Mixed instances policy
MixedInstancesPolicy:
InstancesDistribution:
OnDemandBaseCapacity: 2
OnDemandPercentageAboveBaseCapacity: 30
SpotAllocationStrategy: capacity-optimized
LaunchTemplate:
Overrides:
- InstanceType: m5.large
- InstanceType: m5a.large
- InstanceType: m5n.large
**Option 3: EC2 Spot Fleet**
```bashMixedInstancesPolicy:
InstancesDistribution:
OnDemandBaseCapacity: 2
OnDemandPercentageAboveBaseCapacity: 30
SpotAllocationStrategy: capacity-optimized
LaunchTemplate:
Overrides:
- InstanceType: m5.large
- InstanceType: m5a.large
- InstanceType: m5n.large
**选项3:EC2 Spot Fleet**
```bashCreate Spot Fleet with diverse instance types
Create Spot Fleet with diverse instance types
aws ec2 request-spot-fleet --spot-fleet-request-config file://spot-fleet.json
**Step 4: Implement Interruption Handling**
```bashaws ec2 request-spot-fleet --spot-fleet-request-config file://spot-fleet.json
**步骤4:实现中断处理**
```bashHandle 2-minute termination notice
Handle 2-minute termination notice
Instance metadata: /latest/meta-data/spot/instance-action
Instance metadata: /latest/meta-data/spot/instance-action
In application:
In application:
- Poll for termination notice
- Gracefully shutdown (save state)
- Drain connections
- Exit
**Reference**: See `references/best_practices.md` → Compute Optimization → Spot Instances
---- Poll for termination notice
- Gracefully shutdown (save state)
- Drain connections
- Exit
**参考**:详见`references/best_practices.md` → 计算优化 → Spot实例
---Quick Reference: Cost Optimization Scripts
快速参考:成本优化脚本
All Scripts Location
所有脚本位置
bash
ls scripts/bash
ls scripts/find_unused_resources.py
find_unused_resources.py
analyze_ri_recommendations.py
analyze_ri_recommendations.py
detect_old_generations.py
detect_old_generations.py
spot_recommendations.py
spot_recommendations.py
rightsizing_analyzer.py
rightsizing_analyzer.py
cost_anomaly_detector.py
cost_anomaly_detector.py
undefinedundefinedScript Usage Patterns
脚本使用模式
Monthly Review (Run all):
bash
python3 scripts/find_unused_resources.py
python3 scripts/cost_anomaly_detector.py --days 30
python3 scripts/rightsizing_analyzer.py --days 30Quarterly Optimization:
bash
python3 scripts/analyze_ri_recommendations.py --days 60
python3 scripts/detect_old_generations.py
python3 scripts/spot_recommendations.pySpecific Region Only:
bash
python3 scripts/find_unused_resources.py --region us-east-1
python3 scripts/rightsizing_analyzer.py --region us-west-2Named AWS Profile:
bash
python3 scripts/find_unused_resources.py --profile production
python3 scripts/cost_anomaly_detector.py --profile production --days 60月度评审(全部运行):
bash
python3 scripts/find_unused_resources.py
python3 scripts/cost_anomaly_detector.py --days 30
python3 scripts/rightsizing_analyzer.py --days 30季度优化:
bash
python3 scripts/analyze_ri_recommendations.py --days 60
python3 scripts/detect_old_generations.py
python3 scripts/spot_recommendations.py仅特定区域:
bash
python3 scripts/find_unused_resources.py --region us-east-1
python3 scripts/rightsizing_analyzer.py --region us-west-2指定AWS配置文件:
bash
python3 scripts/find_unused_resources.py --profile production
python3 scripts/cost_anomaly_detector.py --profile production --days 60Script Requirements
脚本要求
bash
undefinedbash
undefinedInstall dependencies
Install dependencies
pip install boto3 tabulate
pip install boto3 tabulate
AWS credentials required
AWS credentials required
Configure via: aws configure
Configure via: aws configure
Or use: --profile PROFILE_NAME
Or use: --profile PROFILE_NAME
---
---Service-Specific Optimization
服务特定优化
Compute Optimization
计算优化
Key Actions:
- Migrate to Graviton (20% savings)
- Use Spot for fault-tolerant workloads (70% savings)
- Purchase RIs for stable workloads (40-65% savings)
- Right-size oversized instances
Reference: → Compute Optimization
references/best_practices.md关键操作:
- 迁移至Graviton(节省20%)
- 为容错工作负载使用Spot(节省70%)
- 为稳定工作负载购买RI(节省40-65%)
- 调整超大实例的规格
参考: → 计算优化
references/best_practices.mdStorage Optimization
存储优化
Key Actions:
- Convert gp2 → gp3 (20% savings)
- Implement S3 lifecycle policies (50-95% savings)
- Delete old snapshots
- Use S3 Intelligent-Tiering
Reference: → Storage Optimization
references/best_practices.md关键操作:
- 将gp2转换为gp3(节省20%)
- 实施S3生命周期策略(节省50-95%)
- 删除旧快照
- 使用S3 Intelligent-Tiering
参考: → 存储优化
references/best_practices.mdNetwork Optimization
网络优化
Key Actions:
- Replace NAT Gateways with VPC Endpoints (save $25-30/month each)
- Use CloudFront to reduce data transfer costs
- Colocate resources in same AZ when possible
Reference: → Network Optimization
references/best_practices.md关键操作:
- 用VPC Endpoints替换NAT Gateways(每个每月节省25-30美元)
- 使用CloudFront降低数据传输成本
- 尽可能将资源部署在同一可用区
参考: → 网络优化
references/best_practices.mdDatabase Optimization
数据库优化
Key Actions:
- Right-size RDS instances
- Use gp3 storage (20% cheaper than gp2)
- Evaluate Aurora Serverless for variable workloads
- Purchase RDS Reserved Instances
Reference: → Database Optimization
references/best_practices.md关键操作:
- 调整RDS实例规格
- 使用gp3存储(比gp2便宜20%)
- 为可变工作负载评估Aurora Serverless
- 购买RDS Reserved Instance
参考: → 数据库优化
references/best_practices.mdService Alternatives Decision Guide
服务替代方案决策指南
Need help choosing between services?
Question: "Should I use EC2, Lambda, or Fargate?"
Answer: See → Compute Alternatives
references/service_alternatives.mdQuestion: "Which S3 storage class should I use?"
Answer: See → Storage Alternatives
references/service_alternatives.mdQuestion: "Should I use RDS or Aurora?"
Answer: See → Database Alternatives
references/service_alternatives.mdQuestion: "NAT Gateway vs VPC Endpoint vs NAT Instance?"
Answer: See → Networking Alternatives
references/service_alternatives.md需要帮助选择服务?
问题:"我应该使用EC2、Lambda还是Fargate?"
答案:详见 → 计算服务替代方案
references/service_alternatives.md问题:"我应该使用哪种S3存储类别?"
答案:详见 → 存储服务替代方案
references/service_alternatives.md问题:"我应该使用RDS还是Aurora?"
答案:详见 → 数据库服务替代方案
references/service_alternatives.md问题:"NAT Gateway vs VPC Endpoint vs NAT Instance?"
答案:详见 → 网络服务替代方案
references/service_alternatives.mdFinOps Governance & Process
FinOps治理与流程
Setting Up FinOps
搭建FinOps体系
Phase 1: Foundation (Month 1)
- Enable Cost Explorer
- Set up AWS Budgets
- Define tagging strategy
- Activate cost allocation tags
Phase 2: Visibility (Months 2-3)
- Implement tagging enforcement
- Run optimization scripts
- Set up monthly reviews
- Create team cost reports
Phase 3: Culture (Ongoing)
- Cost metrics in engineering KPIs
- Cost review in architecture decisions
- Regular optimization sprints
- FinOps champions in each team
Full Guide: See
references/finops_governance.md阶段1:基础搭建(第1个月)
- 启用Cost Explorer
- 设置AWS Budgets
- 定义标签策略
- 激活成本分配标签
阶段2:可视化(第2-3个月)
- 实施标签强制策略
- 运行优化脚本
- 建立月度评审流程
- 创建团队成本报告
阶段3:文化建设(持续进行)
- 将成本指标纳入工程KPI
- 在架构决策中加入成本评审
- 定期开展优化冲刺
- 在每个团队设立FinOps负责人
完整指南:详见
references/finops_governance.mdMonthly Review Process
月度评审流程
Week 1: Data Collection
- Run all optimization scripts
- Export Cost & Usage Reports
- Compile findings
Week 2: Analysis
- Identify trends
- Find opportunities
- Prioritize actions
Week 3: Team Reviews
- Present to engineering teams
- Discuss optimizations
- Assign action items
Week 4: Executive Reporting
- Create executive summary
- Forecast next quarter
- Report optimization wins
Template: See
assets/templates/monthly_cost_report.mdDetailed Process: See → Monthly Review Process
references/finops_governance.md第1周:数据收集
- 运行所有优化脚本
- 导出成本与使用报告
- 整理发现结果
第2周:分析
- 识别趋势
- 发现优化机会
- 优先排序行动项
第3周:团队评审
- 向工程团队展示
- 讨论优化方案
- 分配行动项
第4周:高管汇报
- 创建高管摘要
- 预测下一季度成本
- 汇报优化成果
模板:详见
assets/templates/monthly_cost_report.md详细流程:详见 → 月度评审流程
references/finops_governance.mdCost Optimization Checklist
成本优化检查清单
Quick Wins (Do First)
快速见效项(优先完成)
- Delete unattached EBS volumes
- Delete old EBS snapshots (>90 days)
- Release unused Elastic IPs
- Convert gp2 → gp3 volumes
- Stop/terminate idle EC2 instances
- Enable S3 Intelligent-Tiering
- Set up AWS Budgets and alerts
- 删除未挂载的EBS卷
- 删除旧EBS快照(超过90天)
- 释放未使用的Elastic IP
- 将gp2转换为gp3卷
- 停止/终止闲置EC2实例
- 启用S3 Intelligent-Tiering
- 设置AWS Budgets和告警
Medium Effort (This Quarter)
中等工作量项(本季度完成)
- Right-size oversized instances
- Migrate to newer instance generations
- Purchase Reserved Instances for stable workloads
- Implement S3 lifecycle policies
- Replace NAT Gateways with VPC Endpoints (where applicable)
- Enable automated resource scheduling (dev/test)
- Implement tagging strategy and enforcement
- 调整超大实例的规格
- 迁移至新一代实例
- 为稳定工作负载购买Reserved Instance
- 实施S3生命周期策略
- 用VPC Endpoints替换NAT Gateways(适用场景)
- 启用自动化资源调度(开发/测试环境)
- 实施标签策略与强制机制
Strategic Initiatives (Ongoing)
战略举措(持续进行)
- Migrate to Graviton instances
- Implement Spot for fault-tolerant workloads
- Establish monthly cost review process
- Set up cost allocation by team
- Implement chargeback/showback model
- Create FinOps culture and practices
- 迁移至Graviton实例
- 为容错工作负载实施Spot
- 建立月度成本评审流程
- 按团队分配成本
- 实施成本分摊/展示模型
- 打造FinOps文化与实践
Troubleshooting Cost Issues
成本问题排查
"My bill suddenly increased"
"我的账单突然增加"
-
Run cost anomaly detection:bash
python3 scripts/cost_anomaly_detector.py --days 30 -
Check Cost Explorer for service breakdown
-
Review CloudTrail for resource creation events
-
Check for AutoScaling events
-
Verify no Reserved Instances expired
-
运行成本异常检测:bash
python3 scripts/cost_anomaly_detector.py --days 30 -
在Cost Explorer中查看服务明细
-
查看CloudTrail中的资源创建事件
-
检查AutoScaling事件
-
确认没有Reserved Instance过期
"I need to reduce costs by X%"
"我需要将成本降低X%"
Follow the optimization workflow:
- Run all discovery scripts
- Calculate total potential savings
- Prioritize by: Savings Amount × (1 / Effort)
- Focus on quick wins first
- Implement strategic changes for long-term
遵循优化工作流:
- 运行所有发现脚本
- 计算总潜在节省金额
- 按以下优先级排序:节省金额 × (1 / 工作量)
- 优先处理快速见效项
- 实施战略变更以实现长期节省
"How do I know if Reserved Instances make sense?"
"我如何判断Reserved Instance是否值得购买"
Run RI analysis:
bash
python3 scripts/analyze_ri_recommendations.py --days 60Look for:
- Instances running 60+ days consistently
- Workloads that won't change
- Savings > 30%
运行RI分析:
bash
python3 scripts/analyze_ri_recommendations.py --days 60关注以下要点:
- 持续运行60天以上的实例
- 工作负载不会发生变化
- 节省比例超过30%
"Which resources can I safely delete?"
"哪些资源可以安全删除"
Run unused resource finder:
bash
python3 scripts/find_unused_resources.pySafe to delete (usually):
- Unattached EBS volumes (after verifying)
- Snapshots > 90 days (if backups exist elsewhere)
- Unused Elastic IPs (after verifying not in DNS)
- Stopped EC2 instances > 30 days (after confirming abandoned)
Always verify with resource owner before deletion!
运行未使用资源查找脚本:
bash
python3 scripts/find_unused_resources.py通常可安全删除的资源:
- 未挂载的EBS卷(需先验证)
- 超过90天的快照(如果其他地方有备份)
- 未使用的Elastic IP(需先验证未在DNS中使用)
- 停止超过30天的EC2实例(需确认已废弃)
删除前务必与资源所有者确认!
Best Practices Summary
最佳实践总结
- Tag Everything: Consistent tagging enables cost allocation and accountability
- Monitor Continuously: Weekly script runs catch waste early
- Review Monthly: Regular reviews prevent cost drift
- Right-size Proactively: Don't wait for cost issues to optimize
- Use Commitments Wisely: RIs/SPs for stable workloads only
- Test Before Migrating: Especially for Graviton or Spot
- Automate Cleanup: Scheduled shutdown of dev/test resources
- Share Wins: Celebrate cost savings to build FinOps culture
- 全面打标签:一致的标签可实现成本分配与问责
- 持续监控:每周运行脚本可尽早发现浪费
- 月度评审:定期评审防止成本失控
- 主动调整规格:不要等到出现成本问题才优化
- 明智使用承诺型采购:仅为稳定工作负载购买RI/SP
- 迁移前测试:尤其是Graviton或Spot实例
- 自动化清理:定时关闭开发/测试环境资源
- 分享成果:庆祝成本节省以打造FinOps文化
Additional Resources
额外资源
Detailed References:
- : Comprehensive optimization strategies
references/best_practices.md - : Cost-effective service selection
references/service_alternatives.md - : Organizational FinOps practices
references/finops_governance.md
Templates:
- : Monthly reporting template
assets/templates/monthly_cost_report.md
Scripts:
- All scripts in directory with
scripts/for usage--help
AWS Documentation:
- AWS Cost Explorer: https://aws.amazon.com/aws-cost-management/aws-cost-explorer/
- AWS Budgets: https://aws.amazon.com/aws-cost-management/aws-budgets/
- FinOps Foundation: https://www.finops.org
详细参考文档:
- :全面的优化策略
references/best_practices.md - :高性价比服务选择指南
references/service_alternatives.md - :企业级FinOps实践
references/finops_governance.md
模板:
- :月度报告模板
assets/templates/monthly_cost_report.md
脚本:
- 所有脚本位于目录,使用
scripts/查看用法--help
AWS官方文档:
- AWS Cost Explorer: https://aws.amazon.com/aws-cost-management/aws-cost-explorer/
- AWS Budgets: https://aws.amazon.com/aws-cost-management/aws-budgets/
- FinOps Foundation: https://www.finops.org ",