compute-management
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOCI Compute Management - Expert Knowledge
OCI计算管理 - 专家知识库
🏗️ Use OCI Landing Zone Terraform Modules
🏗️ 使用OCI落地区Terraform模块
Don't reinvent the wheel. Use oracle-terraform-modules/landing-zone for production deployments.
Landing Zone solves:
- ❌ Bad Practice #5: Internet-wide open ports (0.0.0.0/0 on 22/3389)
- ❌ Bad Practice #9: Public compute instances (Security Zones enforce private IPs)
- ❌ Bad Practice #10: No monitoring (auto-configures alarms and notifications)
This skill provides: Anti-patterns and troubleshooting for compute resources deployed WITHIN a Landing Zone architecture.
不要重复造轮子。 生产环境部署请使用oracle-terraform-modules/landing-zone。
落地区可解决以下问题:
- ❌ 不良实践#5:面向全网开放端口(22/3389端口允许0.0.0.0/0访问)
- ❌ 不良实践#9:公共计算实例(安全区域强制使用私有IP)
- ❌ 不良实践#10:未配置监控(自动配置告警与通知)
本技能提供: 针对落地区架构内部署的计算资源的反模式分析与故障排查方法。
⚠️ OCI CLI/API Knowledge Gap
⚠️ OCI CLI/API知识缺口
You don't know OCI CLI commands or OCI API structure.
Your training data has limited and outdated knowledge of:
- OCI CLI syntax and parameters (updates monthly)
- OCI API endpoints and request/response formats
- Compute service CLI operations ()
oci compute instance - OCI service-specific commands and flags
- Latest OCI features and API changes
When OCI operations are needed:
- Use exact CLI commands from this skill's references
- Do NOT guess OCI CLI syntax or parameters
- Do NOT assume API endpoint structures
- Load reference files for detailed CLI operations
What you DO know:
- General cloud compute concepts
- Instance sizing and capacity planning principles
- Linux/Windows system administration
This skill bridges the gap by providing current OCI CLI/API commands for compute operations.
You are an OCI compute expert. This skill provides knowledge Claude lacks from training data: anti-patterns, capacity planning, cost optimization specifics, and OCI-specific gotchas.
你不了解OCI CLI命令或OCI API结构。
你的训练数据中关于以下内容的知识有限且过时:
- OCI CLI语法与参数(每月更新)
- OCI API端点与请求/响应格式
- 计算服务CLI操作()
oci compute instance - OCI服务特定命令与标志
- OCI最新功能与API变更
当需要执行OCI操作时:
- 使用本技能参考资料中的准确CLI命令
- 不要猜测OCI CLI语法或参数
- 不要假设API端点结构
- 加载参考文件以获取详细的CLI操作说明
你已掌握的知识:
- 通用云计算概念
- 实例规格选型与容量规划原则
- Linux/Windows系统管理
本技能通过提供当前有效的OCI CLI/API命令来弥补这一知识缺口,助力计算操作的执行。
你是OCI计算专家。本技能补充了Claude训练数据中缺失的内容:反模式、容量规划、成本优化细节以及OCI特有的陷阱。
NEVER Do This
绝对禁止的操作
❌ NEVER launch instances without checking service limits first
bash
oci limits resource-availability get \
--service-name compute \
--limit-name "standard-e4-core-count" \
--compartment-id <ocid> \
--availability-domain <ad>87% of "out of capacity" errors are actually quota limits, not infrastructure capacity. Check limits BEFORE launching to get accurate error messages.
❌ NEVER use console serial connection as primary access
- Creates security audit findings (bypasses SSH key controls)
- Use only for boot troubleshooting when SSH fails
- Delete connection immediately after troubleshooting
❌ NEVER mix regional and AD-specific resources in templates
- Breaks portability when moving between regions
- Use AD-agnostic designs: spread via fault domains, not hardcoded ADs
❌ NEVER use default security lists in production
- Default allows 0.0.0.0/0 on all ports
- Fails security audits, creates compliance violations
- Always create custom security lists or NSGs
❌ NEVER forget boot volume preservation in dev/test
bash
undefined❌ 绝对不要在未检查服务限制的情况下启动实例
bash
oci limits resource-availability get \
--service-name compute \
--limit-name "standard-e4-core-count" \
--compartment-id <ocid> \
--availability-domain <ad>87%的"容量不足"错误实际上是配额限制问题,而非基础设施容量问题。启动实例前先检查限制,以获取准确的错误信息。
❌ 绝对不要将控制台串行连接作为主要访问方式
- 会产生安全审计发现(绕过SSH密钥控制)
- 仅在SSH连接失败时用于启动故障排查
- 排查完成后立即删除连接
❌ 绝对不要在模板中混合使用区域级和可用域级资源
- 跨区域迁移时会破坏可移植性
- 使用与可用域无关的设计:通过故障域分散部署,而非硬编码可用域
❌ 绝对不要在生产环境中使用默认安全列表
- 默认允许所有端口的0.0.0.0/0访问
- 无法通过安全审计,会导致合规违规
- 务必创建自定义安全列表或网络安全组(NSG)
❌ 绝对不要在开发/测试环境中忘记保留启动卷
bash
undefinedWhen terminating test instances, add:
终止测试实例时,添加以下参数:
oci compute instance terminate --instance-id <id> --preserve-boot-volume false
Without this flag: $50+/month per deleted instance (orphaned boot volumes)
❌ **NEVER enable public IP on production instances**
- Use bastion service or private endpoints for access
- Cost impact: $500-5000+ per security incident from exposed instances
- Landing Zone Security Zones automatically block this patternoci compute instance terminate --instance-id <id> --preserve-boot-volume false
如果不添加此标志:每个已删除实例每月会产生50美元以上的费用(孤立的启动卷)
❌ **绝对不要在生产实例上启用公网IP**
- 使用堡垒机服务或私有端点进行访问
- 成本影响:暴露的实例可能引发安全事件,造成500-5000美元以上的损失
- 落地区安全区域会自动阻止这种模式Capacity Error Decision Tree
容量错误决策树
"Out of host capacity for shape X"?
│
├─ Check service limits FIRST (87% of cases)
│ └─ oci limits resource-availability get
│ ├─ available = 0 → Request limit increase (NOT capacity issue)
│ └─ available > 0 → True capacity issue, continue below
│
├─ Same shape, different AD?
│ └─ Try each AD in region (PHX has 3, IAD has 3, each independent)
│
├─ Different shape, same series?
│ └─ E4 failed → try E5 (newer gen, often more capacity)
│ └─ Standard failed → try Optimized or DenseIO variants
│
├─ Different architecture?
│ └─ AMD → ARM (A1.Flex often has capacity when Intel/AMD full)
│
└─ All ADs exhausted?
└─ Create capacity reservation (guarantees future launches)"无法为规格X分配主机容量"?
│
├─ 首先检查服务限制(87%的情况)
│ └─ oci limits resource-availability get
│ ├─ 可用容量 = 0 → 请求提高限制(并非容量问题)
│ └─ 可用容量 > 0 → 确实是容量问题,继续以下步骤
│
├─ 相同规格,不同可用域?
│ └─ 尝试区域内的每个可用域(PHX有3个,IAD有3个,彼此独立)
│
├─ 不同规格,同一系列?
│ └─ E4规格失败 → 尝试E5(新一代,通常容量更充足)
│ └─ 标准规格失败 → 尝试优化型或密集IO型变体
│
├─ 不同架构?
│ └─ AMD → ARM(当Intel/AMD容量不足时,A1.Flex通常有可用容量)
│
└─ 所有可用域都已耗尽?
└─ 创建容量预留(保证未来的实例启动)Shape Selection: Cost vs Performance
实例规格选型:成本与性能平衡
Budget-Critical (save 50%):
- VM.Standard.A1.Flex (ARM) if app supports: $0.01/OCPU/hr vs $0.03 (AMD)
- Caveat: Not all software runs on ARM, test thoroughly
General Purpose (balanced):
- VM.Standard.E4.Flex: 2:16 CPU:RAM ratio, $0.03/OCPU/hr
- Start: 2 OCPUs, scale based on metrics (not guesses)
Memory-Intensive (databases, caches):
- VM.Standard.E4.Flex with custom ratio: up to 1:64 CPU:RAM
- Cost: $0.03/OCPU + $0.0015/GB RAM
Cost Trap: Fixed shapes (e.g., VM.Standard2.1) often MORE expensive than Flex with same resources. Always compare Flex pricing first.
预算敏感场景(节省50%成本):
- 如果应用支持,选择VM.Standard.A1.Flex(ARM):每OCPU小时0.01美元 vs AMD的0.03美元
- 注意:并非所有软件都能在ARM上运行,请彻底测试
通用场景(平衡型):
- VM.Standard.E4.Flex:CPU:RAM比例为2:16,每OCPU小时0.03美元
- 初始配置:2个OCPU,根据指标进行扩容(而非凭猜测)
内存密集型场景(数据库、缓存):
- 自定义比例的VM.Standard.E4.Flex:CPU:RAM比例最高可达1:64
- 成本:每OCPU0.03美元 + 每GB内存0.0015美元
成本陷阱:固定规格(如VM.Standard2.1)通常比相同资源的Flex规格更昂贵。务必先对比Flex规格的定价。
Instance Principal Authentication (Production)
实例主体认证(生产环境)
When instance needs to call OCI APIs (Object Storage, Vault, etc.):
WRONG (user credentials on instance):
bash
undefined当实例需要调用OCI API(对象存储、Vault等)时:
错误做法(在实例上存储用户凭证):
bash
undefinedDon't do this - credential management nightmare
不要这样做 - 凭证管理会成为噩梦
export OCI_USER_OCID="ocid1.user..."
**RIGHT** (instance principal):
```bashexport OCI_USER_OCID="ocid1.user..."
**正确做法**(实例主体):
```bash1. Create dynamic group
1. 创建动态组
oci iam dynamic-group create
--name "app-instances"
--matching-rule "instance.compartment.id = '<compartment-ocid>'"
--name "app-instances"
--matching-rule "instance.compartment.id = '<compartment-ocid>'"
oci iam dynamic-group create
--name "app-instances"
--matching-rule "instance.compartment.id = '<compartment-ocid>'"
--name "app-instances"
--matching-rule "instance.compartment.id = '<compartment-ocid>'"
2. Grant permissions
2. 授予权限
"Allow dynamic-group app-instances to read object-family in compartment X"
"允许动态组app-instances读取 compartment X中的对象家族资源"
3. Code uses instance principal (no credentials needed):
3. 代码使用实例主体(无需凭证):
signer = oci.auth.signers.InstancePrincipalsSecurityTokenSigner()
client = oci.object_storage.ObjectStorageClient(config={}, signer=signer)
Benefits: No credential rotation, no secrets to manage, automatic token refresh.signer = oci.auth.signers.InstancePrincipalsSecurityTokenSigner()
client = oci.object_storage.ObjectStorageClient(config={}, signer=signer)
优势:无需凭证轮换,无需管理密钥,自动刷新令牌。OCI-Specific Gotchas
OCI特有陷阱
Availability Domain Names Are Tenant-Specific
- Your AD: "fMgC:US-ASHBURN-AD-1"
- Another tenant: "ErKW:US-ASHBURN-AD-1"
- MUST query your tenant:
oci iam availability-domain list
Boot Volume Backups Don't Include Instance Config
- Backup captures disk only, NOT shape/networking/metadata
- For DR: Use custom images (captures everything) or Terraform for infrastructure
Instance Metadata Service Has 3 Versions
- v1: http://169.254.169.254/opc/v1/ (legacy)
- v2: http://169.254.169.254/opc/v2/ (current, requires session token)
- Always use v2 for security (prevents SSRF attacks)
可用域名称是租户特定的
- 你的可用域:"fMgC:US-ASHBURN-AD-1"
- 其他租户的可用域:"ErKW:US-ASHBURN-AD-1"
- 必须查询自己的租户:
oci iam availability-domain list
启动卷备份不包含实例配置
- 备份仅捕获磁盘内容,不包含实例规格/网络/元数据
- 灾难恢复方案:使用自定义镜像(捕获所有内容)或Terraform管理基础设施
实例元数据服务有3个版本
- v1: http://169.254.169.254/opc/v1/(旧版)
- v2: http://169.254.169.254/opc/v2/(当前版本,需要会话令牌)
- 为了安全,请始终使用v2版本(防止SSRF攻击)
Quick Cost Reference
快速成本参考
| Shape Family | $/OCPU/hr | $/GB RAM/hr | Best For |
|---|---|---|---|
| A1.Flex (ARM) | $0.01 | $0.0015 | Cost-critical, ARM-compatible |
| E4.Flex (AMD) | $0.03 | $0.0015 | General purpose |
| E5.Flex (AMD) | $0.035 | $0.0015 | Latest gen, premium perf |
| Optimized3.Flex | $0.025 | $0.0015 | Network-intensive |
Free Tier: 2x AMD VM (1/8 OCPU, 1GB) + 4 ARM cores (24GB total) - always free
Calculation: (OCPUs × $0.03 + GB × $0.0015) × 730 hours/month
Example: 2 OCPU, 16GB = (2×$0.03 + 16×$0.0015) × 730 = $61.32/month
| 规格系列 | 每OCPU小时费用 | 每GB内存小时费用 | 最佳适用场景 |
|---|---|---|---|
| A1.Flex (ARM) | $0.01 | $0.0015 | 预算敏感、支持ARM的应用 |
| E4.Flex (AMD) | $0.03 | $0.0015 | 通用场景 |
| E5.Flex (AMD) | $0.035 | $0.0015 | 最新一代、高性能需求 |
| Optimized3.Flex | $0.025 | $0.0015 | 网络密集型应用 |
免费层:2台AMD虚拟机(1/8 OCPU,1GB内存) + 4个ARM核心(总计24GB内存) - 永久免费
计算方式:(OCPU数量 × $0.03 + 内存GB数 × $0.0015) × 每月730小时
示例:2个OCPU,16GB内存 = (2×$0.03 + 16×$0.0015) × 730 = 每月61.32美元
Progressive Loading References
渐进式加载参考资料
OCI Compute Shapes Reference (Official Oracle Documentation)
OCI计算规格参考(Oracle官方文档)
WHEN TO LOAD :
oci-compute-shapes-reference.md- Need detailed specifications for specific shapes (memory limits, OCPU counts, network bandwidth)
- Comparing flexible shapes (VM.Standard3.Flex vs E4.Flex vs E5.Flex vs E6.Flex vs A1/A2/A4.Flex)
- Understanding extended memory VM instances
- Researching bare metal shapes (BM.Standard3, BM.Standard.E4/E5/E6, BM.Standard.A1/A4)
- Checking GPU shapes, Dense I/O shapes, or HPC-optimized shapes
- Need official Oracle specifications for shape families
Do NOT load for:
- Quick cost comparisons (use Quick Cost Reference table in this skill)
- "Out of capacity" troubleshooting (decision tree in this skill covers it)
- Shape selection guidance (anti-patterns and recommendations in this skill)
何时加载 :
oci-compute-shapes-reference.md- 需要特定规格的详细参数(内存限制、OCPU数量、网络带宽)
- 对比灵活规格(VM.Standard3.Flex vs E4.Flex vs E5.Flex vs E6.Flex vs A1/A2/A4.Flex)
- 了解扩展内存虚拟机实例
- 研究裸金属规格(BM.Standard3, BM.Standard.E4/E5/E6, BM.Standard.A1/A4)
- 查看GPU规格、密集IO规格或HPC优化规格
- 需要Oracle官方的规格系列参数
请勿加载的场景:
- 快速成本对比(使用本技能中的快速成本参考表)
- "容量不足"故障排查(本技能中的决策树已覆盖)
- 规格选型指导(本技能中的反模式与建议已覆盖)
When to Use This Skill
何时使用本技能
- Launching instances: shape selection, capacity planning
- "Out of capacity" errors: decision tree, limit checking
- Cost optimization: shape comparison, right-sizing
- Security: instance principal setup, console connection proper use
- Troubleshooting: boot failures, connectivity issues
- Production: anti-patterns, operational gotchas
- 启动实例:规格选型、容量规划
- "容量不足"错误:决策树、限制检查
- 成本优化:规格对比、合理调整配置
- 安全:实例主体配置、控制台连接的正确使用
- 故障排查:启动失败、连接问题
- 生产环境:反模式、运维陷阱