model-deployment
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseModel Deployment
模型部署
Identifies the correct deployment pathway based on model characteristics and generates deployment code.
根据模型特征识别正确的部署路径并生成部署代码。
Scope
适用范围
This skill supports deploying Nova and OSS models that were fine-tuned through SageMaker Serverless Model Customization only.
Not supported:
- Base models (not fine-tuned)
- Models fine-tuned through other processes
- Full Fine-Tuning (FFT) — only LoRA fine-tuned models are supported
本技能仅支持部署通过SageMaker Serverless Model Customization微调的Nova和OSS模型。
不支持场景:
- 基础模型(未经过微调)
- 通过其他流程微调的模型
- 全参数微调(FFT)——仅支持LoRA微调的模型
Principles
原则
- One thing at a time. Each response advances exactly one decision.
- Confirm before proceeding. Wait for the user to agree before moving on. But don't re-ask questions already answered in the conversation — use what you know.
- Don't read files until you need them. Only read pathway references after the pathway is confirmed.
- Use what you know. If conversation history or artifacts already answer a question, confirm your understanding instead of asking again.
- 一次处理一项任务。 每次响应仅推进一项决策。
- 操作前确认。 等待用户同意后再继续推进。但不要重复询问对话中已经回答过的问题——使用已获取的信息。
- 按需读取文件。 仅在部署路径确认后再读取路径参考文档。
- 复用已知信息。 如果对话历史或已有产物已经能回答某个问题,先确认你的理解而非重复提问。
Workflow
工作流
Step 1: Identify the Training Job
步骤1:识别训练任务
You need the training job name or ARN. Check the conversation history first — the user may have already mentioned it, or it may be available from earlier steps in the workflow (e.g., fine-tuning). If not, ask the user.
Once you have the training job name or ARN, use the AWS MCP tool to look it up:
- Use the AWS MCP tool and extract:
describe-training-job- S3 output path (from or
ModelArtifacts.S3ModelArtifacts)OutputDataConfig.S3OutputPath - IAM role ARN (from )
RoleArn - Region
- S3 output path (from
- Use the AWS MCP tool on the training job ARN and extract:
list-tags- Model ID from the tag
sagemaker-studio:jumpstart-model-id
- Model ID from the
- Determine the model type from the model ID:
- Contains "nova" (nova-micro, nova-lite, nova-pro) → Nova
- Llama, Mistral, Qwen, GPT-OSS, DeepSeek, etc. → OSS
Unsupported models: This skill only supports OSS and Nova models that were LoRA fine-tuned through SageMaker Serverless Model Customization. If the model doesn't match, tell the user this skill can't help and suggest the finetuning skill.
你需要训练任务名称或ARN。首先检查对话历史——用户可能已经提到过,或者在工作流的 earlier步骤(例如微调环节)中已经可以获取。如果没有,向用户询问。
获取到训练任务名称或ARN后,使用AWS MCP工具查询:
- 使用AWS MCP工具并提取:
describe-training-job- S3输出路径(从或
ModelArtifacts.S3ModelArtifacts获取)OutputDataConfig.S3OutputPath - IAM角色ARN(从获取)
RoleArn - Region(区域)
- S3输出路径(从
- 对训练任务ARN使用AWS MCP工具并提取:
list-tags- 从标签中获取模型ID
sagemaker-studio:jumpstart-model-id
- 从
- 通过模型ID判断模型类型:
- 包含“nova”(nova-micro、nova-lite、nova-pro)→ Nova
- Llama、Mistral、Qwen、GPT-OSS、DeepSeek等 → OSS
不支持的模型: 本技能仅支持通过SageMaker Serverless Model Customization完成LoRA微调的OSS和Nova模型。如果模型不符合要求,告知用户本技能无法提供帮助,并推荐使用微调技能。
Step 2: Determine Eligible Deployment Targets
步骤2:确定符合条件的部署目标
Use the following table:
| Model Type | Eligible Targets |
|---|---|
| OSS | SageMaker, Bedrock |
| Nova | SageMaker, Bedrock |
If only one target is eligible, confirm it with the user. Use details from Step 5.
If multiple targets are eligible, help the user decide. Use details from Step 5.
If no targets are eligible, tell the user and explain why.
参考下表:
| 模型类型 | 可部署目标 |
|---|---|
| OSS | SageMaker, Bedrock |
| Nova | SageMaker, Bedrock |
如果只有一个符合条件的目标,和用户确认。使用步骤5中的详情说明。
如果有多个符合条件的目标,帮助用户做选择。使用步骤5中的详情说明。
如果没有符合条件的目标,告知用户并解释原因。
Step 3: Let the User Choose a Deployment Target
步骤3:让用户选择部署目标
Present the eligible options to the user. Present these details to help them decide between SageMaker and Bedrock, if both are available options:
SageMaker Endpoint:
- Dedicated compute resources for consistent performance
- Control instance types and scaling
- Best for predictable workloads with specific latency requirements
Bedrock:
- Fully managed serverless inference
- Auto-scales instantly with no capacity planning
- Pay per request
- Best for variable workloads with fluctuating demand
Do NOT make a recommendation. Let the user choose.
Do NOT mention technical details like merged/unmerged weights, reference files, or APIs, unless the user asks.
⏸ Wait for user to select a deployment option.
向用户展示符合条件的可选目标。如果SageMaker和Bedrock都可选,展示以下详情帮助用户决策:
SageMaker端点:
- 专属计算资源,性能稳定
- 可控制实例类型和扩缩容策略
- 最适合有特定延迟要求的可预测工作负载
Bedrock:
- 全托管无服务器推理
- 即时自动扩缩,无需做容量规划
- 按请求付费
- 最适合需求波动的可变工作负载
不要给出推荐,让用户自行选择。
除非用户询问,否则不要提及合并/未合并权重、参考文件、API等技术细节。
⏸ 等待用户选择部署选项。
Step 4: Display License Agreement
步骤4:展示许可协议
Before proceeding to deployment, display the model's license or service terms to the user.
- Read and look up the model by its model ID (determined in Step 1).
references/model-licenses.md - Follow the instructions in the Notes column — use the exact phrasing provided.
- If the model ID is not found in the table, warn the user that you could not find license information for their model and recommend they verify the license independently before proceeding.
⏸ Wait for the user to confirm before proceeding.
在推进部署前,向用户展示模型的许可或服务条款。
- 读取,通过步骤1中确定的模型ID查询对应许可。
references/model-licenses.md - 遵循备注列中的说明——使用提供的原文表述。
- 如果表中找不到对应的模型ID,警告用户无法找到该模型的许可信息,建议用户在继续操作前自行核实许可。
⏸ 等待用户确认后再继续。
Step 5: Follow Pathway Workflow
步骤5:遵循对应路径的工作流
Read the reference file for the selected pathway and follow its instructions.
| Model Type | Deployment Target | Reference |
|---|---|---|
| OSS | SageMaker | |
| OSS | Bedrock | |
| Nova | SageMaker | |
| Nova | Bedrock | |
读取所选部署路径的参考文件并遵循其说明。
| 模型类型 | 部署目标 | 参考文件 |
|---|---|---|
| OSS | SageMaker | |
| OSS | Bedrock | |
| Nova | SageMaker | |
| Nova | Bedrock | |
Step 6: Post-Deployment Summary
步骤6:部署后总结
After deployment completes, provide the user with a summary. Cover these topics, using details from the pathway reference doc you followed in Step 5:
- What was deployed — endpoint or model name, ARN, status
- How to use it — sample invoke code for the specific deployment target
- Cost — billing model (instance-based vs. pay-per-request) and what to expect
- Cleanup — how to delete the endpoint or model when done
部署完成后,向用户提供总结。参考步骤5中使用的路径参考文档的详情,涵盖以下内容:
- 已部署内容 —— 端点或模型名称、ARN、状态
- 使用方法 —— 对应部署目标的调用示例代码
- 成本说明 —— 计费模式(基于实例 vs 按请求付费)和预期费用情况
- 清理方式 —— 使用完成后如何删除端点或模型
Troubleshooting
故障排查
How to check if a model was LoRA or FFT fine-tuned
如何检查模型是LoRA微调还是FFT微调
If deployment fails unexpectedly, the model may have been full fine-tuned (FFT) rather than LoRA. To check, download the training job's hydra config from its S3 output path at :
.hydra/config.yaml- populated (r, alpha, dropout, etc.) → LoRA (supported)
peft_config - → FFT (not supported by this skill)
peft_config: null
如果部署意外失败,模型可能是全参数微调(FFT)而非LoRA微调。要检查该情况,从训练任务的S3输出路径下载位置的hydra配置文件:
.hydra/config.yaml- 有值(r、alpha、dropout等参数)→ LoRA(支持)
peft_config - → FFT(本技能不支持)
peft_config: null