model-deployment

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Model Deployment

模型部署

Identifies the correct deployment pathway based on model characteristics and generates deployment code.

根据模型特征识别正确的部署路径并生成部署代码。

Scope

适用范围

This skill supports deploying Nova and OSS models that were fine-tuned through SageMaker Serverless Model Customization only.

Not supported:

Base models (not fine-tuned)
Models fine-tuned through other processes
Full Fine-Tuning (FFT) — only LoRA fine-tuned models are supported

本技能仅支持部署通过SageMaker Serverless Model Customization微调的Nova和OSS模型。

不支持场景：

基础模型（未经过微调）
通过其他流程微调的模型
全参数微调（FFT）——仅支持LoRA微调的模型

Principles

原则

One thing at a time. Each response advances exactly one decision.
Confirm before proceeding. Wait for the user to agree before moving on. But don't re-ask questions already answered in the conversation — use what you know.
Don't read files until you need them. Only read pathway references after the pathway is confirmed.
Use what you know. If conversation history or artifacts already answer a question, confirm your understanding instead of asking again.

一次处理一项任务。 每次响应仅推进一项决策。
操作前确认。 等待用户同意后再继续推进。但不要重复询问对话中已经回答过的问题——使用已获取的信息。
按需读取文件。 仅在部署路径确认后再读取路径参考文档。
复用已知信息。 如果对话历史或已有产物已经能回答某个问题，先确认你的理解而非重复提问。

Workflow

工作流

Step 1: Identify the Training Job

步骤1：识别训练任务

You need the training job name or ARN. Check the conversation history first — the user may have already mentioned it, or it may be available from earlier steps in the workflow (e.g., fine-tuning). If not, ask the user.

Once you have the training job name or ARN, use the AWS MCP tool to look it up:

Use the AWS MCP tool

describe-training-job

and extract:

S3 output path (from

ModelArtifacts.S3ModelArtifacts

OutputDataConfig.S3OutputPath

)

IAM role ARN (from
```
RoleArn
```
)
Region

Use the AWS MCP tool
```
list-tags
```
on the training job ARN and extract:
- Model ID from the
```
sagemaker-studio:jumpstart-model-id
```
  tag
Determine the model type from the model ID:
- Contains "nova" (nova-micro, nova-lite, nova-pro) → Nova
- Llama, Mistral, Qwen, GPT-OSS, DeepSeek, etc. → OSS

Unsupported models: This skill only supports OSS and Nova models that were LoRA fine-tuned through SageMaker Serverless Model Customization. If the model doesn't match, tell the user this skill can't help and suggest the finetuning skill.

你需要训练任务名称或ARN。首先检查对话历史——用户可能已经提到过，或者在工作流的 earlier步骤（例如微调环节）中已经可以获取。如果没有，向用户询问。

获取到训练任务名称或ARN后，使用AWS MCP工具查询：

使用AWS MCP工具
```
describe-training-job
```
并提取：
- S3输出路径（从
```
ModelArtifacts.S3ModelArtifacts
```
  或
```
OutputDataConfig.S3OutputPath
```
  获取）
- IAM角色ARN（从
```
RoleArn
```
  获取）
- Region（区域）
对训练任务ARN使用AWS MCP工具
```
list-tags
```
并提取：
- 从
```
sagemaker-studio:jumpstart-model-id
```
  标签中获取模型ID
通过模型ID判断模型类型：
- 包含“nova”（nova-micro、nova-lite、nova-pro）→ Nova
- Llama、Mistral、Qwen、GPT-OSS、DeepSeek等 → OSS

不支持的模型： 本技能仅支持通过SageMaker Serverless Model Customization完成LoRA微调的OSS和Nova模型。如果模型不符合要求，告知用户本技能无法提供帮助，并推荐使用微调技能。

Step 2: Determine Eligible Deployment Targets

步骤2：确定符合条件的部署目标

Use the following table:

Model Type	Eligible Targets
OSS	SageMaker, Bedrock
Nova	SageMaker, Bedrock

If only one target is eligible, confirm it with the user. Use details from Step 5.

If multiple targets are eligible, help the user decide. Use details from Step 5.

If no targets are eligible, tell the user and explain why.

参考下表：

模型类型	可部署目标
OSS	SageMaker, Bedrock
Nova	SageMaker, Bedrock

如果只有一个符合条件的目标，和用户确认。使用步骤5中的详情说明。

如果有多个符合条件的目标，帮助用户做选择。使用步骤5中的详情说明。

如果没有符合条件的目标，告知用户并解释原因。

Step 3: Let the User Choose a Deployment Target

步骤3：让用户选择部署目标

Present the eligible options to the user. Present these details to help them decide between SageMaker and Bedrock, if both are available options:

SageMaker Endpoint:

Dedicated compute resources for consistent performance
Control instance types and scaling
Best for predictable workloads with specific latency requirements

Bedrock:

Fully managed serverless inference
Auto-scales instantly with no capacity planning
Pay per request
Best for variable workloads with fluctuating demand

Do NOT make a recommendation. Let the user choose.

Do NOT mention technical details like merged/unmerged weights, reference files, or APIs, unless the user asks.

⏸ Wait for user to select a deployment option.

向用户展示符合条件的可选目标。如果SageMaker和Bedrock都可选，展示以下详情帮助用户决策：

SageMaker端点：

专属计算资源，性能稳定
可控制实例类型和扩缩容策略
最适合有特定延迟要求的可预测工作负载

Bedrock：

全托管无服务器推理
即时自动扩缩，无需做容量规划
按请求付费
最适合需求波动的可变工作负载

不要给出推荐，让用户自行选择。

除非用户询问，否则不要提及合并/未合并权重、参考文件、API等技术细节。

⏸ 等待用户选择部署选项。

Step 4: Display License Agreement

步骤4：展示许可协议

Before proceeding to deployment, display the model's license or service terms to the user.

Read
```
references/model-licenses.md
```
and look up the model by its model ID (determined in Step 1).
Follow the instructions in the Notes column — use the exact phrasing provided.
If the model ID is not found in the table, warn the user that you could not find license information for their model and recommend they verify the license independently before proceeding.

⏸ Wait for the user to confirm before proceeding.

在推进部署前，向用户展示模型的许可或服务条款。

读取
```
references/model-licenses.md
```
，通过步骤1中确定的模型ID查询对应许可。
遵循备注列中的说明——使用提供的原文表述。
如果表中找不到对应的模型ID，警告用户无法找到该模型的许可信息，建议用户在继续操作前自行核实许可。

⏸ 等待用户确认后再继续。

Step 5: Follow Pathway Workflow

步骤5：遵循对应路径的工作流

Read the reference file for the selected pathway and follow its instructions.

Model Type	Deployment Target	Reference
OSS	SageMaker	`references/deploy-oss-sagemaker.md`
OSS	Bedrock	`references/deploy-oss-bedrock.md`
Nova	SageMaker	`references/deploy-nova-sagemaker.md`
Nova	Bedrock	`references/deploy-nova-bedrock.md`

读取所选部署路径的参考文件并遵循其说明。

模型类型	部署目标	参考文件
OSS	SageMaker	`references/deploy-oss-sagemaker.md`
OSS	Bedrock	`references/deploy-oss-bedrock.md`
Nova	SageMaker	`references/deploy-nova-sagemaker.md`
Nova	Bedrock	`references/deploy-nova-bedrock.md`

Step 6: Post-Deployment Summary

步骤6：部署后总结

After deployment completes, provide the user with a summary. Cover these topics, using details from the pathway reference doc you followed in Step 5:

What was deployed — endpoint or model name, ARN, status
How to use it — sample invoke code for the specific deployment target
Cost — billing model (instance-based vs. pay-per-request) and what to expect
Cleanup — how to delete the endpoint or model when done

部署完成后，向用户提供总结。参考步骤5中使用的路径参考文档的详情，涵盖以下内容：

已部署内容 —— 端点或模型名称、ARN、状态
使用方法 —— 对应部署目标的调用示例代码
成本说明 —— 计费模式（基于实例 vs 按请求付费）和预期费用情况
清理方式 —— 使用完成后如何删除端点或模型

Troubleshooting

故障排查

How to check if a model was LoRA or FFT fine-tuned

如何检查模型是LoRA微调还是FFT微调

If deployment fails unexpectedly, the model may have been full fine-tuned (FFT) rather than LoRA. To check, download the training job's hydra config from its S3 output path at

.hydra/config.yaml

```
peft_config
```
populated (r, alpha, dropout, etc.) → LoRA (supported)
```
peft_config: null
```
→ FFT (not supported by this skill)

如果部署意外失败，模型可能是全参数微调（FFT）而非LoRA微调。要检查该情况，从训练任务的S3输出路径下载

.hydra/config.yaml

位置的hydra配置文件：

```
peft_config
```
有值（r、alpha、dropout等参数）→ LoRA（支持）
```
peft_config: null
```
→ FFT（本技能不支持）