finetuning
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePrerequisites
前置条件
Before starting this workflow, verify:
-
Afile exists
use_case_spec.md- If missing: Activate the skill first, then resume
use-case-specification - DON'T EVER offer to create a use case spec without activating the use-case-specification skill.
- If missing: Activate the
-
A fine-tuning technique (SFT, DPO, or RLVR) and base model have already been selected
- If missing: Activate the skill to collect what's missing, then resume
finetuning-setup - Don't make recommendations on the spot. You MUST activate the finetuning-setup skill.
- If missing: Activate the
-
A base model name available on SageMakerHub has been identified
- If missing: Activate the skill to get it
finetuning-setup - Important: Only use the model name that retrieves, as it may differ from other commonly used names for the same model
finetuning-setup
- If missing: Activate the
启动该工作流前,请确认:
-
存在文件
use_case_spec.md- 若缺失:先激活skill,再恢复当前流程
use-case-specification - 切勿在未激活use-case-specification skill的情况下主动提出创建用例规范。
- 若缺失:先激活
-
已选定微调技术(SFT、DPO或RLVR)和基础模型
- 若缺失:激活skill收集缺失信息,再恢复当前流程
finetuning-setup - 不要当场给出推荐,必须先激活finetuning-setup skill。
- 若缺失:激活
-
已确定SageMakerHub上可用的基础模型名称
- 若缺失:激活skill获取
finetuning-setup - 重要提示: 仅使用检索到的模型名称,因为同一模型的该名称可能与其他常用名称存在差异
finetuning-setup
- 若缺失:激活
Critical Rules
关键规则
Code Generation Rules
代码生成规则
- ✅ Use EXACTLY the imports shown in each cell template
- ❌ Do NOT add additional imports even if they seem helpful
- ❌ Do NOT create variables before they're needed in that cell
- 📋 Copy the code structure precisely - no improvisation
- 🎯 Follow the minimal code principle strictly
- ✅ When writing a notebook cell, make sure the indentation and f strings are correct
- ✅ 严格使用每个单元格模板中展示的导入语句
- ❌ 即使看起来有用,也不要添加额外的导入
- ❌ 不要在单元格需要用到变量之前提前创建
- 📋 精确复制代码结构,不得即兴修改
- 🎯 严格遵守最小代码原则
- ✅ 编写notebook单元格时,确保缩进和f字符串格式正确
User Communication Rules
用户沟通规则
- ❌ NEVER offer to run the notebook for the user (you don't have the tools)
- ❌ NEVER offer to move on to a downstream skill while training is in progress (logically impossible)
- ❌ NEVER set ACCEPT_EULA to True yourself (user must read and agree)
- ✅ Always mention both the number AND title of cells you reference
- ✅ If user asks how to run: Tell them to run cells one by one, mention ipykernel requirement
- ❌ 绝对不要主动提出帮用户运行notebook(你没有对应工具)
- ❌ 训练进行中时绝对不要提出跳转至下游skill(逻辑上不可行)
- ❌ 绝对不要自行将ACCEPT_EULA设为True(必须由用户阅读并同意)
- ✅ 引用单元格时必须同时提及单元格的编号和标题
- ✅ 如果用户询问如何运行:告知他们逐个运行单元格,提示需要ipykernel环境
Workflow
工作流
1. Notebook Setup
1. Notebook配置
1.1 Directory Setup
1.1 目录配置
- Identify project directory from conversation context
- If unclear (multiple relevant directories exist) → Ask user which folder to use
- Create Jupyter notebook:
[title]_finetuning.ipynb- = snake_case name derived from use case
[title] - Save under the identified directory
- 从对话上下文识别项目目录
- 若不明确(存在多个相关目录)→ 询问用户要使用哪个文件夹
- 创建Jupyter notebook:
[title]_finetuning.ipynb- = 从用例派生的蛇形命名
[title] - 保存到识别出的目录下
1.2 Select Reference Template
1.2 选择参考模板
Read the example notebook matching the finetuning strategy:
- SFT →
references/sft_example.md - DPO →
references/dpo_example.md - RLVR →
references/rlvr_example.md
读取与微调策略匹配的示例notebook:
- SFT →
references/sft_example.md - DPO →
references/dpo_example.md - RLVR →
references/rlvr_example.md
1.3 Copy Notebook Structure
1.3 复制Notebook结构
- Write the exact cells from the example to
[title]_finetuning.ipynb - Use same order, dependencies, and imports as the example
- DO NOT improvise or add extra code
- 将示例中的单元格原封不动写入
[title]_finetuning.ipynb - 采用与示例完全相同的顺序、依赖和导入
- 不得即兴修改或添加额外代码
1.4 Auto-Generate Configuration Values
1.4 自动生成配置值
In the 'Setup & Credentials' cell, populate:
-
BASE_MODEL
- Use the exact SageMakerHub model name from context
-
MODEL_PACKAGE_GROUP_NAME
- Generate from use case (read if needed)
use_case_spec.md - Format rules:
- Lowercase, alphanumeric with hyphens only
- 1-63 characters
- Pattern:
[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62} - Example: "Customer Support Chatbot" →
customer-support-chatbot-v1
- Generate from use case (read
-
Save notebook
在“Setup & Credentials”单元格中,填充以下内容:
-
BASE_MODEL
- 使用上下文中获取的SageMakerHub模型全称
-
MODEL_PACKAGE_GROUP_NAME
- 从用例生成(必要时读取)
use_case_spec.md - 格式规则:
- 仅支持小写字母、数字和连字符
- 长度1-63个字符
- 匹配模式:
[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62} - 示例:"Customer Support Chatbot" →
customer-support-chatbot-v1
- 从用例生成(必要时读取
-
保存notebook
2. RLVR Reward Function (for RLVR only, skip this section if technique is SFT or DPO)
2. RLVR奖励函数(仅RLVR适用,若使用SFT或DPO技术请跳过本节)
2.1 Check Reward Function Status
2.1 检查奖励函数状态
- Ask if user has a reward function already, or would like help creating one.
- If user says they have one → Ask for the SageMaker Hub Evaluator ARN. Only proceed to Section 2.3 once the user provides a valid Evaluator ARN. If they don't have it registered as a SageMaker Hub Evaluator, continue to 2.2.
- If user says they do not have one → Continue to 2.2
- 询问用户是否已有奖励函数,或者是否需要协助创建。
- 如果用户表示已有 → 索要SageMaker Hub Evaluator ARN。只有用户提供有效的Evaluator ARN后才能进入2.3节。如果用户未将其注册为SageMaker Hub Evaluator,继续执行2.2。
- 如果用户表示没有 → 继续执行2.2
2.2 Generate Reward Function From Template
2.2 从模板生成奖励函数
- Follow workflow in section "Helping Users Create Lambda Functions"
references/rlvr_reward_function.md
- 遵循中“帮助用户创建Lambda函数”章节的工作流操作
references/rlvr_reward_function.md
2.3 Set CUSTOM_REWARD_FUNCTION value
2.3 设置CUSTOM_REWARD_FUNCTION值
- Set the value for in the Notebook with the ARN of the reward function (either given directly by the user, or from the function generation code as
CUSTOM_REWARD_FUNCTION).evaluator.arn
- 在Notebook中为填入奖励函数的ARN(可以是用户直接提供的,也可以是函数生成代码输出的
CUSTOM_REWARD_FUNCTION)。evaluator.arn
3. EULA review and acceptance
3. EULA审核与接受
- Look up the official EULA link for the selected base model from references/eula_links.md
- Display the EULA link(s) to the user in your message as clickable markdown links
- Tell the user they must read and agree to the EULA before using this model (one sentence)
- Ask them to manually change to
ACCEPT_EULAin the notebook after reviewing the licenseTrue - NEVER set ACCEPT_EULA to True yourself
- 从references/eula_links.md中查找所选基础模型的官方EULA链接
- 在你的回复中以可点击的markdown链接形式向用户展示EULA链接
- 告知用户使用该模型前必须阅读并同意EULA(一句话即可)
- 告知用户审核许可后,需手动将notebook中的改为
ACCEPT_EULATrue - 绝对不要自行将ACCEPT_EULA设为True
4. Notebook Execution
4. Notebook执行
- Display the following to the user::
A Jupyter notebook has now been generated which will help you finetune your model. You are free to run it now. Please let me know once the training is complete. - Wait for user's confirmation about training completion. Once the user has confirmed this, you are free to move to the next step of the plan.
CRITICAL:
- DON'T suggest moving to next steps before training completes
- DON'T elaborate on the next steps unless the user specifically asks you about them.
- 向用户展示以下内容:
已生成Jupyter notebook,可协助你完成模型微调。你现在可以开始运行,训练完成后请告知我。 - 等待用户确认训练完成。用户确认后,你可以跳转至计划的下一步。
关键提示:
- 训练完成前不要建议跳转到下一步
- 除非用户专门询问,否则不要详细说明后续步骤
References
参考资料
- - Lambda reward function creation guide (RLVR only)
rlvr_reward_function.md - - Lambda reward function source template for open-weights models (RLVR only)
templates/rlvr_reward_function_source_template.py - - Lambda reward function source template for Nova 2.0 Lite (RLVR only)
templates/nova_rlvr_reward_function_source_template.py - - Complete notebook template for Supervised Fine-Tuning
sft_example.md - - Complete notebook template for Direct Preference Optimization
dpo_example.md - - Complete notebook template for Reinforcement Learning from Verifiable Rewards
rlvr_example.md
- - Lambda奖励函数创建指南(仅RLVR适用)
rlvr_reward_function.md - - 开源权重模型的Lambda奖励函数源代码模板(仅RLVR适用)
templates/rlvr_reward_function_source_template.py - - Nova 2.0 Lite的Lambda奖励函数源代码模板(仅RLVR适用)
templates/nova_rlvr_reward_function_source_template.py - - 监督式微调完整notebook模板
sft_example.md - - 直接偏好优化完整notebook模板
dpo_example.md - - 可验证奖励强化学习完整notebook模板
rlvr_example.md