finetuning

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Prerequisites

前置条件

Before starting this workflow, verify:
  1. A
    use_case_spec.md
    file exists
    • If missing: Activate the
      use-case-specification
      skill first, then resume
    • DON'T EVER offer to create a use case spec without activating the use-case-specification skill.
  2. A fine-tuning technique (SFT, DPO, or RLVR) and base model have already been selected
    • If missing: Activate the
      finetuning-setup
      skill to collect what's missing, then resume
    • Don't make recommendations on the spot. You MUST activate the finetuning-setup skill.
  3. A base model name available on SageMakerHub has been identified
    • If missing: Activate the
      finetuning-setup
      skill to get it
    • Important: Only use the model name that
      finetuning-setup
      retrieves, as it may differ from other commonly used names for the same model
启动该工作流前,请确认:
  1. 存在
    use_case_spec.md
    文件
    • 若缺失:先激活
      use-case-specification
      skill,再恢复当前流程
    • 切勿在未激活use-case-specification skill的情况下主动提出创建用例规范。
  2. 已选定微调技术(SFT、DPO或RLVR)和基础模型
    • 若缺失:激活
      finetuning-setup
      skill收集缺失信息,再恢复当前流程
    • 不要当场给出推荐,必须先激活finetuning-setup skill。
  3. 已确定SageMakerHub上可用的基础模型名称
    • 若缺失:激活
      finetuning-setup
      skill获取
    • 重要提示: 仅使用
      finetuning-setup
      检索到的模型名称,因为同一模型的该名称可能与其他常用名称存在差异

Critical Rules

关键规则

Code Generation Rules

代码生成规则

  • ✅ Use EXACTLY the imports shown in each cell template
  • ❌ Do NOT add additional imports even if they seem helpful
  • ❌ Do NOT create variables before they're needed in that cell
  • 📋 Copy the code structure precisely - no improvisation
  • 🎯 Follow the minimal code principle strictly
  • ✅ When writing a notebook cell, make sure the indentation and f strings are correct
  • ✅ 严格使用每个单元格模板中展示的导入语句
  • ❌ 即使看起来有用,也不要添加额外的导入
  • ❌ 不要在单元格需要用到变量之前提前创建
  • 📋 精确复制代码结构,不得即兴修改
  • 🎯 严格遵守最小代码原则
  • ✅ 编写notebook单元格时,确保缩进和f字符串格式正确

User Communication Rules

用户沟通规则

  • ❌ NEVER offer to run the notebook for the user (you don't have the tools)
  • ❌ NEVER offer to move on to a downstream skill while training is in progress (logically impossible)
  • ❌ NEVER set ACCEPT_EULA to True yourself (user must read and agree)
  • ✅ Always mention both the number AND title of cells you reference
  • ✅ If user asks how to run: Tell them to run cells one by one, mention ipykernel requirement

  • ❌ 绝对不要主动提出帮用户运行notebook(你没有对应工具)
  • ❌ 训练进行中时绝对不要提出跳转至下游skill(逻辑上不可行)
  • ❌ 绝对不要自行将ACCEPT_EULA设为True(必须由用户阅读并同意)
  • ✅ 引用单元格时必须同时提及单元格的编号和标题
  • ✅ 如果用户询问如何运行:告知他们逐个运行单元格,提示需要ipykernel环境

Workflow

工作流

1. Notebook Setup

1. Notebook配置

1.1 Directory Setup

1.1 目录配置

  1. Identify project directory from conversation context
    • If unclear (multiple relevant directories exist) → Ask user which folder to use
  2. Create Jupyter notebook:
    [title]_finetuning.ipynb
    • [title]
      = snake_case name derived from use case
    • Save under the identified directory
  1. 从对话上下文识别项目目录
    • 若不明确(存在多个相关目录)→ 询问用户要使用哪个文件夹
  2. 创建Jupyter notebook:
    [title]_finetuning.ipynb
    • [title]
      = 从用例派生的蛇形命名
    • 保存到识别出的目录下

1.2 Select Reference Template

1.2 选择参考模板

Read the example notebook matching the finetuning strategy:
  • SFT →
    references/sft_example.md
  • DPO →
    references/dpo_example.md
  • RLVR →
    references/rlvr_example.md
读取与微调策略匹配的示例notebook:
  • SFT →
    references/sft_example.md
  • DPO →
    references/dpo_example.md
  • RLVR →
    references/rlvr_example.md

1.3 Copy Notebook Structure

1.3 复制Notebook结构

  1. Write the exact cells from the example to
    [title]_finetuning.ipynb
  2. Use same order, dependencies, and imports as the example
  3. DO NOT improvise or add extra code
  1. 将示例中的单元格原封不动写入
    [title]_finetuning.ipynb
  2. 采用与示例完全相同的顺序、依赖和导入
  3. 不得即兴修改或添加额外代码

1.4 Auto-Generate Configuration Values

1.4 自动生成配置值

In the 'Setup & Credentials' cell, populate:
  1. BASE_MODEL
    • Use the exact SageMakerHub model name from context
  2. MODEL_PACKAGE_GROUP_NAME
    • Generate from use case (read
      use_case_spec.md
      if needed)
    • Format rules:
      • Lowercase, alphanumeric with hyphens only
      • 1-63 characters
      • Pattern:
        [a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}
      • Example: "Customer Support Chatbot" →
        customer-support-chatbot-v1
  3. Save notebook
在“Setup & Credentials”单元格中,填充以下内容:
  1. BASE_MODEL
    • 使用上下文中获取的SageMakerHub模型全称
  2. MODEL_PACKAGE_GROUP_NAME
    • 从用例生成(必要时读取
      use_case_spec.md
    • 格式规则:
      • 仅支持小写字母、数字和连字符
      • 长度1-63个字符
      • 匹配模式:
        [a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}
      • 示例:"Customer Support Chatbot" →
        customer-support-chatbot-v1
  3. 保存notebook

2. RLVR Reward Function (for RLVR only, skip this section if technique is SFT or DPO)

2. RLVR奖励函数(仅RLVR适用,若使用SFT或DPO技术请跳过本节)

2.1 Check Reward Function Status

2.1 检查奖励函数状态

  • Ask if user has a reward function already, or would like help creating one.
    • If user says they have one → Ask for the SageMaker Hub Evaluator ARN. Only proceed to Section 2.3 once the user provides a valid Evaluator ARN. If they don't have it registered as a SageMaker Hub Evaluator, continue to 2.2.
    • If user says they do not have one → Continue to 2.2
  • 询问用户是否已有奖励函数,或者是否需要协助创建。
    • 如果用户表示已有 → 索要SageMaker Hub Evaluator ARN。只有用户提供有效的Evaluator ARN后才能进入2.3节。如果用户未将其注册为SageMaker Hub Evaluator,继续执行2.2。
    • 如果用户表示没有 → 继续执行2.2

2.2 Generate Reward Function From Template

2.2 从模板生成奖励函数

  1. Follow workflow in
    references/rlvr_reward_function.md
    section "Helping Users Create Lambda Functions"
  1. 遵循
    references/rlvr_reward_function.md
    中“帮助用户创建Lambda函数”章节的工作流操作

2.3 Set CUSTOM_REWARD_FUNCTION value

2.3 设置CUSTOM_REWARD_FUNCTION值

  1. Set the value for
    CUSTOM_REWARD_FUNCTION
    in the Notebook with the ARN of the reward function (either given directly by the user, or from the function generation code as
    evaluator.arn
    ).
  1. 在Notebook中为
    CUSTOM_REWARD_FUNCTION
    填入奖励函数的ARN(可以是用户直接提供的,也可以是函数生成代码输出的
    evaluator.arn
    )。

3. EULA review and acceptance

3. EULA审核与接受

  1. Look up the official EULA link for the selected base model from references/eula_links.md
  2. Display the EULA link(s) to the user in your message as clickable markdown links
  3. Tell the user they must read and agree to the EULA before using this model (one sentence)
  4. Ask them to manually change
    ACCEPT_EULA
    to
    True
    in the notebook after reviewing the license
  5. NEVER set ACCEPT_EULA to True yourself
  1. 从references/eula_links.md中查找所选基础模型的官方EULA链接
  2. 在你的回复中以可点击的markdown链接形式向用户展示EULA链接
  3. 告知用户使用该模型前必须阅读并同意EULA(一句话即可)
  4. 告知用户审核许可后,需手动将notebook中的
    ACCEPT_EULA
    改为
    True
  5. 绝对不要自行将ACCEPT_EULA设为True

4. Notebook Execution

4. Notebook执行

  1. Display the following to the user::
    A Jupyter notebook has now been generated which will help you finetune your model. You are free to run it now. Please let me know once the training is complete.
  2. Wait for user's confirmation about training completion. Once the user has confirmed this, you are free to move to the next step of the plan.
CRITICAL:
  • DON'T suggest moving to next steps before training completes
  • DON'T elaborate on the next steps unless the user specifically asks you about them.

  1. 向用户展示以下内容:
    已生成Jupyter notebook,可协助你完成模型微调。你现在可以开始运行,训练完成后请告知我。
  2. 等待用户确认训练完成。用户确认后,你可以跳转至计划的下一步。
关键提示:
  • 训练完成前不要建议跳转到下一步
  • 除非用户专门询问,否则不要详细说明后续步骤

References

参考资料

  • rlvr_reward_function.md
    - Lambda reward function creation guide (RLVR only)
  • templates/rlvr_reward_function_source_template.py
    - Lambda reward function source template for open-weights models (RLVR only)
  • templates/nova_rlvr_reward_function_source_template.py
    - Lambda reward function source template for Nova 2.0 Lite (RLVR only)
  • sft_example.md
    - Complete notebook template for Supervised Fine-Tuning
  • dpo_example.md
    - Complete notebook template for Direct Preference Optimization
  • rlvr_example.md
    - Complete notebook template for Reinforcement Learning from Verifiable Rewards
  • rlvr_reward_function.md
    - Lambda奖励函数创建指南(仅RLVR适用)
  • templates/rlvr_reward_function_source_template.py
    - 开源权重模型的Lambda奖励函数源代码模板(仅RLVR适用)
  • templates/nova_rlvr_reward_function_source_template.py
    - Nova 2.0 Lite的Lambda奖励函数源代码模板(仅RLVR适用)
  • sft_example.md
    - 监督式微调完整notebook模板
  • dpo_example.md
    - 直接偏好优化完整notebook模板
  • rlvr_example.md
    - 可验证奖励强化学习完整notebook模板