llm-finetuning

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LLM Fine-Tuning Expert

LLM微调专家

A deep learning specialist with hands-on expertise in fine-tuning large language models using parameter-efficient methods, dataset curation, and training optimization. This skill provides guidance for adapting foundation models to specific domains and tasks using LoRA, QLoRA, and the Hugging Face PEFT ecosystem, covering dataset preparation, hyperparameter selection, evaluation strategies, and adapter deployment.

这是一位深度学习专家，具备使用参数高效方法微调大语言模型、数据集治理和训练优化的实操经验。本技能为您提供使用LoRA、QLoRA和Hugging Face PEFT生态将基础模型适配到特定领域和任务的指导，覆盖数据集准备、超参数选择、评估策略和适配器部署等全流程。

Key Principles

核心原则

Fine-tuning is about teaching a model your task format and domain knowledge, not about teaching it language; start with the strongest base model you can afford to run
Dataset quality matters far more than quantity; 1,000 carefully curated, diverse, high-quality examples often outperform 100,000 noisy ones
Use parameter-efficient fine-tuning (LoRA/QLoRA) to reduce memory requirements by orders of magnitude while achieving performance comparable to full fine-tuning
Evaluate with task-specific metrics and human review, not just perplexity; a model with lower perplexity may still produce worse outputs for your specific use case
Track every experiment with exact hyperparameters, dataset versions, and base model checkpoints so that results are reproducible and comparable

微调的作用是教模型适配你的任务格式和领域知识，而非教它理解语言；请从你可负担运行的最强基础模型开始
数据集质量的重要性远高于数量：1000个经过精心筛选、多样化的高质量样本，效果往往优于10万个含噪样本
采用参数高效微调（LoRA/QLoRA）可将内存需求降低数个数量级，同时能达到与全量微调相当的性能表现
要使用任务专属指标和人工审核进行评估，不要只依赖困惑度：困惑度更低的模型在你的特定用例中仍可能产出更差的结果
要记录每次实验的精确超参数、数据集版本和基础模型检查点，确保结果可复现、可对比

Techniques

技术方案

Configure LoRA with appropriate rank (r=8 to 64), alpha (typically 2x rank), and target modules (q_proj, v_proj for attention, or all linear layers for broader adaptation)
Use QLoRA for memory-constrained setups: load the base model in 4-bit NormalFloat quantization, attach LoRA adapters in fp16/bf16, and train with paged optimizers to handle memory spikes
Format datasets as instruction-response pairs with consistent templates; include a system field for persona or context, an instruction field for the task, and a response field for the expected output
Apply the PEFT library workflow: load base model, create LoRA config, get_peft_model(), train with the Hugging Face Trainer or a custom loop, then save and load adapters independently
Set training hyperparameters carefully: learning rate between 1e-5 and 2e-4 with cosine schedule, 1-5 epochs (watch for overfitting), warmup ratio of 0.03-0.1, and gradient accumulation to simulate larger batch sizes
Evaluate with multiple signals: validation loss for overfitting detection, task-specific metrics (ROUGE for summarization, exact match for QA), and structured human evaluation on a held-out set

配置LoRA时选择合适的秩（r=8到64）、alpha（通常为秩的2倍）和目标模块（注意力层选q_proj、v_proj，若要更广泛的适配可选择所有线性层）
内存受限场景下使用QLoRA：以4-bit NormalFloat量化加载基础模型，以fp16/bf16格式挂载LoRA适配器，使用分页优化器处理内存峰值
将数据集格式化为遵循统一模板的指令-响应对：包含用于设定角色或上下文的system字段、描述任务的instruction字段，以及对应预期输出的response字段
遵循PEFT库工作流：加载基础模型、创建LoRA配置、调用get_peft_model()、使用Hugging Face Trainer或自定义循环训练，之后可独立保存和加载适配器
谨慎设置训练超参数：学习率在1e-5到2e-4之间，采用余弦学习率调度，训练1-5个epoch（注意防止过拟合），预热比例设为0.03-0.1，使用梯度累积模拟更大的batch size
采用多维度信号评估：用验证损失检测过拟合，使用任务专属指标（摘要任务用ROUGE，问答任务用精确匹配），并在留出集上开展结构化人工评估

Common Patterns

常见使用场景

Domain Adaptation: Fine-tune on domain-specific text (legal, medical, financial) to teach the model terminology, reasoning patterns, and output formats unique to that field
Instruction Following: Train on diverse instruction-response pairs to improve the model's ability to follow complex multi-step instructions and produce structured outputs
Adapter Merging: After training, merge the LoRA adapter weights back into the base model with merge_and_unload() for inference without the PEFT overhead
Multi-task Training: Mix datasets from different tasks (summarization, classification, extraction) in a single fine-tuning run to create a versatile adapter

领域适配：在领域专属文本（法律、医疗、金融）上微调，让模型学习该领域独有的术语、推理逻辑和输出格式
指令遵循：在多样化的指令-响应对上训练，提升模型遵循复杂多步指令、产出结构化输出的能力
适配器合并：训练完成后，通过merge_and_unload()方法将LoRA适配器权重合并回基础模型，推理时无需额外的PEFT开销
多任务训练：在单次微调运行中混合来自不同任务（摘要、分类、抽取）的数据集，打造通用适配器

Pitfalls to Avoid

注意避坑

Do not fine-tune on data that contains personally identifiable information, copyrighted content, or harmful material without proper review and filtering
Do not train for too many epochs on a small dataset; language models memorize quickly, and overfitting manifests as repetitive, templated outputs that lack generalization
Do not skip decontamination between training and evaluation sets; if evaluation examples appear in training data, metrics will be artificially inflated
Do not assume a single set of hyperparameters works across base models; different architectures and sizes respond differently to learning rates, LoRA ranks, and batch sizes

未经适当审核和过滤，不要在包含个人身份信息、受版权保护内容或有害材料的数据集上进行微调
不要在小数据集上训练过多epoch：大语言模型的记忆能力很强，过拟合会表现为输出重复、模板化，缺乏泛化能力
不要忽略训练集和评估集之间的去重：如果评估样本出现在训练数据中，指标结果会虚高
不要假设同一套超参数适用于所有基础模型：不同架构、不同大小的模型对学习率、LoRA秩、batch size的适配性各不相同