llm-fine-tuning-guide

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LLM Fine-Tuning Guide

LLM 微调指南

Master the art of fine-tuning large language models to create specialized models optimized for your specific use cases, domains, and performance requirements.

掌握大语言模型微调技术，打造针对你的特定用例、领域和性能需求优化的专用模型。

Overview

概述

Fine-tuning adapts pre-trained LLMs to specific tasks, domains, or styles by training them on curated datasets. This improves accuracy, reduces hallucinations, and optimizes costs.

微调通过在精心整理的数据集上训练预训练LLM，使其适配特定任务、领域或风格。这能提升准确率、减少幻觉并优化成本。

When to Fine-Tune

何时进行微调

Domain Specialization: Legal documents, medical records, financial reports
Task-Specific Performance: Better results on specific tasks than base model
Cost Optimization: Smaller fine-tuned model replaces expensive large model
Style Adaptation: Match specific writing styles or tones
Compliance Requirements: Keep sensitive data within your infrastructure
Latency Requirements: Smaller models deploy faster

领域专业化：法律文档、医疗记录、财务报告
特定任务性能提升：在特定任务上取得比基础模型更好的结果
成本优化：用更小的微调模型替代昂贵的大型模型
风格适配：匹配特定写作风格或语气
合规要求：将敏感数据保留在你的基础设施内
延迟要求：更小的模型部署速度更快

When NOT to Fine-Tune

何时不进行微调

One-off queries (use prompting instead)
Rapidly changing information (use RAG instead)
Limited training data (< 100 examples typically insufficient)
General knowledge questions (base model sufficient)

一次性查询（改用提示词即可）
快速变化的信息（改用RAG）
训练数据有限（通常少于100个示例不足够）
通用知识问题（基础模型已足够）

Quick Start

快速开始

Full Fine-Tuning:

bash

python examples/full_fine_tuning.py

LoRA (Recommended for most cases):

bash

python examples/lora_fine_tuning.py

QLoRA (Single GPU):

bash

python examples/qlora_fine_tuning.py

Data Preparation:

bash

python scripts/data_preparation.py

全量微调:

bash

python examples/full_fine_tuning.py

LoRA（大多数场景推荐）:

bash

python examples/lora_fine_tuning.py

QLoRA（单GPU）:

bash

python examples/qlora_fine_tuning.py

数据准备:

bash

python scripts/data_preparation.py

Fine-Tuning Approaches

微调方法

1. Full Fine-Tuning

1. 全量微调

Update all model parameters during training.

Pros:

Maximum performance improvement
Can completely rewrite model behavior
Best for significant domain shifts

Cons:

High computational cost
Requires large dataset (1000+ examples)
Risk of catastrophic forgetting
Long training time

python

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer

model_id = "meta-llama/Llama-2-7b"
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

training_args = TrainingArguments(
    output_dir="./fine-tuned-llama",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-5,
    weight_decay=0.01,
    logging_steps=10,
    save_steps=100,
    eval_strategy="steps",
    eval_steps=50,
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

trainer.train()

训练期间更新所有模型参数。

优点:

性能提升最大化
可完全改写模型行为
最适合显著的领域迁移

缺点:

计算成本高
需要大型数据集（1000+示例）
存在灾难性遗忘风险
训练时间长

python

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer

model_id = "meta-llama/Llama-2-7b"
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

training_args = TrainingArguments(
    output_dir="./fine-tuned-llama",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-5,
    weight_decay=0.01,
    logging_steps=10,
    save_steps=100,
    eval_strategy="steps",
    eval_steps=50,
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

trainer.train()

2. Parameter-Efficient Fine-Tuning (PEFT)

2. 参数高效微调（PEFT）

Train only a small fraction of parameters.

仅训练小部分参数。

LoRA (Low-Rank Adaptation)

LoRA（低秩适配）

Adds trainable low-rank matrices to existing weights.

Pros:

99% fewer trainable parameters
Maintains base model knowledge
Fast training (10-20x faster)
Easy to switch between adapters

Cons:

Slightly lower performance than full fine-tuning
Requires base model at inference

python

from peft import get_peft_model, LoraConfig, TaskType
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model_id = "meta-llama/Llama-2-7b"
model = AutoModelForCausalLM.from_pretrained(base_model_id)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

在现有权重中添加可训练的低秩矩阵。

优点:

可训练参数减少99%
保留基础模型知识
训练速度快（10-20倍）
轻松在适配器间切换

缺点:

性能略低于全量微调
推理时需要基础模型

python

from peft import get_peft_model, LoraConfig, TaskType
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model_id = "meta-llama/Llama-2-7b"
model = AutoModelForCausalLM.from_pretrained(base_model_id)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

Configure LoRA

配置LoRA

lora_config = LoraConfig( r=8, # Rank of low-rank matrices lora_alpha=16, # Scaling factor target_modules=["q_proj", "v_proj"], # Which layers to adapt lora_dropout=0.05, bias="none", task_type=TaskType.CAUSAL_LM )

lora_config = LoraConfig( r=8, # 低秩矩阵的秩 lora_alpha=16, # 缩放因子 target_modules=["q_proj", "v_proj"], # 要适配的层 lora_dropout=0.05, bias="none", task_type=TaskType.CAUSAL_LM )

Wrap model with LoRA

用LoRA包装模型

model = get_peft_model(model, lora_config) model.print_trainable_parameters()

Output: trainable params: 4,194,304 || all params: 6,738,415,616 || trainable%: 0.06

输出: trainable params: 4,194,304 || all params: 6,738,415,616 || trainable%: 0.06

Train as normal

正常训练

trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, ) trainer.train()

Save only LoRA weights

仅保存LoRA权重

model.save_pretrained("./llama-lora-adapter")

undefined

model.save_pretrained("./llama-lora-adapter")

undefined

QLoRA (Quantized LoRA)

QLoRA（量化LoRA）

Combines LoRA with quantization for extreme efficiency.

python

from peft import prepare_model_for_kbit_training, get_peft_model, LoraConfig
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

将LoRA与量化结合实现极致效率。

python

from peft import prepare_model_for_kbit_training, get_peft_model, LoraConfig
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

Quantization config

量化配置

bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="float16", bnb_4bit_use_double_quant=True )

Load quantized model

加载量化模型

model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-2-7b", quantization_config=bnb_config, device_map="auto" )

Prepare for training

准备训练

model = prepare_model_for_kbit_training(model)

Apply LoRA

应用LoRA

lora_config = LoraConfig( r=8, lora_alpha=32, target_modules=["q_proj", "v_proj", "k_proj", "o_proj"], lora_dropout=0.05, bias="none", task_type=TaskType.CAUSAL_LM )

model = get_peft_model(model, lora_config)

lora_config = LoraConfig( r=8, lora_alpha=32, target_modules=["q_proj", "v_proj", "k_proj", "o_proj"], lora_dropout=0.05, bias="none", task_type=TaskType.CAUSAL_LM )

model = get_peft_model(model, lora_config)

Train on single GPU

在单GPU上训练

trainer = Trainer( model=model, args=TrainingArguments( output_dir="./qlora-output", per_device_train_batch_size=1, gradient_accumulation_steps=4, learning_rate=5e-4, num_train_epochs=3, ), train_dataset=train_dataset, ) trainer.train()

undefined

undefined

Prefix Tuning

Prefix Tuning（前缀微调）

Prepends trainable tokens to input.

python

from peft import get_peft_model, PrefixTuningConfig

config = PrefixTuningConfig(
    num_virtual_tokens=20,
    task_type=TaskType.CAUSAL_LM,
)

model = get_peft_model(model, config)

在输入前添加可训练的tokens。

python

from peft import get_peft_model, PrefixTuningConfig

config = PrefixTuningConfig(
    num_virtual_tokens=20,
    task_type=TaskType.CAUSAL_LM,
)

model = get_peft_model(model, config)

Only 20 * embedding_dim parameters trained

仅训练20 * embedding_dim个参数

undefined

undefined

3. Instruction Fine-Tuning

3. 指令微调

Train model to follow instructions with examples.

python

undefined

用示例训练模型遵循指令。

python

undefined

Training data format

训练数据格式

training_data = [ { "instruction": "Translate to French", "input": "Hello, how are you?", "output": "Bonjour, comment allez-vous?" }, { "instruction": "Summarize this text", "input": "Long document...", "output": "Summary..." } ]

Template for training

训练模板

template = """Below is an instruction that describes a task, paired with an input that provides further context.

Instruction:

{instruction}

Input:

{input}

Response:

{output}"""

Create formatted dataset

创建格式化数据集

formatted_data = [ template.format(**example) for example in training_data ]

undefined

formatted_data = [ template.format(**example) for example in training_data ]

undefined

4. Domain-Specific Fine-Tuning

4. 领域特定微调

Tailor models for specific industries or fields.

为特定行业或领域定制模型。

Legal Domain Example

法律领域示例

python

legal_training_data = [
    {
        "prompt": "What are the key clauses in an NDA?",
        "completion": """Key clauses typically include:
1. Definition of Confidential Information
2. Non-Disclosure Obligations
3. Permitted Disclosures
4. Term and Termination
5. Return of Information
6. Remedies"""
    },
    # ... more legal examples
]

python

legal_training_data = [
    {
        "prompt": "What are the key clauses in an NDA?",
        "completion": """Key clauses typically include:
1. Definition of Confidential Information
2. Non-Disclosure Obligations
3. Permitted Disclosures
4. Term and Termination
5. Return of Information
6. Remedies"""
    },
    # ... 更多法律示例
]

Train on legal domain

在法律领域上训练

model = fine_tune_on_domain( base_model="gpt-3.5-turbo", training_data=legal_training_data, epochs=3, learning_rate=0.0002, )

undefined

model = fine_tune_on_domain( base_model="gpt-3.5-turbo", training_data=legal_training_data, epochs=3, learning_rate=0.0002, )

undefined

Data Preparation

数据准备

1. Dataset Quality

1. 数据集质量

python

class DatasetValidator:
    def validate_dataset(self, data):
        issues = {
            "empty_samples": 0,
            "duplicates": 0,
            "outliers": 0,
            "imbalance": {}
        }

        # Check for empty samples
        for sample in data:
            if not sample.get("text"):
                issues["empty_samples"] += 1

        # Check for duplicates
        texts = [s.get("text") for s in data]
        issues["duplicates"] = len(texts) - len(set(texts))

        # Check for length outliers
        lengths = [len(t.split()) for t in texts]
        mean_length = sum(lengths) / len(lengths)
        issues["outliers"] = sum(1 for l in lengths if l > mean_length * 3)

        return issues

python

class DatasetValidator:
    def validate_dataset(self, data):
        issues = {
            "empty_samples": 0,
            "duplicates": 0,
            "outliers": 0,
            "imbalance": {}
        }

        # 检查空样本
        for sample in data:
            if not sample.get("text"):
                issues["empty_samples"] += 1

        # 检查重复样本
        texts = [s.get("text") for s in data]
        issues["duplicates"] = len(texts) - len(set(texts))

        # 检查长度异常值
        lengths = [len(t.split()) for t in texts]
        mean_length = sum(lengths) / len(lengths)
        issues["outliers"] = sum(1 for l in lengths if l > mean_length * 3)

        return issues

Validate before training

训练前验证

validator = DatasetValidator() issues = validator.validate_dataset(training_data) print(f"Dataset Issues: {issues}")

undefined

validator = DatasetValidator() issues = validator.validate_dataset(training_data) print(f"Dataset Issues: {issues}")

undefined

2. Data Augmentation

2. 数据增强

python

from nlpaug.augmenter.word import SynonymAug, RandomWordAug
import nlpaug.flow as naf

python

from nlpaug.augmenter.word import SynonymAug, RandomWordAug
import nlpaug.flow as naf

Create augmentation pipeline

创建增强管道

text = "The quick brown fox jumps over the lazy dog"

Synonym replacement

同义词替换

aug_syn = SynonymAug(aug_p=0.3) augmented_syn = aug_syn.augment(text)

Random word insertion

随机插入单词

aug_insert = RandomWordAug(action="insert", aug_p=0.3) augmented_insert = aug_insert.augment(text)

Combine augmentations

组合增强方法

flow = naf.Sequential([ SynonymAug(aug_p=0.2), RandomWordAug(action="swap", aug_p=0.2) ]) augmented = flow.augment(text)

undefined

flow = naf.Sequential([ SynonymAug(aug_p=0.2), RandomWordAug(action="swap", aug_p=0.2) ]) augmented = flow.augment(text)

undefined

3. Train/Validation Split

3. 训练/验证集划分

python

from sklearn.model_selection import train_test_split

python

from sklearn.model_selection import train_test_split

Create splits

创建划分

train_data, eval_data = train_test_split( data, test_size=0.2, random_state=42 )

eval_data, test_data = train_test_split( eval_data, test_size=0.5, random_state=42 )

print(f"Train: {len(train_data)}, Eval: {len(eval_data)}, Test: {len(test_data)}")

undefined

train_data, eval_data = train_test_split( data, test_size=0.2, random_state=42 )

eval_data, test_data = train_test_split( eval_data, test_size=0.5, random_state=42 )

print(f"Train: {len(train_data)}, Eval: {len(eval_data)}, Test: {len(test_data)}")

undefined

Training Techniques

训练技巧

1. Learning Rate Scheduling

1. 学习率调度

python

from torch.optim.lr_scheduler import CosineAnnealingLR, LinearLR

python

from torch.optim.lr_scheduler import CosineAnnealingLR, LinearLR

Linear warmup + cosine annealing

线性预热 + 余弦退火

def get_scheduler(optimizer, num_steps): lr_scheduler = get_linear_schedule_with_warmup( optimizer, num_warmup_steps=500, num_training_steps=num_steps ) return lr_scheduler

training_args = TrainingArguments( learning_rate=1e-4, lr_scheduler_type="cosine", warmup_steps=500, warmup_ratio=0.1, )

undefined

def get_scheduler(optimizer, num_steps): lr_scheduler = get_linear_schedule_with_warmup( optimizer, num_warmup_steps=500, num_training_steps=num_steps ) return lr_scheduler

training_args = TrainingArguments( learning_rate=1e-4, lr_scheduler_type="cosine", warmup_steps=500, warmup_ratio=0.1, )

undefined

2. Gradient Accumulation

2. 梯度累积

python

training_args = TrainingArguments(
    gradient_accumulation_steps=4,  # Accumulate gradients over 4 steps
    per_device_train_batch_size=1,   # Effective batch size: 1 * 4 = 4
)

python

training_args = TrainingArguments(
    gradient_accumulation_steps=4,  # 累积4步梯度
    per_device_train_batch_size=1,   # 有效批量大小: 1 * 4 = 4
)

Simulates larger batch on limited GPU memory

在有限GPU内存上模拟大批量

undefined

undefined

3. Mixed Precision Training

3. 混合精度训练

python

training_args = TrainingArguments(
    fp16=True,  # Use 16-bit floats
    bf16=False,
)

python

training_args = TrainingArguments(
    fp16=True,  # 使用16位浮点数
    bf16=False,
)

Reduces memory usage by 50%, speeds up training

内存使用减少50%，训练速度提升

undefined

undefined

4. Multi-GPU Training

4. 多GPU训练

python

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=4,
    dataloader_pin_memory=True,
    dataloader_num_workers=4,
)

python

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=4,
    dataloader_pin_memory=True,
    dataloader_num_workers=4,
)

Automatically uses all available GPUs

自动使用所有可用GPU

trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset, )

undefined

trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset, )

undefined

Popular Models for Fine-Tuning

适合微调的热门模型

Open Source Models

开源模型

Llama 3.2 (Meta)

python

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-7b")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-7b")

python

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-7b")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-7b")

Fine-tune on custom data

在自定义数据上微调

... training code

... 训练代码


**Characteristics**:
- 7B, 70B parameter versions
- Strong instruction-following
- Excellent for domain adaptation
- Apache 2.0 license


**特点**:
- 7B、70B参数版本
- 出色的指令遵循能力
- 非常适合领域适配
- Apache 2.0许可证

Gemma 3 (Google)

python

model = AutoModelForCausalLM.from_pretrained("google/gemma-3-2b")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-2b")

python

model = AutoModelForCausalLM.from_pretrained("google/gemma-3-2b")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-2b")

Gemma 3 sizes: 2B, 7B, 27B

Gemma 3尺寸: 2B, 7B, 27B

Very efficient, great for fine-tuning

效率极高，非常适合微调


**Characteristics**:
- Small, medium, large sizes
- Efficient architecture
- Good for edge deployment
- Built on cutting-edge research


**特点**:
- 小、中、大三种尺寸
- 高效架构
- 适合边缘部署
- 基于前沿研究构建

Mistral 7B

python

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

python

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

Strong performance, efficient architecture

性能强劲，架构高效


**Characteristics**:
- Sliding window attention
- Efficient inference
- Strong performance-to-size ratio


**特点**:
- 滑动窗口注意力
- 推理高效
- 出色的性能-尺寸比

Commercial Models

商用模型

OpenAI Fine-Tuning API

OpenAI 微调API

python

import openai

python

import openai

Prepare training data

准备训练数据

training_file = openai.File.create( file=open("training_data.jsonl", "rb"), purpose="fine-tune" )

Create fine-tuning job

创建微调任务

fine_tune_job = openai.FineTuningJob.create( training_file=training_file.id, model="gpt-3.5-turbo", hyperparameters={ "n_epochs": 3, "learning_rate_multiplier": 0.1, } )

Wait for completion

等待完成

fine_tuned_model = openai.FineTuningJob.retrieve(fine_tune_job.id) print(f"Status: {fine_tuned_model.status}")

Use fine-tuned model

使用微调后的模型

response = openai.ChatCompletion.create( model=fine_tuned_model.fine_tuned_model, messages=[{"role": "user", "content": "Hello"}] )

undefined

response = openai.ChatCompletion.create( model=fine_tuned_model.fine_tuned_model, messages=[{"role": "user", "content": "Hello"}] )

undefined

Evaluation and Metrics

评估与指标

1. Perplexity

1. 困惑度

python

import torch
from math import exp

def calculate_perplexity(model, eval_dataset):
    model.eval()
    total_loss = 0
    total_tokens = 0

    with torch.no_grad():
        for batch in eval_dataset:
            outputs = model(**batch)
            loss = outputs.loss
            total_loss += loss.item() * batch["input_ids"].shape[0]
            total_tokens += batch["input_ids"].shape[0]

    perplexity = exp(total_loss / total_tokens)
    return perplexity

perplexity = calculate_perplexity(model, eval_dataset)
print(f"Perplexity: {perplexity:.2f}")

python

import torch
from math import exp

def calculate_perplexity(model, eval_dataset):
    model.eval()
    total_loss = 0
    total_tokens = 0

    with torch.no_grad():
        for batch in eval_dataset:
            outputs = model(**batch)
            loss = outputs.loss
            total_loss += loss.item() * batch["input_ids"].shape[0]
            total_tokens += batch["input_ids"].shape[0]

    perplexity = exp(total_loss / total_tokens)
    return perplexity

perplexity = calculate_perplexity(model, eval_dataset)
print(f"Perplexity: {perplexity:.2f}")

2. Task-Specific Metrics

2. 特定任务指标

python

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score

def evaluate_task(predictions, ground_truth):
    return {
        "accuracy": accuracy_score(ground_truth, predictions),
        "precision": precision_score(ground_truth, predictions, average='weighted'),
        "recall": recall_score(ground_truth, predictions, average='weighted'),
        "f1": f1_score(ground_truth, predictions, average='weighted'),
    }

python

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score

def evaluate_task(predictions, ground_truth):
    return {
        "accuracy": accuracy_score(ground_truth, predictions),
        "precision": precision_score(ground_truth, predictions, average='weighted'),
        "recall": recall_score(ground_truth, predictions, average='weighted'),
        "f1": f1_score(ground_truth, predictions, average='weighted'),
    }

Evaluate on task

评估任务

predictions = [model.predict(x) for x in test_data] metrics = evaluate_task(predictions, test_labels) print(f"Metrics: {metrics}")

undefined

predictions = [model.predict(x) for x in test_data] metrics = evaluate_task(predictions, test_labels) print(f"Metrics: {metrics}")

undefined

3. Human Evaluation

3. 人工评估

python

class HumanEvaluator:
    def evaluate_response(self, prompt, response):
        criteria = {
            "relevance": self._score_relevance(prompt, response),
            "coherence": self._score_coherence(response),
            "factuality": self._score_factuality(response),
            "helpfulness": self._score_helpfulness(response),
        }
        return sum(criteria.values()) / len(criteria)

    def _score_relevance(self, prompt, response):
        # Score 1-5
        pass

    def _score_coherence(self, response):
        # Score 1-5
        pass

python

class HumanEvaluator:
    def evaluate_response(self, prompt, response):
        criteria = {
            "relevance": self._score_relevance(prompt, response),
            "coherence": self._score_coherence(response),
            "factuality": self._score_factuality(response),
            "helpfulness": self._score_helpfulness(response),
        }
        return sum(criteria.values()) / len(criteria)

    def _score_relevance(self, prompt, response):
        # 评分1-5
        pass

    def _score_coherence(self, response):
        # 评分1-5
        pass

Common Challenges & Solutions

常见挑战与解决方案

Challenge: Catastrophic Forgetting

挑战：灾难性遗忘

Model forgets pre-trained knowledge while adapting to new domain.

Solutions:

Use lower learning rates (2e-5 to 5e-5)
Smaller training epochs (1-3)
Regularization techniques
Continual learning approaches

python

undefined

模型在适配新领域时忘记预训练知识。

解决方案:

使用更低的学习率（2e-5至5e-5）
更少的训练轮次（1-3轮）
正则化技术
持续学习方法

python

undefined

Conservative training settings

保守的训练设置

training_args = TrainingArguments( learning_rate=2e-5, # Lower learning rate num_train_epochs=2, # Few epochs weight_decay=0.01, # L2 regularization warmup_steps=500, save_total_limit=3, load_best_model_at_end=True, )

undefined

training_args = TrainingArguments( learning_rate=2e-5, # 更低的学习率 num_train_epochs=2, # 较少的轮次 weight_decay=0.01, # L2正则化 warmup_steps=500, save_total_limit=3, load_best_model_at_end=True, )

undefined

Challenge: Overfitting

挑战：过拟合

Model performs well on training data but poorly on new data.

Solutions:

Use more training data
Implement dropout
Early stopping
Validation monitoring

python

training_args = TrainingArguments(
    eval_strategy="steps",
    eval_steps=50,
    load_best_model_at_end=True,
    early_stopping_patience=3,
    metric_for_best_model="eval_loss",
)

模型在训练数据上表现良好，但在新数据上表现差。

解决方案:

使用更多训练数据
实现dropout
早停
验证集监控

python

training_args = TrainingArguments(
    eval_strategy="steps",
    eval_steps=50,
    load_best_model_at_end=True,
    early_stopping_patience=3,
    metric_for_best_model="eval_loss",
)

Challenge: Insufficient Training Data

挑战：训练数据不足

Few examples for fine-tuning.

Solutions:

Data augmentation
Use PEFT (LoRA) instead of full fine-tuning
Few-shot learning with prompting
Transfer learning

python

undefined

微调可用示例很少。

解决方案:

数据增强
使用PEFT（LoRA）替代全量微调
结合提示词的少样本学习
迁移学习

python

undefined

Use LoRA when data is limited

数据有限时使用LoRA

lora_config = LoraConfig( r=8, lora_alpha=16, target_modules=["q_proj", "v_proj"], lora_dropout=0.05, )

undefined

lora_config = LoraConfig( r=8, lora_alpha=16, target_modules=["q_proj", "v_proj"], lora_dropout=0.05, )

undefined

Best Practices

最佳实践

Before Fine-Tuning

微调前

✓ Start with a strong base model
✓ Prepare high-quality training data (100+ examples recommended)
✓ Define clear evaluation metrics
✓ Set up proper train/validation splits
✓ Document your objectives

✓ 从强大的基础模型开始
✓ 准备高质量训练数据（建议100+示例）
✓ 定义清晰的评估指标
✓ 设置合适的训练/验证集划分
✓ 记录你的目标

During Fine-Tuning

微调中

✓ Monitor training/validation loss
✓ Use appropriate learning rates
✓ Save checkpoints regularly
✓ Validate on held-out data
✓ Watch for overfitting/underfitting

✓ 监控训练/验证损失
✓ 使用合适的学习率
✓ 定期保存检查点
✓ 在预留数据上验证
✓ 注意过拟合/欠拟合

After Fine-Tuning

微调后

✓ Evaluate on test set
✓ Compare against baseline
✓ Perform qualitative analysis
✓ Document configuration and results
✓ Version your fine-tuned models

✓ 在测试集上评估
✓ 与基线模型对比
✓ 执行定性分析
✓ 记录配置和结果
✓ 版本化你的微调模型