llm-fine-tuning-guide
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLLM Fine-Tuning Guide
LLM 微调指南
Master the art of fine-tuning large language models to create specialized models optimized for your specific use cases, domains, and performance requirements.
掌握大语言模型微调技术,打造针对你的特定用例、领域和性能需求优化的专用模型。
Overview
概述
Fine-tuning adapts pre-trained LLMs to specific tasks, domains, or styles by training them on curated datasets. This improves accuracy, reduces hallucinations, and optimizes costs.
微调通过在精心整理的数据集上训练预训练LLM,使其适配特定任务、领域或风格。这能提升准确率、减少幻觉并优化成本。
When to Fine-Tune
何时进行微调
- Domain Specialization: Legal documents, medical records, financial reports
- Task-Specific Performance: Better results on specific tasks than base model
- Cost Optimization: Smaller fine-tuned model replaces expensive large model
- Style Adaptation: Match specific writing styles or tones
- Compliance Requirements: Keep sensitive data within your infrastructure
- Latency Requirements: Smaller models deploy faster
- 领域专业化:法律文档、医疗记录、财务报告
- 特定任务性能提升:在特定任务上取得比基础模型更好的结果
- 成本优化:用更小的微调模型替代昂贵的大型模型
- 风格适配:匹配特定写作风格或语气
- 合规要求:将敏感数据保留在你的基础设施内
- 延迟要求:更小的模型部署速度更快
When NOT to Fine-Tune
何时不进行微调
- One-off queries (use prompting instead)
- Rapidly changing information (use RAG instead)
- Limited training data (< 100 examples typically insufficient)
- General knowledge questions (base model sufficient)
- 一次性查询(改用提示词即可)
- 快速变化的信息(改用RAG)
- 训练数据有限(通常少于100个示例不足够)
- 通用知识问题(基础模型已足够)
Quick Start
快速开始
Full Fine-Tuning:
bash
python examples/full_fine_tuning.pyLoRA (Recommended for most cases):
bash
python examples/lora_fine_tuning.pyQLoRA (Single GPU):
bash
python examples/qlora_fine_tuning.pyData Preparation:
bash
python scripts/data_preparation.py全量微调:
bash
python examples/full_fine_tuning.pyLoRA(大多数场景推荐):
bash
python examples/lora_fine_tuning.pyQLoRA(单GPU):
bash
python examples/qlora_fine_tuning.py数据准备:
bash
python scripts/data_preparation.pyFine-Tuning Approaches
微调方法
1. Full Fine-Tuning
1. 全量微调
Update all model parameters during training.
Pros:
- Maximum performance improvement
- Can completely rewrite model behavior
- Best for significant domain shifts
Cons:
- High computational cost
- Requires large dataset (1000+ examples)
- Risk of catastrophic forgetting
- Long training time
python
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
model_id = "meta-llama/Llama-2-7b"
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
training_args = TrainingArguments(
output_dir="./fine-tuned-llama",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-5,
weight_decay=0.01,
logging_steps=10,
save_steps=100,
eval_strategy="steps",
eval_steps=50,
load_best_model_at_end=True,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
trainer.train()训练期间更新所有模型参数。
优点:
- 性能提升最大化
- 可完全改写模型行为
- 最适合显著的领域迁移
缺点:
- 计算成本高
- 需要大型数据集(1000+示例)
- 存在灾难性遗忘风险
- 训练时间长
python
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
model_id = "meta-llama/Llama-2-7b"
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
training_args = TrainingArguments(
output_dir="./fine-tuned-llama",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-5,
weight_decay=0.01,
logging_steps=10,
save_steps=100,
eval_strategy="steps",
eval_steps=50,
load_best_model_at_end=True,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
trainer.train()2. Parameter-Efficient Fine-Tuning (PEFT)
2. 参数高效微调(PEFT)
Train only a small fraction of parameters.
仅训练小部分参数。
LoRA (Low-Rank Adaptation)
LoRA(低秩适配)
Adds trainable low-rank matrices to existing weights.
Pros:
- 99% fewer trainable parameters
- Maintains base model knowledge
- Fast training (10-20x faster)
- Easy to switch between adapters
Cons:
- Slightly lower performance than full fine-tuning
- Requires base model at inference
python
from peft import get_peft_model, LoraConfig, TaskType
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model_id = "meta-llama/Llama-2-7b"
model = AutoModelForCausalLM.from_pretrained(base_model_id)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)在现有权重中添加可训练的低秩矩阵。
优点:
- 可训练参数减少99%
- 保留基础模型知识
- 训练速度快(10-20倍)
- 轻松在适配器间切换
缺点:
- 性能略低于全量微调
- 推理时需要基础模型
python
from peft import get_peft_model, LoraConfig, TaskType
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model_id = "meta-llama/Llama-2-7b"
model = AutoModelForCausalLM.from_pretrained(base_model_id)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)Configure LoRA
配置LoRA
lora_config = LoraConfig(
r=8, # Rank of low-rank matrices
lora_alpha=16, # Scaling factor
target_modules=["q_proj", "v_proj"], # Which layers to adapt
lora_dropout=0.05,
bias="none",
task_type=TaskType.CAUSAL_LM
)
lora_config = LoraConfig(
r=8, # 低秩矩阵的秩
lora_alpha=16, # 缩放因子
target_modules=["q_proj", "v_proj"], # 要适配的层
lora_dropout=0.05,
bias="none",
task_type=TaskType.CAUSAL_LM
)
Wrap model with LoRA
用LoRA包装模型
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
Output: trainable params: 4,194,304 || all params: 6,738,415,616 || trainable%: 0.06
输出: trainable params: 4,194,304 || all params: 6,738,415,616 || trainable%: 0.06
Train as normal
正常训练
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
)
trainer.train()
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
)
trainer.train()
Save only LoRA weights
仅保存LoRA权重
model.save_pretrained("./llama-lora-adapter")
undefinedmodel.save_pretrained("./llama-lora-adapter")
undefinedQLoRA (Quantized LoRA)
QLoRA(量化LoRA)
Combines LoRA with quantization for extreme efficiency.
python
from peft import prepare_model_for_kbit_training, get_peft_model, LoraConfig
from transformers import AutoModelForCausalLM, BitsAndBytesConfig将LoRA与量化结合实现极致效率。
python
from peft import prepare_model_for_kbit_training, get_peft_model, LoraConfig
from transformers import AutoModelForCausalLM, BitsAndBytesConfigQuantization config
量化配置
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16",
bnb_4bit_use_double_quant=True
)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16",
bnb_4bit_use_double_quant=True
)
Load quantized model
加载量化模型
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b",
quantization_config=bnb_config,
device_map="auto"
)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b",
quantization_config=bnb_config,
device_map="auto"
)
Prepare for training
准备训练
model = prepare_model_for_kbit_training(model)
model = prepare_model_for_kbit_training(model)
Apply LoRA
应用LoRA
lora_config = LoraConfig(
r=8,
lora_alpha=32,
target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
lora_dropout=0.05,
bias="none",
task_type=TaskType.CAUSAL_LM
)
model = get_peft_model(model, lora_config)
lora_config = LoraConfig(
r=8,
lora_alpha=32,
target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
lora_dropout=0.05,
bias="none",
task_type=TaskType.CAUSAL_LM
)
model = get_peft_model(model, lora_config)
Train on single GPU
在单GPU上训练
trainer = Trainer(
model=model,
args=TrainingArguments(
output_dir="./qlora-output",
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
learning_rate=5e-4,
num_train_epochs=3,
),
train_dataset=train_dataset,
)
trainer.train()
undefinedtrainer = Trainer(
model=model,
args=TrainingArguments(
output_dir="./qlora-output",
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
learning_rate=5e-4,
num_train_epochs=3,
),
train_dataset=train_dataset,
)
trainer.train()
undefinedPrefix Tuning
Prefix Tuning(前缀微调)
Prepends trainable tokens to input.
python
from peft import get_peft_model, PrefixTuningConfig
config = PrefixTuningConfig(
num_virtual_tokens=20,
task_type=TaskType.CAUSAL_LM,
)
model = get_peft_model(model, config)在输入前添加可训练的tokens。
python
from peft import get_peft_model, PrefixTuningConfig
config = PrefixTuningConfig(
num_virtual_tokens=20,
task_type=TaskType.CAUSAL_LM,
)
model = get_peft_model(model, config)Only 20 * embedding_dim parameters trained
仅训练20 * embedding_dim个参数
undefinedundefined3. Instruction Fine-Tuning
3. 指令微调
Train model to follow instructions with examples.
python
undefined用示例训练模型遵循指令。
python
undefinedTraining data format
训练数据格式
training_data = [
{
"instruction": "Translate to French",
"input": "Hello, how are you?",
"output": "Bonjour, comment allez-vous?"
},
{
"instruction": "Summarize this text",
"input": "Long document...",
"output": "Summary..."
}
]
training_data = [
{
"instruction": "Translate to French",
"input": "Hello, how are you?",
"output": "Bonjour, comment allez-vous?"
},
{
"instruction": "Summarize this text",
"input": "Long document...",
"output": "Summary..."
}
]
Template for training
训练模板
template = """Below is an instruction that describes a task, paired with an input that provides further context.
template = """Below is an instruction that describes a task, paired with an input that provides further context.
Instruction:
Instruction:
{instruction}
{instruction}
Input:
Input:
{input}
{input}
Response:
Response:
{output}"""
{output}"""
Create formatted dataset
创建格式化数据集
formatted_data = [
template.format(**example) for example in training_data
]
undefinedformatted_data = [
template.format(**example) for example in training_data
]
undefined4. Domain-Specific Fine-Tuning
4. 领域特定微调
Tailor models for specific industries or fields.
为特定行业或领域定制模型。
Legal Domain Example
法律领域示例
python
legal_training_data = [
{
"prompt": "What are the key clauses in an NDA?",
"completion": """Key clauses typically include:
1. Definition of Confidential Information
2. Non-Disclosure Obligations
3. Permitted Disclosures
4. Term and Termination
5. Return of Information
6. Remedies"""
},
# ... more legal examples
]python
legal_training_data = [
{
"prompt": "What are the key clauses in an NDA?",
"completion": """Key clauses typically include:
1. Definition of Confidential Information
2. Non-Disclosure Obligations
3. Permitted Disclosures
4. Term and Termination
5. Return of Information
6. Remedies"""
},
# ... 更多法律示例
]Train on legal domain
在法律领域上训练
model = fine_tune_on_domain(
base_model="gpt-3.5-turbo",
training_data=legal_training_data,
epochs=3,
learning_rate=0.0002,
)
undefinedmodel = fine_tune_on_domain(
base_model="gpt-3.5-turbo",
training_data=legal_training_data,
epochs=3,
learning_rate=0.0002,
)
undefinedData Preparation
数据准备
1. Dataset Quality
1. 数据集质量
python
class DatasetValidator:
def validate_dataset(self, data):
issues = {
"empty_samples": 0,
"duplicates": 0,
"outliers": 0,
"imbalance": {}
}
# Check for empty samples
for sample in data:
if not sample.get("text"):
issues["empty_samples"] += 1
# Check for duplicates
texts = [s.get("text") for s in data]
issues["duplicates"] = len(texts) - len(set(texts))
# Check for length outliers
lengths = [len(t.split()) for t in texts]
mean_length = sum(lengths) / len(lengths)
issues["outliers"] = sum(1 for l in lengths if l > mean_length * 3)
return issuespython
class DatasetValidator:
def validate_dataset(self, data):
issues = {
"empty_samples": 0,
"duplicates": 0,
"outliers": 0,
"imbalance": {}
}
# 检查空样本
for sample in data:
if not sample.get("text"):
issues["empty_samples"] += 1
# 检查重复样本
texts = [s.get("text") for s in data]
issues["duplicates"] = len(texts) - len(set(texts))
# 检查长度异常值
lengths = [len(t.split()) for t in texts]
mean_length = sum(lengths) / len(lengths)
issues["outliers"] = sum(1 for l in lengths if l > mean_length * 3)
return issuesValidate before training
训练前验证
validator = DatasetValidator()
issues = validator.validate_dataset(training_data)
print(f"Dataset Issues: {issues}")
undefinedvalidator = DatasetValidator()
issues = validator.validate_dataset(training_data)
print(f"Dataset Issues: {issues}")
undefined2. Data Augmentation
2. 数据增强
python
from nlpaug.augmenter.word import SynonymAug, RandomWordAug
import nlpaug.flow as nafpython
from nlpaug.augmenter.word import SynonymAug, RandomWordAug
import nlpaug.flow as nafCreate augmentation pipeline
创建增强管道
text = "The quick brown fox jumps over the lazy dog"
text = "The quick brown fox jumps over the lazy dog"
Synonym replacement
同义词替换
aug_syn = SynonymAug(aug_p=0.3)
augmented_syn = aug_syn.augment(text)
aug_syn = SynonymAug(aug_p=0.3)
augmented_syn = aug_syn.augment(text)
Random word insertion
随机插入单词
aug_insert = RandomWordAug(action="insert", aug_p=0.3)
augmented_insert = aug_insert.augment(text)
aug_insert = RandomWordAug(action="insert", aug_p=0.3)
augmented_insert = aug_insert.augment(text)
Combine augmentations
组合增强方法
flow = naf.Sequential([
SynonymAug(aug_p=0.2),
RandomWordAug(action="swap", aug_p=0.2)
])
augmented = flow.augment(text)
undefinedflow = naf.Sequential([
SynonymAug(aug_p=0.2),
RandomWordAug(action="swap", aug_p=0.2)
])
augmented = flow.augment(text)
undefined3. Train/Validation Split
3. 训练/验证集划分
python
from sklearn.model_selection import train_test_splitpython
from sklearn.model_selection import train_test_splitCreate splits
创建划分
train_data, eval_data = train_test_split(
data,
test_size=0.2,
random_state=42
)
eval_data, test_data = train_test_split(
eval_data,
test_size=0.5,
random_state=42
)
print(f"Train: {len(train_data)}, Eval: {len(eval_data)}, Test: {len(test_data)}")
undefinedtrain_data, eval_data = train_test_split(
data,
test_size=0.2,
random_state=42
)
eval_data, test_data = train_test_split(
eval_data,
test_size=0.5,
random_state=42
)
print(f"Train: {len(train_data)}, Eval: {len(eval_data)}, Test: {len(test_data)}")
undefinedTraining Techniques
训练技巧
1. Learning Rate Scheduling
1. 学习率调度
python
from torch.optim.lr_scheduler import CosineAnnealingLR, LinearLRpython
from torch.optim.lr_scheduler import CosineAnnealingLR, LinearLRLinear warmup + cosine annealing
线性预热 + 余弦退火
def get_scheduler(optimizer, num_steps):
lr_scheduler = get_linear_schedule_with_warmup(
optimizer,
num_warmup_steps=500,
num_training_steps=num_steps
)
return lr_scheduler
training_args = TrainingArguments(
learning_rate=1e-4,
lr_scheduler_type="cosine",
warmup_steps=500,
warmup_ratio=0.1,
)
undefineddef get_scheduler(optimizer, num_steps):
lr_scheduler = get_linear_schedule_with_warmup(
optimizer,
num_warmup_steps=500,
num_training_steps=num_steps
)
return lr_scheduler
training_args = TrainingArguments(
learning_rate=1e-4,
lr_scheduler_type="cosine",
warmup_steps=500,
warmup_ratio=0.1,
)
undefined2. Gradient Accumulation
2. 梯度累积
python
training_args = TrainingArguments(
gradient_accumulation_steps=4, # Accumulate gradients over 4 steps
per_device_train_batch_size=1, # Effective batch size: 1 * 4 = 4
)python
training_args = TrainingArguments(
gradient_accumulation_steps=4, # 累积4步梯度
per_device_train_batch_size=1, # 有效批量大小: 1 * 4 = 4
)Simulates larger batch on limited GPU memory
在有限GPU内存上模拟大批量
undefinedundefined3. Mixed Precision Training
3. 混合精度训练
python
training_args = TrainingArguments(
fp16=True, # Use 16-bit floats
bf16=False,
)python
training_args = TrainingArguments(
fp16=True, # 使用16位浮点数
bf16=False,
)Reduces memory usage by 50%, speeds up training
内存使用减少50%,训练速度提升
undefinedundefined4. Multi-GPU Training
4. 多GPU训练
python
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
gradient_accumulation_steps=4,
dataloader_pin_memory=True,
dataloader_num_workers=4,
)python
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
gradient_accumulation_steps=4,
dataloader_pin_memory=True,
dataloader_num_workers=4,
)Automatically uses all available GPUs
自动使用所有可用GPU
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
undefinedtrainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
undefinedPopular Models for Fine-Tuning
适合微调的热门模型
Open Source Models
开源模型
Llama 3.2 (Meta)
Llama 3.2 (Meta)
python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-7b")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-7b")python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-7b")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-7b")Fine-tune on custom data
在自定义数据上微调
... training code
... 训练代码
**Characteristics**:
- 7B, 70B parameter versions
- Strong instruction-following
- Excellent for domain adaptation
- Apache 2.0 license
**特点**:
- 7B、70B参数版本
- 出色的指令遵循能力
- 非常适合领域适配
- Apache 2.0许可证Gemma 3 (Google)
Gemma 3 (Google)
python
model = AutoModelForCausalLM.from_pretrained("google/gemma-3-2b")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-2b")python
model = AutoModelForCausalLM.from_pretrained("google/gemma-3-2b")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-2b")Gemma 3 sizes: 2B, 7B, 27B
Gemma 3尺寸: 2B, 7B, 27B
Very efficient, great for fine-tuning
效率极高,非常适合微调
**Characteristics**:
- Small, medium, large sizes
- Efficient architecture
- Good for edge deployment
- Built on cutting-edge research
**特点**:
- 小、中、大三种尺寸
- 高效架构
- 适合边缘部署
- 基于前沿研究构建Mistral 7B
Mistral 7B
python
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")python
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")Strong performance, efficient architecture
性能强劲,架构高效
**Characteristics**:
- Sliding window attention
- Efficient inference
- Strong performance-to-size ratio
**特点**:
- 滑动窗口注意力
- 推理高效
- 出色的性能-尺寸比Commercial Models
商用模型
OpenAI Fine-Tuning API
OpenAI 微调API
python
import openaipython
import openaiPrepare training data
准备训练数据
training_file = openai.File.create(
file=open("training_data.jsonl", "rb"),
purpose="fine-tune"
)
training_file = openai.File.create(
file=open("training_data.jsonl", "rb"),
purpose="fine-tune"
)
Create fine-tuning job
创建微调任务
fine_tune_job = openai.FineTuningJob.create(
training_file=training_file.id,
model="gpt-3.5-turbo",
hyperparameters={
"n_epochs": 3,
"learning_rate_multiplier": 0.1,
}
)
fine_tune_job = openai.FineTuningJob.create(
training_file=training_file.id,
model="gpt-3.5-turbo",
hyperparameters={
"n_epochs": 3,
"learning_rate_multiplier": 0.1,
}
)
Wait for completion
等待完成
fine_tuned_model = openai.FineTuningJob.retrieve(fine_tune_job.id)
print(f"Status: {fine_tuned_model.status}")
fine_tuned_model = openai.FineTuningJob.retrieve(fine_tune_job.id)
print(f"Status: {fine_tuned_model.status}")
Use fine-tuned model
使用微调后的模型
response = openai.ChatCompletion.create(
model=fine_tuned_model.fine_tuned_model,
messages=[{"role": "user", "content": "Hello"}]
)
undefinedresponse = openai.ChatCompletion.create(
model=fine_tuned_model.fine_tuned_model,
messages=[{"role": "user", "content": "Hello"}]
)
undefinedEvaluation and Metrics
评估与指标
1. Perplexity
1. 困惑度
python
import torch
from math import exp
def calculate_perplexity(model, eval_dataset):
model.eval()
total_loss = 0
total_tokens = 0
with torch.no_grad():
for batch in eval_dataset:
outputs = model(**batch)
loss = outputs.loss
total_loss += loss.item() * batch["input_ids"].shape[0]
total_tokens += batch["input_ids"].shape[0]
perplexity = exp(total_loss / total_tokens)
return perplexity
perplexity = calculate_perplexity(model, eval_dataset)
print(f"Perplexity: {perplexity:.2f}")python
import torch
from math import exp
def calculate_perplexity(model, eval_dataset):
model.eval()
total_loss = 0
total_tokens = 0
with torch.no_grad():
for batch in eval_dataset:
outputs = model(**batch)
loss = outputs.loss
total_loss += loss.item() * batch["input_ids"].shape[0]
total_tokens += batch["input_ids"].shape[0]
perplexity = exp(total_loss / total_tokens)
return perplexity
perplexity = calculate_perplexity(model, eval_dataset)
print(f"Perplexity: {perplexity:.2f}")2. Task-Specific Metrics
2. 特定任务指标
python
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
def evaluate_task(predictions, ground_truth):
return {
"accuracy": accuracy_score(ground_truth, predictions),
"precision": precision_score(ground_truth, predictions, average='weighted'),
"recall": recall_score(ground_truth, predictions, average='weighted'),
"f1": f1_score(ground_truth, predictions, average='weighted'),
}python
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
def evaluate_task(predictions, ground_truth):
return {
"accuracy": accuracy_score(ground_truth, predictions),
"precision": precision_score(ground_truth, predictions, average='weighted'),
"recall": recall_score(ground_truth, predictions, average='weighted'),
"f1": f1_score(ground_truth, predictions, average='weighted'),
}Evaluate on task
评估任务
predictions = [model.predict(x) for x in test_data]
metrics = evaluate_task(predictions, test_labels)
print(f"Metrics: {metrics}")
undefinedpredictions = [model.predict(x) for x in test_data]
metrics = evaluate_task(predictions, test_labels)
print(f"Metrics: {metrics}")
undefined3. Human Evaluation
3. 人工评估
python
class HumanEvaluator:
def evaluate_response(self, prompt, response):
criteria = {
"relevance": self._score_relevance(prompt, response),
"coherence": self._score_coherence(response),
"factuality": self._score_factuality(response),
"helpfulness": self._score_helpfulness(response),
}
return sum(criteria.values()) / len(criteria)
def _score_relevance(self, prompt, response):
# Score 1-5
pass
def _score_coherence(self, response):
# Score 1-5
passpython
class HumanEvaluator:
def evaluate_response(self, prompt, response):
criteria = {
"relevance": self._score_relevance(prompt, response),
"coherence": self._score_coherence(response),
"factuality": self._score_factuality(response),
"helpfulness": self._score_helpfulness(response),
}
return sum(criteria.values()) / len(criteria)
def _score_relevance(self, prompt, response):
# 评分1-5
pass
def _score_coherence(self, response):
# 评分1-5
passCommon Challenges & Solutions
常见挑战与解决方案
Challenge: Catastrophic Forgetting
挑战:灾难性遗忘
Model forgets pre-trained knowledge while adapting to new domain.
Solutions:
- Use lower learning rates (2e-5 to 5e-5)
- Smaller training epochs (1-3)
- Regularization techniques
- Continual learning approaches
python
undefined模型在适配新领域时忘记预训练知识。
解决方案:
- 使用更低的学习率(2e-5至5e-5)
- 更少的训练轮次(1-3轮)
- 正则化技术
- 持续学习方法
python
undefinedConservative training settings
保守的训练设置
training_args = TrainingArguments(
learning_rate=2e-5, # Lower learning rate
num_train_epochs=2, # Few epochs
weight_decay=0.01, # L2 regularization
warmup_steps=500,
save_total_limit=3,
load_best_model_at_end=True,
)
undefinedtraining_args = TrainingArguments(
learning_rate=2e-5, # 更低的学习率
num_train_epochs=2, # 较少的轮次
weight_decay=0.01, # L2正则化
warmup_steps=500,
save_total_limit=3,
load_best_model_at_end=True,
)
undefinedChallenge: Overfitting
挑战:过拟合
Model performs well on training data but poorly on new data.
Solutions:
- Use more training data
- Implement dropout
- Early stopping
- Validation monitoring
python
training_args = TrainingArguments(
eval_strategy="steps",
eval_steps=50,
load_best_model_at_end=True,
early_stopping_patience=3,
metric_for_best_model="eval_loss",
)模型在训练数据上表现良好,但在新数据上表现差。
解决方案:
- 使用更多训练数据
- 实现dropout
- 早停
- 验证集监控
python
training_args = TrainingArguments(
eval_strategy="steps",
eval_steps=50,
load_best_model_at_end=True,
early_stopping_patience=3,
metric_for_best_model="eval_loss",
)Challenge: Insufficient Training Data
挑战:训练数据不足
Few examples for fine-tuning.
Solutions:
- Data augmentation
- Use PEFT (LoRA) instead of full fine-tuning
- Few-shot learning with prompting
- Transfer learning
python
undefined微调可用示例很少。
解决方案:
- 数据增强
- 使用PEFT(LoRA)替代全量微调
- 结合提示词的少样本学习
- 迁移学习
python
undefinedUse LoRA when data is limited
数据有限时使用LoRA
lora_config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
)
undefinedlora_config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
)
undefinedBest Practices
最佳实践
Before Fine-Tuning
微调前
- ✓ Start with a strong base model
- ✓ Prepare high-quality training data (100+ examples recommended)
- ✓ Define clear evaluation metrics
- ✓ Set up proper train/validation splits
- ✓ Document your objectives
- ✓ 从强大的基础模型开始
- ✓ 准备高质量训练数据(建议100+示例)
- ✓ 定义清晰的评估指标
- ✓ 设置合适的训练/验证集划分
- ✓ 记录你的目标
During Fine-Tuning
微调中
- ✓ Monitor training/validation loss
- ✓ Use appropriate learning rates
- ✓ Save checkpoints regularly
- ✓ Validate on held-out data
- ✓ Watch for overfitting/underfitting
- ✓ 监控训练/验证损失
- ✓ 使用合适的学习率
- ✓ 定期保存检查点
- ✓ 在预留数据上验证
- ✓ 注意过拟合/欠拟合
After Fine-Tuning
微调后
- ✓ Evaluate on test set
- ✓ Compare against baseline
- ✓ Perform qualitative analysis
- ✓ Document configuration and results
- ✓ Version your fine-tuned models
- ✓ 在测试集上评估
- ✓ 与基线模型对比
- ✓ 执行定性分析
- ✓ 记录配置和结果
- ✓ 版本化你的微调模型
Implementation Checklist
实施检查清单
- Determine fine-tuning approach (full, LoRA, QLoRA, instruction)
- Prepare and validate training dataset (100+ examples)
- Choose base model (Llama 3.2, Gemma 3, Mistral, etc.)
- Set up PEFT if using parameter-efficient methods
- Configure training arguments and hyperparameters
- Implement data loading and preprocessing
- Set up evaluation metrics
- Train model with monitoring
- Evaluate on test set
- Save and version fine-tuned model
- Test in production environment
- Document process and results
- 确定微调方法(全量、LoRA、QLoRA、指令微调)
- 准备并验证训练数据集(100+示例)
- 选择基础模型(Llama 3.2、Gemma 3、Mistral等)
- 如果使用参数高效方法,设置PEFT
- 配置训练参数和超参数
- 实现数据加载和预处理
- 设置评估指标
- 带监控的训练模型
- 在测试集上评估
- 保存并版本化微调模型
- 在生产环境中测试
- 记录过程和结果
Resources
资源
Frameworks
框架
- Hugging Face Transformers: https://huggingface.co/transformers/
- PEFT (Parameter-Efficient Fine-Tuning): https://github.com/huggingface/peft
- Hugging Face Datasets: https://huggingface.co/datasets
- Hugging Face Transformers: https://huggingface.co/transformers/
- PEFT (Parameter-Efficient Fine-Tuning): https://github.com/huggingface/peft
- Hugging Face Datasets: https://huggingface.co/datasets
Models
模型
- Llama 3.2: https://www.meta.com/llama/
- Gemma 3: https://deepmind.google/technologies/gemma/
- Mistral: https://mistral.ai/
- Llama 3.2: https://www.meta.com/llama/
- Gemma 3: https://deepmind.google/technologies/gemma/
- Mistral: https://mistral.ai/
Papers
论文
- "LoRA: Low-Rank Adaptation of Large Language Models" (Hu et al.)
- "QLoRA: Efficient Finetuning of Quantized LLMs" (Dettmers et al.)
- "LoRA: Low-Rank Adaptation of Large Language Models" (Hu et al.)
- "QLoRA: Efficient Finetuning of Quantized LLMs" (Dettmers et al.)