transformers

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

HuggingFace Transformers

Access thousands of pre-trained models for NLP, vision, audio, and multimodal tasks.

获取适用于NLP、计算机视觉、音频及多模态任务的数千个预训练模型。

When to Use

适用场景

Quick inference with pipelines
Text generation, classification, QA, NER
Image classification, object detection
Fine-tuning on custom datasets
Loading pre-trained models from HuggingFace Hub

使用Pipeline进行快速推理
文本生成、分类、问答、NER
图像分类、目标检测
在自定义数据集上微调模型
从HuggingFace Hub加载预训练模型

Pipeline Tasks

Pipeline支持的任务

NLP Tasks

NLP任务

Task	Pipeline Name	Output
Text Generation	`text-generation`	Completed text
Classification	`text-classification`	Label + confidence
Question Answering	`question-answering`	Answer span
Summarization	`summarization`	Shorter text
Translation	`translation_en_to_fr`	Translated text
NER	`ner`	Entity spans + types
Fill Mask	`fill-mask`	Predicted tokens

任务	Pipeline名称	输出
文本生成	`text-generation`	生成的完整文本
文本分类	`text-classification`	标签 + 置信度
问答系统	`question-answering`	答案片段
文本摘要	`summarization`	精简后的文本
机器翻译	`translation_en_to_fr`	翻译后的文本
命名实体识别（NER）	`ner`	实体片段 + 类型
掩码填充	`fill-mask`	预测的token

Vision Tasks

计算机视觉任务

Task	Pipeline Name	Output
Image Classification	`image-classification`	Label + confidence
Object Detection	`object-detection`	Bounding boxes
Image Segmentation	`image-segmentation`	Pixel masks

任务	Pipeline名称	输出
图像分类	`image-classification`	标签 + 置信度
目标检测	`object-detection`	边界框
图像分割	`image-segmentation`	像素掩码

Audio Tasks

音频任务

Task	Pipeline Name	Output
Speech Recognition	`automatic-speech-recognition`	Transcribed text
Audio Classification	`audio-classification`	Label + confidence

任务	Pipeline名称	输出
语音识别	`automatic-speech-recognition`	转录后的文本
音频分类	`audio-classification`	标签 + 置信度

Model Loading Patterns

模型加载方式

Auto Classes

Auto类

Class	Use Case
AutoModel	Base model (embeddings)
AutoModelForCausalLM	Text generation (GPT-style)
AutoModelForSeq2SeqLM	Encoder-decoder (T5, BART)
AutoModelForSequenceClassification	Classification head
AutoModelForTokenClassification	NER, POS tagging
AutoModelForQuestionAnswering	Extractive QA

Key concept: Always use Auto classes unless you need a specific architecture—they handle model detection automatically.

类	适用场景
AutoModel	基础模型（生成嵌入向量）
AutoModelForCausalLM	文本生成（GPT风格）
AutoModelForSeq2SeqLM	编码器-解码器架构（T5、BART）
AutoModelForSequenceClassification	分类头
AutoModelForTokenClassification	NER、词性标注
AutoModelForQuestionAnswering	抽取式问答

核心概念：除非你需要特定的架构，否则始终使用Auto类——它们会自动处理模型检测。

Generation Parameters

生成参数

Parameter	Effect	Typical Values
max_new_tokens	Output length	50-500
temperature	Randomness (0=deterministic)	0.1-1.0
top_p	Nucleus sampling threshold	0.9-0.95
top_k	Limit vocabulary per step	50
num_beams	Beam search (disable sampling)	4-8
repetition_penalty	Discourage repetition	1.1-1.3

Key concept: Higher temperature = more creative but less coherent. For factual tasks, use low temperature (0.1-0.3).

参数	作用	常用值
max_new_tokens	控制输出长度	50-500
temperature	随机性（0=确定性输出）	0.1-1.0
top_p	核采样阈值	0.9-0.95
top_k	每步限制词汇量	50
num_beams	束搜索（禁用采样）	4-8
repetition_penalty	抑制重复内容	1.1-1.3

核心概念：temperature越高，生成内容越具创造性但连贯性越差。对于事实性任务，使用低temperature（0.1-0.3）。

Memory Management

内存管理

Device Placement Options

设备部署选项

Option	When to Use
device_map="auto"	Let library decide GPU allocation
device_map="cuda:0"	Specific GPU
device_map="cpu"	CPU only

选项	适用场景
device_map="auto"	让库自动决定GPU分配
device_map="cuda:0"	指定特定GPU
device_map="cpu"	仅使用CPU

Quantization Options

量化选项

Method	Memory Reduction	Quality Impact
8-bit	~50%	Minimal
4-bit	~75%	Small for most tasks
GPTQ	~75%	Requires calibration
AWQ	~75%	Activation-aware

Key concept: Use

torch_dtype="auto"

to automatically use the model's native precision (often bfloat16).

方法	内存减少比例	质量影响
8-bit	~50%	影响极小
4-bit	~75%	大多数任务影响轻微
GPTQ	~75%	需要校准
AWQ	~75%	感知激活的量化

核心概念：使用

torch_dtype="auto"

自动使用模型的原生精度（通常为bfloat16）。

Fine-Tuning Concepts

微调相关概念

Trainer Arguments

Trainer参数

Argument	Purpose	Typical Value
num_train_epochs	Training passes	3-5
per_device_train_batch_size	Samples per GPU	8-32
learning_rate	Step size	2e-5 for fine-tuning
weight_decay	Regularization	0.01
warmup_ratio	LR warmup	0.1
evaluation_strategy	When to eval	"epoch" or "steps"

参数	用途	常用值
num_train_epochs	训练轮次	3-5
per_device_train_batch_size	每个GPU的样本数	8-32
learning_rate	学习率	微调常用2e-5
weight_decay	正则化	0.01
warmup_ratio	学习率预热比例	0.1
evaluation_strategy	评估时机	"epoch" 或 "steps"

Fine-Tuning Strategies

微调策略

Strategy	Memory	Quality	Use Case
Full fine-tuning	High	Best	Small models, enough data
LoRA	Low	Good	Large models, limited GPU
QLoRA	Very Low	Good	7B+ models on consumer GPU
Prefix tuning	Low	Moderate	When you can't modify weights

策略	内存占用	效果	适用场景
全量微调	高	最佳	小模型、数据充足
LoRA	低	良好	大模型、GPU资源有限
QLoRA	极低	良好	7B+模型在消费级GPU上运行
Prefix Tuning	低	中等	无法修改模型权重时

Tokenization Concepts

分词相关概念

Parameter	Purpose
padding	Make sequences same length
truncation	Cut sequences to max_length
max_length	Maximum tokens (model-specific)
return_tensors	Output format ("pt", "tf", "np")

Key concept: Always use the tokenizer that matches the model—different models use different vocabularies.

参数	用途
padding	统一序列长度
truncation	将序列截断至max_length
max_length	最大token数（模型特定）
return_tensors	输出格式（"pt"、"tf"、"np"）

核心概念：始终使用与模型匹配的分词器——不同模型使用不同的词汇表。

Best Practices

最佳实践

Practice	Why
Use pipelines for inference	Handles preprocessing automatically
Use device_map="auto"	Optimal GPU memory distribution
Batch inputs	Better throughput
Use quantization for large models	Run 7B+ on consumer GPUs
Match tokenizer to model	Vocabularies differ between models
Use Trainer for fine-tuning	Built-in best practices

实践建议	原因
使用Pipeline进行推理	自动处理预处理步骤
使用device_map="auto"	实现最优GPU内存分配
批量输入数据	提升吞吐量
对大模型使用量化	在消费级GPU上运行7B+模型
分词器与模型匹配	不同模型的词汇表存在差异
使用Trainer进行微调	内置最佳实践方案

Resources

资源

Docs: https://huggingface.co/docs/transformers
Model Hub: https://huggingface.co/models
Course: https://huggingface.co/course

文档：https://huggingface.co/docs/transformers
模型库：https://huggingface.co/models
课程：https://huggingface.co/course