transformers

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

HuggingFace Transformers

HuggingFace Transformers

Access thousands of pre-trained models for NLP, vision, audio, and multimodal tasks.
获取适用于NLP、计算机视觉、音频及多模态任务的数千个预训练模型。

When to Use

适用场景

  • Quick inference with pipelines
  • Text generation, classification, QA, NER
  • Image classification, object detection
  • Fine-tuning on custom datasets
  • Loading pre-trained models from HuggingFace Hub

  • 使用Pipeline进行快速推理
  • 文本生成、分类、问答、NER
  • 图像分类、目标检测
  • 在自定义数据集上微调模型
  • 从HuggingFace Hub加载预训练模型

Pipeline Tasks

Pipeline支持的任务

NLP Tasks

NLP任务

TaskPipeline NameOutput
Text Generation
text-generation
Completed text
Classification
text-classification
Label + confidence
Question Answering
question-answering
Answer span
Summarization
summarization
Shorter text
Translation
translation_en_to_fr
Translated text
NER
ner
Entity spans + types
Fill Mask
fill-mask
Predicted tokens
任务Pipeline名称输出
文本生成
text-generation
生成的完整文本
文本分类
text-classification
标签 + 置信度
问答系统
question-answering
答案片段
文本摘要
summarization
精简后的文本
机器翻译
translation_en_to_fr
翻译后的文本
命名实体识别(NER)
ner
实体片段 + 类型
掩码填充
fill-mask
预测的token

Vision Tasks

计算机视觉任务

TaskPipeline NameOutput
Image Classification
image-classification
Label + confidence
Object Detection
object-detection
Bounding boxes
Image Segmentation
image-segmentation
Pixel masks
任务Pipeline名称输出
图像分类
image-classification
标签 + 置信度
目标检测
object-detection
边界框
图像分割
image-segmentation
像素掩码

Audio Tasks

音频任务

TaskPipeline NameOutput
Speech Recognition
automatic-speech-recognition
Transcribed text
Audio Classification
audio-classification
Label + confidence

任务Pipeline名称输出
语音识别
automatic-speech-recognition
转录后的文本
音频分类
audio-classification
标签 + 置信度

Model Loading Patterns

模型加载方式

Auto Classes

Auto类

ClassUse Case
AutoModelBase model (embeddings)
AutoModelForCausalLMText generation (GPT-style)
AutoModelForSeq2SeqLMEncoder-decoder (T5, BART)
AutoModelForSequenceClassificationClassification head
AutoModelForTokenClassificationNER, POS tagging
AutoModelForQuestionAnsweringExtractive QA
Key concept: Always use Auto classes unless you need a specific architecture—they handle model detection automatically.

适用场景
AutoModel基础模型(生成嵌入向量)
AutoModelForCausalLM文本生成(GPT风格)
AutoModelForSeq2SeqLM编码器-解码器架构(T5、BART)
AutoModelForSequenceClassification分类头
AutoModelForTokenClassificationNER、词性标注
AutoModelForQuestionAnswering抽取式问答
核心概念:除非你需要特定的架构,否则始终使用Auto类——它们会自动处理模型检测。

Generation Parameters

生成参数

ParameterEffectTypical Values
max_new_tokensOutput length50-500
temperatureRandomness (0=deterministic)0.1-1.0
top_pNucleus sampling threshold0.9-0.95
top_kLimit vocabulary per step50
num_beamsBeam search (disable sampling)4-8
repetition_penaltyDiscourage repetition1.1-1.3
Key concept: Higher temperature = more creative but less coherent. For factual tasks, use low temperature (0.1-0.3).

参数作用常用值
max_new_tokens控制输出长度50-500
temperature随机性(0=确定性输出)0.1-1.0
top_p核采样阈值0.9-0.95
top_k每步限制词汇量50
num_beams束搜索(禁用采样)4-8
repetition_penalty抑制重复内容1.1-1.3
核心概念:temperature越高,生成内容越具创造性但连贯性越差。对于事实性任务,使用低temperature(0.1-0.3)。

Memory Management

内存管理

Device Placement Options

设备部署选项

OptionWhen to Use
device_map="auto"Let library decide GPU allocation
device_map="cuda:0"Specific GPU
device_map="cpu"CPU only
选项适用场景
device_map="auto"让库自动决定GPU分配
device_map="cuda:0"指定特定GPU
device_map="cpu"仅使用CPU

Quantization Options

量化选项

MethodMemory ReductionQuality Impact
8-bit~50%Minimal
4-bit~75%Small for most tasks
GPTQ~75%Requires calibration
AWQ~75%Activation-aware
Key concept: Use
torch_dtype="auto"
to automatically use the model's native precision (often bfloat16).

方法内存减少比例质量影响
8-bit~50%影响极小
4-bit~75%大多数任务影响轻微
GPTQ~75%需要校准
AWQ~75%感知激活的量化
核心概念:使用
torch_dtype="auto"
自动使用模型的原生精度(通常为bfloat16)。

Fine-Tuning Concepts

微调相关概念

Trainer Arguments

Trainer参数

ArgumentPurposeTypical Value
num_train_epochsTraining passes3-5
per_device_train_batch_sizeSamples per GPU8-32
learning_rateStep size2e-5 for fine-tuning
weight_decayRegularization0.01
warmup_ratioLR warmup0.1
evaluation_strategyWhen to eval"epoch" or "steps"
参数用途常用值
num_train_epochs训练轮次3-5
per_device_train_batch_size每个GPU的样本数8-32
learning_rate学习率微调常用2e-5
weight_decay正则化0.01
warmup_ratio学习率预热比例0.1
evaluation_strategy评估时机"epoch" 或 "steps"

Fine-Tuning Strategies

微调策略

StrategyMemoryQualityUse Case
Full fine-tuningHighBestSmall models, enough data
LoRALowGoodLarge models, limited GPU
QLoRAVery LowGood7B+ models on consumer GPU
Prefix tuningLowModerateWhen you can't modify weights

策略内存占用效果适用场景
全量微调最佳小模型、数据充足
LoRA良好大模型、GPU资源有限
QLoRA极低良好7B+模型在消费级GPU上运行
Prefix Tuning中等无法修改模型权重时

Tokenization Concepts

分词相关概念

ParameterPurpose
paddingMake sequences same length
truncationCut sequences to max_length
max_lengthMaximum tokens (model-specific)
return_tensorsOutput format ("pt", "tf", "np")
Key concept: Always use the tokenizer that matches the model—different models use different vocabularies.

参数用途
padding统一序列长度
truncation将序列截断至max_length
max_length最大token数(模型特定)
return_tensors输出格式("pt"、"tf"、"np")
核心概念:始终使用与模型匹配的分词器——不同模型使用不同的词汇表。

Best Practices

最佳实践

PracticeWhy
Use pipelines for inferenceHandles preprocessing automatically
Use device_map="auto"Optimal GPU memory distribution
Batch inputsBetter throughput
Use quantization for large modelsRun 7B+ on consumer GPUs
Match tokenizer to modelVocabularies differ between models
Use Trainer for fine-tuningBuilt-in best practices
实践建议原因
使用Pipeline进行推理自动处理预处理步骤
使用device_map="auto"实现最优GPU内存分配
批量输入数据提升吞吐量
对大模型使用量化在消费级GPU上运行7B+模型
分词器与模型匹配不同模型的词汇表存在差异
使用Trainer进行微调内置最佳实践方案

Resources

资源