transformers
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHuggingFace Transformers
HuggingFace Transformers
Access thousands of pre-trained models for NLP, vision, audio, and multimodal tasks.
获取适用于NLP、计算机视觉、音频及多模态任务的数千个预训练模型。
When to Use
适用场景
- Quick inference with pipelines
- Text generation, classification, QA, NER
- Image classification, object detection
- Fine-tuning on custom datasets
- Loading pre-trained models from HuggingFace Hub
- 使用Pipeline进行快速推理
- 文本生成、分类、问答、NER
- 图像分类、目标检测
- 在自定义数据集上微调模型
- 从HuggingFace Hub加载预训练模型
Pipeline Tasks
Pipeline支持的任务
NLP Tasks
NLP任务
| Task | Pipeline Name | Output |
|---|---|---|
| Text Generation | | Completed text |
| Classification | | Label + confidence |
| Question Answering | | Answer span |
| Summarization | | Shorter text |
| Translation | | Translated text |
| NER | | Entity spans + types |
| Fill Mask | | Predicted tokens |
| 任务 | Pipeline名称 | 输出 |
|---|---|---|
| 文本生成 | | 生成的完整文本 |
| 文本分类 | | 标签 + 置信度 |
| 问答系统 | | 答案片段 |
| 文本摘要 | | 精简后的文本 |
| 机器翻译 | | 翻译后的文本 |
| 命名实体识别(NER) | | 实体片段 + 类型 |
| 掩码填充 | | 预测的token |
Vision Tasks
计算机视觉任务
| Task | Pipeline Name | Output |
|---|---|---|
| Image Classification | | Label + confidence |
| Object Detection | | Bounding boxes |
| Image Segmentation | | Pixel masks |
| 任务 | Pipeline名称 | 输出 |
|---|---|---|
| 图像分类 | | 标签 + 置信度 |
| 目标检测 | | 边界框 |
| 图像分割 | | 像素掩码 |
Audio Tasks
音频任务
| Task | Pipeline Name | Output |
|---|---|---|
| Speech Recognition | | Transcribed text |
| Audio Classification | | Label + confidence |
| 任务 | Pipeline名称 | 输出 |
|---|---|---|
| 语音识别 | | 转录后的文本 |
| 音频分类 | | 标签 + 置信度 |
Model Loading Patterns
模型加载方式
Auto Classes
Auto类
| Class | Use Case |
|---|---|
| AutoModel | Base model (embeddings) |
| AutoModelForCausalLM | Text generation (GPT-style) |
| AutoModelForSeq2SeqLM | Encoder-decoder (T5, BART) |
| AutoModelForSequenceClassification | Classification head |
| AutoModelForTokenClassification | NER, POS tagging |
| AutoModelForQuestionAnswering | Extractive QA |
Key concept: Always use Auto classes unless you need a specific architecture—they handle model detection automatically.
| 类 | 适用场景 |
|---|---|
| AutoModel | 基础模型(生成嵌入向量) |
| AutoModelForCausalLM | 文本生成(GPT风格) |
| AutoModelForSeq2SeqLM | 编码器-解码器架构(T5、BART) |
| AutoModelForSequenceClassification | 分类头 |
| AutoModelForTokenClassification | NER、词性标注 |
| AutoModelForQuestionAnswering | 抽取式问答 |
核心概念:除非你需要特定的架构,否则始终使用Auto类——它们会自动处理模型检测。
Generation Parameters
生成参数
| Parameter | Effect | Typical Values |
|---|---|---|
| max_new_tokens | Output length | 50-500 |
| temperature | Randomness (0=deterministic) | 0.1-1.0 |
| top_p | Nucleus sampling threshold | 0.9-0.95 |
| top_k | Limit vocabulary per step | 50 |
| num_beams | Beam search (disable sampling) | 4-8 |
| repetition_penalty | Discourage repetition | 1.1-1.3 |
Key concept: Higher temperature = more creative but less coherent. For factual tasks, use low temperature (0.1-0.3).
| 参数 | 作用 | 常用值 |
|---|---|---|
| max_new_tokens | 控制输出长度 | 50-500 |
| temperature | 随机性(0=确定性输出) | 0.1-1.0 |
| top_p | 核采样阈值 | 0.9-0.95 |
| top_k | 每步限制词汇量 | 50 |
| num_beams | 束搜索(禁用采样) | 4-8 |
| repetition_penalty | 抑制重复内容 | 1.1-1.3 |
核心概念:temperature越高,生成内容越具创造性但连贯性越差。对于事实性任务,使用低temperature(0.1-0.3)。
Memory Management
内存管理
Device Placement Options
设备部署选项
| Option | When to Use |
|---|---|
| device_map="auto" | Let library decide GPU allocation |
| device_map="cuda:0" | Specific GPU |
| device_map="cpu" | CPU only |
| 选项 | 适用场景 |
|---|---|
| device_map="auto" | 让库自动决定GPU分配 |
| device_map="cuda:0" | 指定特定GPU |
| device_map="cpu" | 仅使用CPU |
Quantization Options
量化选项
| Method | Memory Reduction | Quality Impact |
|---|---|---|
| 8-bit | ~50% | Minimal |
| 4-bit | ~75% | Small for most tasks |
| GPTQ | ~75% | Requires calibration |
| AWQ | ~75% | Activation-aware |
Key concept: Use to automatically use the model's native precision (often bfloat16).
torch_dtype="auto"| 方法 | 内存减少比例 | 质量影响 |
|---|---|---|
| 8-bit | ~50% | 影响极小 |
| 4-bit | ~75% | 大多数任务影响轻微 |
| GPTQ | ~75% | 需要校准 |
| AWQ | ~75% | 感知激活的量化 |
核心概念:使用自动使用模型的原生精度(通常为bfloat16)。
torch_dtype="auto"Fine-Tuning Concepts
微调相关概念
Trainer Arguments
Trainer参数
| Argument | Purpose | Typical Value |
|---|---|---|
| num_train_epochs | Training passes | 3-5 |
| per_device_train_batch_size | Samples per GPU | 8-32 |
| learning_rate | Step size | 2e-5 for fine-tuning |
| weight_decay | Regularization | 0.01 |
| warmup_ratio | LR warmup | 0.1 |
| evaluation_strategy | When to eval | "epoch" or "steps" |
| 参数 | 用途 | 常用值 |
|---|---|---|
| num_train_epochs | 训练轮次 | 3-5 |
| per_device_train_batch_size | 每个GPU的样本数 | 8-32 |
| learning_rate | 学习率 | 微调常用2e-5 |
| weight_decay | 正则化 | 0.01 |
| warmup_ratio | 学习率预热比例 | 0.1 |
| evaluation_strategy | 评估时机 | "epoch" 或 "steps" |
Fine-Tuning Strategies
微调策略
| Strategy | Memory | Quality | Use Case |
|---|---|---|---|
| Full fine-tuning | High | Best | Small models, enough data |
| LoRA | Low | Good | Large models, limited GPU |
| QLoRA | Very Low | Good | 7B+ models on consumer GPU |
| Prefix tuning | Low | Moderate | When you can't modify weights |
| 策略 | 内存占用 | 效果 | 适用场景 |
|---|---|---|---|
| 全量微调 | 高 | 最佳 | 小模型、数据充足 |
| LoRA | 低 | 良好 | 大模型、GPU资源有限 |
| QLoRA | 极低 | 良好 | 7B+模型在消费级GPU上运行 |
| Prefix Tuning | 低 | 中等 | 无法修改模型权重时 |
Tokenization Concepts
分词相关概念
| Parameter | Purpose |
|---|---|
| padding | Make sequences same length |
| truncation | Cut sequences to max_length |
| max_length | Maximum tokens (model-specific) |
| return_tensors | Output format ("pt", "tf", "np") |
Key concept: Always use the tokenizer that matches the model—different models use different vocabularies.
| 参数 | 用途 |
|---|---|
| padding | 统一序列长度 |
| truncation | 将序列截断至max_length |
| max_length | 最大token数(模型特定) |
| return_tensors | 输出格式("pt"、"tf"、"np") |
核心概念:始终使用与模型匹配的分词器——不同模型使用不同的词汇表。
Best Practices
最佳实践
| Practice | Why |
|---|---|
| Use pipelines for inference | Handles preprocessing automatically |
| Use device_map="auto" | Optimal GPU memory distribution |
| Batch inputs | Better throughput |
| Use quantization for large models | Run 7B+ on consumer GPUs |
| Match tokenizer to model | Vocabularies differ between models |
| Use Trainer for fine-tuning | Built-in best practices |
| 实践建议 | 原因 |
|---|---|
| 使用Pipeline进行推理 | 自动处理预处理步骤 |
| 使用device_map="auto" | 实现最优GPU内存分配 |
| 批量输入数据 | 提升吞吐量 |
| 对大模型使用量化 | 在消费级GPU上运行7B+模型 |
| 分词器与模型匹配 | 不同模型的词汇表存在差异 |
| 使用Trainer进行微调 | 内置最佳实践方案 |