transformers

Original🇺🇸 English
Translated

Use when "HuggingFace Transformers", "pre-trained models", "pipeline API", or asking about "text generation", "text classification", "question answering", "NER", "fine-tuning transformers", "AutoModel", "Trainer API"

4installs
Added on

NPX Install

npx skill4agent add eyadsibai/ltk transformers

HuggingFace Transformers

Access thousands of pre-trained models for NLP, vision, audio, and multimodal tasks.

When to Use

  • Quick inference with pipelines
  • Text generation, classification, QA, NER
  • Image classification, object detection
  • Fine-tuning on custom datasets
  • Loading pre-trained models from HuggingFace Hub

Pipeline Tasks

NLP Tasks

TaskPipeline NameOutput
Text Generation
text-generation
Completed text
Classification
text-classification
Label + confidence
Question Answering
question-answering
Answer span
Summarization
summarization
Shorter text
Translation
translation_en_to_fr
Translated text
NER
ner
Entity spans + types
Fill Mask
fill-mask
Predicted tokens

Vision Tasks

TaskPipeline NameOutput
Image Classification
image-classification
Label + confidence
Object Detection
object-detection
Bounding boxes
Image Segmentation
image-segmentation
Pixel masks

Audio Tasks

TaskPipeline NameOutput
Speech Recognition
automatic-speech-recognition
Transcribed text
Audio Classification
audio-classification
Label + confidence

Model Loading Patterns

Auto Classes

ClassUse Case
AutoModelBase model (embeddings)
AutoModelForCausalLMText generation (GPT-style)
AutoModelForSeq2SeqLMEncoder-decoder (T5, BART)
AutoModelForSequenceClassificationClassification head
AutoModelForTokenClassificationNER, POS tagging
AutoModelForQuestionAnsweringExtractive QA
Key concept: Always use Auto classes unless you need a specific architecture—they handle model detection automatically.

Generation Parameters

ParameterEffectTypical Values
max_new_tokensOutput length50-500
temperatureRandomness (0=deterministic)0.1-1.0
top_pNucleus sampling threshold0.9-0.95
top_kLimit vocabulary per step50
num_beamsBeam search (disable sampling)4-8
repetition_penaltyDiscourage repetition1.1-1.3
Key concept: Higher temperature = more creative but less coherent. For factual tasks, use low temperature (0.1-0.3).

Memory Management

Device Placement Options

OptionWhen to Use
device_map="auto"Let library decide GPU allocation
device_map="cuda:0"Specific GPU
device_map="cpu"CPU only

Quantization Options

MethodMemory ReductionQuality Impact
8-bit~50%Minimal
4-bit~75%Small for most tasks
GPTQ~75%Requires calibration
AWQ~75%Activation-aware
Key concept: Use
torch_dtype="auto"
to automatically use the model's native precision (often bfloat16).

Fine-Tuning Concepts

Trainer Arguments

ArgumentPurposeTypical Value
num_train_epochsTraining passes3-5
per_device_train_batch_sizeSamples per GPU8-32
learning_rateStep size2e-5 for fine-tuning
weight_decayRegularization0.01
warmup_ratioLR warmup0.1
evaluation_strategyWhen to eval"epoch" or "steps"

Fine-Tuning Strategies

StrategyMemoryQualityUse Case
Full fine-tuningHighBestSmall models, enough data
LoRALowGoodLarge models, limited GPU
QLoRAVery LowGood7B+ models on consumer GPU
Prefix tuningLowModerateWhen you can't modify weights

Tokenization Concepts

ParameterPurpose
paddingMake sequences same length
truncationCut sequences to max_length
max_lengthMaximum tokens (model-specific)
return_tensorsOutput format ("pt", "tf", "np")
Key concept: Always use the tokenizer that matches the model—different models use different vocabularies.

Best Practices

PracticeWhy
Use pipelines for inferenceHandles preprocessing automatically
Use device_map="auto"Optimal GPU memory distribution
Batch inputsBetter throughput
Use quantization for large modelsRun 7B+ on consumer GPUs
Match tokenizer to modelVocabularies differ between models
Use Trainer for fine-tuningBuilt-in best practices

Resources