Search Results: qlora

Found 8 Skills

AI & Machine Learningdavila7/claude-code-templ...

gptq

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× memory reduction with <2% perplexity degradation, or for faster inference (3-4× speedup) vs FP16. Integrates with transformers and PEFT for QLoRA fine-tuning.

🇺🇸|EnglishTranslated

100

AI & Machine Learningdavila7/claude-code-templ...

llama-factory

Expert guidance for fine-tuning LLMs with LLaMA-Factory - WebUI no-code, 100+ models, 2/3/4/5/6/8-bit QLoRA, multimodal support

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

peft-fine-tuning

Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods. Use when fine-tuning large models (7B-70B) with limited GPU memory, when you need to train <1% of parameters with minimal accuracy loss, or for multi-adapter serving. HuggingFace's official library integrated with transformers ecosystem.

🇺🇸|EnglishTranslated

AI & Machine Learningjg-chalk-io/nora-livekit

moai-ml-llm-fine-tuning

Enterprise LLM Fine-Tuning with LoRA, QLoRA, and PEFT techniques

🇺🇸|EnglishTranslated

AI & Machine Learningorchestra-research/ai-res...

quantizing-models-bitsandbytes

Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4, FP4 formats, QLoRA training, and 8-bit optimizers. Works with HuggingFace Transformers.

🇺🇸|EnglishTranslated

AI & Machine Learningpluginagentmarketplace/cu...

fine-tuning

LLM fine-tuning with LoRA, QLoRA, and instruction tuning for domain adaptation.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningrightnow-ai/openfang

llm-finetuning

LLM fine-tuning expert for LoRA, QLoRA, dataset preparation, and training optimization

🇺🇸|EnglishTranslated

AI & Machine Learningitsmostafa/llm-engineerin...

lora

Parameter-efficient fine-tuning with Low-Rank Adaptation (LoRA). Use when fine-tuning large language models with limited GPU memory, creating task-specific adapters, or when you need to train multiple specialized models from a single base.

🇺🇸|EnglishTranslated