Loading...
Loading...
Run evaluations for Hugging Face Hub models using inspect-ai and lighteval on local hardware. Use for backend selection, local GPU evals, and choosing between vLLM / Transformers / accelerate. Not for HF Jobs orchestration, model-card PRs, .eval_results publication, or community-evals automation.
npx skill4agent add huggingface/skills huggingface-community-evalsinspect-ailightevalvllmacceleratemodel-index.eval_resultshugging-face-jobs~/code/community-evalsAll paths below are relative to the directory containing this.SKILL.md
| Use case | Script |
|---|---|
Local | |
Local GPU eval with | |
Local GPU eval with | |
| Extra command patterns | |
uv runHF_TOKENuv --version
printenv HF_TOKEN >/dev/null
nvidia-sminvidia-smiscripts/inspect_eval_uv.pyhugging-face-jobsinspect-ailightevalvllm--backend hfaccelerateinspect-ai--limit 10lighteval--max-samples 10hugging-face-jobsuv run scripts/inspect_eval_uv.py \
--model meta-llama/Llama-3.2-1B \
--task mmlu \
--limit 20inspect-evalsvllmuv run scripts/inspect_vllm_uv.py \
--model meta-llama/Llama-3.2-1B \
--task gsm8k \
--limit 20uv run scripts/inspect_vllm_uv.py \
--model microsoft/phi-2 \
--task mmlu \
--backend hf \
--trust-remote-code \
--limit 20lightevaluv run scripts/lighteval_vllm_uv.py \
--model meta-llama/Llama-3.2-3B-Instruct \
--tasks "leaderboard|mmlu|5,leaderboard|gsm8k|5" \
--max-samples 20 \
--use-chat-templateaccelerateuv run scripts/lighteval_vllm_uv.py \
--model microsoft/phi-2 \
--tasks "leaderboard|mmlu|5" \
--backend accelerate \
--trust-remote-code \
--max-samples 20hugging-face-jobsinspect-aimmlugsm8khellaswagarc_challengetruthfulqawinograndehumanevallightevalsuite|task|num_fewshotleaderboard|mmlu|5leaderboard|gsm8k|5leaderboard|arc_challenge|25lighteval|hellaswag|0lighteval--tasksinspect_vllm_uv.py --backend vllminspect_vllm_uv.py --backend hfvllmlighteval_vllm_uv.py --backend vllmlighteval_vllm_uv.py --backend accelerateinspect_eval_uv.py| Model size | Suggested local hardware |
|---|---|
| consumer GPU / Apple Silicon / small dev GPU |
| stronger local GPU |
| high-memory local GPU or hand off to |
--limit--max-samples--batch-size--gpu-memory-utilizationhugging-face-jobsvllm--backend hfinspect-ai--backend acceleratelightevalHF_TOKEN--trust-remote-codeexamples/USAGE_EXAMPLES.mdscripts/inspect_eval_uv.pyscripts/inspect_vllm_uv.pyscripts/lighteval_vllm_uv.py