Search Results: llm

Found 1,564 Skills

AI & Machine Learningd-o-hub/rust-self-learnin...

web-doc-resolver

Resolve queries or URLs into compact, LLM-ready markdown using a low-cost cascade. Prioritizes llms.txt for structured docs, uses web fetch/search tools for extraction. Use when you need to fetch documentation, resolve web URLs to markdown, search for technical content, or build context from web sources.

🇺🇸|EnglishTranslated

4 scripts/Checked

AI & Machine Learninghuggingface/skills

huggingface-community-evals

Run evaluations for Hugging Face Hub models using inspect-ai and lighteval on local hardware. Use for backend selection, local GPU evals, and choosing between vLLM / Transformers / accelerate. Not for HF Jobs orchestration, model-card PRs, .eval_results publication, or community-evals automation.

🇺🇸|EnglishTranslated

3 scripts/Checked

Automationeveryinc/compound-enginee...

ce-optimize

Run metric-driven iterative optimization loops. Define a measurable goal, build measurement scaffolding, then run parallel experiments that try many approaches, measure each against hard gates and/or LLM-as-judge quality scores, keep improvements, and converge toward the best solution. Use when optimizing clustering quality, search relevance, build performance, prompt quality, or any measurable outcome that benefits from systematic experimentation. Inspired by Karpathy's autoresearch, generalized for multi-file code changes and non-ML domains.

🇺🇸|EnglishTranslated

3 scripts/Attention

AI & Machine Learninghuggingface/skills

huggingface-best

Use when the user asks about finding the best, top, or recommended model for a task, wants to know what AI model to use, or wants to compare models by benchmark scores. Triggers on: "best model for X", "what model should I use for", "top models for [task]", "which model runs on my laptop/machine/device", "recommend a model for", "what LLM should I use for", "compare models for", "what's state of the art for", or any question about choosing an AI model for a specific use case. Always use this skill when the user wants model recommendations or comparisons, even if they don't explicitly mention HuggingFace or benchmarks.

🇺🇸|EnglishTranslated

Security & Compliancearize-ai/arize-skills

arize-compliance-audit

INVOKE THIS SKILL when auditing an AI agent or LLM app for regulatory compliance. Covers EU AI Act, GPAI Code of Practice, GDPR, NIST AI RMF, Colorado AI Act, HIPAA, and ISO 42001. Scans the codebase for compliance gaps, cross-references Arize instrumentation for audit trail coverage, and produces an actionable remediation checklist tailored to the selected frameworks.

🇺🇸|EnglishTranslated

AI & Machine Learningnarevai/skills

narev

Start Here. Use when the user asks about Narev Cloud, the Pricing API, model pricing (API reference skill vs applied workflows on top of that API), live LLM pricing, token costs, cost calculation, pinning or snapshotting model rates, Narev SDK, @ai-billing/core, provider middleware packages, Vercel AI SDK billing, Next.js App Router route handlers, framework-specific billing patterns, usage-based billing, billing integrations (Polar, Stripe, Lago, OpenMeter), FOCUS format, Narev Self-Hosted (ThinOps), deployment, COGS, customer tagging, FinOps for AI, or this documentation site. Guides you to the right skill or documentation path based on their task.

🇺🇸|EnglishTranslated

Testing & QAdavila7/claude-code-templ...

agent-evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

llama-factory

Expert guidance for fine-tuning LLMs with LLaMA-Factory - WebUI no-code, 100+ models, 2/3/4/5/6/8-bit QLoRA, multimodal support

🇺🇸|EnglishTranslated

AI & Machine Learningrefoundai/lenny-skills

ai-evals

Help users create and run AI evaluations. Use when someone is building evals for LLM products, measuring model quality, creating test cases, designing rubrics, or trying to systematically measure AI output quality.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

nowait-reasoning-optimizer

Implements the NOWAIT technique for efficient reasoning in R1-style LLMs. Use when optimizing inference of reasoning models (QwQ, DeepSeek-R1, Phi4-Reasoning, Qwen3, Kimi-VL, QvQ), reducing chain-of-thought token usage by 27-51% while preserving accuracy. Triggers on "optimize reasoning", "reduce thinking tokens", "efficient inference", "suppress reflection tokens", or when working with verbose CoT outputs.

🇺🇸|EnglishTranslated

1 scripts/Checked

Tools & Utilitiesintellectronica/agent-ski...

tavily

Use this skill for web search, extraction, mapping, crawling, and research via Tavily’s REST API when web searches are needed and no built-in tool is available, or when Tavily’s LLM-friendly format is beneficial.

🇺🇸|EnglishTranslated

AI & Machine Learningtriggerdotdev/skills

trigger-agents

AI agent patterns with Trigger.dev - orchestration, parallelization, routing, evaluator-optimizer, and human-in-the-loop. Use when building LLM-powered tasks that need parallel workers, approval gates, tool calling, or multi-step agent workflows.

🇺🇸|EnglishTranslated