Loading...
Loading...
Found 1,564 Skills
Produce an LLM Build Pack (prompt+tool contract, data/eval plan, architecture+safety, launch checklist). Use for building with LLMs, GPT/Claude apps, prompt engineering, RAG, and tool-using agents.
Strategies for managing LLM context windows effectively in AI agents. Use when building agents that handle long conversations, multi-step tasks, tool orchestration, or need to maintain coherence across extended interactions.
Use when setting up, deploying, or operating vLLM Studio (env keys, controller/frontend startup, Docker services, branch workflow, and release checklists).
Evaluate LLM systems using automated metrics, LLM-as-judge, and benchmarks. Use when testing prompt quality, validating RAG pipelines, measuring safety (hallucinations, bias), or comparing models for production deployment.
Use when you want rubric based LLM quality scoring on generated outputs; pair with addon-deterministic-eval-suite.
Fact-checks LLM responses by extracting verifiable claims, verifying each via web search, producing an audit report with verdicts, and optionally revising inaccurate responses. Use when the user asks to audit, fact-check, double-check, or verify a response.
Train your own GPT-2 level LLM for under $100 using nanochat, Karpathy's minimal hackable harness covering tokenization, pretraining, finetuning, evaluation, inference, and chat UI.
Apply when implementing fulfillment, invoice, or tracking logic for VTEX marketplace seller connectors. Covers the Order Invoice Notification API, invoice payload structure, tracking updates, partial invoicing for split shipments, and the authorize fulfillment flow. Use for building seller-side order fulfillment that integrates with VTEX marketplace order management including the 2.5s simulation timeout.
Deploys ML and LLM models on TrueFoundry with GPU inference servers (vLLM, TGI, NVIDIA NIM). Uses YAML manifests with `tfy apply`. Use when serving language models, deploying Hugging Face models, or hosting GPU-accelerated inference endpoints.
Use when validating subjective quality criteria that cannot be deterministically tested — applies LLM-based evaluation with structured rubrics for tone, aesthetics, UX feel, documentation quality, and code readability. Triggers: documentation quality check, error message tone review, UX copy evaluation, code readability assessment, design aesthetic review.
Build and maintain a personal knowledge base using Karpathy's llm-wiki methodology across Claude Code, Codex, and OpenClaw agents.
System prompt toolkit that removes AI slop and makes any LLM respond like a normal person — concise, direct, no filler.