Loading...
Loading...
Found 1,066 Skills
Build, validate, and deploy LLM-as-Judge evaluators for automated quality assessment of LLM pipeline outputs. Use this skill whenever the user wants to: create an automated evaluator for subjective or nuanced failure modes, write a judge prompt for Pass/Fail assessment, split labeled data for judge development, measure judge alignment (TPR/TNR), estimate true success rates with bias correction, or set up CI evaluation pipelines. Also trigger when the user mentions "judge prompt", "automated eval", "LLM evaluator", "grading prompt", "alignment metrics", "true positive rate", or wants to move from manual trace review to automated evaluation. This skill covers the full lifecycle: prompt design → data splitting → iterative refinement → success rate estimation.
Use this skill when crafting, reviewing, or improving prompts for LLM pipelines — including task prompts, system prompts, and LLM-as-Judge prompts. Triggers include: requests to write or refine a prompt, diagnose why an LLM produces inconsistent or incorrect outputs, bridge the gap between intent and model behavior, reduce ambiguity in instructions, add few-shot examples, structure complex prompts, or improve output formatting. Also use when the user needs help distinguishing specification failures (unclear instructions) from generalization failures (model limitations), or when iterating on prompts based on observed failure modes. Do NOT use for general coding tasks, document creation, or non-LLM writing.
Initialize and configure LangGraph projects with proper structure, langgraph.json configuration, environment variables, and dependency management. Use when users want to (1) create a new LangGraph project, (2) set up langgraph.json for deployment, (3) configure environment variables for LLM providers, (4) initialize project structure for agents, (5) set up local development with LangGraph Studio, (6) configure dependencies (pyproject.toml, requirements.txt, package.json), or (7) troubleshoot project configuration issues.
Convert documents (PDF, Word, Excel, PowerPoint, images, HTML) to Markdown using microsoft/markitdown. Use for document analysis, content extraction, preprocessing for LLMs, or batch document conversion. Supports images with OCR/LLM descriptions, audio transcription, and ZIP archives.
Build AI agents with persistent threads, tool calling, and streaming on Convex. Use when implementing chat interfaces, AI assistants, multi-agent workflows, RAG systems, or any LLM-powered features with message history.
AI and machine learning development with PyTorch, TensorFlow, and LLM integration. Use when building ML models, training pipelines, fine-tuning LLMs, or implementing AI features.
Set up Langfuse local development workflow with hot reload and debugging. Use when developing LLM applications locally, debugging traces, or setting up a fast iteration loop with Langfuse. Trigger with phrases like "langfuse local dev", "langfuse development", "debug langfuse traces", "langfuse hot reload", "langfuse dev workflow".
Comprehensive LLM audit. Model currency, prompt quality, evals, observability, CI/CD. Ensures all LLM-powered features follow best practices and are properly instrumented. Auto-invoke when: model names/versions mentioned, AI provider config, prompt changes, .env with AI keys, aiProviders.ts or prompts.ts modified, AI-related PRs. CRITICAL: Training data lags months. ALWAYS web search before LLM decisions.
Full changelog infrastructure from scratch. Greenfield workflow. Installs semantic-release, commitlint, GitHub Actions, LLM synthesis, public page.
Query Langfuse traces for debugging LLM calls, analyzing token usage, and investigating workflow executions. Use when debugging AI/LLM behavior, checking trace data, or analyzing observability metrics.
Write reliable prompts for Agentica/REPL agents that avoid LLM instruction ambiguity
Expert prompt engineering for creating effective prompts for Claude, GPT, and other LLMs. Use when writing system prompts, user prompts, few-shot examples, or optimizing existing prompts for better performance.