Search Results: s-skills

Found 2,747 Skills

AI & Machine Learninghamelsmu/evals-skills

write-judge-prompt

Design LLM-as-Judge evaluators for subjective criteria that code-based checks cannot handle. Use when a failure mode requires interpretation (tone, faithfulness, relevance, completeness). Do NOT use when the failure mode can be checked with code (regex, schema validation, execution tests). Do NOT use when you need to validate or calibrate the judge — use validate-evaluator instead.

🇺🇸|EnglishTranslated

Product & Designommakes/skills

righter

Apply UX content writing principles to review existing UI copy or write new UI copy from scratch. Use this skill whenever someone asks you to: review, audit, critique, or improve UI text, error messages, button labels, tooltips, empty states, onboarding copy, form helper text, or any software interface copy. Also trigger when someone asks you to write new UI copy, label a button, draft an error message, write a modal, or create any in-product text. If the request involves words that appear inside software — use this skill.

🇺🇸|EnglishTranslated

AI & Machine Learninghamelsmu/evals-skills

build-review-interface

Build a custom browser-based annotation interface tailored to your data for reviewing LLM traces and collecting structured feedback. Use when you need to build an annotation tool, review traces, or collect human labels.

🇺🇸|EnglishTranslated

AI & Machine Learninghamelsmu/evals-skills

evaluate-rag

Guides evaluation of RAG pipeline retrieval and generation quality. Use when evaluating a retrieval-augmented generation system, measuring retrieval quality, assessing generation faithfulness or relevance, generating synthetic QA pairs for retrieval testing, or optimizing chunking strategies.

🇺🇸|EnglishTranslated

AI & Machine Learninghamelsmu/evals-skills

validate-evaluator

Calibrate an LLM judge against human labels using data splits, TPR/TNR, and bias correction. Use after writing a judge prompt (write-judge-prompt) when you need to verify alignment before trusting its outputs. Do NOT use for code-based evaluators (those are deterministic; test with standard unit tests).

🇺🇸|EnglishTranslated

AI & Machine Learninghamelsmu/evals-skills

generate-synthetic-data

Create diverse synthetic test inputs for LLM pipeline evaluation using dimension-based tuple generation. Use when bootstrapping an eval dataset, when real user data is sparse, or when stress-testing specific failure hypotheses. Do NOT use when you already have 100+ representative real traces (use stratified sampling instead), or when the task is collecting production logs.

🇺🇸|EnglishTranslated

AI & Machine Learninghamelsmu/evals-skills

eval-audit

Audit an LLM eval pipeline and surface problems: missing error analysis, unvalidated judges, vanity metrics, etc. Use when inheriting an eval system, when unsure whether evals are trustworthy, or as a starting point when no eval infrastructure exists. Do NOT use when the goal is to build a new evaluator from scratch (use error-analysis, write-judge-prompt, or validate-evaluator instead).

🇺🇸|EnglishTranslated

AI & Machine Learninghamelsmu/evals-skills

error-analysis

Help the user systematically identify and categorize failure modes in an LLM pipeline by reading traces. Use when starting a new eval project, after significant pipeline changes (new features, model switches, prompt rewrites), when production metrics drop, or after incidents.

🇺🇸|EnglishTranslated

Tools & Utilitiesmonorepo-labs/skills

supacortex

Personal memory layer — save bookmarks and conversation summaries using the Supacortex CLI. Use when the user says "save to cortex", "save to supacortex", "save this session", or asks to recall past conversations.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesopenhands/skills

datadog

Query and analyze Datadog logs, metrics, APM traces, and monitors using the Datadog API. Use when debugging production issues, monitoring application performance, or investigating alerts.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesjeremylongshore/claude-co...

posthog-prod-checklist

Execute PostHog production deployment checklist and rollback procedures. Use when deploying PostHog integrations to production, preparing for launch, or implementing go-live procedures. Trigger with phrases like "posthog production", "deploy posthog", "posthog go-live", "posthog launch checklist".

🇺🇸|EnglishTranslated

Data Processingjeremylongshore/claude-co...

exploring-blockchain-data

Process query and analyze blockchain data including blocks, transactions, and smart contracts. Use when querying blockchain data and transactions. Trigger with phrases like "explore blockchain", "query transactions", or "check on-chain data".

🇺🇸|EnglishTranslated

5 scripts/Checked