Total 50,034 skills
Showing 12 of 50034 skills
Calibrate an LLM judge against human labels using data splits, TPR/TNR, and bias correction. Use after writing a judge prompt (write-judge-prompt) when you need to verify alignment before trusting its outputs. Do NOT use for code-based evaluators (those are deterministic; test with standard unit tests).
Create diverse synthetic test inputs for LLM pipeline evaluation using dimension-based tuple generation. Use when bootstrapping an eval dataset, when real user data is sparse, or when stress-testing specific failure hypotheses. Do NOT use when you already have 100+ representative real traces (use stratified sampling instead), or when the task is collecting production logs.
Audit an LLM eval pipeline and surface problems: missing error analysis, unvalidated judges, vanity metrics, etc. Use when inheriting an eval system, when unsure whether evals are trustworthy, or as a starting point when no eval infrastructure exists. Do NOT use when the goal is to build a new evaluator from scratch (use error-analysis, write-judge-prompt, or validate-evaluator instead).
Micronaut framework guardrails, patterns, and best practices for AI-assisted development. Use when working with Micronaut projects, or when the user mentions Micronaut. Provides compile-time DI, HTTP server/client, data access, and cloud-native guidelines.
Open a new context session at the start of a leader agent workflow. Records agentName, storyId, and phase in wint.contextSessions, emitting a structured SESSION CREATED block for downstream workers to inherit.
Audit your Octave library for gaps, stale content, duplicates, and inconsistencies. Use when user says "audit my library", "check for gaps", "library health check", "find duplicates", or asks about library quality and completeness.
Create new applications for ryOS following established patterns and conventions. Use when building a new app, adding an application to the desktop, creating app components, or scaffolding app structures.
Automates release management with changelog generation, semantic versioning, and release readiness checks. Use when preparing releases, generating changelogs, bumping versions, or validating release candidates.
Skill for creating custom lint rules by leveraging the existing linter ecosystems of various programming languages. This is a linter designed for AI Agents rather than humans, and its error messages function as correction instruction prompts for AI. Create custom rules in the `lints/` directory using standard methods for each language, including Rust (dylint), TypeScript/JavaScript (ESLint), Python (pylint), Go (golangci-lint), etc. Use this skill in the following scenarios: (1) When you want AI to enforce project-specific coding rules; (2) When you want to create lint rules that output AI-readable correction instructions when violations occur; (3) When you want to enforce naming conventions, structural patterns, and consistency rules through AI-driven linting. Triggers: "Create a linter rule", "Add a lint rule", "Enforce this pattern", "AI linter", "Custom lint", "Code rules", "Naming rules", "Structural rules", "create a linter rule", "add a lint rule", "enforce this pattern", "AI linter".
Expert blueprint for romance games and dating sims (Tokimeki Memorial, Monster Prom, Persona social links) focusing on affection systems, multi-stat relationships, dated events, and route branching. Use when building relationship-centric games, social simulations, or otome games. Keywords romance, dating sim, affection system, relationship stats, date events, character routes, love interest.
Guide for exposing PostHog product endpoints as MCP tools. Use when creating new or updating API endpoints, adding MCP tool definitions, scaffolding YAML configs, or writing serializers with good descriptions. Covers the full pipeline from Django serializer to generated TypeScript tool handler.
Generate Python RSS feed scrapers from blog websites, integrated with hourly GitHub Actions