Loading...
Loading...
Found 2,519 Skills
Security Benchmark Runner - Auto-activating skill for Security Advanced. Triggers on: security benchmark runner, security benchmark runner Part of the Security Advanced skill category.
Senior AI Product Manager. Expert in Probabilistic Strategy, Rapid Agentic Prototyping, and Hypothesis Generation for 2026.
Test, commit, and push in one atomic workflow. Runs Go and Python tests, commits with conventional message, pushes to current branch.
Calibrate an LLM judge against human labels using data splits, TPR/TNR, and bias correction. Use after writing a judge prompt (write-judge-prompt) when you need to verify alignment before trusting its outputs. Do NOT use for code-based evaluators (those are deterministic; test with standard unit tests).
Audit an LLM eval pipeline and surface problems: missing error analysis, unvalidated judges, vanity metrics, etc. Use when inheriting an eval system, when unsure whether evals are trustworthy, or as a starting point when no eval infrastructure exists. Do NOT use when the goal is to build a new evaluator from scratch (use error-analysis, write-judge-prompt, or validate-evaluator instead).
Debug OpenWork sidecars, config, and audit trail
Эксперт по защите от SQL injection. Используй для parameterized queries, input validation и database security.
Use when the user wants to validate that implemented code matches its specifications, generate integration tests from feature files, or check if code still satisfies existing scenarios. Trigger after implementation completes a feature. Also use when the user asks "does the code do what we specified?" or "generate tests from the feature files".
Test-Driven Development workflow enforcement with RED-GREEN-REFACTOR cycle. Use when implementing features test-first or improving test coverage.
Generate Playwright tests. Use when user says "write tests", "generate tests", "add tests for", "test this component", "e2e test", "create test for", "test this page", or "test this feature".
Fix failing or flaky Playwright tests. Use when user says "fix test", "flaky test", "test failing", "debug test", "test broken", "test passes sometimes", or "intermittent failure".
Review Playwright tests for quality. Use when user says "review tests", "check test quality", "audit tests", "improve tests", "test code review", or "playwright best practices check".