Loading...
Loading...
Found 126 Skills
Use when planning A/B tests in LaunchDarkly, Optimizely, or similar platforms. Sizes the experiment (sample size, MDE, runtime), drafts hypothesis + success metrics + guardrails, and produces a launch checklist + rollback plan.
Bootstrap evaluators from production traces — emit SDK code, a framework-agnostic JSON spec, or publish online LLM-judge evaluators directly to Datadog. Use when user says "bootstrap evaluators", "generate evaluators", "create evals from traces", "eval bootstrap", "write evaluators", "build eval suite", "publish evaluators", or wants to generate BaseEvaluator/LLMJudge code or online judge configs from production LLM trace data. Works with ml_app and optional RCA report or failure hypothesis.
Automated hypothesis generation and testing using large language models. Use this skill when generating scientific hypotheses from datasets, combining literature insights with empirical data, testing hypotheses against observational data, or conducting systematic hypothesis exploration for research discovery in domains like deception detection, AI content detection, mental health analysis, or other empirical research tasks.
Comprehensive debugging methodology for finding and fixing bugs (formerly debugging). This skill should be used when debugging code, investigating errors, troubleshooting issues, performing root cause analysis, or responding to incidents. Covers systematic reproduction, hypothesis-driven investigation, and root cause analysis techniques. Use when encountering exceptions, stack traces, crashes, segfaults, undefined behavior, or when bug reports need investigation.
Transform Claude Code into an AI Scientist that orchestrates research workflows using tree-based hypothesis exploration. Triggers on "research project", "scientific experiment", "run experiments", "AI scientist", "tree search experimentation", "systematic study".
Эксперт анализа распределений. Используй для statistical distributions, data analysis и hypothesis testing.
Use when diagnosing unexpected behavior, failed workflows, bugs, browser or Node.js runtime issues, logs, traces, or when preparing a root-cause hypothesis. 诊断异常、定位 bug、判断修复方向时使用:先建立证据表,区分运行时事实和代码推断,避免多层猜测;证据不足时添加 copy-friendly 浏览器日志或本地 Node.js JSONL 日志。
Use when a BizOps lead, COO, or process-improvement owner needs to document an end-to-end business process (procurement, employee onboarding, incident handoff, customer-onboarding, claims adjudication) in BPMN-style notation, measure cycle times by stage, surface where work spends most of its time waiting vs. being worked, and quantify the gap between processing time and total elapsed time. Pairs Lean / Six Sigma / Theory-of-Constraints canon with deterministic stdlib-only Python tools to produce a process map, a ranked bottleneck list (with severity + root-cause hypothesis), and a cycle-time analysis (P50, P90, value-add ratio, Little's-Law throughput). Distinct from sales-pipeline, system-reliability (SLO), and strategic-OKR work — this is tactical process documentation for internal operations.
Autonomous NeMo-RL research agent workflow for directed hypothesis testing and open-ended discovery. Guides agents through the full experiment lifecycle: understanding recipes and environments, wiring RL or NeMo-gym runs, launching reproducible baselines and iterations, analyzing results, preserving human oversight, and using git plus TSV logs as the research ledger.
When the user wants to design, prioritize, or analyze growth experiments -- including A/B tests, hypothesis frameworks, ICE/RICE scoring, or growth sprints. Also use when the user says "A/B test," "experiment design," "growth sprint," "experiment prioritization," or "statistical significance." For analytics setup, see product-analytics. For growth modeling, see growth-modeling.
Systematic debugging methodology with root cause analysis. Phases: investigate, hypothesize, validate, verify. Capabilities: backward call stack tracing, multi-layer validation, verification protocols, symptom analysis, regression prevention. Actions: debug, investigate, trace, analyze, validate, verify bugs. Keywords: debugging, root cause, bug fix, stack trace, error investigation, test failure, exception handling, breakpoint, logging, reproduce, isolate, regression, call stack, symptom vs cause, hypothesis testing, validation, verification protocol. Use when: encountering bugs, analyzing test failures, tracing unexpected behavior, investigating performance issues, preventing regressions, validating fixes before completion claims.
Construct well-structured arguments using the hypothesis-argument-example triad. Covers formulating falsifiable hypotheses, building logical arguments (deductive, inductive, analogical, evidential), providing concrete examples, and steelmanning counterarguments. Use when writing or reviewing PR descriptions that propose technical changes, justifying design decisions in ADRs, constructing substantive code review feedback, or building a research argument or technical proposal.