Loading...
Loading...
Found 3 Skills
Use when experiments complete to judge what claims the results support, what they don't, and what evidence is still missing. Codex MCP evaluates results against intended claims and routes to next action (pivot, supplement, or confirm). Use after experiments finish — before writing the paper or running ablations.
Curated repository of experiment hypotheses, assumptions, and historical learnings.
Resolves experiment references from natural language to concrete experiment IDs. Handles name lookups, fuzzy descriptions ('the signup experiment', 'my latest experiment'), status filtering, and disambiguation when multiple experiments match. TRIGGER when: user refers to an experiment by name, description, or relative reference ('latest', 'most recent', 'the one I created yesterday') and you don't already have the experiment ID. DO NOT TRIGGER when: user provides an experiment ID directly, or you already resolved the experiment earlier in the conversation.