Loading...
Loading...
Found 6 Skills
Run metric-driven iterative optimization loops. Define a measurable goal, build measurement scaffolding, then run parallel experiments that try many approaches, measure each against hard gates and/or LLM-as-judge quality scores, keep improvements, and converge toward the best solution. Use when optimizing clustering quality, search relevance, build performance, prompt quality, or any measurable outcome that benefits from systematic experimentation. Inspired by Karpathy's autoresearch, generalized for multi-file code changes and non-ML domains.
Start an autonomous experiment loop with user-selected interval (10min, 1h, daily, weekly, monthly). Uses CronCreate for scheduling.
Use ARIS (Auto-Research-In-Sleep) for autonomous ML research — idea generation, paper review, experiment automation, and cross-model collaboration with Claude Code, Codex, or any LLM agent.
Autonomous NeMo-RL research agent workflow for directed hypothesis testing and open-ended discovery. Guides agents through the full experiment lifecycle: understanding recipes and environments, wiring RL or NeMo-gym runs, launching reproducible baselines and iterations, analyzing results, preserving human oversight, and using git plus TSV logs as the research ledger.
Runs ML experiments reproducibly — single runs or autonomous BFS batches. Single mode: isolated venv, time-budgeted, failure-handled, logs to RESEARCH.md. BFS mode (opt-in): designs N hypotheses, runs each for a fixed budget, compares via a single verifiable metric, keeps improvements and git-resets failures — fully autonomous until done. Respects the RESEARCH.md supervision policy for notifications, approvals, and stop limits. Trigger phrases: "run experiment", "train model", "explore design space", "find best config", "autoresearch".
Set up and run an autonomous experiment loop for any optimization target. Use when asked to start autoresearch or run experiments.