aris-autonomous-ml-research

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ARIS — Autonomous ML Research In Sleep

ARIS — 睡眠时自主进行ML研究

Skill by ara.so — Daily 2026 Skills collection
ARIS (Auto-Research-In-Sleep) turns Claude Code into an autonomous ML research engine. It chains idea discovery → cross-model review loops → paper writing → compiled PDF into hands-off overnight pipelines. Claude Code drives execution while an external model (Codex/GPT-5.4, GLM, DeepSeek, Kimi, etc.) acts as adversarial reviewer — breaking self-play blind spots that single-model review cannot escape.
ara.so开发的Skill — 2026每日技能合集
ARIS(Auto-Research-In-Sleep)将Claude Code转变为自主ML研究引擎。它把创意挖掘 → 跨模型审查循环 → 论文撰写 → PDF编译串联成无需人工干预的隔夜工作流。Claude Code负责执行任务,而外部模型(Codex/GPT-5.4、GLM、DeepSeek、Kimi等)则扮演对抗性审查者的角色——打破单模型审查无法避免的自玩盲区。

What It Does

功能介绍

WorkflowTriggerWhat Runs
Idea Discovery
/idea-discovery
Literature survey → 8–12 ideas → novelty check → pilot GPU runs → ranked report
Auto Review Loop
/auto-review-loop
4-round review/fix cycle, score tracked per round (e.g. 5/10 → 7.5/10)
Paper Writing
/paper-writing
Narrative → outline → figures → LaTeX → PDF → 2-round auto-improvement
Full Pipeline
/research-pipeline
Chains all three end-to-end from a single prompt
工作流触发指令运行内容
创意挖掘
/idea-discovery
文献调研 → 生成8-12个创意 → 新颖性检查 → GPU试点运行 → 排名报告
自动审查循环
/auto-review-loop
4轮审查/修正循环,跟踪每轮得分(例如5/10 → 7.5/10)
论文撰写
/paper-writing
叙事构建 → 大纲生成 → 图表制作 → LaTeX编写 → PDF编译 → 2轮自动优化
完整工作流
/research-pipeline
从单个提示词出发,串联上述三个流程的端到端工作流

Installation

安装步骤

bash
undefined
bash
undefined

1. Clone and install skills

1. 克隆并安装技能

git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git cp -r Auto-claude-code-research-in-sleep/skills/* ~/.claude/skills/
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git cp -r Auto-claude-code-research-in-sleep/skills/* ~/.claude/skills/

2. Install Codex MCP (cross-model reviewer)

2. 安装Codex MCP(跨模型审查者)

npm install -g @openai/codex codex setup # set model to gpt-5.4 when prompted claude mcp add codex -s user -- codex mcp-server
npm install -g @openai/codex codex setup # 提示时将模型设置为gpt-5.4 claude mcp add codex -s user -- codex mcp-server

3. Verify MCP is connected

3. 验证MCP连接状态

claude mcp list # should show "codex" in the list
undefined
claude mcp list # 列表中应显示"codex"
undefined

Codex Model Configuration

Codex模型配置

The reviewer model is read from
~/.codex/config.toml
, not from skill files. Edit directly if needed:
toml
undefined
审查者模型的配置读取自
~/.codex/config.toml
而非技能文件。如有需要可直接编辑:
toml
undefined

~/.codex/config.toml

~/.codex/config.toml

model = "gpt-5.4" # recommended — most rigorous reviewer
model = "gpt-5.4" # 推荐使用——审查最严谨

model = "gpt-5.3-codex"

model = "gpt-5.3-codex"

model = "gpt-5.2-codex"

model = "gpt-5.2-codex"

model = "o3"

model = "o3"

undefined
undefined

Core Workflows

核心工作流

Workflow 1 — Idea Discovery

工作流1 — 创意挖掘

/idea-discovery "factorized gap in discrete diffusion language models"
Be specific — "NLP" produces weak ideas; "factorized gap in discrete diffusion LMs" targets a real research gap.
What runs:
  1. Multi-source literature search (arXiv, Scholar, Zotero, Obsidian, local PDFs)
  2. Claude brainstorms 8–12 candidate ideas
  3. Codex reviewer cross-checks novelty against literature
  4. Pilot GPU experiments on top candidates
  5. Ranked idea report saved to
    idea_discovery_report.md
/idea-discovery "离散扩散语言模型中的分解缺口"
指令需具体——"NLP"会生成质量一般的创意;"离散扩散LM中的分解缺口"则指向真实的研究缺口。
运行内容:
  1. 多来源文献检索(arXiv、Scholar、Zotero、Obsidian、本地PDF)
  2. Claude brainstorm生成8-12个候选创意
  3. Codex审查者对照文献交叉检查新颖性
  4. 对排名靠前的创意进行GPU试点实验
  5. 排名后的创意报告保存至
    idea_discovery_report.md

Workflow 2 — Auto Review Loop

工作流2 — 自动审查循环

/auto-review-loop
Run from a directory containing your paper draft or experiment results.
What runs:
  1. Claude submits current work to Codex reviewer
  2. Codex returns structured critique with score
    /10
  3. Claude implements fixes (experiments, writing, ablations)
  4. Repeat up to 4 rounds or until score threshold met
  5. Score curve saved to
    docs/auto_review_score_curve.png
/auto-review-loop
在包含论文草稿或实验结果的目录中运行该指令。
运行内容:
  1. Claude将当前成果提交给Codex审查者
  2. Codex返回带评分(/10)的结构化评审意见
  3. Claude实施修正(实验、写作、消融实验)
  4. 重复最多4轮,或直至达到分数阈值
  5. 得分曲线保存至
    docs/auto_review_score_curve.png

Workflow 3 — Paper Writing

工作流3 — 论文撰写

/paper-writing "NARRATIVE_REPORT.md"
Point at a narrative markdown file describing your findings.
What runs:
  1. Outline generation (sections, figures, tables)
  2. Figure generation from experiment results
  3. LaTeX source assembly
  4. pdflatex
    compilation
  5. 2-round auto-review-and-improve cycle
  6. Final PDF + anti-hallucination BibTeX (fetched from DBLP/CrossRef)
/paper-writing "NARRATIVE_REPORT.md"
指定一个描述研究发现的叙事markdown文件。
运行内容:
  1. 生成大纲(章节、图表、表格)
  2. 根据实验结果生成图表
  3. 组装LaTeX源码
  4. pdflatex
    编译
  5. 2轮自动审查与优化循环
  6. 最终PDF+防幻觉BibTeX(从DBLP/CrossRef获取)

Full Pipeline

完整工作流

/research-pipeline "your research direction"
Chains Workflows 1 → 2 → 3 from a single prompt. Wake up to a scored, compiled paper.
/research-pipeline "你的研究方向"
从单个提示词出发,串联工作流1→2→3。一觉醒来即可获得带评分的编译完成的论文。

Inline Configuration Overrides

内联配置覆盖

Append
— key: value
to any command:
/research-pipeline "topic" — AUTO_PROCEED: false
/research-pipeline "topic" — human checkpoint: true
/research-pipeline "topic" — arxiv download: true
/research-pipeline "topic" — AUTO_PROCEED: false, human checkpoint: true
ParameterDefaultEffect
AUTO_PROCEED
true
false
= pause at idea selection gate before committing GPU time
human checkpoint
false
true
= pause after each review round for manual feedback
arxiv download
false
true
= download full PDFs during literature survey (vs metadata only)
DBLP_BIBTEX
true
false
= use LLM-generated BibTeX (not recommended — hallucination risk)
在任意命令后追加
— key: value
即可覆盖配置:
/research-pipeline "主题" — AUTO_PROCEED: false
/research-pipeline "主题" — human checkpoint: true
/research-pipeline "主题" — arxiv download: true
/research-pipeline "主题" — AUTO_PROCEED: false, human checkpoint: true
参数默认值作用
AUTO_PROCEED
true
false
= 在投入GPU资源前,于创意选择节点暂停
human checkpoint
false
true
= 每轮审查后暂停,等待人工反馈
arxiv download
false
true
= 文献调研时下载完整PDF(而非仅元数据)
DBLP_BIBTEX
true
false
= 使用大语言模型生成的BibTeX(不推荐——存在幻觉风险)

Alternative Model Combinations

替代模型组合

No Claude or OpenAI API required — swap any OpenAI-compatible endpoint via the
llm-chat
MCP server:
bash
undefined
无需Claude或OpenAI API——通过
llm-chat
MCP服务器可替换为任意兼容OpenAI的端点:
bash
undefined

Install the bundled llm-chat MCP server

安装捆绑的llm-chat MCP服务器

cd Auto-claude-code-research-in-sleep/mcp-servers/llm-chat pip install -r requirements.txt
cd Auto-claude-code-research-in-sleep/mcp-servers/llm-chat pip install -r requirements.txt

Configure your provider

配置你的供应商

export LLM_CHAT_BASE_URL="https://open.bigmodel.cn/api/paas/v4" # GLM-4 export LLM_CHAT_API_KEY="your-key" export LLM_CHAT_MODEL="glm-4-plus"
export LLM_CHAT_BASE_URL="https://open.bigmodel.cn/api/paas/v4" # GLM-4 export LLM_CHAT_API_KEY="你的密钥" export LLM_CHAT_MODEL="glm-4-plus"

Add to Claude Code

添加到Claude Code

claude mcp add llm-chat -s user -- python server.py

**Tested reviewer models:**

| Provider | Model | Notes |
|---|---|---|
| OpenAI | `gpt-5.4` | Recommended — most rigorous |
| Zhipu AI | `glm-4-plus` | Strong Chinese-language papers |
| MiniMax | `abab6.5s-chat` | Fast, cost-effective |
| Moonshot | `moonshot-v1-128k` | Kimi — long-context papers |
| DeepSeek | `deepseek-chat` | Code-heavy experiments |
| 01.ai | `yi-large` | LongCat — long context |
claude mcp add llm-chat -s user -- python server.py

**已测试的审查者模型:**

| 供应商 | 模型 | 说明 |
|---|---|---|
| OpenAI | `gpt-5.4` | 推荐——审查最严谨 |
| 智谱AI | `glm-4-plus` | 擅长中文论文 |
| MiniMax | `abab6.5s-chat` | 速度快、性价比高 |
| Moonshot | `moonshot-v1-128k` | Kimi——长上下文论文 |
| DeepSeek | `deepseek-chat` | 代码密集型实验 |
| 01.ai | `yi-large` | LongCat——长上下文 |

Anti-Hallucination Citations

防幻觉引用

BibTeX is fetched from real databases by default — no manual flag needed:
python
undefined
默认情况下BibTeX从真实数据库获取——无需手动设置:
python
undefined

skills/paper-writing/citation_fetcher.py pattern used internally

内部使用的skills/paper-writing/citation_fetcher.py代码逻辑

import requests
def fetch_bibtex_dblp(title: str) -> str | None: """Fetch real BibTeX from DBLP by paper title.""" resp = requests.get( "https://dblp.org/search/publ/api", params={"q": title, "format": "json", "h": 1} ) hits = resp.json().get("result", {}).get("hits", {}).get("hit", []) if not hits: return None key = hits[0]["info"].get("key", "") bib_resp = requests.get(f"https://dblp.org/rec/{key}.bib") return bib_resp.text if bib_resp.ok else None
def fetch_bibtex_crossref(doi: str) -> str | None: """Fallback: fetch BibTeX from CrossRef by DOI.""" resp = requests.get( f"https://api.crossref.org/works/{doi}/transform/application/x-bibtex" ) return resp.text if resp.ok else None

Disable with `— DBLP_BIBTEX: false` if working fully offline.
import requests
def fetch_bibtex_dblp(title: str) -> str | None: """根据论文标题从DBLP获取真实BibTeX。""" resp = requests.get( "https://dblp.org/search/publ/api", params={"q": title, "format": "json", "h": 1} ) hits = resp.json().get("result", {}).get("hits", {}).get("hit", []) if not hits: return None key = hits[0]["info"].get("key", "") bib_resp = requests.get(f"https://dblp.org/rec/{key}.bib") return bib_resp.text if bib_resp.ok else None
def fetch_bibtex_crossref(doi: str) -> str | None: """备选方案:根据DOI从CrossRef获取BibTeX。""" resp = requests.get( f"https://api.crossref.org/works/{doi}/transform/application/x-bibtex" ) return resp.text if resp.ok else None

若完全离线工作,可通过`— DBLP_BIBTEX: false`禁用该功能。

Optional Integrations

可选集成

Zotero

Zotero

bash
undefined
bash
undefined

Install Zotero Better BibTeX plugin, then:

安装Zotero Better BibTeX插件,然后:

export ZOTERO_API_KEY="your-zotero-web-api-key" export ZOTERO_LIBRARY_ID="your-library-id" export ZOTERO_LIBRARY_TYPE="user" # or "group"

Literature search will query your Zotero library before hitting arXiv.
export ZOTERO_API_KEY="你的Zotero Web API密钥" export ZOTERO_LIBRARY_ID="你的图书馆ID" export ZOTERO_LIBRARY_TYPE="user" # 或"group"

文献调研时会先查询你的Zotero图书馆,再访问arXiv。

Obsidian

Obsidian

bash
export OBSIDIAN_VAULT_PATH="/path/to/your/vault"
Skill will search markdown notes in the vault for related work before external queries.
bash
export OBSIDIAN_VAULT_PATH="/你的知识库路径"
技能会先搜索知识库中的markdown笔记,再进行外部查询。

Feishu / Lark Notifications

飞书/ Lark通知

bash
export FEISHU_WEBHOOK_URL="https://open.feishu.cn/open-apis/bot/v2/hook/your-token"
export FEISHU_MODE="push"   # off | push | interactive
ModeBehaviour
off
No notifications
push
One-way alerts: review scores, experiment completions, checkpoints
interactive
Mobile approval buttons at
AUTO_PROCEED: false
gates
bash
export FEISHU_WEBHOOK_URL="https://open.feishu.cn/open-apis/bot/v2/hook/你的令牌"
export FEISHU_MODE="push"   # off | push | interactive
模式行为
off
无通知
push
单向提醒:审查分数、实验完成、检查点
interactive
AUTO_PROCEED: false
节点显示移动端审批按钮

Directory Layout After a Pipeline Run

工作流运行后的目录结构

your-project/
├── idea_discovery_report.md       # ranked ideas with novelty scores
├── NARRATIVE_REPORT.md            # auto-generated findings narrative
├── paper/
│   ├── main.tex                   # assembled LaTeX
│   ├── main.pdf                   # compiled output
│   ├── figures/                   # auto-generated plots
│   └── references.bib             # real BibTeX from DBLP/CrossRef
├── experiments/
│   ├── pilot_runs/                # idea-discovery GPU pilots
│   └── review_round_*/            # per-round experiment results
└── docs/
    └── auto_review_score_curve.png
your-project/
├── idea_discovery_report.md       # 带新颖性评分的创意排名报告
├── NARRATIVE_REPORT.md            # 自动生成的研究发现叙事
├── paper/
│   ├── main.tex                   # 组装好的LaTeX文件
│   ├── main.pdf                   # 编译后的输出文件
│   ├── figures/                   # 自动生成的图表
│   └── references.bib             # 从DBLP/CrossRef获取的真实BibTeX
├── experiments/
│   ├── pilot_runs/                # 创意挖掘阶段的GPU试点实验
│   └── review_round_*/            # 每轮审查的实验结果
└── docs/
    └── auto_review_score_curve.png

Python Integration Pattern

Python集成方式

Trigger ARIS workflows programmatically from a Python script (e.g. a cron job or CI step):
python
import subprocess
import json
from pathlib import Path

def run_aris_pipeline(
    research_direction: str,
    output_dir: str = ".",
    auto_proceed: bool = True,
    human_checkpoint: bool = False,
    arxiv_download: bool = False,
) -> dict:
    """
    Launch ARIS full pipeline via Claude Code CLI.
    Returns parsed score progression from the review curve JSON.
    """
    overrides = ", ".join([
        f"AUTO_PROCEED: {str(auto_proceed).lower()}",
        f"human checkpoint: {str(human_checkpoint).lower()}",
        f"arxiv download: {str(arxiv_download).lower()}",
    ])

    command = f'/research-pipeline "{research_direction}" — {overrides}'

    result = subprocess.run(
        ["claude", "--print", command],
        cwd=output_dir,
        capture_output=True,
        text=True,
    )

    if result.returncode != 0:
        raise RuntimeError(f"ARIS pipeline failed:\n{result.stderr}")

    # Parse score progression if available
    score_json = Path(output_dir) / "docs" / "review_scores.json"
    if score_json.exists():
        return json.loads(score_json.read_text())

    return {"stdout": result.stdout}
通过Python脚本(如定时任务或CI步骤)以编程方式触发ARIS工作流:
python
import subprocess
import json
from pathlib import Path

def run_aris_pipeline(
    research_direction: str,
    output_dir: str = ".",
    auto_proceed: bool = True,
    human_checkpoint: bool = False,
    arxiv_download: bool = False,
) -> dict:
    """
    通过Claude Code CLI启动ARIS完整工作流。
    返回审查曲线JSON中的解析后得分变化。
    """
    overrides = ", ".join([
        f"AUTO_PROCEED: {str(auto_proceed).lower()}",
        f"human checkpoint: {str(human_checkpoint).lower()}",
        f"arxiv download: {str(arxiv_download).lower()}",
    ])

    command = f'/research-pipeline "{research_direction}" — {overrides}'

    result = subprocess.run(
        ["claude", "--print", command],
        cwd=output_dir,
        capture_output=True,
        text=True,
    )

    if result.returncode != 0:
        raise RuntimeError(f"ARIS工作流失败:\n{result.stderr}")

    # 若存在则解析得分变化
    score_json = Path(output_dir) / "docs" / "review_scores.json"
    if score_json.exists():
        return json.loads(score_json.read_text())

    return {"stdout": result.stdout}

Example: nightly research job

示例:夜间研究任务

if name == "main": scores = run_aris_pipeline( research_direction="token-level uncertainty calibration in autoregressive LMs", output_dir="./nightly_research", auto_proceed=True, human_checkpoint=False, ) print(f"Final review score: {scores.get('rounds', [{}])[-1].get('score')}/10")
undefined
if name == "main": scores = run_aris_pipeline( research_direction="自回归语言模型中的token级不确定性校准", output_dir="./nightly_research", auto_proceed=True, human_checkpoint=False, ) print(f"最终审查得分: {scores.get('rounds', [{}])[-1].get('score')}/10")
undefined

Skill Composition

Skill组合

ARIS ships 20 composable sub-skills. Chain them manually for custom workflows:
undefined
ARIS内置20个可组合的子Skill。可手动串联它们以构建自定义工作流:
undefined

Literature only

仅文献调研

/literature-survey "topic"
/literature-survey "主题"

Brainstorm without pilot experiments

仅brainstorm创意,不做试点实验

/idea-brainstorm "topic" — pilot experiments: false
/idea-brainstorm "主题" — pilot experiments: false

Single review round (no loop)

单轮审查(无循环)

/single-review "path/to/draft.md"
/single-review "path/to/draft.md"

Proof-writing (community skill)

证明撰写(社区Skill)

/proof-writer "theorem statement"
/proof-writer "定理陈述"

Write paper from existing narrative, skip review

基于现有叙事撰写论文,跳过审查

/paper-writing "NARRATIVE.md" — auto-review: false
undefined
/paper-writing "NARRATIVE.md" — auto-review: false
undefined

Troubleshooting

故障排除

Codex MCP not found
bash
claude mcp list                          # verify "codex" appears
codex setup                              # re-run setup if missing
claude mcp remove codex && \
  claude mcp add codex -s user -- codex mcp-server   # re-add
Skills not loading in Claude Code
bash
ls ~/.claude/skills/                     # verify files copied
Codex MCP未找到
bash
claude mcp list                          # 验证"codex"是否存在
codex setup                              # 若缺失则重新运行设置
claude mcp remove codex && \
  claude mcp add codex -s user -- codex mcp-server   # 重新添加
Claude Code中Skill未加载
bash
ls ~/.claude/skills/                     # 验证文件已复制

Each skill must be a directory with SKILL.md inside

每个Skill必须是包含SKILL.md的目录

ls ~/.claude/skills/auto-review-loop/SKILL.md

**`pdflatex` not found during paper writing**
```bash
ls ~/.claude/skills/auto-review-loop/SKILL.md

**论文撰写时提示`pdflatex`未找到**
```bash

macOS

macOS

brew install --cask mactex-no-gui
brew install --cask mactex-no-gui

Ubuntu/Debian

Ubuntu/Debian

sudo apt install texlive-full
sudo apt install texlive-full

Then retry — skill auto-detects pdflatex on PATH

然后重试——Skill会自动检测PATH中的pdflatex


**Reviewer returns empty critique**
Check `~/.codex/config.toml` — ensure `model` is set and your API key is valid:
```bash
codex "say hello"    # quick smoke test outside Claude Code
GLM/DeepSeek reviewer not triggering Verify
llm-chat
MCP server is listed:
bash
claude mcp list      # should show "llm-chat"
echo $LLM_CHAT_BASE_URL   # must be set in the shell that launches claude
Score not improving after 4 rounds
  • Add
    — human checkpoint: true
    and inspect each round's critique file in
    experiments/review_round_*/
  • Consider switching reviewer model — a different architecture surfaces different weaknesses
  • Lower-level issues (bad data, flawed baseline) need manual intervention before another loop

**审查者返回空评审意见**
检查`~/.codex/config.toml`——确保`model`已设置且API密钥有效:
```bash
codex "say hello"    # 在Claude Code外进行快速测试
GLM/DeepSeek审查者未触发 验证
llm-chat
MCP服务器已在列表中:
bash
claude mcp list      # 应显示"llm-chat"
echo $LLM_CHAT_BASE_URL   # 启动claude的shell中必须已设置该环境变量
4轮后得分仍未提升
  • 添加
    — human checkpoint: true
    ,检查
    experiments/review_round_*/
    中每轮的评审意见文件
  • 考虑更换审查者模型——不同架构会发现不同的问题
  • 底层问题(数据质量差、基线有缺陷)需要人工干预后再进行循环

Community Skills

社区Skills

SkillDescription
proof-writer
Rigorous theorem proof drafting with anti-hallucination citations
Add your own skill: create
skills/your-skill-name/SKILL.md
and open a PR.
Skill描述
proof-writer
带防幻觉引用的严谨定理证明撰写
添加自定义Skill:创建
skills/your-skill-name/SKILL.md
并提交PR。

Cross-Model Review — Why It Works

跨模型审查——为何有效

Claude Code (executor)          Codex / external LLM (reviewer)
─────────────────────          ───────────────────────────────
Fast, fluid code execution  ←→  Deliberate, rigorous critique
Broad context retention         Adversarial probing of blind spots
Narrative generation            Structural weakness detection
Single-model self-review falls into local minima — the same pattern-matching that generated the work also evaluates it. Cross-model review is adversarial: the reviewer actively probes weaknesses the executor didn't anticipate. The 1→2 model jump produces the largest quality gain; adding more reviewers yields diminishing returns.
Claude Code(执行者)          Codex / 外部大语言模型(审查者)
─────────────────────          ───────────────────────────────
快速流畅的代码执行  ←→  审慎严谨的评审意见
广泛的上下文保留能力         对抗性探测盲区
叙事生成能力            结构缺陷检测
单模型自审查会陷入局部最优——生成成果的模式匹配逻辑同样用于评估成果。跨模型审查是对抗性的:审查者会主动探测执行者未预料到的问题。从1个模型增加到2个模型时质量提升最显著;添加更多审查者的收益会递减。