aris-autonomous-ml-research

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ARIS — Autonomous ML Research In Sleep

ARIS — 睡眠时自主进行ML研究

Skill by ara.so — Daily 2026 Skills collection

ARIS (Auto-Research-In-Sleep) turns Claude Code into an autonomous ML research engine. It chains idea discovery → cross-model review loops → paper writing → compiled PDF into hands-off overnight pipelines. Claude Code drives execution while an external model (Codex/GPT-5.4, GLM, DeepSeek, Kimi, etc.) acts as adversarial reviewer — breaking self-play blind spots that single-model review cannot escape.

由ara.so开发的Skill — 2026每日技能合集

ARIS（Auto-Research-In-Sleep）将Claude Code转变为自主ML研究引擎。它把创意挖掘 → 跨模型审查循环 → 论文撰写 → PDF编译串联成无需人工干预的隔夜工作流。Claude Code负责执行任务，而外部模型（Codex/GPT-5.4、GLM、DeepSeek、Kimi等）则扮演对抗性审查者的角色——打破单模型审查无法避免的自玩盲区。

What It Does

功能介绍

Workflow	Trigger	What Runs
Idea Discovery	`/idea-discovery`	Literature survey → 8–12 ideas → novelty check → pilot GPU runs → ranked report
Auto Review Loop	`/auto-review-loop`	4-round review/fix cycle, score tracked per round (e.g. 5/10 → 7.5/10)
Paper Writing	`/paper-writing`	Narrative → outline → figures → LaTeX → PDF → 2-round auto-improvement
Full Pipeline	`/research-pipeline`	Chains all three end-to-end from a single prompt

工作流	触发指令	运行内容
创意挖掘	`/idea-discovery`	文献调研 → 生成8-12个创意 → 新颖性检查 → GPU试点运行 → 排名报告
自动审查循环	`/auto-review-loop`	4轮审查/修正循环，跟踪每轮得分（例如5/10 → 7.5/10）
论文撰写	`/paper-writing`	叙事构建 → 大纲生成 → 图表制作 → LaTeX编写 → PDF编译 → 2轮自动优化
完整工作流	`/research-pipeline`	从单个提示词出发，串联上述三个流程的端到端工作流

Installation

安装步骤

bash

undefined

bash

undefined

1. Clone and install skills

1. 克隆并安装技能

git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git cp -r Auto-claude-code-research-in-sleep/skills/* ~/.claude/skills/

2. Install Codex MCP (cross-model reviewer)

2. 安装Codex MCP（跨模型审查者）

npm install -g @openai/codex codex setup # set model to gpt-5.4 when prompted claude mcp add codex -s user -- codex mcp-server

npm install -g @openai/codex codex setup # 提示时将模型设置为gpt-5.4 claude mcp add codex -s user -- codex mcp-server

3. Verify MCP is connected

3. 验证MCP连接状态

claude mcp list # should show "codex" in the list

undefined

claude mcp list # 列表中应显示"codex"

undefined

Codex Model Configuration

Codex模型配置

The reviewer model is read from

~/.codex/config.toml

, not from skill files. Edit directly if needed:

toml

undefined

审查者模型的配置读取自

~/.codex/config.toml

，而非技能文件。如有需要可直接编辑：

toml

undefined

~/.codex/config.toml

model = "gpt-5.4" # recommended — most rigorous reviewer

model = "gpt-5.4" # 推荐使用——审查最严谨

model = "gpt-5.3-codex"

model = "gpt-5.2-codex"

model = "o3"

undefined

undefined

Core Workflows

核心工作流

Workflow 1 — Idea Discovery

工作流1 — 创意挖掘

/idea-discovery "factorized gap in discrete diffusion language models"

Be specific — "NLP" produces weak ideas; "factorized gap in discrete diffusion LMs" targets a real research gap.

What runs:

Multi-source literature search (arXiv, Scholar, Zotero, Obsidian, local PDFs)
Claude brainstorms 8–12 candidate ideas
Codex reviewer cross-checks novelty against literature
Pilot GPU experiments on top candidates
Ranked idea report saved to
```
idea_discovery_report.md
```

/idea-discovery "离散扩散语言模型中的分解缺口"

指令需具体——"NLP"会生成质量一般的创意；"离散扩散LM中的分解缺口"则指向真实的研究缺口。

运行内容：

多来源文献检索（arXiv、Scholar、Zotero、Obsidian、本地PDF）
Claude brainstorm生成8-12个候选创意
Codex审查者对照文献交叉检查新颖性
对排名靠前的创意进行GPU试点实验
排名后的创意报告保存至
```
idea_discovery_report.md
```

Workflow 2 — Auto Review Loop

工作流2 — 自动审查循环

/auto-review-loop

Run from a directory containing your paper draft or experiment results.

What runs:

Claude submits current work to Codex reviewer
Codex returns structured critique with score
```
/10
```
Claude implements fixes (experiments, writing, ablations)
Repeat up to 4 rounds or until score threshold met
Score curve saved to
```
docs/auto_review_score_curve.png
```

/auto-review-loop

在包含论文草稿或实验结果的目录中运行该指令。

运行内容：

Claude将当前成果提交给Codex审查者
Codex返回带评分（/10）的结构化评审意见
Claude实施修正（实验、写作、消融实验）
重复最多4轮，或直至达到分数阈值
得分曲线保存至
```
docs/auto_review_score_curve.png
```

Workflow 3 — Paper Writing

工作流3 — 论文撰写

/paper-writing "NARRATIVE_REPORT.md"

Point at a narrative markdown file describing your findings.

What runs:

Outline generation (sections, figures, tables)
Figure generation from experiment results
LaTeX source assembly
```
pdflatex
```
compilation
2-round auto-review-and-improve cycle
Final PDF + anti-hallucination BibTeX (fetched from DBLP/CrossRef)

/paper-writing "NARRATIVE_REPORT.md"

指定一个描述研究发现的叙事markdown文件。

运行内容：

生成大纲（章节、图表、表格）
根据实验结果生成图表
组装LaTeX源码
```
pdflatex
```
编译
2轮自动审查与优化循环
最终PDF+防幻觉BibTeX（从DBLP/CrossRef获取）

Full Pipeline

完整工作流

/research-pipeline "your research direction"

Chains Workflows 1 → 2 → 3 from a single prompt. Wake up to a scored, compiled paper.

/research-pipeline "你的研究方向"

从单个提示词出发，串联工作流1→2→3。一觉醒来即可获得带评分的编译完成的论文。

Inline Configuration Overrides

内联配置覆盖

Append

— key: value

to any command:

/research-pipeline "topic" — AUTO_PROCEED: false
/research-pipeline "topic" — human checkpoint: true
/research-pipeline "topic" — arxiv download: true
/research-pipeline "topic" — AUTO_PROCEED: false, human checkpoint: true

Parameter	Default	Effect
`AUTO_PROCEED`	`true`	`false` = pause at idea selection gate before committing GPU time
`human checkpoint`	`false`	`true` = pause after each review round for manual feedback
`arxiv download`	`false`	`true` = download full PDFs during literature survey (vs metadata only)
`DBLP_BIBTEX`	`true`	`false` = use LLM-generated BibTeX (not recommended — hallucination risk)

在任意命令后追加

— key: value

即可覆盖配置：

/research-pipeline "主题" — AUTO_PROCEED: false
/research-pipeline "主题" — human checkpoint: true
/research-pipeline "主题" — arxiv download: true
/research-pipeline "主题" — AUTO_PROCEED: false, human checkpoint: true

参数	默认值	作用
`AUTO_PROCEED`	`true`	`false` = 在投入GPU资源前，于创意选择节点暂停
`human checkpoint`	`false`	`true` = 每轮审查后暂停，等待人工反馈
`arxiv download`	`false`	`true` = 文献调研时下载完整PDF（而非仅元数据）
`DBLP_BIBTEX`	`true`	`false` = 使用大语言模型生成的BibTeX（不推荐——存在幻觉风险）

Alternative Model Combinations

替代模型组合

No Claude or OpenAI API required — swap any OpenAI-compatible endpoint via the

llm-chat

MCP server:

bash

undefined

无需Claude或OpenAI API——通过

llm-chat

MCP服务器可替换为任意兼容OpenAI的端点：

bash

undefined

Install the bundled llm-chat MCP server

安装捆绑的llm-chat MCP服务器

cd Auto-claude-code-research-in-sleep/mcp-servers/llm-chat pip install -r requirements.txt

Configure your provider

配置你的供应商

export LLM_CHAT_BASE_URL="https://open.bigmodel.cn/api/paas/v4" # GLM-4 export LLM_CHAT_API_KEY="your-key" export LLM_CHAT_MODEL="glm-4-plus"

export LLM_CHAT_BASE_URL="https://open.bigmodel.cn/api/paas/v4" # GLM-4 export LLM_CHAT_API_KEY="你的密钥" export LLM_CHAT_MODEL="glm-4-plus"

Add to Claude Code

添加到Claude Code

claude mcp add llm-chat -s user -- python server.py


**Tested reviewer models:**

| Provider | Model | Notes |
|---|---|---|
| OpenAI | `gpt-5.4` | Recommended — most rigorous |
| Zhipu AI | `glm-4-plus` | Strong Chinese-language papers |
| MiniMax | `abab6.5s-chat` | Fast, cost-effective |
| Moonshot | `moonshot-v1-128k` | Kimi — long-context papers |
| DeepSeek | `deepseek-chat` | Code-heavy experiments |
| 01.ai | `yi-large` | LongCat — long context |

claude mcp add llm-chat -s user -- python server.py


**已测试的审查者模型：**

| 供应商 | 模型 | 说明 |
|---|---|---|
| OpenAI | `gpt-5.4` | 推荐——审查最严谨 |
| 智谱AI | `glm-4-plus` | 擅长中文论文 |
| MiniMax | `abab6.5s-chat` | 速度快、性价比高 |
| Moonshot | `moonshot-v1-128k` | Kimi——长上下文论文 |
| DeepSeek | `deepseek-chat` | 代码密集型实验 |
| 01.ai | `yi-large` | LongCat——长上下文 |

Anti-Hallucination Citations

防幻觉引用

BibTeX is fetched from real databases by default — no manual flag needed:

python

undefined

默认情况下BibTeX从真实数据库获取——无需手动设置：

python

undefined

skills/paper-writing/citation_fetcher.py pattern used internally

内部使用的skills/paper-writing/citation_fetcher.py代码逻辑

import requests

def fetch_bibtex_dblp(title: str) -> str | None: """Fetch real BibTeX from DBLP by paper title.""" resp = requests.get( "https://dblp.org/search/publ/api", params={"q": title, "format": "json", "h": 1} ) hits = resp.json().get("result", {}).get("hits", {}).get("hit", []) if not hits: return None key = hits[0]["info"].get("key", "") bib_resp = requests.get(f"https://dblp.org/rec/{key}.bib") return bib_resp.text if bib_resp.ok else None

def fetch_bibtex_crossref(doi: str) -> str | None: """Fallback: fetch BibTeX from CrossRef by DOI.""" resp = requests.get( f"https://api.crossref.org/works/{doi}/transform/application/x-bibtex" ) return resp.text if resp.ok else None


Disable with `— DBLP_BIBTEX: false` if working fully offline.

import requests

def fetch_bibtex_dblp(title: str) -> str | None: """根据论文标题从DBLP获取真实BibTeX。""" resp = requests.get( "https://dblp.org/search/publ/api", params={"q": title, "format": "json", "h": 1} ) hits = resp.json().get("result", {}).get("hits", {}).get("hit", []) if not hits: return None key = hits[0]["info"].get("key", "") bib_resp = requests.get(f"https://dblp.org/rec/{key}.bib") return bib_resp.text if bib_resp.ok else None

def fetch_bibtex_crossref(doi: str) -> str | None: """备选方案：根据DOI从CrossRef获取BibTeX。""" resp = requests.get( f"https://api.crossref.org/works/{doi}/transform/application/x-bibtex" ) return resp.text if resp.ok else None


若完全离线工作，可通过`— DBLP_BIBTEX: false`禁用该功能。

Optional Integrations

可选集成

Zotero

bash

undefined

bash

undefined

Install Zotero Better BibTeX plugin, then:

安装Zotero Better BibTeX插件，然后：

export ZOTERO_API_KEY="your-zotero-web-api-key" export ZOTERO_LIBRARY_ID="your-library-id" export ZOTERO_LIBRARY_TYPE="user" # or "group"


Literature search will query your Zotero library before hitting arXiv.

export ZOTERO_API_KEY="你的Zotero Web API密钥" export ZOTERO_LIBRARY_ID="你的图书馆ID" export ZOTERO_LIBRARY_TYPE="user" # 或"group"


文献调研时会先查询你的Zotero图书馆，再访问arXiv。

Obsidian

bash

export OBSIDIAN_VAULT_PATH="/path/to/your/vault"

Skill will search markdown notes in the vault for related work before external queries.

bash

export OBSIDIAN_VAULT_PATH="/你的知识库路径"

技能会先搜索知识库中的markdown笔记，再进行外部查询。

Feishu / Lark Notifications

飞书/ Lark通知

bash

export FEISHU_WEBHOOK_URL="https://open.feishu.cn/open-apis/bot/v2/hook/your-token"
export FEISHU_MODE="push"   # off | push | interactive

Mode	Behaviour
`off`	No notifications
`push`	One-way alerts: review scores, experiment completions, checkpoints
`interactive`	Mobile approval buttons at `AUTO_PROCEED: false` gates

bash

export FEISHU_WEBHOOK_URL="https://open.feishu.cn/open-apis/bot/v2/hook/你的令牌"
export FEISHU_MODE="push"   # off | push | interactive

模式	行为
`off`	无通知
`push`	单向提醒：审查分数、实验完成、检查点
`interactive`	在 `AUTO_PROCEED: false` 节点显示移动端审批按钮

Directory Layout After a Pipeline Run

工作流运行后的目录结构

your-project/
├── idea_discovery_report.md       # ranked ideas with novelty scores
├── NARRATIVE_REPORT.md            # auto-generated findings narrative
├── paper/
│   ├── main.tex                   # assembled LaTeX
│   ├── main.pdf                   # compiled output
│   ├── figures/                   # auto-generated plots
│   └── references.bib             # real BibTeX from DBLP/CrossRef
├── experiments/
│   ├── pilot_runs/                # idea-discovery GPU pilots
│   └── review_round_*/            # per-round experiment results
└── docs/
    └── auto_review_score_curve.png

your-project/
├── idea_discovery_report.md       # 带新颖性评分的创意排名报告
├── NARRATIVE_REPORT.md            # 自动生成的研究发现叙事
├── paper/
│   ├── main.tex                   # 组装好的LaTeX文件
│   ├── main.pdf                   # 编译后的输出文件
│   ├── figures/                   # 自动生成的图表
│   └── references.bib             # 从DBLP/CrossRef获取的真实BibTeX
├── experiments/
│   ├── pilot_runs/                # 创意挖掘阶段的GPU试点实验
│   └── review_round_*/            # 每轮审查的实验结果
└── docs/
    └── auto_review_score_curve.png

Python Integration Pattern

Python集成方式

Trigger ARIS workflows programmatically from a Python script (e.g. a cron job or CI step):

python

import subprocess
import json
from pathlib import Path

def run_aris_pipeline(
    research_direction: str,
    output_dir: str = ".",
    auto_proceed: bool = True,
    human_checkpoint: bool = False,
    arxiv_download: bool = False,
) -> dict:
    """
    Launch ARIS full pipeline via Claude Code CLI.
    Returns parsed score progression from the review curve JSON.
    """
    overrides = ", ".join([
        f"AUTO_PROCEED: {str(auto_proceed).lower()}",
        f"human checkpoint: {str(human_checkpoint).lower()}",
        f"arxiv download: {str(arxiv_download).lower()}",
    ])

    command = f'/research-pipeline "{research_direction}" — {overrides}'

    result = subprocess.run(
        ["claude", "--print", command],
        cwd=output_dir,
        capture_output=True,
        text=True,
    )

    if result.returncode != 0:
        raise RuntimeError(f"ARIS pipeline failed:\n{result.stderr}")

    # Parse score progression if available
    score_json = Path(output_dir) / "docs" / "review_scores.json"
    if score_json.exists():
        return json.loads(score_json.read_text())

    return {"stdout": result.stdout}

通过Python脚本（如定时任务或CI步骤）以编程方式触发ARIS工作流：

python

import subprocess
import json
from pathlib import Path

def run_aris_pipeline(
    research_direction: str,
    output_dir: str = ".",
    auto_proceed: bool = True,
    human_checkpoint: bool = False,
    arxiv_download: bool = False,
) -> dict:
    """
    通过Claude Code CLI启动ARIS完整工作流。
    返回审查曲线JSON中的解析后得分变化。
    """
    overrides = ", ".join([
        f"AUTO_PROCEED: {str(auto_proceed).lower()}",
        f"human checkpoint: {str(human_checkpoint).lower()}",
        f"arxiv download: {str(arxiv_download).lower()}",
    ])

    command = f'/research-pipeline "{research_direction}" — {overrides}'

    result = subprocess.run(
        ["claude", "--print", command],
        cwd=output_dir,
        capture_output=True,
        text=True,
    )

    if result.returncode != 0:
        raise RuntimeError(f"ARIS工作流失败:\n{result.stderr}")

    # 若存在则解析得分变化
    score_json = Path(output_dir) / "docs" / "review_scores.json"
    if score_json.exists():
        return json.loads(score_json.read_text())

    return {"stdout": result.stdout}

Example: nightly research job

示例：夜间研究任务

if name == "main": scores = run_aris_pipeline( research_direction="token-level uncertainty calibration in autoregressive LMs", output_dir="./nightly_research", auto_proceed=True, human_checkpoint=False, ) print(f"Final review score: {scores.get('rounds', [{}])[-1].get('score')}/10")

undefined

if name == "main": scores = run_aris_pipeline( research_direction="自回归语言模型中的token级不确定性校准", output_dir="./nightly_research", auto_proceed=True, human_checkpoint=False, ) print(f"最终审查得分: {scores.get('rounds', [{}])[-1].get('score')}/10")

undefined

Skill Composition

Skill组合

ARIS ships 20 composable sub-skills. Chain them manually for custom workflows:

undefined

ARIS内置20个可组合的子Skill。可手动串联它们以构建自定义工作流：

undefined

Literature only

仅文献调研

/literature-survey "topic"

/literature-survey "主题"

Brainstorm without pilot experiments

仅brainstorm创意，不做试点实验

/idea-brainstorm "topic" — pilot experiments: false

/idea-brainstorm "主题" — pilot experiments: false

Single review round (no loop)

单轮审查（无循环）

/single-review "path/to/draft.md"

Proof-writing (community skill)

证明撰写（社区Skill）

/proof-writer "theorem statement"

/proof-writer "定理陈述"

Write paper from existing narrative, skip review

基于现有叙事撰写论文，跳过审查

/paper-writing "NARRATIVE.md" — auto-review: false

undefined

/paper-writing "NARRATIVE.md" — auto-review: false

undefined

Troubleshooting

故障排除

Codex MCP not found

bash

claude mcp list                          # verify "codex" appears
codex setup                              # re-run setup if missing
claude mcp remove codex && \
  claude mcp add codex -s user -- codex mcp-server   # re-add

Skills not loading in Claude Code

bash

ls ~/.claude/skills/                     # verify files copied

Codex MCP未找到

bash

claude mcp list                          # 验证"codex"是否存在
codex setup                              # 若缺失则重新运行设置
claude mcp remove codex && \
  claude mcp add codex -s user -- codex mcp-server   # 重新添加

Claude Code中Skill未加载

bash

ls ~/.claude/skills/                     # 验证文件已复制

Each skill must be a directory with SKILL.md inside

每个Skill必须是包含SKILL.md的目录

ls ~/.claude/skills/auto-review-loop/SKILL.md


**`pdflatex` not found during paper writing**
```bash

ls ~/.claude/skills/auto-review-loop/SKILL.md


**论文撰写时提示`pdflatex`未找到**
```bash

macOS

brew install --cask mactex-no-gui

Ubuntu/Debian

sudo apt install texlive-full

Then retry — skill auto-detects pdflatex on PATH

然后重试——Skill会自动检测PATH中的pdflatex


**Reviewer returns empty critique**
Check `~/.codex/config.toml` — ensure `model` is set and your API key is valid:
```bash
codex "say hello"    # quick smoke test outside Claude Code

GLM/DeepSeek reviewer not triggering Verify

llm-chat

MCP server is listed:

bash

claude mcp list      # should show "llm-chat"
echo $LLM_CHAT_BASE_URL   # must be set in the shell that launches claude

Score not improving after 4 rounds

Add

— human checkpoint: true

and inspect each round's critique file in

experiments/review_round_*/

Consider switching reviewer model — a different architecture surfaces different weaknesses
Lower-level issues (bad data, flawed baseline) need manual intervention before another loop


**审查者返回空评审意见**
检查`~/.codex/config.toml`——确保`model`已设置且API密钥有效：
```bash
codex "say hello"    # 在Claude Code外进行快速测试

GLM/DeepSeek审查者未触发 验证

llm-chat

MCP服务器已在列表中：

bash

claude mcp list      # 应显示"llm-chat"
echo $LLM_CHAT_BASE_URL   # 启动claude的shell中必须已设置该环境变量

4轮后得分仍未提升

添加

— human checkpoint: true

，检查

experiments/review_round_*/

中每轮的评审意见文件

考虑更换审查者模型——不同架构会发现不同的问题
底层问题（数据质量差、基线有缺陷）需要人工干预后再进行循环

Community Skills

社区Skills

Skill	Description
`proof-writer`	Rigorous theorem proof drafting with anti-hallucination citations

Add your own skill: create

skills/your-skill-name/SKILL.md

and open a PR.

Skill	描述
`proof-writer`	带防幻觉引用的严谨定理证明撰写

添加自定义Skill：创建

skills/your-skill-name/SKILL.md

并提交PR。

Cross-Model Review — Why It Works

跨模型审查——为何有效

Claude Code (executor)          Codex / external LLM (reviewer)
─────────────────────          ───────────────────────────────
Fast, fluid code execution  ←→  Deliberate, rigorous critique
Broad context retention         Adversarial probing of blind spots
Narrative generation            Structural weakness detection

Single-model self-review falls into local minima — the same pattern-matching that generated the work also evaluates it. Cross-model review is adversarial: the reviewer actively probes weaknesses the executor didn't anticipate. The 1→2 model jump produces the largest quality gain; adding more reviewers yields diminishing returns.

Claude Code（执行者）          Codex / 外部大语言模型（审查者）
─────────────────────          ───────────────────────────────
快速流畅的代码执行  ←→  审慎严谨的评审意见
广泛的上下文保留能力         对抗性探测盲区
叙事生成能力            结构缺陷检测

单模型自审查会陷入局部最优——生成成果的模式匹配逻辑同样用于评估成果。跨模型审查是对抗性的：审查者会主动探测执行者未预料到的问题。从1个模型增加到2个模型时质量提升最显著；添加更多审查者的收益会递减。