aris-autonomous-ml-research

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ARIS Autonomous ML Research

ARIS 自动化机器学习研究

Skill by ara.so — Claude Code Skills collection.
ARIS (Auto-Research-In-Sleep) is a lightweight Markdown-based system for autonomous ML research that orchestrates cross-model collaboration. It enables AI agents to discover ideas, review papers, run experiments, and write rebuttals — all autonomously. Works with Claude Code, Codex CLI, Cursor, Trae, Antigravity, or any LLM agent.
ara.so开发的Skill——Claude Code Skills合集。
ARIS(Auto-Research-In-Sleep)是一个基于Markdown的轻量级自动化机器学习研究系统,可协调跨模型协作。它能让AI agent自主完成创意发现、论文评审、实验运行和反驳信撰写等全部工作。支持Claude Code、Codex CLI、Cursor、Trae、Antigravity或任意LLM agent。

What ARIS Does

ARIS 功能介绍

  • Idea Generation: Automatically discovers research ideas from arXiv papers, GitHub repos, or research directions
  • Cross-Model Review: Uses different LLMs for execution vs. review to break self-play blind spots (e.g., Claude Code executes, GPT-5.4 reviews)
  • Experiment Automation: Clones codebases, runs experiments, analyzes results
  • Paper Writing: Generates drafts with auto-review loops to improve quality
  • Rebuttal Generation: Parses reviews, builds strategy, drafts rebuttals under character limits
  • Research Wiki: Persistent knowledge base tracking papers, ideas, experiments, and claims
  • 创意生成:自动从arXiv论文、GitHub仓库或研究方向中发掘研究创意
  • 跨模型评审:使用不同LLM分别执行任务和进行评审,打破自循环盲区(例如Claude Code执行,GPT-5.4评审)
  • 实验自动化:克隆代码库、运行实验、分析结果
  • 论文撰写:生成草稿并通过自动评审循环提升质量
  • 反驳信生成:解析评审意见、制定策略、在字符限制内撰写反驳信
  • 研究维基:持久化知识库,跟踪论文、创意、实验和研究结论

Installation

安装方法

As Claude Code Skills (In-Editor)

作为Claude Code Skills(编辑器内使用)

bash
undefined
bash
undefined

Clone skills into your Claude Code skills directory

将skills克隆到你的Claude Code技能目录

cd ~/claude-code-skills # or your skills path git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git aris cd aris/skills
cd ~/claude-code-skills # 或你的技能路径 git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git aris cd aris/skills

Skills are now available in Claude Code

现在即可在Claude Code中使用这些技能

undefined
undefined

As Standalone CLI (ARIS-Code)

作为独立CLI工具(ARIS-Code)

bash
undefined
bash
undefined

Download latest release

下载最新版本

Or install from source

或从源码安装

git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git cd Auto-claude-code-research-in-sleep/aris-code cargo build --release
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git cd Auto-claude-code-research-in-sleep/aris-code cargo build --release

Run setup

运行初始化设置

./target/release/aris-code
./target/release/aris-code

Follow interactive setup to configure API keys

按照交互式指引配置API密钥

undefined
undefined

Environment Setup

环境配置

ARIS requires API keys for LLM providers:
bash
undefined
ARIS需要LLM提供商的API密钥:
bash
undefined

For Claude (primary executor)

用于Claude(主要执行器)

export ANTHROPIC_API_KEY=your_key_here
export ANTHROPIC_API_KEY=your_key_here

For OpenAI (reviewer/alternative)

用于OpenAI(评审器/备选)

export OPENAI_API_KEY=your_key_here
export OPENAI_API_KEY=your_key_here

For alternative Chinese models (optional)

用于其他中文模型(可选)

export MOONSHOT_API_KEY=your_key_here # Kimi export MINIMAX_API_KEY=your_key_here # MiniMax export GLM_API_KEY=your_key_here # GLM export DEEPSEEK_API_KEY=your_key_here # DeepSeek export DOUBAO_API_KEY=your_key_here # Doubao
undefined
export MOONSHOT_API_KEY=your_key_here # Kimi export MINIMAX_API_KEY=your_key_here # MiniMax export GLM_API_KEY=your_key_here # GLM export DEEPSEEK_API_KEY=your_key_here # DeepSeek export DOUBAO_API_KEY=your_key_here # Doubao
undefined

Key Commands

核心命令

Standalone CLI Commands

独立CLI命令

bash
undefined
bash
undefined

Interactive setup

交互式初始化设置

/setup
/setup

Run full research pipeline

运行完整研究流程

/research-pipeline "your research direction"
/research-pipeline "你的研究方向"

Run with reference paper and base repo

结合参考论文和基础仓库运行

/research-pipeline "improve method X" -- ref paper: https://arxiv.org/abs/2406.04329, base repo: https://github.com/org/project
/research-pipeline "改进方法X" -- ref paper: https://arxiv.org/abs/2406.04329, base repo: https://github.com/org/project

Generate rebuttal from reviews

根据评审意见生成反驳信

/rebuttal "paper/ + reviews" -- venue: ICML, character limit: 5000
/rebuttal "paper/ + reviews" -- venue: ICML, character limit: 5000

Plan mode (step-by-step breakdown)

规划模式(分步拆解任务)

/plan "implement transformer variant"
/plan "实现Transformer变体"

Research Wiki operations

研究维基操作

/wiki add paper <arxiv_url> /wiki add idea "your idea description" /wiki query "search term" /wiki export
/wiki add paper <arxiv_url> /wiki add idea "你的创意描述" /wiki query "搜索关键词" /wiki export

Meta-optimization (self-improvement)

元优化(自我改进)

/meta-optimize
/meta-optimize

Task management

任务管理

/tasks /tasks add "task description" /tasks complete <id>
/tasks /tasks add "任务描述" /tasks complete <id>

Help and info

帮助与信息

/help /models # List available models
undefined
/help /models # 列出可用模型
undefined

Claude Code Skill Commands

Claude Code Skill命令

When using ARIS as Claude Code skills, trigger workflows with natural language:
"Use ARIS to review this paper: https://arxiv.org/abs/2406.04329"
"Generate research ideas about discrete diffusion models"
"Run the research pipeline for improving attention mechanisms"
"Help me write a rebuttal for these ICML reviews"
当将ARIS作为Claude Code skills使用时,可通过自然语言触发工作流:
"使用ARIS评审这篇论文:https://arxiv.org/abs/2406.04329"
"生成关于离散扩散模型的研究创意"
"运行改进注意力机制的研究流程"
"帮我为这些ICML评审意见撰写反驳信"

Core Workflows

核心工作流

1. Full Research Pipeline

1. 完整研究流程

End-to-end autonomous research:
bash
/research-pipeline "factorized gap in discrete diffusion LMs"
What happens:
  1. Discovers related papers from arXiv
  2. Generates novel research ideas
  3. Reviews ideas with external LLM (cross-model)
  4. Runs experiments on top ideas
  5. Writes paper draft
  6. Auto-reviews and improves draft
  7. Outputs final paper + code
With reference paper + codebase:
bash
/research-pipeline "improve attention efficiency" -- ref paper: https://arxiv.org/abs/2305.xxxx, base repo: https://github.com/org/attention-impl
ARIS reads the paper → finds weaknesses → uses that specific codebase → generates targeted improvements.
端到端自动化研究:
bash
/research-pipeline "离散扩散LM中的分解间隙"
执行流程:
  1. 从arXiv发现相关论文
  2. 生成新颖研究创意
  3. 通过外部LLM进行跨模型评审
  4. 针对优质创意运行实验
  5. 撰写论文草稿
  6. 自动评审并优化草稿
  7. 输出最终论文+代码
结合参考论文+代码库:
bash
/research-pipeline "提升注意力效率" -- ref paper: https://arxiv.org/abs/2305.xxxx, base repo: https://github.com/org/attention-impl
ARIS会读取论文→找出不足→使用指定代码库→生成针对性改进方案。

2. Targeted Idea Discovery

2. 定向创意发掘

Generate ideas from specific sources:
python
undefined
从特定来源生成创意:
python
undefined

In Claude Code, reference the skill

在Claude Code中引用该skill

""" Use Workflow 1: DiscoverPaper skill to find ideas from:

**ARIS will:**
- Parse papers/code
- Extract key insights
- Generate 5-10 novel ideas
- Route to external reviewer
- Return scored, critiqued ideas
""" 使用Workflow 1: DiscoverPaper skill从以下来源发掘创意:

**ARIS会:**
- 解析论文/代码
- 提取关键见解
- 生成5-10个新颖创意
- 发送给外部评审器
- 返回带有评分和评审意见的创意

3. Paper Review Loop

3. 论文评审循环

Multi-round review with automated improvements:
bash
undefined
多轮评审并自动优化:
bash
undefined

Standalone CLI

独立CLI

/review-paper paper_draft.md --rounds 3
/review-paper paper_draft.md --rounds 3

In Claude Code

在Claude Code中

"Review this draft and improve it through 3 rounds: ./draft.md"

**Review process:**
1. External LLM critiques (e.g., GPT-5.4)
2. Claude Code addresses weaknesses
3. Repeat until score plateau or max rounds
4. Final output with score progression graph
"评审这份草稿并通过3轮优化:./draft.md"

**评审流程:**
1. 外部LLM给出评审意见(例如GPT-5.4)
2. Claude Code针对不足进行改进
3. 重复直到评分稳定或达到最大轮次
4. 输出包含评分变化图的最终版本

4. Rebuttal Generation

4. 反驳信生成

Parse reviews and draft rebuttal:
bash
/rebuttal "paper_dir/" -- venue: ICML, character limit: 5000
Phases:
  1. Parse reviews: Extract all reviewer concerns
  2. Build strategy: Map concerns → responses
  3. Draft rebuttal: Generate structured response
  4. Format check: Ensure under character limit
Quick mode (stop before drafting):
bash
/rebuttal "paper_dir/" -- venue: NeurIPS, character limit: 8000, quick mode: true
解析评审意见并撰写反驳信:
bash
/rebuttal "paper_dir/" -- venue: ICML, character limit: 5000
执行阶段:
  1. 解析评审意见:提取所有评审关注点
  2. 制定策略:映射关注点到对应回复
  3. 撰写反驳信:生成结构化回复
  4. 格式检查:确保符合字符限制
快速模式(停止在撰写前):
bash
/rebuttal "paper_dir/" -- venue: NeurIPS, character limit: 8000, quick mode: true

Configuration

配置说明

Model Selection

模型选择

ARIS supports multiple executor + reviewer combinations:
In standalone CLI:
bash
undefined
ARIS支持多种执行器+评审器组合:
在独立CLI中:
bash
undefined

Interactive setup

交互式初始化设置

/setup
/setup

Choose:

选择:

1. Claude (Anthropic)

1. Claude (Anthropic)

2. OpenAI (GPT-4/5)

2. OpenAI (GPT-4/5)

3. Kimi (Moonshot)

3. Kimi (Moonshot)

4. MiniMax

4. MiniMax

5. GLM (Zhipu)

5. GLM (智谱)

6. DeepSeek

6. DeepSeek

7. Doubao

7. Doubao

8. LM Studio (local)

8. LM Studio (本地)

... and more

... 以及更多


**In skill files (YAML frontmatter):**

```yaml
---
executor: claude-opus-4.7  # Primary LLM
reviewer: gpt-5.5          # Review LLM
---
Available executors:
  • claude-opus-4.7
    ,
    claude-sonnet-4.5
  • gpt-5.5
    ,
    gpt-5.4
    ,
    o1
    ,
    o3
    ,
    o4
  • kimi-k2.5
    ,
    kimi-k3
  • minimax-m2.7
    ,
    minimax-pro
  • glm-5
    ,
    glm-5-plus
  • deepseek-v3
    ,
    deepseek-r1
  • doubao-lite
    ,
    doubao-pro
Reviewer routing:
yaml
reviewer: oracle-pro  # GPT-5.4 Pro via Oracle MCP (strongest)
reviewer: gpt-5.4     # GPT-5.4 standard
reviewer: claude-opus # Claude for review
reviewer: auto        # Smart routing based on executor

**在skill文件中(YAML前置内容):**

```yaml
---
executor: claude-opus-4.7  # 主LLM
reviewer: gpt-5.5          # 评审LLM
---
可用执行器:
  • claude-opus-4.7
    ,
    claude-sonnet-4.5
  • gpt-5.5
    ,
    gpt-5.4
    ,
    o1
    ,
    o3
    ,
    o4
  • kimi-k2.5
    ,
    kimi-k3
  • minimax-m2.7
    ,
    minimax-pro
  • glm-5
    ,
    glm-5-plus
  • deepseek-v3
    ,
    deepseek-r1
  • doubao-lite
    ,
    doubao-pro
评审器路由:
yaml
reviewer: oracle-pro  # 通过Oracle MCP使用GPT-5.4 Pro(最强版本)
reviewer: gpt-5.4     # 标准GPT-5.4
reviewer: claude-opus # 使用Claude进行评审
reviewer: auto        # 根据执行器智能路由

Research Wiki Configuration

研究维基配置

Enable persistent memory across sessions:
bash
undefined
启用跨会话持久化记忆:
bash
undefined

In CLI

在CLI中

/setup
/setup

Enable "Research Wiki" option

启用"Research Wiki"选项

Or set in config

或在配置文件中设置

wiki: enabled: true path: ~/.aris/wiki/ auto_commit: true # Git commit after each change

**Wiki structure:**
~/.aris/wiki/ ├── papers/ # Tracked papers ├── ideas/ # Research ideas ├── experiments/ # Experiment results ├── claims/ # Key claims and evidence └── graph.json # Relationship graph
undefined
wiki: enabled: true path: ~/.aris/wiki/ auto_commit: true # 每次修改后自动Git提交

**维基结构:**
~/.aris/wiki/ ├── papers/ # 跟踪的论文 ├── ideas/ # 研究创意 ├── experiments/ # 实验结果 ├── claims/ # 关键结论与证据 └── graph.json # 关系图谱
undefined

Proxy and Custom Endpoints

代理与自定义端点

HTTP/HTTPS proxy:
bash
export HTTP_PROXY=http://localhost:8080
export HTTPS_PROXY=http://localhost:8080
HTTP/HTTPS代理:
bash
export HTTP_PROXY=http://localhost:8080
export HTTPS_PROXY=http://localhost:8080

Or in /setup

或在/setup中设置

Select provider → Configure proxy URL

选择提供商→配置代理URL


**Custom API endpoints:**

```bash

**自定义API端点:**

```bash

For Anthropic-compatible proxies (Bedrock, etc.)

用于Anthropic兼容代理(如Bedrock等)

/setup
/setup

Provider: Anthropic

提供商:Anthropic


**Local models (LM Studio/Ollama):**

```bash
/setup

**本地模型(LM Studio/Ollama):**

```bash
/setup

Select "LM Studio / Ollama (Local)"

选择"LM Studio / Ollama (Local)"

Model: local-model-name

模型:local-model-name

undefined
undefined

Code Examples

代码示例

Example 1: Idea Discovery from Paper

示例1:从论文中发掘创意

python
undefined
python
undefined

skill: workflow-1-discover-paper.md

skill: workflow-1-discover-paper.md

""" I want to discover ideas from this paper: https://arxiv.org/abs/2305.14342 (Transformer-XL)
Focus on: attention mechanism improvements """
""" 我想从这篇论文中发掘创意: https://arxiv.org/abs/2305.14342 (Transformer-XL)
重点关注:注意力机制改进 """

ARIS will:

ARIS会:

1. Download and parse paper

1. 下载并解析论文

2. Extract key insights about XL attention

2. 提取关于XL注意力的关键见解

3. Generate 5-10 novel ideas

3. 生成5-10个新颖创意

4. Send to GPT-5.4 for review

4. 发送给GPT-5.4进行评审

5. Return scored ideas with critiques

5. 返回带有评分和评审意见的创意


**Output structure:**

```markdown

**输出结构:**

```markdown

Discovered Ideas (Reviewed)

发掘的创意(已评审)

Idea 1: Factorized Relative Position Embeddings (Score: 8.5/10)

创意1:分解式相对位置嵌入(评分:8.5/10)

Core insight: XL uses dense relative position matrix — factorize it.
Reviewer critique (GPT-5.4):
  • ✅ Novelty: High (not explored in XL paper)
  • ✅ Feasibility: Doable (standard tensor decomposition)
  • ⚠️ Impact: Need to verify on long sequences
  • ⚠️ Risk: May hurt performance if rank too low
Next steps: Implement SVD-based factorization, benchmark on PG-19

核心见解: XL使用密集相对位置矩阵——对其进行分解。
评审意见(GPT-5.4):
  • ✅ 新颖性:高(XL论文中未探索)
  • ✅ 可行性:可实现(标准张量分解)
  • ⚠️ 影响:需在长序列上验证
  • ⚠️ 风险:若秩过低可能影响性能
下一步: 实现基于SVD的分解,在PG-19上进行基准测试

Idea 2: Learnable Decay for Relative Attention (Score: 7.2/10)

创意2:相对注意力的可学习衰减(评分:7.2/10)

...
undefined
...
undefined

Example 2: Experiment Automation

示例2:实验自动化

python
undefined
python
undefined

skill: workflow-2-run-experiment.md

skill: workflow-2-run-experiment.md

""" Clone https://github.com/kimiyoung/transformer-xl Implement Idea 1 (factorized position embeddings) Run on enwik8 benchmark Compare with baseline """
""" 克隆https://github.com/kimiyoung/transformer-xl 实现创意1(分解式位置嵌入) 在enwik8基准上运行 与基线对比 """

ARIS will:

ARIS会:

1. Clone repo

1. 克隆仓库

2. Create experiment branch

2. 创建实验分支

3. Modify model code (e.g., pytorch_modules/rel_multihead_attn.py)

3. 修改模型代码(如pytorch_modules/rel_multihead_attn.py)

4. Set up training config

4. 设置训练配置

5. Run experiment

5. 运行实验

6. Parse results

6. 解析结果

7. Generate comparison report

7. 生成对比报告


**Generated experiment code:**

```python

**生成的实验代码:**

```python

aris_experiments/factorized_rel_pos/model_patch.py

aris_experiments/factorized_rel_pos/model_patch.py

import torch import torch.nn as nn
class FactorizedRelativeAttention(nn.Module): def init(self, d_model, n_heads, rank=64): super().init() self.d_model = d_model self.n_heads = n_heads self.rank = rank
    # Factorized position embeddings: (seq_len, d_model) ≈ U @ V^T
    self.U = nn.Parameter(torch.randn(2048, rank))  # max_seq_len
    self.V = nn.Parameter(torch.randn(d_model, rank))

def forward(self, q, k, v, pos_emb):
    # Compute relative position on-the-fly
    rel_pos = self.U @ self.V.t()  # (2048, d_model)
    # ... rest of attention logic

**Experiment results:**

```markdown
import torch import torch.nn as nn
class FactorizedRelativeAttention(nn.Module): def init(self, d_model, n_heads, rank=64): super().init() self.d_model = d_model self.n_heads = n_heads self.rank = rank
    # 分解式位置嵌入:(seq_len, d_model) ≈ U @ V^T
    self.U = nn.Parameter(torch.randn(2048, rank))  # max_seq_len
    self.V = nn.Parameter(torch.randn(d_model, rank))

def forward(self, q, k, v, pos_emb):
    # 动态计算相对位置
    rel_pos = self.U @ self.V.t()  # (2048, d_model)
    # ... 其余注意力逻辑

**实验结果:**

```markdown

Experiment Results: Factorized Relative Position

实验结果:分解式相对位置

MetricBaseline (XL)Ours (Factorized)Δ
PPL (enwik8)1.061.08+0.02 ↓
Speed (tok/s)12.3k18.7k+52% ↑
Memory (GB)11.27.8-30% ↑
Params (M)277261-5.8% ↑
Success: 52% faster, 30% less memory, minor PPL degradation acceptable for long-context tasks.
undefined
指标基线(XL)我们的方案(分解式)变化
PPL (enwik8)1.061.08+0.02 ↓
速度 (tok/s)12.3k18.7k+52% ↑
内存 (GB)11.27.8-30% ↑
参数 (M)277261-5.8% ↑
成功:速度提升52%,内存减少30%,PPL小幅下降在长上下文任务中可接受。
undefined

Example 3: Paper Writing with Auto-Review

示例3:带自动评审的论文撰写

python
undefined
python
undefined

skill: workflow-3-write-paper.md

skill: workflow-3-write-paper.md

""" Write a paper about the factorized position embeddings experiment.
Title: "Efficient Transformers via Factorized Relative Attention" Target venue: ICML 2026 Sections: Abstract, Introduction, Method, Experiments, Conclusion """
""" 撰写一篇关于分解式位置嵌入实验的论文。
标题:"Efficient Transformers via Factorized Relative Attention" 目标会议:ICML 2026 章节:摘要、引言、方法、实验、结论 """

ARIS will:

ARIS会:

1. Generate initial draft

1. 生成初始草稿

2. Send to GPT-5.4 for review

2. 发送给GPT-5.4进行评审

3. Address critiques (e.g., "add ablation study")

3. 针对评审意见进行改进(如"添加消融研究")

4. Re-review

4. 重新评审

5. Repeat for N rounds or until score plateau

5. 重复N轮或直到评分稳定


**Review loop:**

```markdown

**评审循环:**

```markdown

Round 1 Review (GPT-5.4, Score: 6.5/10)

第一轮评审(GPT-5.4,评分:6.5/10)

Strengths:
  • Clear motivation (long-context efficiency)
  • Solid experimental results
Weaknesses:
  • Missing ablation on rank hyperparameter
  • No comparison with Linear Attention baseline
  • Introduction lacks related work on efficient Transformers
Suggestions:
  1. Add Table 2: Rank ablation (r=16,32,64,128)
  2. Cite Performer, Linformer in related work
  3. Add wall-clock time comparison

优势:
  • 动机明确(长上下文效率)
  • 实验结果扎实
不足:
  • 缺少秩超参数的消融实验
  • 未与Linear Attention基线对比
  • 引言部分缺少高效Transformer的相关工作
建议:
  1. 添加表2:秩消融实验(r=16,32,64,128)
  2. 在相关工作中引用Performer、Linformer
  3. 添加时钟时间对比

Round 2 Review (GPT-5.4, Score: 8.1/10)

第二轮评审(GPT-5.4,评分:8.1/10)

Improvements: ✅ Added rank ablation (Table 2) ✅ Expanded related work ✅ Wall-clock benchmarks included
Remaining issues:
  • Figure 3 caption unclear
  • Conclusion should mention future work

改进点: ✅ 添加了秩消融实验(表2) ✅ 扩展了相关工作 ✅ 包含了时钟基准测试
剩余问题:
  • 图3的标题不清晰
  • 结论应提及未来工作

Round 3 Review (GPT-5.4, Score: 8.8/10)

第三轮评审(GPT-5.4,评分:8.8/10)

Near-ready: Minor edits only. Ready for submission.
undefined
接近就绪: 仅需少量编辑。可提交。
undefined

Example 4: Rebuttal Generation

示例4:反驳信生成

bash
undefined
bash
undefined

reviews.txt contains reviewer comments

reviews.txt包含评审意见

paper/ directory has the submitted paper

paper/目录包含提交的论文

/rebuttal "paper/ reviews.txt" -- venue: ICML, character limit: 5000

**Generated rebuttal structure:**

```markdown
/rebuttal "paper/ reviews.txt" -- venue: ICML, character limit: 5000

**生成的反驳信结构:**

```markdown

Rebuttal to ICML Reviews

ICML评审反驳信

Character count: 4847 / 5000
字符数:4847 / 5000

Response to Reviewer 1 (Score: 6 → ?)

回复评审人1(评分:6 → ?)

Q1: "Rank ablation missing — how does r affect performance?"
We thank the reviewer for this suggestion. We have added Table 2 (Appendix) showing rank ablation (r=16,32,64,128). Key findings: r=64 is optimal (PPL 1.08, speed +52%), r=128 matches baseline quality (PPL 1.06) but slower, r=32 degrades PPL to 1.15. We will include this in the camera-ready.
Q2: "No comparison with Linear Attention methods."
Valid point. We added Performer and FNet baselines (Table 3). Our method outperforms Performer by 0.04 PPL while being 20% faster due to factorization locality. Updated draft attached.

问题1:"缺少秩消融实验——r如何影响性能?"
感谢评审人的建议。我们已添加表2(附录)展示秩消融实验(r=16,32,64,128)。关键发现:r=64为最优(PPL 1.08,速度提升52%),r=128匹配基线质量(PPL 1.06)但速度较慢,r=32会导致PPL下降至1.15。我们会将其包含在最终版本中。
问题2:"未与Linear Attention方法对比。"
这是合理的观点。我们添加了Performer和FNet基线(表3)。我们的方法在PPL上优于Performer 0.04,同时因分解的局部性速度提升20%。已附上更新后的草稿。

Response to Reviewer 2 (Score: 7 → ?)

回复评审人2(评分:7 → ?)

Q1: "Scalability to 100k+ sequences unclear."
We have run additional experiments on PG-19 (100k context). Results: our method maintains +45% speedup with PPL degradation <0.05. Memory scales O(r·L) vs O(L²) for standard XL. Will add to Section 4.3.

Common concern (R1, R3): "Related work incomplete."
We have expanded Section 2 to cite Linformer, Performer, Longformer, and BigBird. Table 1 now includes complexity comparison.

Summary: We address all major concerns with new experiments and expanded analysis. We believe these changes strengthen the paper significantly.
undefined
问题1:"对100k+序列的扩展性不明确。"
我们已在PG-19(100k上下文)上运行额外实验。结果:我们的方法保持45%的速度提升,PPL下降<0.05。内存复杂度为O(r·L),而标准XL为O(L²)。将添加到4.3节。

共同关注点(评审人1、3):"相关工作不完整。"
我们已扩展第2节,引用Linformer、Performer、Longformer和BigBird。表1现在包含复杂度对比。

总结: 我们通过新实验和扩展分析解决了所有主要问题。我们相信这些修改显著增强了论文的质量。
undefined

Common Patterns

常见模式

Pattern 1: Iterative Improvement Loop

模式1:迭代改进循环

python
undefined
python
undefined

Discover → Review → Refine → Repeat

发掘 → 评审 → 优化 → 重复

ideas = discover_papers("diffusion models") reviewed_ideas = cross_model_review(ideas) top_idea = select_highest_score(reviewed_ideas) results = run_experiment(top_idea) paper = write_paper(results) final_paper = auto_review_loop(paper, rounds=3)
undefined
ideas = discover_papers("diffusion models") reviewed_ideas = cross_model_review(ideas) top_idea = select_highest_score(reviewed_ideas) results = run_experiment(top_idea) paper = write_paper(results) final_paper = auto_review_loop(paper, rounds=3)
undefined

Pattern 2: Multi-Source Idea Generation

模式2:多源创意生成

python
undefined
python
undefined

Combine arXiv + GitHub + local papers

结合arXiv + GitHub + 本地论文

sources = [ "arxiv:diffusion+models", "github:CompVis/stable-diffusion", "local:./papers/ddpm_analysis.pdf" ] ideas = discover_from_sources(sources) ideas = deduplicate_ideas(ideas) ideas = cross_review(ideas)
undefined
sources = [ "arxiv:diffusion+models", "github:CompVis/stable-diffusion", "local:./papers/ddpm_analysis.pdf" ] ideas = discover_from_sources(sources) ideas = deduplicate_ideas(ideas) ideas = cross_review(ideas)
undefined

Pattern 3: Targeted Paper Improvement

模式3:定向论文改进

python
undefined
python
undefined

Read existing paper → find gaps → generate fixes

读取现有论文 → 找出不足 → 生成改进方案

ARIS extracts weaknesses from paper

ARIS从论文中提取不足

weaknesses = extract_weaknesses(paper_url)
weaknesses = extract_weaknesses(paper_url)

Generate ideas that specifically address those weaknesses

生成针对性解决这些不足的创意

ideas = generate_targeted_ideas(weaknesses, base_repo)
ideas = generate_targeted_ideas(weaknesses, base_repo)

Run experiments with the exact codebase from the paper

使用论文中的代码库运行实验

results = run_experiments(ideas, base_repo)
undefined
results = run_experiments(ideas, base_repo)
undefined

Pattern 4: Cross-Model Review Chain

模式4:跨模型评审链

python
undefined
python
undefined

Use different models for different stages

不同阶段使用不同模型

executor = "claude-opus-4.7" # Fast, creative execution reviewer_1 = "gpt-5.4" # Rigorous critique reviewer_2 = "oracle-pro" # Final stress test
draft = generate_draft(executor) critique_1 = review(draft, reviewer_1) improved = revise(draft, critique_1, executor) critique_2 = review(improved, reviewer_2) final = revise(improved, critique_2, executor)
undefined
executor = "claude-opus-4.7" # 快速、富有创意的执行 reviewer_1 = "gpt-5.4" # 严谨的评审 reviewer_2 = "oracle-pro" # 最终压力测试
draft = generate_draft(executor) critique_1 = review(draft, reviewer_1) improved = revise(draft, critique_1, executor) critique_2 = review(improved, reviewer_2) final = revise(improved, critique_2, executor)
undefined

Troubleshooting

故障排查

API Key Issues

API密钥问题

bash
undefined
bash
undefined

Error: "ANTHROPIC_API_KEY not found"

错误:"ANTHROPIC_API_KEY not found"

export ANTHROPIC_API_KEY=your_key_here
export ANTHROPIC_API_KEY=your_key_here

Verify in CLI

在CLI中验证

/setup
/setup

Check "Current configuration" section

查看"当前配置"部分

undefined
undefined

Model Availability

模型可用性

bash
undefined
bash
undefined

List available models for your API keys

列出你的API密钥可用的模型

/models
/models

Error: "Model not available"

错误:"Model not available"

Solution: Check /setup → verify API key → select available model

解决方案:检查/setup → 验证API密钥 → 选择可用模型

undefined
undefined

Rate Limits

速率限制

python
undefined
python
undefined

ARIS auto-retries on 429/5xx errors

ARIS会自动重试429/5xx错误

Default: 3 retries with exponential backoff

默认:3次重试,指数退避

To adjust (in skill YAML):

调整参数(在skill YAML中):

retry: max_attempts: 5 backoff_multiplier: 2.0 initial_delay: 1.0 # seconds
undefined
retry: max_attempts: 5 backoff_multiplier: 2.0 initial_delay: 1.0 # 秒
undefined

Cross-Model Review Not Working

跨模型评审不工作

bash
undefined
bash
undefined

Error: "Reviewer returned empty response"

错误:"Reviewer returned empty response"

Common cause: reviewer API key missing

常见原因:评审器API密钥缺失

Check:

检查:

echo $OPENAI_API_KEY # For gpt-5.4 reviewer echo $ANTHROPIC_API_KEY # For claude reviewer
echo $OPENAI_API_KEY # 用于gpt-5.4评审器 echo $ANTHROPIC_API_KEY # 用于claude评审器

Fix:

修复:

/setup
/setup

Configure reviewer provider separately

单独配置评审器提供商

undefined
undefined

Experiment Failures

实验失败

python
undefined
python
undefined

Error: "Git clone failed"

错误:"Git clone failed"

Solution: Check repo URL, network, auth

解决方案:检查仓库URL、网络、权限

Error: "Experiment timeout"

错误:"Experiment timeout"

Solution: Increase timeout in skill YAML:

解决方案:在skill YAML中增加超时时间:

experiment: timeout: 7200 # seconds (default: 3600)
experiment: timeout: 7200 # 秒(默认:3600)

Error: "CUDA out of memory"

错误:"CUDA out of memory"

Solution: Add batch size reduction to experiment config:

解决方案:在实验配置中减少批量大小:

training: batch_size: 16 # reduce from default gradient_accumulation: 4
undefined
training: batch_size: 16 # 从默认值减少 gradient_accumulation: 4
undefined

Wiki Corruption

维基损坏

bash
undefined
bash
undefined

Error: "Wiki index corrupted"

错误:"Wiki index corrupted"

Solution: Rebuild index

解决方案:重建索引

cd ~/.aris/wiki rm graph.json /wiki rebuild
cd ~/.aris/wiki rm graph.json /wiki rebuild

Or reset wiki entirely

或完全重置维基

rm -rf ~/.aris/wiki /setup # Re-enable wiki
undefined
rm -rf ~/.aris/wiki /setup # 重新启用维基
undefined

Character Limit in Rebuttal

反驳信字符限制

bash
undefined
bash
undefined

Error: "Rebuttal exceeds character limit"

错误:"Rebuttal exceeds character limit"

ARIS auto-compresses, but if still over:

ARIS会自动压缩,但如果仍然超出:

1. Use quick mode to see strategy first

1. 使用快速模式先查看策略

/rebuttal "reviews" -- character limit: 5000, quick mode: true
/rebuttal "reviews" -- character limit: 5000, quick mode: true

2. Manually edit strategy.md to remove low-priority responses

2. 手动编辑strategy.md移除低优先级回复

3. Re-run with edited strategy

3. 使用编辑后的策略重新运行

/rebuttal "reviews" -- character limit: 5000, resume: strategy.md
undefined
/rebuttal "reviews" -- character limit: 5000, resume: strategy.md
undefined

Local Model Connection

本地模型连接

bash
undefined
bash
undefined

Error: "Cannot connect to LM Studio"

错误:"Cannot connect to LM Studio"

Solution:

解决方案:

1. Start LM Studio server

1. 启动LM Studio服务器

2. Check port (default: 1234)

2. 检查端口(默认:1234)

curl http://localhost:1234/v1/models # Should return model list
curl http://localhost:1234/v1/models # 应返回模型列表

3. In /setup:

3. 在/setup中:

Provider: LM Studio

提供商:LM Studio

Model: <model-name-from-curl>

模型:<model-name-from-curl>

undefined
undefined

Permission Issues (Standalone CLI)

权限问题(独立CLI)

bash
undefined
bash
undefined

Error: "Permission denied" when running experiments

错误:运行实验时"Permission denied"

Solution: ARIS prompts before executing tools

解决方案:ARIS在执行工具前会提示

Check permission mode:

检查权限模式:

/setup
/setup

Tool Permission: Prompt (recommended) | Auto-allow | Deny

工具权限:提示(推荐) | 自动允许 | 拒绝

Or set in config:

或在配置中设置:

security: tool_permission: prompt # always ask before running code
undefined
security: tool_permission: prompt # 执行代码前始终询问
undefined

Advanced Features

高级功能

Research Wiki (Persistent Memory)

研究维基(持久化记忆)

Track papers, ideas, experiments across sessions:
bash
undefined
跨会话跟踪论文、创意、实验:
bash
undefined

Add paper

添加论文

Add idea

添加创意

/wiki add idea "Factorized relative position embeddings for Transformer-XL"
/wiki add idea "Transformer-XL的分解式相对位置嵌入"

Link idea to paper

关联创意与论文

/wiki link idea:1 paper:1 "inspired_by"
/wiki link idea:1 paper:1 "inspired_by"

Query relationships

查询关系

/wiki query "ideas from Transformer-XL paper"
/wiki query "来自Transformer-XL论文的创意"

Export to Markdown

导出为Markdown

/wiki export ./research_notes/

**Relationship types:**
- `inspired_by`: idea → paper
- `improves_on`: idea → idea
- `validated_by`: idea → experiment
- `contradicts`: claim → claim
/wiki export ./research_notes/

**关系类型:**
- `inspired_by`: 创意 → 论文
- `improves_on`: 创意 → 创意
- `validated_by`: 创意 → 实验
- `contradicts`: 结论 → 结论

Meta-Optimization (Self-Improvement)

元优化(自我改进)

ARIS analyzes its own logs and proposes skill improvements:
bash
/meta-optimize
ARIS分析自身日志并提出skill改进方案:
bash
/meta-optimize

ARIS will:

ARIS会:

1. Parse conversation logs

1. 解析对话日志

2. Identify failure patterns

2. 识别失败模式

3. Generate SKILL.md patches

3. 生成SKILL.md补丁

4. Apply patches (with confirmation)

4. 应用补丁(需确认)


**Example meta-optimization:**

```markdown

**元优化示例:**

```markdown

Detected Pattern: Review scores often plateau at 8.0

检测到模式:评审评分常停滞在8.0

Root cause: Reviewer prompt lacks specific grading rubric.
Proposed fix:
diff
--- skills/workflow-3-write-paper.md
+++ skills/workflow-3-write-paper.md
@@ -15,6 +15,12 @@
 
 Review this paper draft and provide:
 - Score (1-10)
 - Strengths (3-5 points)
 - Weaknesses (3-5 points)
+
+**Grading rubric:**
+- 9-10: Publication-ready, minor edits only
+- 7-8: Strong work, needs revisions
+- 5-6: Major issues, needs rework
+- 1-4: Fundamental flaws
Apply this patch? (y/n): y
undefined
根本原因: 评审提示缺少具体评分标准。
建议修复:
diff
--- skills/workflow-3-write-paper.md
+++ skills/workflow-3-write-paper.md
@@ -15,6 +15,12 @@
 
 Review this paper draft and provide:
 - Score (1-10)
 - Strengths (3-5 points)
 - Weaknesses (3-5 points)
+
+**Grading rubric:**
+- 9-10: 可直接发表,仅需少量编辑
+- 7-8: 工作质量优秀,需修订
+- 5-6: 存在重大问题,需重写
+- 1-4: 存在根本性缺陷
应用此补丁?(y/n): y
undefined

Plan Mode

规划模式

Break down complex tasks before execution:
bash
/plan "implement a vision transformer from scratch and train on CIFAR-10"
在执行前拆解复杂任务:
bash
/plan "从零实现视觉Transformer并在CIFAR-10上训练"

ARIS generates step-by-step plan:

ARIS生成分步计划:

1. Set up environment (PyTorch, datasets)

1. 设置环境(PyTorch、数据集)

2. Implement patch embedding layer

2. 实现补丁嵌入层

3. Implement transformer encoder blocks

3. 实现Transformer编码器块

4. Implement classification head

4. 实现分类头

5. Write training loop

5. 编写训练循环

6. Run experiments with different configs

6. 使用不同配置运行实验

7. Analyze results

7. 分析结果

Execute plan:

执行计划:

/execute-plan
undefined
/execute-plan
undefined

Integration with Other Tools

与其他工具的集成

Cursor Integration

Cursor集成

bash
undefined
bash
undefined

In Cursor settings.json:

在Cursor的settings.json中:

{ "cursor.aiRules": [ "Use ARIS skills from ~/claude-code-skills/aris/skills/", "For research tasks, trigger ARIS workflows", "For paper review, use workflow-3-write-paper.md" ] }
undefined
{ "cursor.aiRules": [ "使用~/claude-code-skills/aris/skills/中的ARIS skills", "研究任务触发ARIS工作流", "论文评审使用workflow-3-write-paper.md" ] }
undefined

Trae Integration

Trae集成

See TRAE_ARIS_RUNBOOK_EN.md for detailed setup.
详细设置请参考TRAE_ARIS_RUNBOOK_EN.md

Codex CLI Integration

Codex CLI集成

bash
undefined
bash
undefined

ARIS has native Codex CLI skills

ARIS原生支持Codex CLI skills

cd skills/skills-codex/
cd skills/skills-codex/

Use with OpenAI Codex

与OpenAI Codex一起使用

codex chat --system-file workflow-1-discover-paper.md
undefined
codex chat --system-file workflow-1-discover-paper.md
undefined

Oracle MCP (GPT-5.4 Pro Reviewer)

Oracle MCP(GPT-5.4 Pro评审器)

bash
undefined
bash
undefined

Install Oracle MCP

安装Oracle MCP

npm install -g @steipete/oracle
npm install -g @steipete/oracle

Configure in /setup

在/setup中配置

Reviewer: oracle-pro

评审器:oracle-pro

Or in skill YAML:

或在skill YAML中:

reviewer: oracle-pro
undefined
reviewer: oracle-pro
undefined

Best Practices

最佳实践

  1. Start small: Use targeted workflows (Workflow 1-4) before full pipeline
  2. Validate reviews: Cross-model review is powerful but not infallible — verify critiques
  3. Use Research Wiki: Enable persistent memory for long-term projects
  4. Monitor costs: GPT-5.4 Pro (oracle) is expensive; use for final review only
  5. Version control: ARIS creates Git branches for experiments — commit frequently
  6. Iterate: Auto-review loops improve quality — aim for 3+ rounds
  7. Read logs: ARIS logs all LLM interactions in
    ~/.aris/logs/
    — useful for debugging
  8. Customize skills: Fork and modify SKILL.md files for your domain
  1. 从小规模开始:先使用定向工作流(Workflow 1-4),再尝试完整流程
  2. 验证评审意见:跨模型评审功能强大但并非绝对可靠——需验证评审意见
  3. 使用研究维基:为长期项目启用持久化记忆
  4. 监控成本:GPT-5.4 Pro(oracle)成本较高;仅用于最终评审
  5. 版本控制:ARIS为实验创建Git分支——频繁提交
  6. 迭代优化:自动评审循环提升质量——目标3+轮
  7. 查看日志:ARIS在
    ~/.aris/logs/
    中记录所有LLM交互——便于调试
  8. 自定义skills:复刻并修改SKILL.md文件以适配你的领域

Resources

相关资源


License: MIT
Contributing: PRs welcome — see CONTRIBUTING.md

许可证:MIT
贡献:欢迎提交PR——参考CONTRIBUTING.md