aris-autonomous-ml-research
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseARIS Autonomous ML Research
ARIS 自动化机器学习研究
Skill by ara.so — Claude Code Skills collection.
ARIS (Auto-Research-In-Sleep) is a lightweight Markdown-based system for autonomous ML research that orchestrates cross-model collaboration. It enables AI agents to discover ideas, review papers, run experiments, and write rebuttals — all autonomously. Works with Claude Code, Codex CLI, Cursor, Trae, Antigravity, or any LLM agent.
由ara.so开发的Skill——Claude Code Skills合集。
ARIS(Auto-Research-In-Sleep)是一个基于Markdown的轻量级自动化机器学习研究系统,可协调跨模型协作。它能让AI agent自主完成创意发现、论文评审、实验运行和反驳信撰写等全部工作。支持Claude Code、Codex CLI、Cursor、Trae、Antigravity或任意LLM agent。
What ARIS Does
ARIS 功能介绍
- Idea Generation: Automatically discovers research ideas from arXiv papers, GitHub repos, or research directions
- Cross-Model Review: Uses different LLMs for execution vs. review to break self-play blind spots (e.g., Claude Code executes, GPT-5.4 reviews)
- Experiment Automation: Clones codebases, runs experiments, analyzes results
- Paper Writing: Generates drafts with auto-review loops to improve quality
- Rebuttal Generation: Parses reviews, builds strategy, drafts rebuttals under character limits
- Research Wiki: Persistent knowledge base tracking papers, ideas, experiments, and claims
- 创意生成:自动从arXiv论文、GitHub仓库或研究方向中发掘研究创意
- 跨模型评审:使用不同LLM分别执行任务和进行评审,打破自循环盲区(例如Claude Code执行,GPT-5.4评审)
- 实验自动化:克隆代码库、运行实验、分析结果
- 论文撰写:生成草稿并通过自动评审循环提升质量
- 反驳信生成:解析评审意见、制定策略、在字符限制内撰写反驳信
- 研究维基:持久化知识库,跟踪论文、创意、实验和研究结论
Installation
安装方法
As Claude Code Skills (In-Editor)
作为Claude Code Skills(编辑器内使用)
bash
undefinedbash
undefinedClone skills into your Claude Code skills directory
将skills克隆到你的Claude Code技能目录
cd ~/claude-code-skills # or your skills path
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git aris
cd aris/skills
cd ~/claude-code-skills # 或你的技能路径
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git aris
cd aris/skills
Skills are now available in Claude Code
现在即可在Claude Code中使用这些技能
undefinedundefinedAs Standalone CLI (ARIS-Code)
作为独立CLI工具(ARIS-Code)
bash
undefinedbash
undefinedDownload latest release
下载最新版本
Or install from source
或从源码安装
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cd Auto-claude-code-research-in-sleep/aris-code
cargo build --release
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cd Auto-claude-code-research-in-sleep/aris-code
cargo build --release
Run setup
运行初始化设置
./target/release/aris-code
./target/release/aris-code
Follow interactive setup to configure API keys
按照交互式指引配置API密钥
undefinedundefinedEnvironment Setup
环境配置
ARIS requires API keys for LLM providers:
bash
undefinedARIS需要LLM提供商的API密钥:
bash
undefinedFor Claude (primary executor)
用于Claude(主要执行器)
export ANTHROPIC_API_KEY=your_key_here
export ANTHROPIC_API_KEY=your_key_here
For OpenAI (reviewer/alternative)
用于OpenAI(评审器/备选)
export OPENAI_API_KEY=your_key_here
export OPENAI_API_KEY=your_key_here
For alternative Chinese models (optional)
用于其他中文模型(可选)
export MOONSHOT_API_KEY=your_key_here # Kimi
export MINIMAX_API_KEY=your_key_here # MiniMax
export GLM_API_KEY=your_key_here # GLM
export DEEPSEEK_API_KEY=your_key_here # DeepSeek
export DOUBAO_API_KEY=your_key_here # Doubao
undefinedexport MOONSHOT_API_KEY=your_key_here # Kimi
export MINIMAX_API_KEY=your_key_here # MiniMax
export GLM_API_KEY=your_key_here # GLM
export DEEPSEEK_API_KEY=your_key_here # DeepSeek
export DOUBAO_API_KEY=your_key_here # Doubao
undefinedKey Commands
核心命令
Standalone CLI Commands
独立CLI命令
bash
undefinedbash
undefinedInteractive setup
交互式初始化设置
/setup
/setup
Run full research pipeline
运行完整研究流程
/research-pipeline "your research direction"
/research-pipeline "你的研究方向"
Run with reference paper and base repo
结合参考论文和基础仓库运行
/research-pipeline "improve method X" -- ref paper: https://arxiv.org/abs/2406.04329, base repo: https://github.com/org/project
/research-pipeline "改进方法X" -- ref paper: https://arxiv.org/abs/2406.04329, base repo: https://github.com/org/project
Generate rebuttal from reviews
根据评审意见生成反驳信
/rebuttal "paper/ + reviews" -- venue: ICML, character limit: 5000
/rebuttal "paper/ + reviews" -- venue: ICML, character limit: 5000
Plan mode (step-by-step breakdown)
规划模式(分步拆解任务)
/plan "implement transformer variant"
/plan "实现Transformer变体"
Research Wiki operations
研究维基操作
/wiki add paper <arxiv_url>
/wiki add idea "your idea description"
/wiki query "search term"
/wiki export
/wiki add paper <arxiv_url>
/wiki add idea "你的创意描述"
/wiki query "搜索关键词"
/wiki export
Meta-optimization (self-improvement)
元优化(自我改进)
/meta-optimize
/meta-optimize
Task management
任务管理
/tasks
/tasks add "task description"
/tasks complete <id>
/tasks
/tasks add "任务描述"
/tasks complete <id>
Help and info
帮助与信息
/help
/models # List available models
undefined/help
/models # 列出可用模型
undefinedClaude Code Skill Commands
Claude Code Skill命令
When using ARIS as Claude Code skills, trigger workflows with natural language:
"Use ARIS to review this paper: https://arxiv.org/abs/2406.04329"
"Generate research ideas about discrete diffusion models"
"Run the research pipeline for improving attention mechanisms"
"Help me write a rebuttal for these ICML reviews"当将ARIS作为Claude Code skills使用时,可通过自然语言触发工作流:
"使用ARIS评审这篇论文:https://arxiv.org/abs/2406.04329"
"生成关于离散扩散模型的研究创意"
"运行改进注意力机制的研究流程"
"帮我为这些ICML评审意见撰写反驳信"Core Workflows
核心工作流
1. Full Research Pipeline
1. 完整研究流程
End-to-end autonomous research:
bash
/research-pipeline "factorized gap in discrete diffusion LMs"What happens:
- Discovers related papers from arXiv
- Generates novel research ideas
- Reviews ideas with external LLM (cross-model)
- Runs experiments on top ideas
- Writes paper draft
- Auto-reviews and improves draft
- Outputs final paper + code
With reference paper + codebase:
bash
/research-pipeline "improve attention efficiency" -- ref paper: https://arxiv.org/abs/2305.xxxx, base repo: https://github.com/org/attention-implARIS reads the paper → finds weaknesses → uses that specific codebase → generates targeted improvements.
端到端自动化研究:
bash
/research-pipeline "离散扩散LM中的分解间隙"执行流程:
- 从arXiv发现相关论文
- 生成新颖研究创意
- 通过外部LLM进行跨模型评审
- 针对优质创意运行实验
- 撰写论文草稿
- 自动评审并优化草稿
- 输出最终论文+代码
结合参考论文+代码库:
bash
/research-pipeline "提升注意力效率" -- ref paper: https://arxiv.org/abs/2305.xxxx, base repo: https://github.com/org/attention-implARIS会读取论文→找出不足→使用指定代码库→生成针对性改进方案。
2. Targeted Idea Discovery
2. 定向创意发掘
Generate ideas from specific sources:
python
undefined从特定来源生成创意:
python
undefinedIn Claude Code, reference the skill
在Claude Code中引用该skill
"""
Use Workflow 1: DiscoverPaper skill to find ideas from:
- arXiv search: "vision transformers"
- GitHub repo: https://github.com/google-research/vision_transformer
- Local paper: ./papers/vit_analysis.pdf """
**ARIS will:**
- Parse papers/code
- Extract key insights
- Generate 5-10 novel ideas
- Route to external reviewer
- Return scored, critiqued ideas"""
使用Workflow 1: DiscoverPaper skill从以下来源发掘创意:
- arXiv搜索:"vision transformers"
- GitHub仓库:https://github.com/google-research/vision_transformer
- 本地论文:./papers/vit_analysis.pdf """
**ARIS会:**
- 解析论文/代码
- 提取关键见解
- 生成5-10个新颖创意
- 发送给外部评审器
- 返回带有评分和评审意见的创意3. Paper Review Loop
3. 论文评审循环
Multi-round review with automated improvements:
bash
undefined多轮评审并自动优化:
bash
undefinedStandalone CLI
独立CLI
/review-paper paper_draft.md --rounds 3
/review-paper paper_draft.md --rounds 3
In Claude Code
在Claude Code中
"Review this draft and improve it through 3 rounds: ./draft.md"
**Review process:**
1. External LLM critiques (e.g., GPT-5.4)
2. Claude Code addresses weaknesses
3. Repeat until score plateau or max rounds
4. Final output with score progression graph"评审这份草稿并通过3轮优化:./draft.md"
**评审流程:**
1. 外部LLM给出评审意见(例如GPT-5.4)
2. Claude Code针对不足进行改进
3. 重复直到评分稳定或达到最大轮次
4. 输出包含评分变化图的最终版本4. Rebuttal Generation
4. 反驳信生成
Parse reviews and draft rebuttal:
bash
/rebuttal "paper_dir/" -- venue: ICML, character limit: 5000Phases:
- Parse reviews: Extract all reviewer concerns
- Build strategy: Map concerns → responses
- Draft rebuttal: Generate structured response
- Format check: Ensure under character limit
Quick mode (stop before drafting):
bash
/rebuttal "paper_dir/" -- venue: NeurIPS, character limit: 8000, quick mode: true解析评审意见并撰写反驳信:
bash
/rebuttal "paper_dir/" -- venue: ICML, character limit: 5000执行阶段:
- 解析评审意见:提取所有评审关注点
- 制定策略:映射关注点到对应回复
- 撰写反驳信:生成结构化回复
- 格式检查:确保符合字符限制
快速模式(停止在撰写前):
bash
/rebuttal "paper_dir/" -- venue: NeurIPS, character limit: 8000, quick mode: trueConfiguration
配置说明
Model Selection
模型选择
ARIS supports multiple executor + reviewer combinations:
In standalone CLI:
bash
undefinedARIS支持多种执行器+评审器组合:
在独立CLI中:
bash
undefinedInteractive setup
交互式初始化设置
/setup
/setup
Choose:
选择:
1. Claude (Anthropic)
1. Claude (Anthropic)
2. OpenAI (GPT-4/5)
2. OpenAI (GPT-4/5)
3. Kimi (Moonshot)
3. Kimi (Moonshot)
4. MiniMax
4. MiniMax
5. GLM (Zhipu)
5. GLM (智谱)
6. DeepSeek
6. DeepSeek
7. Doubao
7. Doubao
8. LM Studio (local)
8. LM Studio (本地)
... and more
... 以及更多
**In skill files (YAML frontmatter):**
```yaml
---
executor: claude-opus-4.7 # Primary LLM
reviewer: gpt-5.5 # Review LLM
---Available executors:
- ,
claude-opus-4.7claude-sonnet-4.5 - ,
gpt-5.5,gpt-5.4,o1,o3o4 - ,
kimi-k2.5kimi-k3 - ,
minimax-m2.7minimax-pro - ,
glm-5glm-5-plus - ,
deepseek-v3deepseek-r1 - ,
doubao-litedoubao-pro
Reviewer routing:
yaml
reviewer: oracle-pro # GPT-5.4 Pro via Oracle MCP (strongest)
reviewer: gpt-5.4 # GPT-5.4 standard
reviewer: claude-opus # Claude for review
reviewer: auto # Smart routing based on executor
**在skill文件中(YAML前置内容):**
```yaml
---
executor: claude-opus-4.7 # 主LLM
reviewer: gpt-5.5 # 评审LLM
---可用执行器:
- ,
claude-opus-4.7claude-sonnet-4.5 - ,
gpt-5.5,gpt-5.4,o1,o3o4 - ,
kimi-k2.5kimi-k3 - ,
minimax-m2.7minimax-pro - ,
glm-5glm-5-plus - ,
deepseek-v3deepseek-r1 - ,
doubao-litedoubao-pro
评审器路由:
yaml
reviewer: oracle-pro # 通过Oracle MCP使用GPT-5.4 Pro(最强版本)
reviewer: gpt-5.4 # 标准GPT-5.4
reviewer: claude-opus # 使用Claude进行评审
reviewer: auto # 根据执行器智能路由Research Wiki Configuration
研究维基配置
Enable persistent memory across sessions:
bash
undefined启用跨会话持久化记忆:
bash
undefinedIn CLI
在CLI中
/setup
/setup
Enable "Research Wiki" option
启用"Research Wiki"选项
Or set in config
或在配置文件中设置
wiki:
enabled: true
path: ~/.aris/wiki/
auto_commit: true # Git commit after each change
**Wiki structure:**
~/.aris/wiki/
├── papers/ # Tracked papers
├── ideas/ # Research ideas
├── experiments/ # Experiment results
├── claims/ # Key claims and evidence
└── graph.json # Relationship graph
undefinedwiki:
enabled: true
path: ~/.aris/wiki/
auto_commit: true # 每次修改后自动Git提交
**维基结构:**
~/.aris/wiki/
├── papers/ # 跟踪的论文
├── ideas/ # 研究创意
├── experiments/ # 实验结果
├── claims/ # 关键结论与证据
└── graph.json # 关系图谱
undefinedProxy and Custom Endpoints
代理与自定义端点
HTTP/HTTPS proxy:
bash
export HTTP_PROXY=http://localhost:8080
export HTTPS_PROXY=http://localhost:8080HTTP/HTTPS代理:
bash
export HTTP_PROXY=http://localhost:8080
export HTTPS_PROXY=http://localhost:8080Or in /setup
或在/setup中设置
Select provider → Configure proxy URL
选择提供商→配置代理URL
**Custom API endpoints:**
```bash
**自定义API端点:**
```bashFor Anthropic-compatible proxies (Bedrock, etc.)
用于Anthropic兼容代理(如Bedrock等)
/setup
/setup
Provider: Anthropic
提供商:Anthropic
Custom base URL: https://bedrock.amazonaws.com/anthropic/
**Local models (LM Studio/Ollama):**
```bash
/setup
**本地模型(LM Studio/Ollama):**
```bash
/setupSelect "LM Studio / Ollama (Local)"
选择"LM Studio / Ollama (Local)"
Base URL: http://localhost:1234/v1
基础URL:http://localhost:1234/v1
Model: local-model-name
模型:local-model-name
undefinedundefinedCode Examples
代码示例
Example 1: Idea Discovery from Paper
示例1:从论文中发掘创意
python
undefinedpython
undefinedskill: workflow-1-discover-paper.md
skill: workflow-1-discover-paper.md
"""
I want to discover ideas from this paper:
https://arxiv.org/abs/2305.14342 (Transformer-XL)
Focus on: attention mechanism improvements
"""
"""
我想从这篇论文中发掘创意:
https://arxiv.org/abs/2305.14342 (Transformer-XL)
重点关注:注意力机制改进
"""
ARIS will:
ARIS会:
1. Download and parse paper
1. 下载并解析论文
2. Extract key insights about XL attention
2. 提取关于XL注意力的关键见解
3. Generate 5-10 novel ideas
3. 生成5-10个新颖创意
4. Send to GPT-5.4 for review
4. 发送给GPT-5.4进行评审
5. Return scored ideas with critiques
5. 返回带有评分和评审意见的创意
**Output structure:**
```markdown
**输出结构:**
```markdownDiscovered Ideas (Reviewed)
发掘的创意(已评审)
Idea 1: Factorized Relative Position Embeddings (Score: 8.5/10)
创意1:分解式相对位置嵌入(评分:8.5/10)
Core insight: XL uses dense relative position matrix — factorize it.
Reviewer critique (GPT-5.4):
- ✅ Novelty: High (not explored in XL paper)
- ✅ Feasibility: Doable (standard tensor decomposition)
- ⚠️ Impact: Need to verify on long sequences
- ⚠️ Risk: May hurt performance if rank too low
Next steps: Implement SVD-based factorization, benchmark on PG-19
核心见解: XL使用密集相对位置矩阵——对其进行分解。
评审意见(GPT-5.4):
- ✅ 新颖性:高(XL论文中未探索)
- ✅ 可行性:可实现(标准张量分解)
- ⚠️ 影响:需在长序列上验证
- ⚠️ 风险:若秩过低可能影响性能
下一步: 实现基于SVD的分解,在PG-19上进行基准测试
Idea 2: Learnable Decay for Relative Attention (Score: 7.2/10)
创意2:相对注意力的可学习衰减(评分:7.2/10)
...
undefined...
undefinedExample 2: Experiment Automation
示例2:实验自动化
python
undefinedpython
undefinedskill: workflow-2-run-experiment.md
skill: workflow-2-run-experiment.md
"""
Clone https://github.com/kimiyoung/transformer-xl
Implement Idea 1 (factorized position embeddings)
Run on enwik8 benchmark
Compare with baseline
"""
"""
克隆https://github.com/kimiyoung/transformer-xl
实现创意1(分解式位置嵌入)
在enwik8基准上运行
与基线对比
"""
ARIS will:
ARIS会:
1. Clone repo
1. 克隆仓库
2. Create experiment branch
2. 创建实验分支
3. Modify model code (e.g., pytorch_modules/rel_multihead_attn.py)
3. 修改模型代码(如pytorch_modules/rel_multihead_attn.py)
4. Set up training config
4. 设置训练配置
5. Run experiment
5. 运行实验
6. Parse results
6. 解析结果
7. Generate comparison report
7. 生成对比报告
**Generated experiment code:**
```python
**生成的实验代码:**
```pythonaris_experiments/factorized_rel_pos/model_patch.py
aris_experiments/factorized_rel_pos/model_patch.py
import torch
import torch.nn as nn
class FactorizedRelativeAttention(nn.Module):
def init(self, d_model, n_heads, rank=64):
super().init()
self.d_model = d_model
self.n_heads = n_heads
self.rank = rank
# Factorized position embeddings: (seq_len, d_model) ≈ U @ V^T
self.U = nn.Parameter(torch.randn(2048, rank)) # max_seq_len
self.V = nn.Parameter(torch.randn(d_model, rank))
def forward(self, q, k, v, pos_emb):
# Compute relative position on-the-fly
rel_pos = self.U @ self.V.t() # (2048, d_model)
# ... rest of attention logic
**Experiment results:**
```markdownimport torch
import torch.nn as nn
class FactorizedRelativeAttention(nn.Module):
def init(self, d_model, n_heads, rank=64):
super().init()
self.d_model = d_model
self.n_heads = n_heads
self.rank = rank
# 分解式位置嵌入:(seq_len, d_model) ≈ U @ V^T
self.U = nn.Parameter(torch.randn(2048, rank)) # max_seq_len
self.V = nn.Parameter(torch.randn(d_model, rank))
def forward(self, q, k, v, pos_emb):
# 动态计算相对位置
rel_pos = self.U @ self.V.t() # (2048, d_model)
# ... 其余注意力逻辑
**实验结果:**
```markdownExperiment Results: Factorized Relative Position
实验结果:分解式相对位置
| Metric | Baseline (XL) | Ours (Factorized) | Δ |
|---|---|---|---|
| PPL (enwik8) | 1.06 | 1.08 | +0.02 ↓ |
| Speed (tok/s) | 12.3k | 18.7k | +52% ↑ |
| Memory (GB) | 11.2 | 7.8 | -30% ↑ |
| Params (M) | 277 | 261 | -5.8% ↑ |
✅ Success: 52% faster, 30% less memory, minor PPL degradation acceptable for long-context tasks.
undefined| 指标 | 基线(XL) | 我们的方案(分解式) | 变化 |
|---|---|---|---|
| PPL (enwik8) | 1.06 | 1.08 | +0.02 ↓ |
| 速度 (tok/s) | 12.3k | 18.7k | +52% ↑ |
| 内存 (GB) | 11.2 | 7.8 | -30% ↑ |
| 参数 (M) | 277 | 261 | -5.8% ↑ |
✅ 成功:速度提升52%,内存减少30%,PPL小幅下降在长上下文任务中可接受。
undefinedExample 3: Paper Writing with Auto-Review
示例3:带自动评审的论文撰写
python
undefinedpython
undefinedskill: workflow-3-write-paper.md
skill: workflow-3-write-paper.md
"""
Write a paper about the factorized position embeddings experiment.
Title: "Efficient Transformers via Factorized Relative Attention"
Target venue: ICML 2026
Sections: Abstract, Introduction, Method, Experiments, Conclusion
"""
"""
撰写一篇关于分解式位置嵌入实验的论文。
标题:"Efficient Transformers via Factorized Relative Attention"
目标会议:ICML 2026
章节:摘要、引言、方法、实验、结论
"""
ARIS will:
ARIS会:
1. Generate initial draft
1. 生成初始草稿
2. Send to GPT-5.4 for review
2. 发送给GPT-5.4进行评审
3. Address critiques (e.g., "add ablation study")
3. 针对评审意见进行改进(如"添加消融研究")
4. Re-review
4. 重新评审
5. Repeat for N rounds or until score plateau
5. 重复N轮或直到评分稳定
**Review loop:**
```markdown
**评审循环:**
```markdownRound 1 Review (GPT-5.4, Score: 6.5/10)
第一轮评审(GPT-5.4,评分:6.5/10)
Strengths:
- Clear motivation (long-context efficiency)
- Solid experimental results
Weaknesses:
- Missing ablation on rank hyperparameter
- No comparison with Linear Attention baseline
- Introduction lacks related work on efficient Transformers
Suggestions:
- Add Table 2: Rank ablation (r=16,32,64,128)
- Cite Performer, Linformer in related work
- Add wall-clock time comparison
优势:
- 动机明确(长上下文效率)
- 实验结果扎实
不足:
- 缺少秩超参数的消融实验
- 未与Linear Attention基线对比
- 引言部分缺少高效Transformer的相关工作
建议:
- 添加表2:秩消融实验(r=16,32,64,128)
- 在相关工作中引用Performer、Linformer
- 添加时钟时间对比
Round 2 Review (GPT-5.4, Score: 8.1/10)
第二轮评审(GPT-5.4,评分:8.1/10)
Improvements:
✅ Added rank ablation (Table 2)
✅ Expanded related work
✅ Wall-clock benchmarks included
Remaining issues:
- Figure 3 caption unclear
- Conclusion should mention future work
改进点:
✅ 添加了秩消融实验(表2)
✅ 扩展了相关工作
✅ 包含了时钟基准测试
剩余问题:
- 图3的标题不清晰
- 结论应提及未来工作
Round 3 Review (GPT-5.4, Score: 8.8/10)
第三轮评审(GPT-5.4,评分:8.8/10)
Near-ready: Minor edits only. Ready for submission.
undefined接近就绪: 仅需少量编辑。可提交。
undefinedExample 4: Rebuttal Generation
示例4:反驳信生成
bash
undefinedbash
undefinedreviews.txt contains reviewer comments
reviews.txt包含评审意见
paper/ directory has the submitted paper
paper/目录包含提交的论文
/rebuttal "paper/ reviews.txt" -- venue: ICML, character limit: 5000
**Generated rebuttal structure:**
```markdown/rebuttal "paper/ reviews.txt" -- venue: ICML, character limit: 5000
**生成的反驳信结构:**
```markdownRebuttal to ICML Reviews
ICML评审反驳信
Character count: 4847 / 5000
字符数:4847 / 5000
Response to Reviewer 1 (Score: 6 → ?)
回复评审人1(评分:6 → ?)
Q1: "Rank ablation missing — how does r affect performance?"
We thank the reviewer for this suggestion. We have added Table 2 (Appendix) showing rank ablation (r=16,32,64,128). Key findings: r=64 is optimal (PPL 1.08, speed +52%), r=128 matches baseline quality (PPL 1.06) but slower, r=32 degrades PPL to 1.15. We will include this in the camera-ready.
Q2: "No comparison with Linear Attention methods."
Valid point. We added Performer and FNet baselines (Table 3). Our method outperforms Performer by 0.04 PPL while being 20% faster due to factorization locality. Updated draft attached.
问题1:"缺少秩消融实验——r如何影响性能?"
感谢评审人的建议。我们已添加表2(附录)展示秩消融实验(r=16,32,64,128)。关键发现:r=64为最优(PPL 1.08,速度提升52%),r=128匹配基线质量(PPL 1.06)但速度较慢,r=32会导致PPL下降至1.15。我们会将其包含在最终版本中。
问题2:"未与Linear Attention方法对比。"
这是合理的观点。我们添加了Performer和FNet基线(表3)。我们的方法在PPL上优于Performer 0.04,同时因分解的局部性速度提升20%。已附上更新后的草稿。
Response to Reviewer 2 (Score: 7 → ?)
回复评审人2(评分:7 → ?)
Q1: "Scalability to 100k+ sequences unclear."
We have run additional experiments on PG-19 (100k context). Results: our method maintains +45% speedup with PPL degradation <0.05. Memory scales O(r·L) vs O(L²) for standard XL. Will add to Section 4.3.
Common concern (R1, R3): "Related work incomplete."
We have expanded Section 2 to cite Linformer, Performer, Longformer, and BigBird. Table 1 now includes complexity comparison.
Summary: We address all major concerns with new experiments and expanded analysis. We believe these changes strengthen the paper significantly.
undefined问题1:"对100k+序列的扩展性不明确。"
我们已在PG-19(100k上下文)上运行额外实验。结果:我们的方法保持45%的速度提升,PPL下降<0.05。内存复杂度为O(r·L),而标准XL为O(L²)。将添加到4.3节。
共同关注点(评审人1、3):"相关工作不完整。"
我们已扩展第2节,引用Linformer、Performer、Longformer和BigBird。表1现在包含复杂度对比。
总结: 我们通过新实验和扩展分析解决了所有主要问题。我们相信这些修改显著增强了论文的质量。
undefinedCommon Patterns
常见模式
Pattern 1: Iterative Improvement Loop
模式1:迭代改进循环
python
undefinedpython
undefinedDiscover → Review → Refine → Repeat
发掘 → 评审 → 优化 → 重复
ideas = discover_papers("diffusion models")
reviewed_ideas = cross_model_review(ideas)
top_idea = select_highest_score(reviewed_ideas)
results = run_experiment(top_idea)
paper = write_paper(results)
final_paper = auto_review_loop(paper, rounds=3)
undefinedideas = discover_papers("diffusion models")
reviewed_ideas = cross_model_review(ideas)
top_idea = select_highest_score(reviewed_ideas)
results = run_experiment(top_idea)
paper = write_paper(results)
final_paper = auto_review_loop(paper, rounds=3)
undefinedPattern 2: Multi-Source Idea Generation
模式2:多源创意生成
python
undefinedpython
undefinedCombine arXiv + GitHub + local papers
结合arXiv + GitHub + 本地论文
sources = [
"arxiv:diffusion+models",
"github:CompVis/stable-diffusion",
"local:./papers/ddpm_analysis.pdf"
]
ideas = discover_from_sources(sources)
ideas = deduplicate_ideas(ideas)
ideas = cross_review(ideas)
undefinedsources = [
"arxiv:diffusion+models",
"github:CompVis/stable-diffusion",
"local:./papers/ddpm_analysis.pdf"
]
ideas = discover_from_sources(sources)
ideas = deduplicate_ideas(ideas)
ideas = cross_review(ideas)
undefinedPattern 3: Targeted Paper Improvement
模式3:定向论文改进
python
undefinedpython
undefinedRead existing paper → find gaps → generate fixes
读取现有论文 → 找出不足 → 生成改进方案
paper_url = "https://arxiv.org/abs/2406.04329"
base_repo = "https://github.com/org/paper-code"
paper_url = "https://arxiv.org/abs/2406.04329"
base_repo = "https://github.com/org/paper-code"
ARIS extracts weaknesses from paper
ARIS从论文中提取不足
weaknesses = extract_weaknesses(paper_url)
weaknesses = extract_weaknesses(paper_url)
Generate ideas that specifically address those weaknesses
生成针对性解决这些不足的创意
ideas = generate_targeted_ideas(weaknesses, base_repo)
ideas = generate_targeted_ideas(weaknesses, base_repo)
Run experiments with the exact codebase from the paper
使用论文中的代码库运行实验
results = run_experiments(ideas, base_repo)
undefinedresults = run_experiments(ideas, base_repo)
undefinedPattern 4: Cross-Model Review Chain
模式4:跨模型评审链
python
undefinedpython
undefinedUse different models for different stages
不同阶段使用不同模型
executor = "claude-opus-4.7" # Fast, creative execution
reviewer_1 = "gpt-5.4" # Rigorous critique
reviewer_2 = "oracle-pro" # Final stress test
draft = generate_draft(executor)
critique_1 = review(draft, reviewer_1)
improved = revise(draft, critique_1, executor)
critique_2 = review(improved, reviewer_2)
final = revise(improved, critique_2, executor)
undefinedexecutor = "claude-opus-4.7" # 快速、富有创意的执行
reviewer_1 = "gpt-5.4" # 严谨的评审
reviewer_2 = "oracle-pro" # 最终压力测试
draft = generate_draft(executor)
critique_1 = review(draft, reviewer_1)
improved = revise(draft, critique_1, executor)
critique_2 = review(improved, reviewer_2)
final = revise(improved, critique_2, executor)
undefinedTroubleshooting
故障排查
API Key Issues
API密钥问题
bash
undefinedbash
undefinedError: "ANTHROPIC_API_KEY not found"
错误:"ANTHROPIC_API_KEY not found"
export ANTHROPIC_API_KEY=your_key_here
export ANTHROPIC_API_KEY=your_key_here
Verify in CLI
在CLI中验证
/setup
/setup
Check "Current configuration" section
查看"当前配置"部分
undefinedundefinedModel Availability
模型可用性
bash
undefinedbash
undefinedList available models for your API keys
列出你的API密钥可用的模型
/models
/models
Error: "Model not available"
错误:"Model not available"
Solution: Check /setup → verify API key → select available model
解决方案:检查/setup → 验证API密钥 → 选择可用模型
undefinedundefinedRate Limits
速率限制
python
undefinedpython
undefinedARIS auto-retries on 429/5xx errors
ARIS会自动重试429/5xx错误
Default: 3 retries with exponential backoff
默认:3次重试,指数退避
To adjust (in skill YAML):
调整参数(在skill YAML中):
retry:
max_attempts: 5
backoff_multiplier: 2.0
initial_delay: 1.0 # seconds
undefinedretry:
max_attempts: 5
backoff_multiplier: 2.0
initial_delay: 1.0 # 秒
undefinedCross-Model Review Not Working
跨模型评审不工作
bash
undefinedbash
undefinedError: "Reviewer returned empty response"
错误:"Reviewer returned empty response"
Common cause: reviewer API key missing
常见原因:评审器API密钥缺失
Check:
检查:
echo $OPENAI_API_KEY # For gpt-5.4 reviewer
echo $ANTHROPIC_API_KEY # For claude reviewer
echo $OPENAI_API_KEY # 用于gpt-5.4评审器
echo $ANTHROPIC_API_KEY # 用于claude评审器
Fix:
修复:
/setup
/setup
Configure reviewer provider separately
单独配置评审器提供商
undefinedundefinedExperiment Failures
实验失败
python
undefinedpython
undefinedError: "Git clone failed"
错误:"Git clone failed"
Solution: Check repo URL, network, auth
解决方案:检查仓库URL、网络、权限
Error: "Experiment timeout"
错误:"Experiment timeout"
Solution: Increase timeout in skill YAML:
解决方案:在skill YAML中增加超时时间:
experiment:
timeout: 7200 # seconds (default: 3600)
experiment:
timeout: 7200 # 秒(默认:3600)
Error: "CUDA out of memory"
错误:"CUDA out of memory"
Solution: Add batch size reduction to experiment config:
解决方案:在实验配置中减少批量大小:
training:
batch_size: 16 # reduce from default
gradient_accumulation: 4
undefinedtraining:
batch_size: 16 # 从默认值减少
gradient_accumulation: 4
undefinedWiki Corruption
维基损坏
bash
undefinedbash
undefinedError: "Wiki index corrupted"
错误:"Wiki index corrupted"
Solution: Rebuild index
解决方案:重建索引
cd ~/.aris/wiki
rm graph.json
/wiki rebuild
cd ~/.aris/wiki
rm graph.json
/wiki rebuild
Or reset wiki entirely
或完全重置维基
rm -rf ~/.aris/wiki
/setup # Re-enable wiki
undefinedrm -rf ~/.aris/wiki
/setup # 重新启用维基
undefinedCharacter Limit in Rebuttal
反驳信字符限制
bash
undefinedbash
undefinedError: "Rebuttal exceeds character limit"
错误:"Rebuttal exceeds character limit"
ARIS auto-compresses, but if still over:
ARIS会自动压缩,但如果仍然超出:
1. Use quick mode to see strategy first
1. 使用快速模式先查看策略
/rebuttal "reviews" -- character limit: 5000, quick mode: true
/rebuttal "reviews" -- character limit: 5000, quick mode: true
2. Manually edit strategy.md to remove low-priority responses
2. 手动编辑strategy.md移除低优先级回复
3. Re-run with edited strategy
3. 使用编辑后的策略重新运行
/rebuttal "reviews" -- character limit: 5000, resume: strategy.md
undefined/rebuttal "reviews" -- character limit: 5000, resume: strategy.md
undefinedLocal Model Connection
本地模型连接
bash
undefinedbash
undefinedError: "Cannot connect to LM Studio"
错误:"Cannot connect to LM Studio"
Solution:
解决方案:
1. Start LM Studio server
1. 启动LM Studio服务器
2. Check port (default: 1234)
2. 检查端口(默认:1234)
curl http://localhost:1234/v1/models # Should return model list
curl http://localhost:1234/v1/models # 应返回模型列表
3. In /setup:
3. 在/setup中:
Provider: LM Studio
提供商:LM Studio
Base URL: http://localhost:1234/v1
基础URL:http://localhost:1234/v1
Model: <model-name-from-curl>
模型:<model-name-from-curl>
undefinedundefinedPermission Issues (Standalone CLI)
权限问题(独立CLI)
bash
undefinedbash
undefinedError: "Permission denied" when running experiments
错误:运行实验时"Permission denied"
Solution: ARIS prompts before executing tools
解决方案:ARIS在执行工具前会提示
Check permission mode:
检查权限模式:
/setup
/setup
Tool Permission: Prompt (recommended) | Auto-allow | Deny
工具权限:提示(推荐) | 自动允许 | 拒绝
Or set in config:
或在配置中设置:
security:
tool_permission: prompt # always ask before running code
undefinedsecurity:
tool_permission: prompt # 执行代码前始终询问
undefinedAdvanced Features
高级功能
Research Wiki (Persistent Memory)
研究维基(持久化记忆)
Track papers, ideas, experiments across sessions:
bash
undefined跨会话跟踪论文、创意、实验:
bash
undefinedAdd paper
添加论文
/wiki add paper https://arxiv.org/abs/2305.14342
/wiki add paper https://arxiv.org/abs/2305.14342
Add idea
添加创意
/wiki add idea "Factorized relative position embeddings for Transformer-XL"
/wiki add idea "Transformer-XL的分解式相对位置嵌入"
Link idea to paper
关联创意与论文
/wiki link idea:1 paper:1 "inspired_by"
/wiki link idea:1 paper:1 "inspired_by"
Query relationships
查询关系
/wiki query "ideas from Transformer-XL paper"
/wiki query "来自Transformer-XL论文的创意"
Export to Markdown
导出为Markdown
/wiki export ./research_notes/
**Relationship types:**
- `inspired_by`: idea → paper
- `improves_on`: idea → idea
- `validated_by`: idea → experiment
- `contradicts`: claim → claim/wiki export ./research_notes/
**关系类型:**
- `inspired_by`: 创意 → 论文
- `improves_on`: 创意 → 创意
- `validated_by`: 创意 → 实验
- `contradicts`: 结论 → 结论Meta-Optimization (Self-Improvement)
元优化(自我改进)
ARIS analyzes its own logs and proposes skill improvements:
bash
/meta-optimizeARIS分析自身日志并提出skill改进方案:
bash
/meta-optimizeARIS will:
ARIS会:
1. Parse conversation logs
1. 解析对话日志
2. Identify failure patterns
2. 识别失败模式
3. Generate SKILL.md patches
3. 生成SKILL.md补丁
4. Apply patches (with confirmation)
4. 应用补丁(需确认)
**Example meta-optimization:**
```markdown
**元优化示例:**
```markdownDetected Pattern: Review scores often plateau at 8.0
检测到模式:评审评分常停滞在8.0
Root cause: Reviewer prompt lacks specific grading rubric.
Proposed fix:
diff
--- skills/workflow-3-write-paper.md
+++ skills/workflow-3-write-paper.md
@@ -15,6 +15,12 @@
Review this paper draft and provide:
- Score (1-10)
- Strengths (3-5 points)
- Weaknesses (3-5 points)
+
+**Grading rubric:**
+- 9-10: Publication-ready, minor edits only
+- 7-8: Strong work, needs revisions
+- 5-6: Major issues, needs rework
+- 1-4: Fundamental flawsApply this patch? (y/n): y
undefined根本原因: 评审提示缺少具体评分标准。
建议修复:
diff
--- skills/workflow-3-write-paper.md
+++ skills/workflow-3-write-paper.md
@@ -15,6 +15,12 @@
Review this paper draft and provide:
- Score (1-10)
- Strengths (3-5 points)
- Weaknesses (3-5 points)
+
+**Grading rubric:**
+- 9-10: 可直接发表,仅需少量编辑
+- 7-8: 工作质量优秀,需修订
+- 5-6: 存在重大问题,需重写
+- 1-4: 存在根本性缺陷应用此补丁?(y/n): y
undefinedPlan Mode
规划模式
Break down complex tasks before execution:
bash
/plan "implement a vision transformer from scratch and train on CIFAR-10"在执行前拆解复杂任务:
bash
/plan "从零实现视觉Transformer并在CIFAR-10上训练"ARIS generates step-by-step plan:
ARIS生成分步计划:
1. Set up environment (PyTorch, datasets)
1. 设置环境(PyTorch、数据集)
2. Implement patch embedding layer
2. 实现补丁嵌入层
3. Implement transformer encoder blocks
3. 实现Transformer编码器块
4. Implement classification head
4. 实现分类头
5. Write training loop
5. 编写训练循环
6. Run experiments with different configs
6. 使用不同配置运行实验
7. Analyze results
7. 分析结果
Execute plan:
执行计划:
/execute-plan
undefined/execute-plan
undefinedIntegration with Other Tools
与其他工具的集成
Cursor Integration
Cursor集成
bash
undefinedbash
undefinedIn Cursor settings.json:
在Cursor的settings.json中:
{
"cursor.aiRules": [
"Use ARIS skills from ~/claude-code-skills/aris/skills/",
"For research tasks, trigger ARIS workflows",
"For paper review, use workflow-3-write-paper.md"
]
}
undefined{
"cursor.aiRules": [
"使用~/claude-code-skills/aris/skills/中的ARIS skills",
"研究任务触发ARIS工作流",
"论文评审使用workflow-3-write-paper.md"
]
}
undefinedTrae Integration
Trae集成
See TRAE_ARIS_RUNBOOK_EN.md for detailed setup.
详细设置请参考TRAE_ARIS_RUNBOOK_EN.md。
Codex CLI Integration
Codex CLI集成
bash
undefinedbash
undefinedARIS has native Codex CLI skills
ARIS原生支持Codex CLI skills
cd skills/skills-codex/
cd skills/skills-codex/
Use with OpenAI Codex
与OpenAI Codex一起使用
codex chat --system-file workflow-1-discover-paper.md
undefinedcodex chat --system-file workflow-1-discover-paper.md
undefinedOracle MCP (GPT-5.4 Pro Reviewer)
Oracle MCP(GPT-5.4 Pro评审器)
bash
undefinedbash
undefinedInstall Oracle MCP
安装Oracle MCP
npm install -g @steipete/oracle
npm install -g @steipete/oracle
Configure in /setup
在/setup中配置
Reviewer: oracle-pro
评审器:oracle-pro
Or in skill YAML:
或在skill YAML中:
reviewer: oracle-pro
undefinedreviewer: oracle-pro
undefinedBest Practices
最佳实践
- Start small: Use targeted workflows (Workflow 1-4) before full pipeline
- Validate reviews: Cross-model review is powerful but not infallible — verify critiques
- Use Research Wiki: Enable persistent memory for long-term projects
- Monitor costs: GPT-5.4 Pro (oracle) is expensive; use for final review only
- Version control: ARIS creates Git branches for experiments — commit frequently
- Iterate: Auto-review loops improve quality — aim for 3+ rounds
- Read logs: ARIS logs all LLM interactions in — useful for debugging
~/.aris/logs/ - Customize skills: Fork and modify SKILL.md files for your domain
- 从小规模开始:先使用定向工作流(Workflow 1-4),再尝试完整流程
- 验证评审意见:跨模型评审功能强大但并非绝对可靠——需验证评审意见
- 使用研究维基:为长期项目启用持久化记忆
- 监控成本:GPT-5.4 Pro(oracle)成本较高;仅用于最终评审
- 版本控制:ARIS为实验创建Git分支——频繁提交
- 迭代优化:自动评审循环提升质量——目标3+轮
- 查看日志:ARIS在中记录所有LLM交互——便于调试
~/.aris/logs/ - 自定义skills:复刻并修改SKILL.md文件以适配你的领域
Resources
相关资源
- Project: https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep
- Technical Report: https://huggingface.co/papers/2605.03042
- Agent Guide: AGENT_GUIDE.md
- Alternative Models: Model Configuration Guide
- Community: Discord, GitHub Discussions
License: MIT
Contributing: PRs welcome — see CONTRIBUTING.md
- 项目地址:https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep
- 技术报告:https://huggingface.co/papers/2605.03042
- Agent指南:AGENT_GUIDE.md
- 备选模型:模型配置指南
- 社区:Discord、GitHub Discussions
许可证:MIT
贡献:欢迎提交PR——参考CONTRIBUTING.md