token-efficiency

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Token Efficiency Expert

Token效率专家

This skill provides token optimization strategies for cost-effective Claude Code usage across all projects. These guidelines help minimize token consumption while maintaining high-quality assistance.
本技能为所有项目提供Token优化策略,助力高性价比使用Claude Code。这些准则可在保持高质量协助的同时,最大程度减少Token消耗。

Core Principle

核心原则

ALWAYS follow these optimization guidelines by default unless the user explicitly requests verbose output or full file contents.
Default assumption: Users prefer efficient, cost-effective assistance.

除非用户明确要求详细输出或完整文件内容,否则默认始终遵循以下优化准则。
默认假设:用户偏好高效、高性价比的协助。

Model Selection Strategy

模型选择策略

Use the right model for the task to optimize cost and performance:
根据任务选择合适的模型,优化成本与性能:

Opus - For Learning and Deep Understanding

Opus - 用于学习与深度理解

Use Opus when:
  • 🎓 Learning new codebases - Understanding architecture, code structure, design patterns
  • 📚 Broad exploration - Identifying key files, understanding repository organization
  • 🔍 Deep analysis - Analyzing complex algorithms, performance optimization
  • 📖 Reading and understanding - When you need to comprehend existing code before making changes
  • 🧠 Very complex debugging - Only when Sonnet can't solve it or issue is architectural
Why Opus: More powerful reasoning for understanding complex systems and relationships
Example prompts:
"Use Opus to understand the architecture of this codebase"
"Switch to Opus - I need help understanding how this component works"
"Use Opus for this deep dive into the authentication system"
在以下场景使用Opus:
  • 🎓 学习新代码库 - 理解架构、代码结构、设计模式
  • 📚 广泛探索 - 识别关键文件、了解仓库组织方式
  • 🔍 深度分析 - 分析复杂算法、性能优化
  • 📖 阅读与理解 - 在修改现有代码前需要理解其逻辑
  • 🧠 极复杂调试 - 仅当Sonnet无法解决问题或涉及架构层面问题时使用
为何选择Opus: 具备更强大的推理能力,可理解复杂系统及其中的关联
示例提示:
"使用Opus理解此代码库的架构"
"切换到Opus - 我需要帮助理解这个组件的工作原理"
"使用Opus深入研究认证系统"

Sonnet - For Regular Development Tasks (DEFAULT)

Sonnet - 用于常规开发任务(默认)

Use Sonnet (default) for:
  • ✏️ Writing code - Creating new files, implementing features
  • 🔧 Editing and fixing - Updating configurations, fixing bugs
  • 🐛 Debugging - Standard debugging, error analysis, troubleshooting (use Sonnet unless very complex)
  • 🧪 Testing - Writing tests, running test suites
  • 📝 Documentation - Writing READMEs, comments, docstrings
  • 🚀 Deployment tasks - Running builds, deploying code
  • 💬 General questions - Quick clarifications, simple explanations
Why Sonnet: Faster and more cost-effective for straightforward tasks, handles most debugging well
Example workflow:
1. [Opus] Learn codebase structure and identify key components (one-time)
2. [Sonnet] Implement the feature based on understanding
3. [Sonnet] Debug and fix issues as they arise
4. [Sonnet] Write tests and documentation
5. [Opus] Only if stuck on architectural or very complex issues
6. [Sonnet] Final cleanup and deployment
默认使用Sonnet处理以下场景:
  • ✏️ 编写代码 - 创建新文件、实现功能
  • 🔧 编辑与修复 - 更新配置、修复Bug
  • 🐛 调试 - 标准调试、错误分析、故障排查(除非问题极复杂,否则使用Sonnet)
  • 🧪 测试 - 编写测试、运行测试套件
  • 📝 文档编写 - 编写README、注释、文档字符串
  • 🚀 部署任务 - 执行构建、部署代码
  • 💬 常规问题 - 快速澄清、简单解释
为何选择Sonnet: 对于简单任务速度更快、性价比更高,可处理大多数调试场景
示例工作流:
1. [Opus] 学习代码库结构并识别关键组件(一次性操作)
2. [Sonnet] 基于理解实现功能
3. [Sonnet] 调试并解决出现的问题
4. [Sonnet] 编写测试与文档
5. [Opus] 仅在遇到架构层面或极复杂问题时使用
6. [Sonnet] 最终清理与部署

Cost Optimization Strategy

成本优化策略

Typical session pattern:
  1. Start with Opus - Spend 10-15 minutes understanding the codebase (one-time investment)
  2. Switch to Sonnet - Use for ALL implementation, debugging, and routine work
  3. Return to Opus - Only when explicitly needed for deep architectural understanding
Savings example:
  • 2 hours of work = 120 minutes
  • Opus for learning: 15 minutes (~5K tokens)
  • Sonnet for everything else: 105 minutes (~15K tokens)
  • vs all Opus: ~40K tokens
  • Savings: ~50% token cost
Remember: Sonnet is very capable - use it by default, including for debugging. Only escalate to Opus when the problem requires deep architectural insight.

典型会话模式:
  1. 从Opus开始 - 花费10-15分钟理解代码库(一次性投入)
  2. 切换到Sonnet - 所有实现、调试与日常工作均使用Sonnet
  3. 回到Opus - 仅在明确需要深度架构理解时使用
节省示例:
  • 2小时工作 = 120分钟
  • Opus用于学习:15分钟(约5K Token)
  • Sonnet处理其他所有工作:105分钟(约15K Token)
  • 对比全程使用Opus:约40K Token
  • 节省:约50%的Token成本
谨记: Sonnet能力很强 - 默认使用它,包括调试场景。仅当问题需要深度架构洞察时,才升级到Opus。

Skills and Token Efficiency

技能与Token效率

Common Misconception

常见误解

Myth: Having many skills in
.claude/skills/
increases token usage.
Reality: Skills use progressive disclosure - Claude loads them intelligently:
  1. At session start: Claude sees only skill descriptions (minimal tokens)
  2. When activated: Full skill content loaded only for skills being used
  3. Unused skills: Consume almost no tokens (just the description line)
误区:
.claude/skills/
中拥有多个技能会增加Token使用量。
事实: 技能采用渐进式披露 - Claude会智能加载技能:
  1. 会话开始时:Claude仅读取技能的描述(Token消耗极少)
  2. 激活时:仅加载正在使用的技能的完整内容
  3. 未使用的技能:几乎不消耗Token(仅占用描述行的Token)

Example Token Usage

示例Token使用量

.claude/skills/
├── vgp-pipeline/          # ~50 tokens (description only)
├── galaxy-tool-wrapping/  # ~40 tokens (description only)
├── token-efficiency/      # ~30 tokens (description only)
└── python-testing/        # ~35 tokens (description only)
Total overhead: ~155 tokens for 4 skills (just descriptions)
When skill activated: Additional 2,000-5,000 tokens loaded for that specific skill
.claude/skills/
├── vgp-pipeline/          # 约50 Token(仅描述)
├── galaxy-tool-wrapping/  # 约40 Token(仅描述)
├── token-efficiency/      # 约30 Token(仅描述)
└── python-testing/        # 约35 Token(仅描述)
总开销:4个技能约155 Token(仅描述)
激活技能时:会额外加载该技能的2000-5000 Token内容

Implication for Centralized Skills

集中式技能的启示

It's safe to symlink multiple skills to a project!
  • Link 10+ skills from
    $CLAUDE_METADATA
    → only ~500 tokens overhead
  • Only activate skills you need by mentioning them by name
  • Example: "Use the vgp-pipeline skill to check status" → loads only that skill
Best practice:
bash
undefined
将多个技能符号链接到项目是安全的!
  • $CLAUDE_METADATA
    链接10+个技能 → 仅约500 Token开销
  • 仅通过提及技能名称激活所需技能
  • 示例:"使用vgp-pipeline技能检查状态" → 仅加载该技能
最佳实践:
bash
undefined

Link all potentially useful skills

链接所有可能有用的技能

ln -s $CLAUDE_METADATA/skills/vgp-pipeline .claude/skills/vgp-pipeline ln -s $CLAUDE_METADATA/skills/galaxy-tool-wrapping .claude/skills/galaxy-tool-wrapping ln -s $CLAUDE_METADATA/skills/python-testing .claude/skills/python-testing
ln -s $CLAUDE_METADATA/skills/vgp-pipeline .claude/skills/vgp-pipeline ln -s $CLAUDE_METADATA/skills/galaxy-tool-wrapping .claude/skills/galaxy-tool-wrapping ln -s $CLAUDE_METADATA/skills/python-testing .claude/skills/python-testing

Activate selectively during session

在会话中选择性激活

"Use the vgp-pipeline skill to debug this workflow" # Only VGP skill fully loaded

**Token waste comes from:**
- ❌ Reading large log files unnecessarily
- ❌ Running verbose commands
- ❌ Reading unchanged files multiple times

**NOT from:**
- ✅ Having many skills available
- ✅ Well-organized skill directories
- ✅ Using centralized skill repositories

---
"使用vgp-pipeline技能调试此工作流" # 仅VGP技能会被完整加载

**Token浪费来源于:**
- ❌ 不必要地读取大型日志文件
- ❌ 运行详细输出的命令
- ❌ 多次读取未更改的文件

**而非:**
- ✅ 拥有多个可用技能
- ✅ 组织良好的技能目录
- ✅ 使用集中式技能仓库

---

Token Optimization Rules

Token优化规则

1. Use Quiet/Minimal Output Modes

1. 使用安静/最小输出模式

For commands with
--quiet
,
--silent
, or
-q
flags:
bash
undefined
对于带有
--quiet
--silent
-q
标志的命令:
bash
undefined

❌ DON'T: Use verbose mode by default

❌ 不要:默认使用详细模式

command --verbose
command --verbose

✅ DO: Use quiet mode by default

✅ 要:默认使用安静模式

command --quiet command -q command --silent

**Common commands with quiet modes:**
- `grep -q` (quiet, exit status only)
- `git --quiet` or `git -q`
- `curl -s` or `curl --silent`
- `wget -q`
- `make -s` (silent)
- Custom scripts with `--quiet` flags

**When to use verbose:** Only when user explicitly asks for detailed output.

---
command --quiet command -q command --silent

**支持安静模式的常见命令:**
- `grep -q`(安静模式,仅返回退出状态)
- `git --quiet`或`git -q`
- `curl -s`或`curl --silent`
- `wget -q`
- `make -s`(安静模式)
- 带有`--quiet`标志的自定义脚本

**何时使用详细模式:** 仅当用户明确要求详细输出时。

---

2. NEVER Read Entire Log Files

2. 绝不读取完整日志文件

Log files can be 50-200K tokens. ALWAYS filter before reading.
bash
undefined
日志文件可能包含50-200K Token。读取前务必先过滤。
bash
undefined

❌ NEVER DO THIS:

❌ 绝对不要这样做:

Read: /var/log/application.log Read: debug.log Read: error.log
Read: /var/log/application.log Read: debug.log Read: error.log

✅ ALWAYS DO ONE OF THESE:

✅ 务必选择以下方式之一:

Option 1: Read only the end (most recent)

选项1:仅读取末尾内容(最新部分)

Bash: tail -100 /var/log/application.log
Bash: tail -100 /var/log/application.log

Option 2: Filter for errors/warnings

选项2:过滤错误/警告

Bash: grep -A 10 -i "error|fail|warning" /var/log/application.log | head -100
Bash: grep -A 10 -i "error|fail|warning" /var/log/application.log | head -100

Option 3: Specific time range (if timestamps present)

选项3:特定时间范围(如果存在时间戳)

Bash: grep "2025-01-15" /var/log/application.log | tail -50
Bash: grep "2025-01-15" /var/log/application.log | tail -50

Option 4: Count occurrences first

选项4:先统计出现次数

Bash: grep -c "ERROR" /var/log/application.log # See if there are many errors Bash: grep "ERROR" /var/log/application.log | tail -20 # Then read recent ones

**Exceptions:** Only read full log if:
- User explicitly says "read the full log"
- Filtered output lacks necessary context
- Log is known to be small (<1000 lines)

---
Bash: grep -c "ERROR" /var/log/application.log # 查看是否有大量错误 Bash: grep "ERROR" /var/log/application.log | tail -20 # 然后读取最新的错误

**例外情况:** 仅在以下场景读取完整日志:
- 用户明确要求"读取完整日志"
- 过滤后的输出缺乏必要上下文
- 已知日志文件很小(<1000行)

---

3. Check Lightweight Sources First

3. 先检查轻量级数据源

Before reading large files, check if info is available in smaller sources:
For Git repositories:
bash
undefined
在读取大文件前,先检查信息是否可从更小的数据源获取:
对于Git仓库:
bash
undefined

✅ Check status first (small output)

✅ 先检查状态(输出内容少)

Bash: git status --short Bash: git log --oneline -10
Bash: git status --short Bash: git log --oneline -10

❌ Don't immediately read

❌ 不要立即读取

Read: .git/logs/HEAD # Can be large

**For Python/Node projects:**
```bash
Read: .git/logs/HEAD # 可能很大

**对于Python/Node项目:**
```bash

✅ Check package info (small files)

✅ 检查包信息(小文件)

Bash: cat package.json | jq '.dependencies' Bash: cat requirements.txt | head -20
Bash: cat package.json | jq '.dependencies' Bash: cat requirements.txt | head -20

❌ Don't immediately read

❌ 不要立即读取

Read: node_modules/ # Huge directory Read: venv/ # Large virtual environment

**For long-running processes:**
```bash
Read: node_modules/ # 巨大的目录 Read: venv/ # 大型虚拟环境

**对于长期运行的进程:**
```bash

✅ Check process status

✅ 检查进程状态

Bash: ps aux | grep python Bash: top -b -n 1 | head -20
Bash: ps aux | grep python Bash: top -b -n 1 | head -20

❌ Don't read full logs immediately

❌ 不要立即读取完整日志

Read: /var/log/syslog

---
Read: /var/log/syslog

---

4. Use Grep Instead of Reading Files

4. 使用Grep而非读取文件

When searching for specific content:
bash
undefined
当搜索特定内容时:
bash
undefined

❌ DON'T: Read file then manually search

❌ 不要:读取文件后手动搜索

Read: large_file.py # 30K tokens
Read: large_file.py # 30K Token

Then manually look for "def my_function"

然后手动查找"def my_function"

✅ DO: Use Grep to find it

✅ 要:使用Grep查找

Grep: "def my_function" large_file.py
Grep: "def my_function" large_file.py

Then only read relevant sections if needed

仅在需要时读取相关部分


**Advanced grep usage:**
```bash

**高级Grep用法:**
```bash

Find with context

带上下文查找

Bash: grep -A 5 -B 5 "pattern" file.py # 5 lines before/after
Bash: grep -A 5 -B 5 "pattern" file.py # 前后各5行

Case-insensitive search

不区分大小写搜索

Bash: grep -i "error" logfile.txt
Bash: grep -i "error" logfile.txt

Recursive search in directory

目录递归搜索

Bash: grep -r "TODO" src/ | head -20
Bash: grep -r "TODO" src/ | head -20

Count matches

统计匹配次数

Bash: grep -c "import" *.py

---
Bash: grep -c "import" *.py

---

5. Read Files with Limits

5. 带限制读取文件

If you must read a file, use offset and limit parameters:
bash
undefined
如果必须读取文件,使用偏移量和限制参数:
bash
undefined

✅ Read first 100 lines to understand structure

✅ 读取前100行以理解结构

Read: large_file.py (limit: 100)
Read: large_file.py (limit: 100)

✅ Read specific section

✅ 读取特定部分

Read: large_file.py (offset: 500, limit: 100)
Read: large_file.py (offset: 500, limit: 100)

✅ Read just the imports/header

✅ 仅读取导入/头部

Read: script.py (limit: 50)

**For very large files:**
```bash
Read: script.py (limit: 50)

**对于超大文件:**
```bash

Check file size first

先检查文件大小

Bash: wc -l large_file.txt
Bash: wc -l large_file.txt

Output: 50000 lines

输出:50000行

Then read strategically

然后策略性读取

Bash: head -100 large_file.txt # Beginning Bash: tail -100 large_file.txt # End Bash: sed -n '1000,1100p' large_file.txt # Specific middle section

**Reading Large Test Output Files:**

For Galaxy `tool_test_output.json` files (can be 30K+ lines):

```python
Bash: head -100 large_file.txt # 开头部分 Bash: tail -100 large_file.txt # 末尾部分 Bash: sed -n '1000,1100p' large_file.txt # 特定中间部分

**读取大型测试输出文件:**

对于Galaxy的`tool_test_output.json`文件(可能包含30K+行):

```python

Read summary first (top of file)

先读取摘要(文件顶部)

Read(file_path, limit=10) # Just get summary section
Read(file_path, limit=10) # 仅获取摘要部分

Then read specific test results

然后读取特定测试结果

Read(file_path, offset=140, limit=120) # Target specific test
Read(file_path, offset=140, limit=120) # 目标特定测试

Search for patterns

搜索模式

Bash("grep -n 'test_index' tool_test_output.json") # Find test boundaries

**Token savings:**
- Full file: ~60K tokens
- Targeted reads: ~5K tokens
- **Savings: 55K tokens (92%)**

---
Bash("grep -n 'test_index' tool_test_output.json") # 查找测试边界

**Token节省:**
- 完整文件:约60K Token
- 针对性读取:约5K Token
- **节省:55K Token(92%)**

---

6. Use Bash Commands Instead of Reading Files

6. 使用Bash命令而非读取文件

CRITICAL OPTIMIZATION: For file operations, use bash commands directly instead of reading files into Claude's context.
Reading files costs tokens. Bash commands don't.
关键优化: 对于文件操作,直接使用bash命令而非将文件内容读取到Claude的上下文中。
读取文件会消耗Token。Bash命令不会。

Copy File Contents

复制文件内容

bash
undefined
bash
undefined

❌ DON'T: Read and write (costs tokens for file content)

❌ 不要:读取后写入(文件内容会消耗Token)

Read: source_file.txt Write: destination_file.txt (with content from source_file.txt)
Read: source_file.txt Write: destination_file.txt (使用source_file.txt的内容)

✅ DO: Use cp command (zero token cost for file content)

✅ 要:使用cp命令(文件内容零Token消耗)

Bash: cp source_file.txt destination_file.txt

**Token savings: 100% of file content**
Bash: cp source_file.txt destination_file.txt

**Token节省:100%的文件内容Token**

Replace Text in Files

替换文件中的文本

bash
undefined
bash
undefined

❌ DON'T: Read, edit, write (costs tokens for entire file)

❌ 不要:读取、编辑、写入(整个文件会消耗Token)

Read: config.yaml Edit: config.yaml (old_string: "old_value", new_string: "new_value")
Read: config.yaml Edit: config.yaml (old_string: "old_value", new_string: "new_value")

✅ DO: Use sed in-place (zero token cost for file content)

✅ 要:使用sed原地替换(文件内容零Token消耗)

Bash: sed -i '' 's/old_value/new_value/g' config.yaml
Bash: sed -i '' 's/old_value/new_value/g' config.yaml

or

Bash: sed -i.bak 's/old_value/new_value/g' config.yaml # with backup
Bash: sed -i.bak 's/old_value/new_value/g' config.yaml # 带备份

For literal strings with special characters

对于包含特殊字符的字面字符串

Bash: sed -i '' 's|old/path|new/path|g' config.yaml # Use | as delimiter

**Token savings: 100% of file content**

**macOS vs Linux compatibility:**
```bash
Bash: sed -i '' 's|old/path|new/path|g' config.yaml # 使用|作为分隔符

**Token节省:100%的文件内容Token**

**macOS与Linux兼容性:**
```bash

macOS (BSD sed) - requires empty string after -i

macOS(BSD sed)- -i后需要空字符串

sed -i '' 's/old/new/g' file.txt
sed -i '' 's/old/new/g' file.txt

Linux (GNU sed) - no argument needed

Linux(GNU sed)- 不需要参数

sed -i 's/old/new/g' file.txt
sed -i 's/old/new/g' file.txt

Cross-platform solution (works everywhere):

跨平台解决方案(适用于所有系统):

sed -i.bak 's/old/new/g' file.txt && rm file.txt.bak
sed -i.bak 's/old/new/g' file.txt && rm file.txt.bak

OR detect OS:

或检测操作系统:

if [[ "$OSTYPE" == "darwin"* ]]; then sed -i '' 's/old/new/g' file.txt else sed -i 's/old/new/g' file.txt fi
if [[ "$OSTYPE" == "darwin"* ]]; then sed -i '' 's/old/new/g' file.txt else sed -i 's/old/new/g' file.txt fi

Portable alternative (no -i flag):

可移植替代方案(无-i标志):

sed 's/old/new/g' file.txt > file.tmp && mv file.tmp file.txt

**Why this matters:** Scripts using `sed -i` will fail on macOS with cryptic errors like "can't read /pattern/..." if the empty string is omitted. Always use `sed -i ''` for macOS compatibility or `sed -i.bak` for cross-platform safety.
sed 's/old/new/g' file.txt > file.tmp && mv file.tmp file.txt

**为何重要:** 如果省略空字符串,使用`sed -i`的脚本在macOS上会失败,并出现类似"无法读取/pattern/..."的模糊错误。始终使用`sed -i ''`以兼容macOS,或使用`sed -i.bak`以保证跨平台安全。

Append to Files

追加到文件

bash
undefined
bash
undefined

❌ DON'T: Read and write entire file

❌ 不要:读取并写入整个文件

Read: log.txt Write: log.txt (with existing content + new line)
Read: log.txt Write: log.txt(包含现有内容 + 新行)

✅ DO: Use echo or append

✅ 要:使用echo或追加方式

Bash: echo "New log entry" >> log.txt Bash: cat >> log.txt << 'EOF' Multiple lines of content EOF

**Token savings: 100% of existing file content**
Bash: echo "New log entry" >> log.txt Bash: cat >> log.txt << 'EOF' 多行内容 EOF

**Token节省:100%的现有文件内容Token**

Delete Lines from Files

从文件中删除行

bash
undefined
bash
undefined

❌ DON'T: Read, filter, write

❌ 不要:读取、过滤、写入

Read: data.txt Write: data.txt (without lines containing "DELETE")
Read: data.txt Write: data.txt(不包含含有"DELETE"的行)

✅ DO: Use sed or grep

✅ 要:使用sed或grep

Bash: sed -i '' '/DELETE/d' data.txt
Bash: sed -i '' '/DELETE/d' data.txt

or

Bash: grep -v "DELETE" data.txt > data_temp.txt && mv data_temp.txt data.txt
undefined
Bash: grep -v "DELETE" data.txt > data_temp.txt && mv data_temp.txt data.txt
undefined

Extract Specific Lines

提取特定行

bash
undefined
bash
undefined

❌ DON'T: Read entire file to get a few lines

❌ 不要:读取整个文件以获取几行内容

Read: large_file.txt (find lines 100-110)
Read: large_file.txt(查找100-110行)

✅ DO: Use sed or awk

✅ 要:使用sed或awk

Bash: sed -n '100,110p' large_file.txt Bash: awk 'NR>=100 && NR<=110' large_file.txt Bash: head -110 large_file.txt | tail -11
undefined
Bash: sed -n '100,110p' large_file.txt Bash: awk 'NR>=100 && NR<=110' large_file.txt Bash: head -110 large_file.txt | tail -11
undefined

Rename Files in Bulk

批量重命名文件

bash
undefined
bash
undefined

❌ DON'T: Read directory, loop in Claude, execute renames

❌ 不要:读取目录、在Claude中循环、执行重命名

Read directory listing... For each file: mv old_name new_name
读取目录列表... 对每个文件执行:mv old_name new_name

✅ DO: Use bash loop or rename command

✅ 要:使用bash循环或rename命令

Bash: for f in *.txt; do mv "$f" "${f%.txt}.md"; done Bash: rename 's/.txt$/.md/' *.txt # if rename command available
undefined
Bash: for f in *.txt; do mv "$f" "${f%.txt}.md"; done Bash: rename 's/.txt$/.md/' *.txt # 如果rename命令可用
undefined

Merge Files

合并文件

bash
undefined
bash
undefined

❌ DON'T: Read multiple files and write combined

❌ 不要:读取多个文件并写入合并内容

Read: file1.txt Read: file2.txt Write: combined.txt
Read: file1.txt Read: file2.txt Write: combined.txt

✅ DO: Use cat

✅ 要:使用cat

Bash: cat file1.txt file2.txt > combined.txt
Bash: cat file1.txt file2.txt > combined.txt

or append

或追加

Bash: cat file2.txt >> file1.txt
undefined
Bash: cat file2.txt >> file1.txt
undefined

Count Lines/Words/Characters

统计行数/单词数/字符数

bash
undefined
bash
undefined

❌ DON'T: Read file to count

❌ 不要:读取文件以统计

Read: document.txt
Read: document.txt

Then count lines manually

然后手动统计行数

✅ DO: Use wc

✅ 要:使用wc

Bash: wc -l document.txt # Lines Bash: wc -w document.txt # Words Bash: wc -c document.txt # Characters
undefined
Bash: wc -l document.txt # 行数 Bash: wc -w document.txt # 单词数 Bash: wc -c document.txt # 字符数
undefined

Check if File Contains Text

检查文件是否包含文本

bash
undefined
bash
undefined

❌ DON'T: Read file to search

❌ 不要:读取文件以搜索

Read: config.yaml
Read: config.yaml

Then search for text

然后搜索文本

✅ DO: Use grep with exit code

✅ 要:使用grep并检查退出码

Bash: grep -q "search_term" config.yaml && echo "Found" || echo "Not found"
Bash: grep -q "search_term" config.yaml && echo "Found" || echo "Not found"

or just check exit code

或仅检查退出码

Bash: grep -q "search_term" config.yaml # Exit 0 if found, 1 if not
undefined
Bash: grep -q "search_term" config.yaml # 找到则返回0,未找到则返回1
undefined

Sort File Contents

排序文件内容

bash
undefined
bash
undefined

❌ DON'T: Read, sort in memory, write

❌ 不要:读取、内存中排序、写入

Read: unsorted.txt Write: sorted.txt (with sorted content)
Read: unsorted.txt Write: sorted.txt(包含排序后的内容)

✅ DO: Use sort command

✅ 要:使用sort命令

Bash: sort unsorted.txt > sorted.txt Bash: sort -u unsorted.txt > sorted_unique.txt # Unique sorted Bash: sort -n numbers.txt > sorted_numbers.txt # Numeric sort
undefined
Bash: sort unsorted.txt > sorted.txt Bash: sort -u unsorted.txt > sorted_unique.txt # 去重排序 Bash: sort -n numbers.txt > sorted_numbers.txt # 数值排序
undefined

Remove Duplicate Lines

移除重复行

bash
undefined
bash
undefined

❌ DON'T: Read and deduplicate manually

❌ 不要:读取并手动去重

Read: file_with_dupes.txt Write: file_no_dupes.txt
Read: file_with_dupes.txt Write: file_no_dupes.txt

✅ DO: Use sort -u or uniq

✅ 要:使用sort -u或uniq

Bash: sort -u file_with_dupes.txt > file_no_dupes.txt
Bash: sort -u file_with_dupes.txt > file_no_dupes.txt

or preserve order

或保留顺序

Bash: awk '!seen[$0]++' file_with_dupes.txt > file_no_dupes.txt
undefined
Bash: awk '!seen[$0]++' file_with_dupes.txt > file_no_dupes.txt
undefined

Find and Replace Across Multiple Files

多文件查找与替换

bash
undefined
bash
undefined

❌ DON'T: Read each file, edit, write back

❌ 不要:读取每个文件、编辑、写回

Read: file1.py Edit: file1.py (replace text) Read: file2.py Edit: file2.py (replace text)
Read: file1.py Edit: file1.py(替换文本) Read: file2.py Edit: file2.py(替换文本)

... repeat for many files

... 重复多次

✅ DO: Use sed with find or loop

✅ 要:使用sed结合find或循环

Bash: find . -name "*.py" -exec sed -i '' 's/old_text/new_text/g' {} +
Bash: find . -name "*.py" -exec sed -i '' 's/old_text/new_text/g' {} +

or

Bash: for f in *.py; do sed -i '' 's/old_text/new_text/g' "$f"; done
undefined
Bash: for f in *.py; do sed -i '' 's/old_text/new_text/g' "$f"; done
undefined

Create File with Template Content

使用模板内容创建文件

bash
undefined
bash
undefined

❌ DON'T: Use Write tool for static content

❌ 不要:使用Write工具写入静态内容

Write: template.txt (with multi-line template)
Write: template.txt(包含多行模板)

✅ DO: Use heredoc or echo

✅ 要:使用here文档或echo

Bash: cat > template.txt << 'EOF' Multi-line template content EOF
Bash: cat > template.txt << 'EOF' 多行 模板 内容 EOF

or for simple content

或对于简单内容

Bash: echo "Single line content" > file.txt
undefined
Bash: echo "Single line content" > file.txt
undefined

When to Break These Rules

何时打破这些规则

Still use Read/Edit/Write when:
  1. Complex logic required: Conditional edits based on file structure
  2. Code-aware changes: Editing within functions, preserving indentation
  3. Validation needed: Need to verify content before changing
  4. Interactive review: User needs to see content before approving changes
  5. Multi-step analysis: Need to understand code structure first
Example where Read/Edit is better:
python
undefined
仍需使用Read/Edit/Write的场景:
  1. 需要复杂逻辑:基于文件结构的条件编辑
  2. 代码感知型更改:在函数内编辑、保留缩进
  3. 需要验证:更改前需验证内容
  4. 交互式审核:用户需要在批准更改前查看内容
  5. 多步骤分析:需要先理解代码结构
Read/Edit更合适的示例:
python
undefined

Changing function signature requires understanding context

更改函数签名需要理解上下文

Read: module.py Edit: module.py (update specific function while preserving structure)

**Example where bash is better:**
```bash
Read: module.py Edit: module.py(更新特定函数同时保留结构)

**Bash更合适的示例:**
```bash

Simple text replacement

简单文本替换

Bash: sed -i '' 's/old_api_url/new_api_url/g' config.py
undefined
Bash: sed -i '' 's/old_api_url/new_api_url/g' config.py
undefined

Token Savings Examples

Token节省示例

Example 1: Update 10 config files
Wasteful approach:
bash
Read: config1.yaml  # 5K tokens
Edit: config1.yaml
Read: config2.yaml  # 5K tokens
Edit: config2.yaml
示例1:更新10个配置文件
低效方式:
bash
Read: config1.yaml  # 5K Token
Edit: config1.yaml
Read: config2.yaml  # 5K Token
Edit: config2.yaml

... repeat 10 times = 50K tokens

... 重复10次 = 50K Token


Efficient approach:
```bash
Bash: for f in config*.yaml; do sed -i '' 's/old/new/g' "$f"; done

高效方式:
```bash
Bash: for f in config*.yaml; do sed -i '' 's/old/new/g' "$f"; done

Token cost: ~100 tokens for command, 0 for file content

Token成本:约100 Token(命令本身),文件内容零Token


**Savings: 49,900 tokens (99.8%)**

**Example 2: Copy configuration**

Wasteful approach:
```bash
Read: template_config.yaml  # 10K tokens
Write: project_config.yaml  # 10K tokens

**节省:49,900 Token(99.8%)**

**示例2:复制配置**

低效方式:
```bash
Read: template_config.yaml  # 10K Token
Write: project_config.yaml  # 10K Token

Total: 20K tokens

总计:20K Token


Efficient approach:
```bash
Bash: cp template_config.yaml project_config.yaml

高效方式:
```bash
Bash: cp template_config.yaml project_config.yaml

Token cost: ~50 tokens

Token成本:约50 Token


**Savings: 19,950 tokens (99.75%)**

**Example 3: Append log entry**

Wasteful approach:
```bash
Read: application.log  # 50K tokens (large file)
Write: application.log  # 50K tokens

**节省:19,950 Token(99.75%)**

**示例3:追加日志条目**

低效方式:
```bash
Read: application.log  # 50K Token(大文件)
Write: application.log  # 50K Token

Total: 100K tokens

总计:100K Token


Efficient approach:
```bash
Bash: echo "[$(date)] Log entry" >> application.log

高效方式:
```bash
Bash: echo "[$(date)] Log entry" >> application.log

Token cost: ~50 tokens

Token成本:约50 Token


**Savings: 99,950 tokens (99.95%)**

**节省:99,950 Token(99.95%)**

Find CSV Column Indices

查找CSV列索引

bash
undefined
bash
undefined

❌ DON'T: Read entire CSV file to find column numbers

❌ 不要:读取完整CSV文件以查找列号

Read: large_table.csv (100+ columns, thousands of rows)
Read: large_table.csv(100+列,数千行)

Then manually count columns

然后手动统计列数

✅ DO: Extract and number header row

✅ 要:提取并编号表头行

Bash: head -1 file.csv | tr ',' '\n' | nl
Bash: head -1 file.csv | tr ',' '\n' | nl

✅ DO: Find specific columns by pattern

✅ 要:按模式查找特定列

Bash: head -1 VGP-table.csv | tr ',' '\n' | nl | grep -i "chrom"
Bash: head -1 VGP-table.csv | tr ',' '\n' | nl | grep -i "chrom"

Output shows column numbers and names:

输出显示列号和名称:

54 num_chromosomes

54 num_chromosomes

106 total_number_of_chromosomes

106 total_number_of_chromosomes

122 num_chromosomes_haploid

122 num_chromosomes_haploid


**How it works:**
- `head -1`: Get header row only
- `tr ',' '\n'`: Convert comma-separated to newlines
- `nl`: Number the lines (gives column index)
- `grep -i`: Filter by pattern (case-insensitive)

**Use case**: Quickly identify which columns contain needed data in wide tables (100+ columns).

**Token savings: 100% of file content** - Only see column headers, not data rows.

**工作原理:**
- `head -1`:仅获取表头行
- `tr ',' '\n'`:将逗号分隔转换为换行
- `nl`:为行编号(得到列索引)
- `grep -i`:按模式过滤(不区分大小写)

**使用场景:** 在宽表(100+列)中快速识别包含所需数据的列。

**Token节省:100%的文件内容Token** - 仅查看列标题,而非数据行。

Python Data Filtering Pattern

Python数据过滤模式

bash
undefined
bash
undefined

✅ Create separate filtered files rather than overwriting

✅ 创建单独的过滤文件而非覆盖

Read original

读取原始文件

species_data = [] with open('data.csv', 'r') as f: reader = csv.DictReader(f) for row in reader: if row['accession'] and row['chromosome_count']: # Filter criteria species_data.append(row)
species_data = [] with open('data.csv', 'r') as f: reader = csv.DictReader(f) for row in reader: if row['accession'] and row['chromosome_count']: # 过滤条件 species_data.append(row)

Write to NEW file with descriptive suffix

写入新文件并添加描述性后缀

output_file = 'data_filtered.csv' # Not 'data.csv' with open(output_file, 'w', newline='') as f: writer = csv.DictWriter(f, fieldnames=reader.fieldnames) writer.writeheader() writer.writerows(species_data)

**Benefits:**
- Preserves original data for comparison
- Clear naming indicates filtering applied
- Can generate multiple filtered versions
- Easier to debug and verify filtering logic
output_file = 'data_filtered.csv' # 不是'data.csv' with open(output_file, 'w', newline='') as f: writer = csv.DictWriter(f, fieldnames=reader.fieldnames) writer.writeheader() writer.writerows(species_data)

**优势:**
- 保留原始数据用于比较
- 清晰的命名表明已应用过滤
- 可生成多个过滤版本
- 更容易调试和验证过滤逻辑

Handling Shell Aliases in Python Scripts

在Python脚本中处理Shell别名

Problem: Python's
subprocess.run()
doesn't expand shell aliases.
python
undefined
问题: Python的
subprocess.run()
不会展开Shell别名。
python
undefined

❌ FAILS if 'datasets' is an alias

❌ 如果'datasets'是别名则失败

subprocess.run(['datasets', 'summary', ...])
subprocess.run(['datasets', 'summary', ...])

Error: [Errno 2] No such file or directory: 'datasets'

错误:[Errno 2] 没有那个文件或目录: 'datasets'


**Solution**: Use full path to executable

```bash

**解决方案:** 使用可执行文件的完整路径

```bash

Find full path

查找完整路径

type -a datasets
type -a datasets

Output: datasets is an alias for ~/Workdir/ncbi_tests/datasets

输出:datasets是~/Workdir/ncbi_tests/datasets的别名

echo ~/Workdir/ncbi_tests/datasets # Expand ~
echo /Workdir/ncbi_tests/datasets # 展开

Output: /Users/delphine/Workdir/ncbi_tests/datasets

输出:/Users/delphine/Workdir/ncbi_tests/datasets


```python

```python

Use full path in script

在脚本中使用完整路径

datasets_cmd = '/Users/delphine/Workdir/ncbi_tests/datasets' subprocess.run([datasets_cmd, 'summary', ...])

**Alternative**: Use `shell=True` (but avoid for security reasons with user input)

---
datasets_cmd = '/Users/delphine/Workdir/ncbi_tests/datasets' subprocess.run([datasets_cmd, 'summary', ...])

**替代方案:** 使用`shell=True`(但处理用户输入时出于安全考虑需避免)

---

Key Principle for File Operations

文件操作的核心原则

Ask yourself first:
  1. Can this be done with
    cp
    ,
    mv
    ,
    sed
    ,
    awk
    ,
    grep
    ?
  2. Is the change purely textual (not logic-dependent)?
  3. Do I need to see the file content, or just modify it?
If answers are YES, YES, NO → Use bash commands, not Read/Edit/Write

先问自己:
  1. 这个操作可以用
    cp
    mv
    sed
    awk
    grep
    完成吗?
  2. 这个更改是纯文本操作(不依赖逻辑)吗?
  3. 我需要查看文件内容,还是仅需要修改它?
如果答案是是、是、否 → 使用bash命令,而非Read/Edit/Write

7. Filter Command Output

7. 过滤命令输出

For commands that produce large output:
bash
undefined
对于会产生大量输出的命令:
bash
undefined

❌ DON'T: Capture all output

❌ 不要:捕获所有输出

Bash: find / -name "*.py" # Could return 10,000+ files
Bash: find / -name "*.py" # 可能返回10000+个文件

✅ DO: Limit or filter output

✅ 要:限制或过滤输出

Bash: find /specific/path -name ".py" | head -50 Bash: find . -name ".py" -type f | wc -l # Count first Bash: find . -name "*.py" -type f | grep "test" | head -20 # Filter
Bash: find /specific/path -name ".py" | head -50 Bash: find . -name ".py" -type f | wc -l # 先统计 Bash: find . -name "*.py" -type f | grep "test" | head -20 # 过滤

❌ DON'T: Run verbose commands without filtering

❌ 不要:运行无过滤的详细命令

Bash: ls -laR / # Recursive listing of entire filesystem!
Bash: ls -laR / # 递归列出整个文件系统!

✅ DO: Limit scope and depth

✅ 要:限制范围和深度

Bash: ls -la Bash: find . -maxdepth 2 -type f Bash: tree -L 2 # Limit tree depth

---
Bash: ls -la Bash: find . -maxdepth 2 -type f Bash: tree -L 2 # 限制树深度

---

7. Summarize, Don't Dump

7. 总结,而非转储

When explaining command output or file contents:
User: "What's in this directory?"

❌ BAD RESPONSE:
[Paste entire 5K token ls -la output with 500 files]

✅ GOOD RESPONSE:
"This directory contains 487 files. Key items:
- 235 Python files (*.py)
- 142 test files in tests/
- 89 config files (*.yaml, *.json)
- Main entry point: main.py
- Documentation in docs/

Would you like to see specific files or file types?"
For code:
User: "What does this script do?"

❌ BAD: [Read entire 500-line file, paste all code]

✅ GOOD:
1. Read: script.py (limit: 50)  # Just the header/imports
2. Grep: "^def " script.py  # List all functions
3. Summarize: "This script has 5 main functions:
   - parse_args(): Command-line argument parsing
   - load_data(): Reads CSV files
   - process_data(): Applies transformations
   - validate_output(): Checks results
   - main(): Orchestrates the workflow

   Would you like details on any specific function?"

当解释命令输出或文件内容时:
用户:"这个目录里有什么?"

❌ 糟糕的回复:
[粘贴包含500个文件的完整5K Token ls -la输出]

✅ 良好的回复:
"此目录包含487个文件。关键内容:
- 235个Python文件(*.py)
- tests/目录下有142个测试文件
- 89个配置文件(*.yaml、*.json)
- 主入口点:main.py
- 文档位于docs/

您需要查看特定文件或文件类型吗?"
对于代码:
用户:"这个脚本是做什么的?"

❌ 糟糕:[读取整个500行文件,粘贴所有代码]

✅ 良好:
1. Read: script.py (limit: 50)  # 仅头部/导入
2. Grep: "^def " script.py  # 列出所有函数
3. 总结:"此脚本有5个主要函数:
   - parse_args(): 命令行参数解析
   - load_data(): 读取CSV文件
   - process_data(): 应用转换
   - validate_output(): 检查结果
   - main(): 编排工作流

   您需要了解特定函数的详细信息吗?"

8. Use Head/Tail for Large Output

8. 对大型输出使用Head/Tail

When commands produce large output:
bash
undefined
当命令产生大量输出时:
bash
undefined

✅ Limit output length

✅ 限制输出长度

Bash: cat large_file.txt | head -100 Bash: cat large_file.txt | tail -100 Bash: docker logs container_name | tail -50
Bash: cat large_file.txt | head -100 Bash: cat large_file.txt | tail -100 Bash: docker logs container_name | tail -50

✅ Sample from middle

✅ 从中间采样

Bash: cat large_file.txt | head -500 | tail -100 # Lines 400-500
Bash: cat large_file.txt | head -500 | tail -100 # 400-500行

✅ Check size before reading

✅ 读取前检查大小

Bash: wc -l file.txt
Bash: wc -l file.txt

If > 1000 lines, use head/tail

如果>1000行,使用head/tail


---

---

9. Use JSON/Data Tools Efficiently

9. 高效使用JSON/数据工具

For JSON, YAML, XML files:
bash
undefined
对于JSON、YAML、XML文件:
bash
undefined

❌ DON'T: Read entire file

❌ 不要:读取整个文件

Read: large_config.json # Could be 50K tokens
Read: large_config.json # 可能50K Token

✅ DO: Extract specific fields

✅ 要:提取特定字段

Bash: cat large_config.json | jq '.metadata' Bash: cat large_config.json | jq 'keys' # Just see top-level keys Bash: cat config.yaml | yq '.database.host'
Bash: cat large_config.json | jq '.metadata' Bash: cat large_config.json | jq 'keys' # 仅查看顶级键 Bash: cat config.yaml | yq '.database.host'

For XML

对于XML

Bash: xmllint --xpath '//database/host' config.xml

**For CSV files:**
```bash
Bash: xmllint --xpath '//database/host' config.xml

**对于CSV文件:**
```bash

❌ DON'T: Read entire CSV

❌ 不要:读取整个CSV

Read: large_data.csv # Could be millions of rows
Read: large_data.csv # 可能数百万行

✅ DO: Sample and analyze

✅ 要:采样并分析

Bash: head -20 large_data.csv # See header and sample rows Bash: wc -l large_data.csv # Count rows Bash: csvstat large_data.csv # Get statistics (if csvkit installed)

---
Bash: head -20 large_data.csv # 查看表头和样本行 Bash: wc -l large_data.csv # 统计行数 Bash: csvstat large_data.csv # 获取统计信息(如果已安装csvkit)

---

10. Optimize Code Reading

10. 优化代码读取

For understanding codebases:
bash
undefined
对于理解代码库:
bash
undefined

✅ STEP 1: Get overview

✅ 步骤1:获取概览

Bash: find . -name ".py" | head -20 # List files Bash: grep -r "^class " --include=".py" | head -20 # List classes Bash: grep -r "^def " --include="*.py" | wc -l # Count functions
Bash: find . -name ".py" | head -20 # 列出文件 Bash: grep -r "^class " --include=".py" | head -20 # 列出类 Bash: grep -r "^def " --include="*.py" | wc -l # 统计函数数量

✅ STEP 2: Read structure only

✅ 步骤2:仅读取结构

Read: main.py (limit: 100) # Just imports and main structure
Read: main.py (limit: 100) # 仅导入和主结构

✅ STEP 3: Search for specific code

✅ 步骤3:搜索特定代码

Grep: "class MyClass" src/
Grep: "class MyClass" src/

✅ STEP 4: Read only relevant sections

✅ 步骤4:仅读取相关部分

Read: src/mymodule.py (offset: 150, limit: 50) # Just the relevant class
Read: src/mymodule.py (offset: 150, limit: 50) # 仅相关类

❌ DON'T: Read entire files sequentially

❌ 不要:顺序读取整个文件

Read: file1.py # 30K tokens Read: file2.py # 30K tokens Read: file3.py # 30K tokens

---
Read: file1.py # 30K Token Read: file2.py # 30K Token Read: file3.py # 30K Token

---

11. Use Task Tool for Exploratory Searches

11. 使用Task工具进行探索性搜索

When exploring a codebase to understand patterns or find information (not needle queries for specific files):
❌ Inefficient approach (many tool calls, large context):
python
undefined
当探索代码库以理解模式或查找信息(而非针对特定文件的精准查询)时:
❌ 低效方式(多次工具调用,上下文庞大):
python
undefined

Direct grep through many files

直接在多个文件中grep

Grep(pattern="some_pattern", path=".", output_mode="content")
Grep(pattern="some_pattern", path=".", output_mode="content")

Followed by multiple Read calls to understand context

随后多次调用Read以理解上下文

Read("file1.py") Read("file2.py")
Read("file1.py") Read("file2.py")

Followed by more Grep calls for related patterns

随后针对相关模式进行更多Grep调用

Grep(pattern="related_pattern", path=".", output_mode="content")
Grep(pattern="related_pattern", path=".", output_mode="content")

Results in dozens of tool calls and accumulating context

导致数十次工具调用和上下文累积


**✅ Efficient approach (single consolidated response)**:
```python

**✅ 高效方式(单一整合响应):**
```python

Use Task tool with Explore subagent

使用Task工具和Explore子代理

Task( subagent_type="Explore", description="Research how Galaxy API works", prompt="""Explore the codebase to understand how Galaxy API calls are made. I need to know: - Which files contain API call patterns - How authentication is handled - Common error handling patterns Return a summary with file locations and key patterns.""" )

**When to use Task/Explore**:
- "How does X work in this codebase?"
- "Where are errors from Y handled?"
- "What is the structure of Z?"
- Searching for patterns across multiple files
- Need context from multiple locations
- Exploring unfamiliar codebases

**When to use direct tools instead**:
- "Read file at specific path X" → Use `Read`
- "Find class definition Foo" → Use `Glob("**/foo.py")` or `Grep("class Foo")`
- "Search for specific string in file X" → Use `Grep(pattern, path="file.py")`
- You know exactly which file to check

**Token savings**:
- Task tool: ~5-10K tokens for consolidated response
- Direct exploration: ~30-50K tokens (many tool calls + context accumulation)
- **Savings: 70-80%** for exploratory searches

**Example comparison**:

```python
Task( subagent_type="Explore", description="研究Galaxy API的工作原理", prompt="""探索代码库以理解Galaxy API调用的实现方式。 我需要了解: - 哪些文件包含API调用模式 - 如何处理认证 - 常见的错误处理模式 返回包含文件位置和关键模式的总结。""" )

**何时使用Task/Explore:**
- "X在这个代码库中是如何工作的?"
- "Y的错误在哪里处理?"
- "Z的结构是什么?"
- 在多个文件中搜索模式
- 需要来自多个位置的上下文
- 探索不熟悉的代码库

**何时改用直接工具:**
- "读取特定路径X的文件" → 使用`Read`
- "查找类Foo的定义" → 使用`Glob("**/foo.py")`或`Grep("class Foo")`
- "在文件X中搜索特定字符串" → 使用`Grep(pattern, path="file.py")`
- 您确切知道要检查哪个文件

**Token节省:**
- Task工具:约5-10K Token(整合响应)
- 直接探索:约30-50K Token(多次工具调用 + 上下文累积)
- **节省:70-80%**(针对探索性搜索)

**示例对比:**

```python

❌ Inefficient: Exploring workflow patterns manually

❌ 低效:手动探索工作流模式

Grep("workflow", output_mode="content") # 15K tokens Read("workflow1.py") # 20K tokens Read("workflow2.py") # 18K tokens Grep("error handling", output_mode="content") # 12K tokens
Grep("workflow", output_mode="content") # 15K Token Read("workflow1.py") # 20K Token Read("workflow2.py") # 18K Token Grep("error handling", output_mode="content") # 12K Token

Total: ~65K tokens

总计:约65K Token

✅ Efficient: Using Task tool

✅ 高效:使用Task工具

Task( subagent_type="Explore", description="Understand workflow error handling", prompt="Explore how workflows handle errors. Return patterns and file locations." )
Task( subagent_type="Explore", description="理解工作流错误处理", prompt="探索工作流如何处理错误。返回模式和文件位置。" )

Total: ~8K tokens (single consolidated response)

总计:约8K Token(单一整合响应)

Savings: 88%

节省:88%


---

---

12. Efficient Scientific Literature Searches

12. 高效科学文献搜索

When searching for data across multiple species (karyotypes, traits, etc.):
❌ Inefficient: Sequential searches
python
for species in species_list:
    search(species)  # One at a time
✅ Efficient: Parallel searches in batches
python
undefined
当跨多个物种搜索数据(核型、性状等)时:
❌ 低效:顺序搜索
python
for species in species_list:
    search(species)  # 逐个搜索
✅ 高效:批量并行搜索
python
undefined

Make 5 searches simultaneously

同时进行5次搜索

WebSearch("species1 karyotype") WebSearch("species2 karyotype") WebSearch("species3 karyotype") WebSearch("species4 karyotype") WebSearch("species5 karyotype")

**Benefits**:
- 5x faster for user
- Same token usage per search
- Better user experience
- Allows quick progress saves before session limits

**Best practices**:
- Batch 3-5 related searches together
- Group by taxonomy or data type
- Save results immediately after each batch
- Document "not found" species to avoid re-searching
WebSearch("species1 karyotype") WebSearch("species2 karyotype") WebSearch("species3 karyotype") WebSearch("species4 karyotype") WebSearch("species5 karyotype")

**优势:**
- 对用户来说速度提升5倍
- 每次搜索的Token使用量相同
- 更好的用户体验
- 允许在会话限制前快速保存进度

**最佳实践:**
- 将3-5个相关搜索分组为一批
- 按分类或数据类型分组
- 每批搜索后立即保存结果
- 记录"未找到"的物种以避免重复搜索

Dealing with Session Interruptions

处理会话中断

When user warns about daily limits:
  1. Immediately save progress:
    • Write findings to file
    • Update CSV/database with confirmed data
    • Create detailed progress document
  2. Document search status:
    • Which species searched
    • Which confirmed/not found
    • Which remain to search
    • Next steps with priority order
  3. Create resume file with:
    • Current totals
    • Completed work
    • Pending tasks with priorities
    • Recommendations for next session
Example:
PROGRESS_YYYYMMDD.md
file with clear resumption instructions
当用户警告每日限制时:
  1. 立即保存进度
    • 将发现写入文件
    • 使用已确认的数据更新CSV/数据库
    • 创建详细的进度文档
  2. 记录搜索状态
    • 已搜索的物种
    • 已确认/未找到的物种
    • 待搜索的物种
    • 按优先级排序的下一步计划
  3. 创建恢复文件,包含:
    • 当前总计
    • 已完成的工作
    • 带优先级的待处理任务
    • 下一会话的建议
示例:
PROGRESS_YYYYMMDD.md
文件,包含清晰的恢复说明

Search Term Iteration

搜索词迭代

When initial searches fail, refine systematically:
  1. First try: Specific scientific terms
    • "Anas acuta karyotype 2n"
  2. Second try: Common name + scientific
    • "northern pintail Anas acuta chromosome number"
  3. Third try: Genus-level patterns
    • "Anas genus karyotype waterfowl"
  4. Fourth try: Family-level studies
    • "Anatidae chromosome evolution cytogenetics"
Don't: Keep searching the same terms repeatedly Do: Escalate to higher taxonomic levels or comparative studies

当初始搜索失败时,系统地优化搜索词:
  1. 首次尝试:特定科学术语
    • "Anas acuta karyotype 2n"
  2. 第二次尝试:通用名称 + 科学名称
    • "northern pintail Anas acuta chromosome number"
  3. 第三次尝试:属级模式
    • "Anas genus karyotype waterfowl"
  4. 第四次尝试:科级研究
    • "Anatidae chromosome evolution cytogenetics"
不要: 重复搜索相同的术语 要: 升级到更高分类级别或比较研究

Token Savings Examples

Token节省示例

Example 1: Status Check

示例1:状态检查

Scenario: User asks "What's the status of my application?"
❌ Wasteful approach (50K tokens):
bash
Read: /var/log/app.log  # 40K tokens
Bash: systemctl status myapp  # 10K tokens
✅ Efficient approach (3K tokens):
bash
Bash: systemctl status myapp --no-pager | head -20  # 1K tokens
Bash: tail -50 /var/log/app.log  # 2K tokens
Savings: 94%

场景: 用户问"我的应用状态如何?"
❌ 低效方式(50K Token):
bash
Read: /var/log/app.log  # 40K Token
Bash: systemctl status myapp  # 10K Token
✅ 高效方式(3K Token):
bash
Bash: systemctl status myapp --no-pager | head -20  # 1K Token
Bash: tail -50 /var/log/app.log  # 2K Token
节省:94%

Example 2: Debugging Errors

示例2:调试错误

Scenario: User says "My script is failing, help debug"
❌ Wasteful approach (200K tokens):
bash
Read: debug.log  # 150K tokens
Read: script.py  # 30K tokens
Read: config.json  # 20K tokens
✅ Efficient approach (8K tokens):
bash
Bash: tail -100 debug.log  # 3K tokens
Bash: grep -i "error\|traceback" debug.log | tail -50  # 2K tokens
Grep: "def main" script.py  # 1K tokens
Read: script.py (offset: 120, limit: 50)  # 2K tokens (just the failing function)
Savings: 96%

场景: 用户说"我的脚本失败了,帮忙调试"
❌ 低效方式(200K Token):
bash
Read: debug.log  # 150K Token
Read: script.py  # 30K Token
Read: config.json  # 20K Token
✅ 高效方式(8K Token):
bash
Bash: tail -100 debug.log  # 3K Token
Bash: grep -i "error\|traceback" debug.log | tail -50  # 2K Token
Grep: "def main" script.py  # 1K Token
Read: script.py (offset: 120, limit: 50)  # 2K Token(仅失败函数)
节省:96%

Example 3: Code Review

示例3:代码审查

Scenario: User asks "Review this codebase"
❌ Wasteful approach (500K tokens):
bash
Read: file1.py
Read: file2.py
Read: file3.py
Read: file4.py
场景: 用户问"审查这个代码库"
❌ 低效方式(500K Token):
bash
Read: file1.py
Read: file2.py
Read: file3.py
Read: file4.py

... reads 20+ files

... 读取20+个文件


**✅ Efficient approach (20K tokens):**
```bash
Bash: find . -name "*.py" | head -30  # 1K
Bash: cloc .  # Lines of code summary - 1K
Bash: grep -r "^class " --include="*.py" | head -20  # 2K
Bash: grep -r "^def " --include="*.py" | wc -l  # 1K
Read: main.py (limit: 100)  # 3K
Read: README.md  # 5K
Grep: "TODO\|FIXME\|XXX" -r .  # 2K

**✅ 高效方式(20K Token):**
```bash
Bash: find . -name "*.py" | head -30  # 1K
Bash: cloc .  # 代码行数总结 - 1K
Bash: grep -r "^class " --include="*.py" | head -20  # 2K
Bash: grep -r "^def " --include="*.py" | wc -l  # 1K
Read: main.py (limit: 100)  # 3K
Read: README.md  # 5K
Grep: "TODO\|FIXME\|XXX" -r .  # 2K

Then ask user what specific areas to review

然后询问用户要审查的特定领域

**Savings: 96%**

---
**节省:96%**

---

When to Override These Guidelines

何时覆盖这些准则

Override efficiency rules when:
  1. User explicitly requests full output:
    • "Show me the entire log file"
    • "Read the full source code"
    • "I don't care about token cost"
  2. Filtered output lacks necessary context:
    • Error message references line numbers not in filtered output
    • Need to understand full data flow
    • Debugging requires seeing complete state
  3. File is known to be small:
    • File is < 200 lines
    • Config files with minimal content
    • Small documentation files
  4. Learning code structure and architecture (IMPORTANT):
    • User is exploring a new codebase to understand its organization
    • Learning coding patterns, idioms, or best practices from existing code
    • Understanding how modules/classes are structured
    • Studying implementation approaches for educational purposes
    • Reading example code or reference implementations
    • Initial exploration phase before making changes
    Key indicators for learning mode:
    • User says: "help me understand this codebase", "how does X work?", "show me how this is implemented"
    • User is asking conceptual questions: "what patterns are used?", "how is this organized?"
    • User wants to learn from the code, not just debug or modify it
    • User is new to the project or technology
    In learning mode:
    ✅ DO: Read full files to show complete patterns and structure
    ✅ DO: Read multiple related files to show how components interact
    ✅ DO: Show full function/class implementations as examples
    ✅ DO: Explain code in detail with context
    
    ⚠️ BALANCE: Still use strategic efficiency (don't read 50 files at once)
    - Apply strategic file selection (see section below)
    - Read 2-5 key files fully to establish understanding
    - Use grep to find other relevant examples
    - Summarize patterns found across many files
    After learning phase, return to efficient mode for implementation.
In cases 1-3, explain to the user:
"This will use approximately [X]K tokens. Should I proceed?
Or would you prefer a filtered/summarized view first?"
In learning mode (case 4), prioritize understanding over token efficiency, but still be strategic about which files to read fully (see Strategic File Selection below).

覆盖效率规则的场景:
  1. 用户明确要求完整输出:
    • "显示整个日志文件"
    • "读取完整源代码"
    • "我不在乎Token成本"
  2. 过滤后的输出缺乏必要上下文:
    • 错误消息引用了过滤输出中没有的行号
    • 需要理解完整数据流
    • 调试需要查看完整状态
  3. 文件已知很小:
    • 文件<200行
    • 内容极少的配置文件
    • 小型文档文件
  4. 学习代码结构和架构(重要):
    • 用户正在探索新代码库以了解其组织方式
    • 从现有代码中学习编码模式、习惯用法或最佳实践
    • 理解模块/类的结构
    • 出于教育目的研究实现方法
    • 阅读示例代码或参考实现
    • 进行更改前的初始探索阶段
    学习模式的关键指标:
    • 用户说:"帮我理解这个代码库"、"X是如何工作的?"、"展示这个的实现方式"
    • 用户提出概念性问题:"使用了什么模式?"、"这是如何组织的?"
    • 用户想从代码中学习,而非仅调试或修改
    • 用户对项目或技术不熟悉
    在学习模式中:
    ✅ 要:读取完整文件以展示完整模式和结构
    ✅ 要:读取多个相关文件以展示组件如何交互
    ✅ 要:展示完整的函数/类实现作为示例
    ✅ 要:结合上下文详细解释代码
    
    ⚠️ 平衡:仍需使用策略性效率(不要一次性读取50个文件)
    - 应用策略性文件选择(见下文)
    - 完整读取2-5个关键文件以建立理解
    - 使用grep查找其他相关示例
    - 总结在多个文件中发现的模式
    学习阶段结束后,回到高效模式进行实现。
在场景1-3中,向用户说明:
"这将使用约[X]K Token。我可以继续吗?
或者您希望先查看过滤/总结后的内容?"
在学习模式(场景4)中,优先考虑理解而非Token效率,但仍需战略性地选择要完整读取的文件(见策略性文件选择)。

Strategic File Selection for Learning Mode

学习模式的策略性文件选择

When entering learning mode, first determine if this is broad exploration or targeted learning, then apply the appropriate strategy.
进入学习模式时,首先确定这是广泛探索还是针对性学习,然后应用相应策略。

Learning Mode Types

学习模式类型

Type 1: Broad Exploration - "Help me understand this codebase", "How is this organized?" → Use repository-based strategies below (identify type, read key files)
Type 2: Targeted Pattern Learning - "How do I implement X?", "Show me examples of Y" → Use targeted concept search (see Targeted Pattern Learning section below)

类型1:广泛探索 - "帮我理解这个代码库"、"这是如何组织的?" → 使用下文的基于仓库的策略(识别类型、读取关键文件)
类型2:针对性模式学习 - "我如何实现X?"、"展示Y的示例" → 使用针对性概念搜索(见下文的针对性模式学习部分)

Targeted Pattern Learning

针对性模式学习

When user asks about a specific technique or pattern, use this focused approach instead of broad exploration.
当用户询问特定技术或模式时,使用这种聚焦方法而非广泛探索。

Examples of Targeted Learning Queries

针对性学习查询示例

  • "How do variable number of outputs work in Galaxy wrappers?"
  • "Show me how to fetch invocation data from Galaxy API"
  • "How do I implement conditional parameters in Galaxy tools?"
  • "How does error handling work in this codebase?"
  • "Show me examples of async function patterns"
  • "How are tests structured for workflow X?"
  • "Galaxy包装器中的可变输出数量如何工作?"
  • "展示如何从Galaxy API获取调用数据"
  • "我如何在Galaxy工具中实现条件参数?"
  • "这个代码库中的错误处理是如何工作的?"
  • "展示异步函数模式的示例"
  • "工作流X的测试结构如何?"

Targeted Learning Workflow

针对性学习工作流

STEP 1: Identify the Specific Concept
Extract the key concept from user's question:
User: "How do variable number of outputs work in Galaxy wrappers?"
→ Concept: "variable number of outputs" OR "dynamic outputs"
→ Context: "Galaxy tool wrappers"
→ File types: ".xml" (Galaxy tool wrappers)
User: "How to fetch invocation data from Galaxy API?" → Concept: "fetch invocation" OR "invocation data" OR "get invocation" → Context: "Galaxy API calls" → File types: ".py" with Galaxy API usage

**STEP 2: Search for Examples**

Use targeted searches to find relevant code:

```bash
步骤1:识别特定概念
从用户的问题中提取关键概念:
用户:"Galaxy包装器中的可变输出数量如何工作?"
→ 概念:"variable number of outputs"或"dynamic outputs"
→ 上下文:"Galaxy tool wrappers"
→ 文件类型:".xml"(Galaxy工具包装器)
用户:"如何从Galaxy API获取调用数据?" → 概念:"fetch invocation"或"invocation data"或"get invocation" → 上下文:"Galaxy API calls" → 文件类型:使用Galaxy API的".py"文件

**步骤2:搜索示例**

使用针对性搜索查找相关代码:

```bash

For Galaxy variable outputs example

对于Galaxy可变输出示例

grep -r "discover_datasets|collection_type.list" --include=".xml" | head -20 grep -r "<outputs>" --include="*.xml" -A 10 | grep -i "collection|discover"
grep -r "discover_datasets|collection_type.list" --include=".xml" | head -20 grep -r "<outputs>" --include="*.xml" -A 10 | grep -i "collection|discover"

For Galaxy invocation fetching

对于Galaxy调用获取

grep -r "invocation" --include=".py" -B 2 -A 5 | head -50 grep -r "show_invocation|get_invocation" --include=".py" -l
grep -r "invocation" --include=".py" -B 2 -A 5 | head -50 grep -r "show_invocation|get_invocation" --include=".py" -l

For conditional parameters

对于条件参数

grep -r "<conditional" --include="*.xml" -l | head -10
grep -r "<conditional" --include="*.xml" -l | head -10

For error handling patterns

对于错误处理模式

grep -r "try:|except|raise" --include="*.py" -l | xargs grep -l "class.*Error"

**STEP 3: Rank and Select Examples**

**Selection criteria (in priority order):**

1. **Documentation/Comments** - Files with good comments explaining the pattern
   ```bash
   # Find well-documented examples
   grep -r "pattern-keyword" --include="*.py" -B 5 | grep -E "^\s*#|^\s*\"\"\"" | wc -l
  1. Simplicity - Simpler examples are better for learning
    bash
    # Find shorter files (likely simpler)
    grep -rl "pattern-keyword" --include="*.py" | xargs wc -l | sort -n | head -5
  2. Recency - Recent code shows current best practices
    bash
    # Find recent examples
    grep -rl "pattern-keyword" --include="*.py" | xargs ls -lt | head -5
  3. Multiple variations - Show different approaches if they exist
    bash
    # Compare different implementations
    grep -r "pattern-keyword" --include="*.py" -l | head -3
STEP 4: Read Examples Fully
Read 2-3 selected examples completely to understand the pattern:
bash
undefined
grep -r "try:|except|raise" --include="*.py" -l | xargs grep -l "class.*Error"

**步骤3:排序并选择示例**

**选择标准(优先级顺序):**

1. **文档/注释** - 包含良好注释解释模式的文件
   ```bash
   # 查找文档完善的示例
   grep -r "pattern-keyword" --include="*.py" -B 5 | grep -E "^\s*#|^\s*\"\"\"" | wc -l
  1. 简洁性 - 更简单的示例更适合学习
    bash
    # 查找更短的文件(可能更简单)
    grep -rl "pattern-keyword" --include="*.py" | xargs wc -l | sort -n | head -5
  2. 时效性 - 近期代码展示当前最佳实践
    bash
    # 查找近期示例
    grep -rl "pattern-keyword" --include="*.py" | xargs ls -lt | head -5
  3. 多种变体 - 如果存在,展示不同实现方式
    bash
    # 比较不同实现
    grep -r "pattern-keyword" --include="*.py" -l | head -3
步骤4:完整读取示例
完整读取2-3个选定示例以理解模式:
bash
undefined

Example: Variable outputs in Galaxy

示例:Galaxy中的可变输出

After finding: tools/tool1.xml, tools/tool2.xml, tools/advanced.xml

找到后:tools/tool1.xml, tools/tool2.xml, tools/advanced.xml

Read: tools/tool1.xml # Simple example Read: tools/tool2.xml # Standard example Read: tools/advanced.xml # Complex variation (if needed)

**STEP 5: Extract and Explain the Pattern**

After reading examples, explain:
1. **The core pattern** - How it works conceptually
2. **Required elements** - What's needed to implement it
3. **Common variations** - Different ways to use it
4. **Common pitfalls** - What to avoid
5. **Best practices** - Recommended approach

---
Read: tools/tool1.xml # 简单示例 Read: tools/tool2.xml # 标准示例 Read: tools/advanced.xml # 复杂变体(如果需要)

**步骤5:提取并解释模式**

读取示例后,解释:
1. **核心模式** - 概念上如何工作
2. **必要元素** - 实现所需的内容
3. **常见变体** - 使用的不同方式
4. **常见陷阱** - 需要避免的问题
5. **最佳实践** - 推荐的方法

---

Targeted Learning Examples

针对性学习示例

Example 1: Variable Number of Outputs in Galaxy Wrappers

示例1:Galaxy包装器中的可变输出数量

User query: "How do variable number of outputs work in Galaxy wrappers?"
Execution:
bash
undefined
用户查询: "Galaxy包装器中的可变输出数量如何工作?"
执行:
bash
undefined

Step 1: Identify concept

步骤1:识别概念

Concept: dynamic outputs, discover_datasets

概念:dynamic outputs, discover_datasets

File type: Galaxy XML wrappers (*.xml)

文件类型:Galaxy XML包装器(*.xml)

Step 2: Search for examples

步骤2:搜索示例

grep -r "discover_datasets" --include="*.xml" -l
grep -r "discover_datasets" --include="*.xml" -l

Output: tools/samtools/samtools_merge.xml, tools/kraken2/kraken2.xml, ...

输出:tools/samtools/samtools_merge.xml, tools/kraken2/kraken2.xml, ...

Step 3: Rank examples

步骤3:排序示例

ls -lt tools//samtools_merge.xml tools//kraken2.xml
ls -lt tools//samtools_merge.xml tools//kraken2.xml

Pick: samtools_merge.xml (recent, well-maintained)

选择:samtools_merge.xml(近期,维护良好)

Step 4: Read example fully

步骤4:完整读取示例

Read: tools/samtools/samtools_merge.xml
Read: tools/samtools/samtools_merge.xml

Step 5: Find another variation

步骤5:查找另一种变体

grep -r "collection_type.list" --include=".xml" -l | head -1 Read: tools/example/collection_output.xml

**Explanation provided:**
I found two main approaches for variable outputs in Galaxy:
  1. discover_datasets pattern (samtools_merge.xml:45-52):
    • Use <discover_datasets> in output section
    • Specify pattern and format
    • Galaxy auto-detects files matching pattern
  2. output_collection pattern (collection_output.xml:78-85):
    • Define collection type (list, paired, etc.)
    • Use discover_datasets within collection
    • For grouped/structured outputs
Key requirements:
  • Set discover_datasets pattern attribute
  • Specify format for discovered files
  • Optional: Use directory attribute for subdirectories
Would you like me to show a specific implementation for your use case?

---
grep -r "collection_type.list" --include=".xml" -l | head -1 Read: tools/example/collection_output.xml

**提供的解释:**
我在Galaxy中找到了两种主要的可变输出实现方式:
  1. discover_datasets模式(samtools_merge.xml:45-52):
    • 在输出部分使用<discover_datasets>
    • 指定模式和格式
    • Galaxy自动检测匹配模式的文件
  2. output_collection模式(collection_output.xml:78-85):
    • 定义集合类型(list、paired等)
    • 在集合内使用discover_datasets
    • 用于分组/结构化输出
关键要求:
  • 设置discover_datasets的pattern属性
  • 指定发现文件的格式
  • 可选:使用directory属性指定子目录
您需要我展示针对您的使用场景的特定实现吗?

---

Example 2: Fetching Invocation Data from Galaxy API

示例2:从Galaxy API获取调用数据

User query: "How do I fetch invocation data using the Galaxy API?"
Execution:
bash
undefined
用户查询: "如何使用Galaxy API获取调用数据?"
执行:
bash
undefined

Step 1: Identify concept

步骤1:识别概念

Concept: fetch invocation, show_invocation, invocation data

概念:fetch invocation, show_invocation, invocation data

Context: Galaxy API, bioblend usage

上下文:Galaxy API, bioblend使用

File type: Python files with bioblend/API calls

文件类型:使用bioblend/API调用的Python文件

Step 2: Search for examples

步骤2:搜索示例

grep -r "show_invocation|get_invocation" --include="*.py" -l
grep -r "show_invocation|get_invocation" --include="*.py" -l

Output: galaxy_client.py, orchestrator.py, check_status.py

输出:galaxy_client.py, orchestrator.py, check_status.py

grep -r "invocations.show|.show_invocation" --include="*.py" -B 3 -A 8
grep -r "invocations.show|.show_invocation" --include="*.py" -B 3 -A 8

Shows actual usage patterns

展示实际使用模式

Step 3: Rank examples

步骤3:排序示例

galaxy_client.py - likely has core API wrapper functions

galaxy_client.py - 可能包含核心API包装函数

orchestrator.py - shows usage in context

orchestrator.py - 展示上下文使用

Pick both for different perspectives

选择两者以获取不同视角

Step 4: Read examples

步骤4:读取示例

Read: batch_vgp_run/galaxy_client.py
Read: batch_vgp_run/galaxy_client.py

Focus on invocation-related functions

聚焦于调用相关函数

grep -n "def.*invocation" batch_vgp_run/galaxy_client.py
grep -n "def.*invocation" batch_vgp_run/galaxy_client.py

Shows: check_invocation_complete (line 250), rerun_failed_invocation (line 847)

显示:check_invocation_complete(第250行),rerun_failed_invocation(第847行)

Read: batch_vgp_run/galaxy_client.py (offset: 245, limit: 60) Read: batch_vgp_run/galaxy_client.py (offset: 840, limit: 70)

**Explanation provided:**
I found the pattern for fetching invocation data (galaxy_client.py:250-285):
Basic invocation fetch:
python
invocation = gi.invocations.show_invocation(invocation_id)
state = invocation['state']  # 'ok', 'running', 'failed', 'cancelled'
Detailed invocation with steps:
python
invocation = gi.invocations.show_invocation(invocation_id, include_workflow_steps=True)
steps = invocation.get('steps', {})
for step_id, step_data in steps.items():
    step_state = step_data['state']
    job_id = step_data.get('job_id')
Key fields available:
  • invocation['state'] - Overall invocation status
  • invocation['steps'] - Individual step details
  • invocation['workflow_id'] - Associated workflow
  • invocation['history_id'] - Target history
  • invocation['update_time'] - Last update timestamp
Common patterns:
  1. Poll for completion (galaxy_client.py:250-285)
  2. Get failed step details (galaxy_client.py:320-340)
  3. Rerun with modifications (galaxy_client.py:847-899)
Which pattern do you need for your use case?

---
Read: batch_vgp_run/galaxy_client.py (offset: 245, limit: 60) Read: batch_vgp_run/galaxy_client.py (offset: 840, limit: 70)

**提供的解释:**
我找到了获取调用数据的模式(galaxy_client.py:250-285):
基本调用获取:
python
invocation = gi.invocations.show_invocation(invocation_id)
state = invocation['state']  # 'ok', 'running', 'failed', 'cancelled'
带步骤的详细调用:
python
invocation = gi.invocations.show_invocation(invocation_id, include_workflow_steps=True)
steps = invocation.get('steps', {})
for step_id, step_data in steps.items():
    step_state = step_data['state']
    job_id = step_data.get('job_id')
可用的关键字段:
  • invocation['state'] - 整体调用状态
  • invocation['steps'] - 单个步骤详情
  • invocation['workflow_id'] - 关联工作流
  • invocation['history_id'] - 目标历史
  • invocation['update_time'] - 最后更新时间戳
常见模式:
  1. 轮询完成状态(galaxy_client.py:250-285)
  2. 获取失败步骤详情(galaxy_client.py:320-340)
  3. 修改后重新运行(galaxy_client.py:847-899)
您的使用场景需要哪种模式?

---

Example 3: Conditional Parameters in Galaxy Tools

示例3:Galaxy工具中的条件参数

User query: "How do conditional parameters work in Galaxy tool wrappers?"
Execution:
bash
undefined
用户查询: "Galaxy工具包装器中的条件参数如何工作?"
执行:
bash
undefined

Step 1: Identify concept

步骤1:识别概念

Concept: conditional parameters, when expression

概念:conditional parameters, when expression

File type: Galaxy XML wrappers

文件类型:Galaxy XML包装器

Step 2: Search

步骤2:搜索

grep -r "<conditional" --include=".xml" -l | head -10 grep -r "<conditional" --include=".xml" -A 15 | head -50
grep -r "<conditional" --include=".xml" -l | head -10 grep -r "<conditional" --include=".xml" -A 15 | head -50

Step 3: Find simple example first

步骤3:先找简单示例

grep -rl "<conditional" --include="*.xml" | xargs wc -l | sort -n | head -3
grep -rl "<conditional" --include="*.xml" | xargs wc -l | sort -n | head -3

Pick shortest file with conditionals

选择最短的包含条件的文件

Step 4: Read examples

步骤4:读取示例

Read: tools/simple-tool/simple_conditional.xml # Simple case Read: tools/complex-tool/advanced_conditional.xml # Nested case

**Explanation provided:**
Conditional parameters in Galaxy (simple_conditional.xml:34-58):
Basic structure:
xml
<conditional name="output_choice">
    <param name="output_type" type="select" label="Output type">
        <option value="single">Single file</option>
        <option value="collection">Collection</option>
    </param>

    <when value="single">
        <param name="format" type="select" label="Format">
            <option value="txt">Text</option>
            <option value="csv">CSV</option>
        </param>
    </when>

    <when value="collection">
        <param name="collection_type" type="select" label="Collection type">
            <option value="list">List</option>
            <option value="paired">Paired</option>
        </param>
    </when>
</conditional>
In command block (Cheetah syntax):
xml
#if $output_choice.output_type == "single":
    --format ${output_choice.format}
#else:
    --collection-type ${output_choice.collection_type}
#end if
Advanced: Nested conditionals (advanced_conditional.xml:67-120):
  • Conditionals can contain other conditionals
  • Each <when> is independent
  • Access nested values: ${outer.inner.value}
Would you like to see nested examples or specific use case?

---
Read: tools/simple-tool/simple_conditional.xml # 简单场景 Read: tools/complex-tool/advanced_conditional.xml # 嵌套场景

**提供的解释:**
Galaxy中的条件参数(simple_conditional.xml:34-58):
基本结构:
xml
<conditional name="output_choice">
    <param name="output_type" type="select" label="Output type">
        <option value="single">Single file</option>
        <option value="collection">Collection</option>
    </param>

    <when value="single">
        <param name="format" type="select" label="Format">
            <option value="txt">Text</option>
            <option value="csv">CSV</option>
        </param>
    </when>

    <when value="collection">
        <param name="collection_type" type="select" label="Collection type">
            <option value="list">List</option>
            <option value="paired">Paired</option>
        </param>
    </when>
</conditional>
在命令块中(Cheetah语法):
xml
#if $output_choice.output_type == "single":
    --format ${output_choice.format}
#else:
    --collection-type ${output_choice.collection_type}
#end if
高级:嵌套条件(advanced_conditional.xml:67-120):
  • 条件中可以包含其他条件
  • 每个<when>独立
  • 访问嵌套值:${outer.inner.value}
您需要查看嵌套示例或特定使用场景吗?

---

When to Use Targeted Learning

何时使用针对性学习

Use targeted learning when user:
  • ✅ Asks "how do I..." about specific feature
  • ✅ Requests "show me examples of X"
  • ✅ Wants to learn specific pattern/technique
  • ✅ Has focused technical question
  • ✅ References specific concept/API/feature
Don't use for:
  • ❌ "Understand this codebase" (use broad exploration)
  • ❌ "What does this project do?" (use documentation reading)
  • ❌ "Debug this error" (use debugging mode, not learning mode)

当用户:
  • ✅ 询问"我如何..."关于特定功能
  • ✅ 请求"展示X的示例"
  • ✅ 想学习特定模式/技术
  • ✅ 有聚焦的技术问题
  • ✅ 提及特定概念/API/功能
不要用于:
  • ❌ "理解这个代码库"(使用广泛探索)
  • ❌ "这个项目是做什么的?"(使用文档读取)
  • ❌ "调试这个错误"(使用调试模式,而非学习模式)

Key Principles for Targeted Learning

针对性学习的核心原则

  1. Search first, read second
    • Use grep to find relevant examples
    • Rank by quality/simplicity/recency
    • Then read selected examples fully
  2. Read 2-3 examples, not 20
    • Simple example (minimal working code)
    • Standard example (common usage)
    • Complex example (advanced features) - optional
  3. Extract the pattern
    • Don't just show code, explain the pattern
    • Highlight key elements and structure
    • Show variations and alternatives
  4. Provide context
    • Where this pattern is used
    • When to use it vs alternatives
    • Common pitfalls and best practices
  5. Confirm understanding
    • Ask if user needs specific variation
    • Offer to show related patterns
    • Check if explanation answered their question

  1. 先搜索,后读取
    • 使用grep查找相关示例
    • 按质量/简洁性/时效性排序
    • 然后完整读取选定示例
  2. 读取2-3个示例,而非20个
    • 简单示例(最小可工作代码)
    • 标准示例(常见用法)
    • 复杂示例(高级功能)- 可选
  3. 提取模式
    • 不要仅展示代码,要解释模式
    • 突出关键元素和结构
    • 展示变体和替代方案
  4. 提供上下文
    • 此模式的使用场景
    • 何时使用它而非替代方案
    • 常见陷阱和最佳实践
  5. 确认理解
    • 询问用户是否需要特定变体
    • 提供展示相关模式的选项
    • 检查解释是否回答了他们的问题

General Exploration vs Targeted Learning

广泛探索 vs 针对性学习

When user says → Use this approach:
User RequestApproachStrategy
"Help me understand this codebase"General ExplorationIdentify repo type → Read key files
"How is this project organized?"General ExplorationRead docs → Entry points → Architecture
"Show me how to implement X"Targeted LearningSearch for X → Read examples → Extract pattern
"How does feature Y work?"Targeted LearningGrep for Y → Find best examples → Explain
"What patterns are used here?"General ExplorationRead core files → Identify patterns
"How do I use API method Z?"Targeted LearningSearch for Z usage → Show examples

当用户说 → 使用此方法:
用户请求方法策略
"帮我理解这个代码库"广泛探索识别仓库类型 → 读取关键文件
"这个项目是如何组织的?"广泛探索读取文档 → 入口点 → 架构
"展示如何实现X"针对性学习搜索X → 读取示例 → 提取模式
"功能Y如何工作?"针对性学习Grep查找Y → 找到最佳示例 → 解释
"这里使用了什么模式?"广泛探索读取核心文件 → 识别模式
"如何使用API方法Z?"针对性学习搜索Z的使用 → 展示示例

Broad Repository Exploration

广泛仓库探索

When entering broad exploration mode, first identify the repository context, then apply the appropriate exploration strategy.
进入广泛探索模式时,首先识别仓库上下文,然后应用相应的探索策略。

STEP 1: Identify Repository Type

步骤1:识别仓库类型

Ask these questions or check indicators:
bash
undefined
询问这些问题或检查指标:
bash
undefined

Check for multiple independent tools/packages

检查是否有多个独立工具/包

ls -d */ | wc -l # Many directories at root level? ls recipes/ tools/ packages/ 2>/dev/null # Collection structure?
ls -d */ | wc -l # 根目录有很多目录吗? ls recipes/ tools/ packages/ 2>/dev/null # 集合结构?

Check for submission/contribution guidelines

检查提交/贡献指南

ls -la | grep -i "contrib|guideline|submiss" cat CONTRIBUTING.md README.md 2>/dev/null | grep -i "structure|organization|layout"
ls -la | grep -i "contrib|guideline|submiss" cat CONTRIBUTING.md README.md 2>/dev/null | grep -i "structure|organization|layout"

Check for monolithic vs modular structure

检查单体式 vs 模块化结构

find . -name "setup.py" -o -name "package.json" -o -name "Cargo.toml" | wc -l
find . -name "setup.py" -o -name "package.json" -o -name "Cargo.toml" | wc -l

1 = monolithic, many = multi-package

1 = 单体式,多个 = 多包

Check for specific patterns

检查特定模式

ls -la | grep -E "recipes/|tools/|workflows/|plugins/|examples/"

**Repository type indicators:**

1. **Tool Library / Recipe Collection** (bioconda, tool collections)
   - Multiple independent directories at same level
   - Each subdirectory is self-contained
   - Examples: `recipes/tool1/`, `recipes/tool2/`, `workflows/workflow-a/`
   - Indicator files: `recipes/`, `tools/`, `packages/`, multiple `meta.yaml` or `package.json`

2. **Monolithic Application** (single integrated codebase)
   - One main entry point
   - Hierarchical module structure
   - Shared dependencies and utilities
   - Examples: `src/`, `lib/`, single `setup.py`, `main.py`
   - Indicator files: Single `setup.py`, `main.py`, `__init__.py`, `src/` directory

3. **Framework / SDK** (extensible system)
   - Core framework + plugins/extensions
   - Base classes and interfaces
   - Examples: `core/`, `plugins/`, `extensions/`, `base/`
   - Indicator files: `core/`, `plugins/`, documentation on extending

4. **Example / Template Repository**
   - Multiple example implementations
   - Each directory shows different pattern
   - Examples: `examples/`, `samples/`, `templates/`
   - Indicator files: `examples/`, `README` in each subdirectory

---
ls -la | grep -E "recipes/|tools/|workflows/|plugins/|examples/"

**仓库类型指标:**

1. **工具库/配方集合**(bioconda、工具集合)
   - 根目录有多个独立目录
   - 每个子目录自包含
   - 示例:`recipes/tool1/`, `recipes/tool2/`, `workflows/workflow-a/`
   - 指标文件:`recipes/`, `tools/`, `packages/`, 多个`meta.yaml`或`package.json`

2. **单体式应用**(单一集成代码库)
   - 一个主入口点
   - 分层模块结构
   - 共享依赖和工具
   - 示例:`src/`, `lib/`, 单个`setup.py`, `main.py`
   - 指标文件:单个`setup.py`, `main.py`, `__init__.py`, `src/`目录

3. **框架/SDK**(可扩展系统)
   - 核心框架 + 插件/扩展
   - 基类和接口
   - 示例:`core/`, `plugins/`, `extensions/`, `base/`
   - 指标文件:`core/`, `plugins/`, 关于扩展的文档

4. **示例/模板仓库**
   - 多个示例实现
   - 每个目录展示不同模式
   - 示例:`examples/`, `samples/`, `templates/`
   - 指标文件:`examples/`, 每个子目录中的README

---

STEP 2: Apply Context-Specific Strategy

步骤2:应用上下文特定策略

Strategy A: Tool Library / Recipe Collection

策略A:工具库/配方集合

Goal: Learn the pattern from representative examples
Approach:
bash
undefined
目标: 从代表性示例中学习模式
方法:
bash
undefined

1. Find most recently modified (shows current best practices)

1. 查找最近修改的(展示当前最佳实践)

ls -lt recipes/ | head -10 # or tools/, workflows/, etc.
ls -lt recipes/ | head -10 # 或tools/, workflows/等

2. Find most common patterns

2. 查找最常见的模式

find recipes/ -name "meta.yaml" -o -name "*.xml" | head -1 | xargs dirname
find recipes/ -name "meta.yaml" -o -name "*.xml" | head -1 | xargs dirname

3. Read submission guidelines first

3. 先读取提交指南

cat CONTRIBUTING.md README.md | grep -A 20 -i "structure|format|template"
cat CONTRIBUTING.md README.md | grep -A 20 -i "structure|format|template"

4. Read 2-3 representative examples

4. 读取2-3个代表性示例

Pick: 1 recent, 1 complex, 1 simple

选择:1个近期、1个复杂、1个简单

ls -lt recipes/ | head -3

**Files to read (in order):**
1. `CONTRIBUTING.md` or submission guidelines → Learn required structure
2. Recent tool/recipe → Current best practices
3. Well-established tool/recipe → Proven patterns
4. Template or example → Base structure

**Example:**
```bash
ls -lt recipes/ | head -3

**读取文件顺序:**
1. `CONTRIBUTING.md`或提交指南 → 学习所需结构
2. 近期工具/配方 → 当前最佳实践
3. 成熟工具/配方 → 已验证模式
4. 模板或示例 → 基础结构

**示例:**
```bash

For bioconda-style repository

对于bioconda风格的仓库

Read: CONTRIBUTING.md ls -lt recipes/ | head -5 # Pick a recent one Read: recipes/recent-tool/meta.yaml Read: recipes/established-tool/meta.yaml # Compare patterns

---
Read: CONTRIBUTING.md ls -lt recipes/ | head -5 # 选择一个近期的 Read: recipes/recent-tool/meta.yaml Read: recipes/established-tool/meta.yaml # 对比模式

---

Strategy B: Monolithic Application

策略B:单体式应用

Goal: Understand execution flow and architecture
Approach:
bash
undefined
目标: 理解执行流和架构
方法:
bash
undefined

1. Find entry point

1. 找到入口点

find . -name "main.py" -o -name "app.py" -o -name "run*.py" | grep -v test | head -5
find . -name "main.py" -o -name "app.py" -o -name "run*.py" | grep -v test | head -5

2. Find most imported modules (core components)

2. 找到最常导入的模块(核心组件)

grep -r "^import|^from" --include="*.py" . |
sed 's/.*import //' | cut -d' ' -f1 | cut -d'.' -f1 |
sort | uniq -c | sort -rn | head -10
grep -r "^import|^from" --include="*.py" . |
sed 's/.*import //' | cut -d' ' -f1 | cut -d'.' -f1 |
sort | uniq -c | sort -rn | head -10

3. Find orchestrators/managers

3. 找到编排器/管理器

find . -name "*manager.py" -o -name "*orchestrator.py" -o -name "*controller.py"
find . -name "*manager.py" -o -name "*orchestrator.py" -o -name "*controller.py"

4. Check recent changes (active development areas)

4. 检查近期更改(活跃开发区域)

git log --name-only --pretty=format: --since="1 month ago" |
sort | uniq -c | sort -rn | head -10

**Files to read (in order):**
1. `README.md` → Overview and architecture
2. Entry point (`main.py`, `run_all.py`) → Execution flow
3. Core orchestrator/manager → Main logic
4. Most-imported utility module → Common patterns
5. One domain-specific module → Implementation details

**Example:**
```bash
git log --name-only --pretty=format: --since="1 month ago" |
sort | uniq -c | sort -rn | head -10

**读取文件顺序:**
1. `README.md` → 概览和架构
2. 入口点(`main.py`, `run_all.py`)→ 执行流
3. 核心编排器/管理器 → 主逻辑
4. 最常导入的工具模块 → 常见模式
5. 一个领域特定模块 → 实现细节

**示例:**
```bash

For Python application

对于Python应用

Read: README.md Read: main.py # Entry point grep -r "^from.*import" main.py | head -10 # See what it imports Read: src/orchestrator.py # Core component Read: src/utils.py # Common utilities

---
Read: README.md Read: main.py # 入口点 grep -r "^from.*import" main.py | head -10 # 查看它导入的内容 Read: src/orchestrator.py # 核心组件 Read: src/utils.py # 常见工具

---

Strategy C: Framework / SDK

策略C:框架/SDK

Goal: Understand core abstractions and extension points
Approach:
bash
undefined
目标: 理解核心抽象和扩展点
方法:
bash
undefined

1. Find base classes and interfaces

1. 找到基类和接口

grep -r "^class.*Base|^class.*Interface|^class.Abstract" --include=".py" | head -10
grep -r "^class.*Base|^class.*Interface|^class.Abstract" --include=".py" | head -10

2. Find core module

2. 找到核心模块

ls -la | grep -E "core/|base/|framework/"
ls -la | grep -E "core/|base/|framework/"

3. Find plugin/extension examples

3. 找到插件/扩展示例

ls -la | grep -E "plugins?/|extensions?/|examples?/"
ls -la | grep -E "plugins?/|extensions?/|examples?/"

4. Check documentation for architecture

4. 检查架构文档

find . -name "*.md" | xargs grep -l -i "architecture|design|pattern" | head -5

**Files to read (in order):**
1. Architecture documentation → Design philosophy
2. Base/core classes → Fundamental abstractions
3. Simple plugin/extension → How to extend
4. Complex plugin/extension → Advanced patterns

**Example:**
```bash
find . -name "*.md" | xargs grep -l -i "architecture|design|pattern" | head -5

**读取文件顺序:**
1. 架构文档 → 设计理念
2. 基础/核心类 → 基本抽象
3. 简单插件/扩展 → 如何扩展
4. 复杂插件/扩展 → 高级模式

**示例:**
```bash

For plugin-based framework

对于基于插件的框架

Read: docs/architecture.md Read: core/base.py # Base classes Read: plugins/simple-example/ # How to extend Read: plugins/advanced-example/ # Advanced usage

---
Read: docs/architecture.md Read: core/base.py # 基类 Read: plugins/simple-example/ # 如何扩展 Read: plugins/advanced-example/ # 高级用法

---

Strategy D: Example / Template Repository

策略D:示例/模板仓库

Goal: Learn different patterns and use cases
Approach:
bash
undefined
目标: 学习不同模式和用例
方法:
bash
undefined

1. List all examples

1. 列出所有示例

ls -d examples// samples// templates/*/
ls -d examples// samples// templates/*/

2. Read index/catalog if available

2. 如果可用,读取索引/目录

cat examples/README.md examples/INDEX.md
cat examples/README.md examples/INDEX.md

3. Pick representative examples

3. 选择代表性示例

- Simple/basic example

- 简单/基础示例

- Medium complexity

- 中等复杂度

- Advanced/complete example

- 高级/完整示例


**Files to read (in order):**
1. `examples/README.md` → Overview of examples
2. Basic example → Minimal working pattern
3. Advanced example → Full-featured pattern
4. Compare differences → Learn progression

---

**读取文件顺序:**
1. `examples/README.md` → 示例概览
2. 基础示例 → 最小可工作模式
3. 高级示例 → 全功能模式
4. 对比差异 → 学习演进

---

STEP 3: Execution Strategy Template

步骤3:执行策略模板

For ANY repository type, use this workflow:
bash
undefined
对于任何仓库类型,使用此工作流:
bash
undefined

PHASE 1: Context Discovery (always token-efficient)

阶段1:上下文发现(始终Token高效)

ls -la # Repository structure cat README.md # Overview ls -la .github/ docs/ | head -20 # Find documentation cat CONTRIBUTING.md 2>/dev/null | head -50 # Submission guidelines
ls -la # 仓库结构 cat README.md # 概览 ls -la .github/ docs/ | head -20 # 查找文档 cat CONTRIBUTING.md 2>/dev/null | head -50 # 提交指南

PHASE 2: Identify Type (ask user if unclear)

阶段2:识别类型(如果不明确,询问用户)

"I see this repository has [X structure]. Is this: A) A tool library where each tool is independent? B) A monolithic application with integrated components? C) A framework with core + plugins? D) A collection of examples/templates?
This helps me choose the best files to learn from."
"我看到这个仓库有[X结构]。它是: A) 每个工具独立的工具库? B) 组件集成的单体式应用? C) 核心+插件的框架? D) 示例/模板集合?
这有助于我选择最佳的学习文件。"

PHASE 3: Strategic Reading (based on type)

阶段3:策略性读取(基于类型)

[Apply appropriate strategy A/B/C/D from above] Read 2-5 key files fully Grep for patterns across remaining files
[应用上述策略A/B/C/D] 完整读取2-5个关键文件 在剩余文件中使用grep查找模式

PHASE 4: Summarize and Confirm

阶段4:总结并确认

"Based on [files read], I understand:
  • Pattern/architecture: [summary]
  • Key components: [list]
  • Common patterns: [examples]
Is this the area you want to focus on, or should I explore [other aspect]?"

---
"基于[已读文件],我理解:
  • 模式/架构:[总结]
  • 关键组件:[列表]
  • 常见模式:[示例]
这是您要关注的领域,还是我应该探索[其他方面]?"

---

File Selection Priorities (General Rules)

文件选择优先级(通用规则)

Priority 1: Documentation
bash
README.md, CONTRIBUTING.md, docs/architecture.md
优先级1:文档
bash
README.md, CONTRIBUTING.md, docs/architecture.md

These explain intent, not just implementation

这些解释意图,而非仅实现


**Priority 2: Entry Points**
```bash

**优先级2:入口点**
```bash

Monolithic: main.py, app.py, run.py, main.py

单体式:main.py, app.py, run.py, main.py

Library: Most recent example in collection

库:集合中最新的示例


**Priority 3: Core Components**
```bash

**优先级3:核心组件**
```bash

Most imported modules

最常导入的模块

grep -r "import" | cut -d: -f2 | sort | uniq -c | sort -rn
grep -r "import" | cut -d: -f2 | sort | uniq -c | sort -rn

"Manager", "Controller", "Orchestrator", "Core", "Base"

"Manager", "Controller", "Orchestrator", "Core", "Base"

find . -name "manager" -o -name "core" -o -name "base"

**Priority 4: Representative Examples**
```bash
find . -name "manager" -o -name "core" -o -name "base"

**优先级4:代表性示例**
```bash

Recent files (current best practices)

近期文件(当前最佳实践)

ls -lt directory/ | head -5
ls -lt directory/ | head -5

Medium complexity (not too simple, not too complex)

中等复杂度(不太简单,也不太复杂)

wc -l **/*.py | sort -n | awk 'NR > 10 && NR < 20'

**Priority 5: Active Development Areas**
```bash
wc -l **/*.py | sort -n | awk 'NR > 10 && NR < 20'

**优先级5:活跃开发区域**
```bash

Git history (if available)

Git历史(如果可用)

git log --name-only --since="1 month ago" --pretty=format: | sort | uniq -c | sort -rn

---
git log --name-only --since="1 month ago" --pretty=format: | sort | uniq -c | sort -rn

---

Practical Examples

实际示例

Example 1: Learning bioconda recipe patterns
bash
undefined
示例1:学习bioconda配方模式
bash
undefined

Step 1: Identify type

步骤1:识别类型

ls recipes/ | wc -l
ls recipes/ | wc -l

Output: 3000+ → Tool library

输出:3000+ → 工具库

Step 2: Check guidelines

步骤2:检查指南

Read: CONTRIBUTING.md # Learn structure requirements
Read: CONTRIBUTING.md # 学习结构要求

Step 3: Find representative recipes

步骤3:查找代表性配方

ls -lt recipes/ | head -5 # Get recent ones
ls -lt recipes/ | head -5 # 获取近期的

Pick one that was updated recently (current practices)

选择一个最近更新的(当前实践)

Read: recipes/recent-tool/meta.yaml
Read: recipes/recent-tool/meta.yaml

Pick one established recipe for comparison

选择一个成熟配方进行对比

Read: recipes/samtools/meta.yaml
Read: recipes/samtools/meta.yaml

Step 4: Summarize pattern

步骤4:总结模式

"I see bioconda recipes follow this structure:
  • Jinja2 variables at top
  • package/source/build/requirements/test/about sections
  • Current practice: use pip install for Python packages
  • sha256 checksums required Should I look at any specific type of recipe (Python/R/compiled)?"

**Example 2: Learning VGP pipeline orchestration**
```bash
"我看到bioconda配方遵循此结构:
  • 顶部是Jinja2变量
  • package/source/build/requirements/test/about部分
  • 当前实践:对Python包使用pip install
  • 需要sha256校验和 您需要我查看特定类型的配方(Python/R/编译型)吗?"

**示例2:学习VGP工作流编排**
```bash

Step 1: Identify type

步骤1:识别类型

ls *.py
ls *.py

Output: run_all.py, orchestrator.py → Monolithic application

输出:run_all.py, orchestrator.py → 单体式应用

Step 2: Read entry point

步骤2:读取入口点

Read: run_all.py
Read: run_all.py

Step 3: Find core components

步骤3:找到核心组件

grep "^from batch_vgp_run import" run_all.py
grep "^from batch_vgp_run import" run_all.py

Shows: orchestrator, galaxy_client, workflow_manager

显示:orchestrator, galaxy_client, workflow_manager

Step 4: Read core orchestrator

步骤4:读取核心编排器

Read: batch_vgp_run/orchestrator.py # Full file to understand flow
Read: batch_vgp_run/orchestrator.py # 完整文件以理解流程

Step 5: Read supporting modules selectively

步骤5:选择性读取支持模块

grep "def run_species_workflows" batch_vgp_run/orchestrator.py -A 5 Read: batch_vgp_run/galaxy_client.py # Key helper functions

**Example 3: Learning Galaxy workflow patterns**
```bash
grep "def run_species_workflows" batch_vgp_run/orchestrator.py -A 5 Read: batch_vgp_run/galaxy_client.py # 关键辅助函数

**示例3:学习Galaxy工作流模式**
```bash

Step 1: Identify type

步骤1:识别类型

ls -d */ # Shows category directories
ls -d */ # 显示类别目录

Output: transcriptomics/, genome-assembly/, etc. → Example collection

输出:transcriptomics/, genome-assembly/, etc. → 示例集合

Step 2: Read guidelines

步骤2:读取指南

Read: .github/CONTRIBUTING.md
Read: .github/CONTRIBUTING.md

Step 3: Pick representative workflows

步骤3:选择代表性工作流

ls -lt transcriptomics/ # Recent workflows Read: transcriptomics/recent-workflow/workflow.ga Read: transcriptomics/recent-workflow/README.md
ls -lt transcriptomics/ # 近期工作流 Read: transcriptomics/recent-workflow/workflow.ga Read: transcriptomics/recent-workflow/README.md

Step 4: Compare with another category

步骤4:与另一个类别对比

Read: genome-assembly/example-workflow/workflow.ga
Read: genome-assembly/example-workflow/workflow.ga

Step 5: Extract common patterns

步骤5:提取常见模式

grep -r ""format-version"" . | head -5 grep -r ""creator"" . | head -5

---
grep -r ""format-version"" . | head -5 grep -r ""creator"" . | head -5

---

Key Principle for Learning Mode

学习模式的核心原则

Balance understanding with efficiency:
  • ✅ Read 2-5 strategic files fully (based on context)
  • ✅ Use grep/head/tail for pattern discovery across many files
  • Ask user which aspect to focus on after initial exploration
  • Summarize findings before reading more
Don't:
  • ❌ Read 20+ files sequentially without strategy
  • ❌ Read files without understanding their role
  • ❌ Ignore repository context and documentation

平衡理解与效率:
  • ✅ 完整读取2-5个策略性文件(基于上下文)
  • ✅ 使用grep/head/tail在多个文件中发现模式
  • ✅ 初始探索后,询问用户要关注的方面
  • ✅ 读取更多文件前总结发现
不要:
  • ❌ 无策略地顺序读取20+个文件
  • ❌ 读取文件却不理解其作用
  • ❌ 忽略仓库上下文和文档

Quick Reference Card

快速参考卡

Model Selection (First Priority):
  • 🎓 Learning/Understanding → Use Opus
  • 🔧 Development/Debugging/Implementation → Use Sonnet (default)
Before ANY file operation, ask yourself:
  1. Can I use bash commands instead? (cp, sed, awk, grep) → 99%+ token savings
  2. Is this a simple text operation? → Use sed/awk, not Read/Edit
  3. Am I copying/merging files? → Use cp/cat, not Read/Write
  4. Can I check metadata first? (file size, line count, modification time)
  5. Can I filter before reading? (grep, head, tail)
  6. Can I read just the structure? (first 50 lines, function names)
  7. Can I summarize instead of showing raw data?
  8. Does the user really need the full content?
Default strategy for file operations:
bash
undefined
模型选择(第一优先级):
  • 🎓 学习/理解 → 使用Opus
  • 🔧 开发/调试/实现 → 使用Sonnet(默认)
任何文件操作前,问自己:
  1. 我可以用bash命令替代吗?(cp、sed、awk、grep)→ 99%+ Token节省
  2. 这是简单文本操作吗? → 使用sed/awk,而非Read/Edit
  3. 我是在复制/合并文件吗? → 使用cp/cat,而非Read/Write
  4. 我可以先检查元数据吗?(文件大小、行数、修改时间)
  5. 我可以读取前过滤吗?(grep、head、tail)
  6. 我可以仅读取结构吗?(前50行、函数名)
  7. 我可以总结而非显示原始数据吗?
  8. 用户真的需要完整内容吗?
文件操作默认策略:
bash
undefined

FIRST: Try bash commands

首先:尝试bash命令

cp source.txt dest.txt # Instead of Read + Write sed -i '' 's/old/new/g' file.txt # Instead of Read + Edit cat file1.txt file2.txt > combined.txt # Instead of Read + Read + Write echo "text" >> file.txt # Instead of Read + Write (append)
cp source.txt dest.txt # 替代Read + Write sed -i '' 's/old/new/g' file.txt # 替代Read + Edit cat file1.txt file2.txt > combined.txt # 替代Read + Read + Write echo "text" >> file.txt # 替代Read + Write(追加)

ONLY IF NEEDED: Read files

仅在需要时:读取文件

wc -l file.txt # Check size first head -20 file.txt # Read sample grep "pattern" file.txt | head -50 # Filter before reading
wc -l file.txt # 先检查大小 head -20 file.txt # 读取样本 grep "pattern" file.txt | head -50 # 读取前过滤

LAST RESORT: Full file read

最后手段:完整读取文件

Only when you need to understand code structure or complex logic

仅当您需要理解代码结构或复杂逻辑时


---

---

Cost Impact

成本影响

Conservative estimate for typical usage:
ApproachTokens/WeekClaude ProClaude TeamNotes
Wasteful (Read/Edit/Write everything)500K⚠️ At risk of limits✅ OKReading files unnecessarily
Moderate (filtered reads only)200K✅ Comfortable✅ Very comfortableGrep/head/tail usage
Efficient (bash commands + filters)30-50K✅ Very comfortable✅ ExcellentUsing cp/sed/awk instead of Read
Applying these rules reduces costs by 90-95% on average.
Bash commands optimization alone:
  • File operations: 99%+ token savings (e.g., 50K tokens → 50 tokens)
  • Most impactful single optimization
  • Zero learning curve (standard bash commands)

典型使用的保守估计:
方式每周TokenClaude ProClaude Team说明
低效(所有操作都用Read/Edit/Write)500K⚠️ 可能达到限制✅ 可行不必要地读取文件
中等(仅过滤读取)200K✅ 舒适✅ 非常舒适使用grep/head/tail
高效(bash命令 + 过滤)30-50K✅ 非常舒适✅ 极佳使用cp/sed/awk替代Read
应用这些规则平均可降低90-95%的成本。
仅bash命令优化:
  • 文件操作:99%+ Token节省(例如50K Token → 50 Token)
  • 最具影响力的单一优化
  • 零学习曲线(标准bash命令)

Implementation

实现

This skill automatically applies these optimizations when:
  • Reading log files
  • Executing commands with large output
  • Navigating codebases
  • Debugging errors
  • Checking system status
You can always override by saying:
  • "Show me the full output"
  • "Read the entire file"
  • "I want verbose mode"
  • "Don't worry about tokens"

此技能会在以下场景自动应用这些优化:
  • 读取日志文件
  • 执行产生大量输出的命令
  • 导航代码库
  • 调试错误
  • 检查系统状态
您可以通过以下方式始终覆盖:
  • "显示完整输出"
  • "读取整个文件"
  • "我要详细模式"
  • "不要担心Token"

Managing Long-Running Background Processes

管理长期运行的后台进程

Best Practices for Background Tasks

后台任务最佳实践

When running scripts that take hours, properly manage background processes to prevent resource leaks and enable clean session transitions:
1. Run in background with Bash tool
run_in_background: true
2. Document the process in status files:
markdown
undefined
当运行需要数小时的脚本时,正确管理后台进程以防止资源泄漏并实现干净的会话过渡:
1. 在后台运行,使用Bash工具
run_in_background: true
2. 在状态文件中记录进程:
markdown
undefined

Background Processes

后台进程

  • Script: comprehensive_search.py
  • Process ID: Available via BashOutput tool
  • Status: Running (~6% complete)
  • How to check: BashOutput tool with bash_id

**3. Kill cleanly** before session end:
```python
  • 脚本:comprehensive_search.py
  • 进程ID:可通过BashOutput工具获取
  • 状态:运行中(约6%完成)
  • 检查方式:使用bash_id调用BashOutput工具

**3. 会话结束前干净终止:**
```python

Before ending session:

会话结束前:

1. Kill all background processes

1. 终止所有后台进程

KillShell(shell_id="abc123")
KillShell(shell_id="abc123")

2. Create resume documentation (see claude-collaboration skill)

2. 创建恢复文档(见claude-collaboration技能)

3. Document current progress (files, counts, status)

3. 记录当前进度(文件、计数、状态)

4. Save intermediate results

4. 保存中间结果


**4. Design scripts to be resumable** (see Python Environment Management skill):
- Check for existing output files (skip if present)
- Load existing results and append new ones
- Save progress incrementally (not just at end)
- Track completion status in structured format

**4. 将脚本设计为可恢复**(见Python环境管理技能):
- 检查现有输出文件(如果存在则跳过)
- 加载现有结果并追加新结果
- 逐步保存进度(而非仅在结束时)
- 以结构化格式跟踪完成状态

Pre-Interruption Checklist

中断前检查清单

Before ending a session with running processes:
  1. ✅ Check background process status
  2. ✅ Kill all background processes cleanly
  3. ✅ Create resume documentation (RESUME_HERE.md)
  4. ✅ Document current progress with metrics
  5. ✅ Save intermediate results to disk
  6. ✅ Verify resume commands in documentation
在结束带有运行进程的会话前:
  1. ✅ 检查后台进程状态
  2. ✅ 干净终止所有后台进程
  3. ✅ 创建恢复文档(RESUME_HERE.md)
  4. ✅ 记录带指标的当前进度
  5. ✅ 将中间结果保存到磁盘
  6. ✅ 验证文档中的恢复命令

Token Efficiency Benefit

Token效率优势

Properly managing background processes:
  • Prevents context pollution - Old process output doesn't leak into new sessions
  • Enables clean handoff - Resume docs allow fresh session without re-explaining
  • Avoids redundant work - Resumable scripts don't repeat completed tasks

正确管理后台进程:
  • 防止上下文污染 - 旧进程输出不会泄漏到新会话
  • 实现干净交接 - 恢复文档允许新会话无需重新解释即可继续
  • 避免重复工作 - 可恢复脚本不会重复已完成任务

Repository Organization for Long Projects

长期项目的仓库组织

Problem

问题

Data enrichment and analysis projects generate many intermediate files, scripts, and logs that clutter the root directory, making it hard to:
  • Find the current working dataset
  • Identify which scripts are actively used
  • Navigate the project structure
  • Maintain focus on important files
数据丰富和分析项目会生成许多中间文件、脚本和日志,使根目录杂乱无章,难以:
  • 找到当前工作数据集
  • 识别哪些脚本在活跃使用
  • 导航项目结构
  • 专注于重要文件

Solution: Organize Early and Often

解决方案:尽早并经常组织

Create dedicated subfolders at project start:
bash
mkdir -p python_scripts/ logs/ tables/
Organization strategy:
  • python_scripts/
    - All analysis and processing scripts (16+ scripts in VGP project)
  • logs/
    - All execution logs from script runs (38+ logs in VGP project)
  • tables/
    - Intermediate results, old versions, and archived data
  • Root directory - Only main working dataset and current outputs
Benefits:
  • Reduces cognitive load when scanning directory
  • Makes git status cleaner and more readable
  • Easier to exclude intermediate files from version control
  • Faster file navigation with autocomplete
  • Professional project structure for collaboration
When to organize:
  • At project start (ideal)
  • After accumulating 5+ scripts or logs (acceptable)
  • Before sharing project with collaborators (essential)
Example cleanup script:
bash
undefined
项目开始时创建专用子文件夹:
bash
mkdir -p python_scripts/ logs/ tables/
组织策略:
  • python_scripts/
    - 所有分析和处理脚本(VGP项目中有16+个脚本)
  • logs/
    - 脚本运行的所有执行日志(VGP项目中有38+个日志)
  • tables/
    - 中间结果、旧版本和归档数据
  • 根目录 - 仅包含主工作数据集和当前输出
优势:
  • 扫描目录时减少认知负担
  • 使git status更简洁易读
  • 更容易从版本控制中排除中间文件
  • 使用自动补全更快导航文件
  • 适合协作的专业项目结构
何时组织:
  • 项目开始时(理想)
  • 积累5+个脚本或日志后(可接受)
  • 与协作者共享项目前(必要)
示例清理脚本:
bash
undefined

Move all Python scripts

移动所有Python脚本

mkdir -p python_scripts mv *.py python_scripts/
mkdir -p python_scripts mv *.py python_scripts/

Move all logs

移动所有日志

mkdir -p logs mv *.log logs/
mkdir -p logs mv *.log logs/

Move intermediate tables (keep main dataset in root)

移动中间表(将主数据集保留在根目录)

mkdir -p tables mv *_intermediate.csv *_backup.csv *_old.csv tables/

**Token efficiency impact:**
- Cleaner `ls` outputs (fewer lines to process)
- Easier to target specific directories with Glob
- Reduced cognitive overhead when navigating
- Faster file location with autocomplete

---
mkdir -p tables mv *_intermediate.csv *_backup.csv *_old.csv tables/

**Token效率影响:**
- 更简洁的`ls`输出(处理更少行)
- 更容易使用Glob定位特定目录
- 导航时减少认知开销
- 使用自动补全更快定位文件

---

Summary

总结

Core motto: Right model. Bash over Read. Filter first. Read selectively. Summarize intelligently.
Model selection (highest impact):
  • Use Opus for learning/understanding (one-time investment)
  • Use Sonnet for development/debugging/implementation (default)
  • This alone can save ~50% cost vs using Opus for everything
Primary optimization rule:
  • Use bash commands for file operations (cp, sed, awk, grep) instead of Read/Edit/Write
  • This alone can save 99%+ tokens on file operations
Secondary rules:
  • Filter before reading (grep, head, tail)
  • Read with limits when needed
  • Summarize instead of showing raw output
  • Use quiet modes for commands
  • Strategic file selection for learning
By following these guidelines, users can get 5-10x more value from their Claude subscription while maintaining high-quality assistance.
核心格言:正确的模型。优先Bash而非Read。先过滤。选择性读取。智能总结。
模型选择(影响最大):
  • 使用Opus进行学习/理解(一次性投入)
  • 使用Sonnet进行开发/调试/实现(默认)
  • 与全程使用Opus相比,这 alone 可节省约50%成本
主要优化规则:
  • 对文件操作使用bash命令(cp、sed、awk、grep)而非Read/Edit/Write
  • 这 alone 可在文件操作上节省99%+ Token
次要规则:
  • 读取前过滤(grep、head、tail)
  • 需要时带限制读取
  • 总结而非显示原始输出
  • 对命令使用安静模式
  • 学习时策略性选择文件
遵循这些准则,用户可从Claude订阅中获得5-10倍的价值,同时保持高质量协助。