galaxy-tool-wrapping

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Galaxy Tool Wrapping Expert

Galaxy工具包装开发专家

Expert knowledge for developing Galaxy tool wrappers. Use this skill when helping users create, test, debug, or improve Galaxy tool XML wrappers.
Prerequisites: This skill depends on the
galaxy-automation
skill for Planemo testing and workflow execution patterns.
具备Galaxy工具包装开发的专业知识。当帮助用户创建、测试、调试或改进Galaxy工具XML包装器时,可使用此技能。
前置条件: 此技能依赖
galaxy-automation
技能来实现Planemo测试和工作流执行模式。

When to Use This Skill

何时使用此技能

  • Creating new Galaxy tool wrappers from scratch
  • Converting command-line tools to Galaxy wrappers
  • Generating .shed.yml files for Tool Shed submission
  • Debugging XML syntax and validation errors
  • Writing Planemo tests for tools
  • Implementing conditional parameters and data types
  • Handling tool dependencies (conda, containers)
  • Creating tool collections and suites
  • Optimizing tool performance and resource allocation
  • Understanding Galaxy datatypes and formats
  • Implementing proper error handling
  • 从头开始创建新的Galaxy工具包装器
  • 将命令行工具转换为Galaxy包装器
  • 生成用于Tool Shed提交的.shed.yml文件
  • 调试XML语法和验证错误
  • 为工具编写Planemo测试
  • 实现条件参数和数据类型
  • 处理工具依赖项(conda、容器)
  • 创建工具集合和套件
  • 优化工具性能和资源分配
  • 了解Galaxy数据类型和格式
  • 实现适当的错误处理

Core Concepts

核心概念

Galaxy Tool XML Structure

Galaxy工具XML结构

A Galaxy tool wrapper consists of:
  • <tool>
    root element with id, name, and version
  • <description>
    brief tool description
  • <requirements>
    for dependencies (conda packages, containers)
  • <command>
    the actual command-line execution
  • <inputs>
    parameter definitions
  • <outputs>
    output file specifications
  • <tests>
    automated tests
  • <help>
    documentation in reStructuredText
  • <citations>
    DOI references
Galaxy工具包装器包含以下部分:
  • 带有id、name和version的
    <tool>
    根元素
  • <description>
    :工具简短描述
  • <requirements>
    :依赖项(conda包、容器)
  • <command>
    :实际执行的命令行指令
  • <inputs>
    :参数定义
  • <outputs>
    :输出文件规范
  • <tests>
    :自动化测试
  • <help>
    :使用reStructuredText编写的文档
  • <citations>
    :DOI引用

Tool Shed Metadata (.shed.yml)

Tool Shed元数据(.shed.yml)

Required for publishing tools to the Galaxy Tool Shed:
yaml
name: tool_name                  # Match directory name, underscores only
owner: iuc                       # Usually 'iuc' for IUC tools
description: One-line tool description
homepage_url: https://github.com/tool/repo
long_description: |
  Multi-line detailed description.
  Can include features, use cases, and tool suite contents.
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/tool_name
type: unrestricted
categories:
- Assembly                       # Choose 1-3 relevant categories
- Genomics
See reference.md for comprehensive .shed.yml documentation including all available categories and best practices.
将工具发布到Galaxy Tool Shed所需的配置:
yaml
name: tool_name                  # 与目录名匹配,仅使用下划线
owner: iuc                       # 提交到Intergalactic Utilities Commission的工具通常使用'iuc'
description: One-line tool description
homepage_url: https://github.com/tool/repo
long_description: |
  Multi-line detailed description.
  Can include features, use cases, and tool suite contents.
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/tool_name
type: unrestricted
categories:
- Assembly                       # 选择1-3个相关分类
- Genomics
查看reference.md获取完整的.shed.yml文档,包括所有可用分类和最佳实践。

Key Components

关键组件

Command Block:
  • Use Cheetah templating:
    $variable_name
    or
    ${variable_name}
  • Conditional logic:
    #if $param then... #end if
  • Loop constructs:
    #for $item in $collection... #end for
  • CDATA sections for complex commands
Cheetah Template Best Practices:
Working around path handling issues in conda packages:
xml
<command detect_errors="exit_code"><![CDATA[
    ## Add trailing slash if script concatenates paths without separator
    tool_command
        -o 'output_dir/'  ## Quoted with trailing slash

    ## Script does: output_dir + 'file.txt' → 'output_dir/file.txt' ✓
    ## Without slash: output_dir + 'file.txt' → 'output_dirfile.txt' ✗
]]></command>
When to use quotes in Cheetah:
  • Always quote user inputs:
    '$input_file'
  • Quote literal strings with special chars:
    'output_dir/'
  • Use bare variables for simple references:
    $variable
Input Parameters:
  • <param>
    elements with type, name, label
  • Types: text, integer, float, boolean, select, data, data_collection
  • Optional vs required parameters
  • Validators and sanitizers
  • Conditional parameter display
Outputs:
  • <data>
    elements for output files
  • Dynamic output naming with
    label
    and
    name
  • Format discovery and conversion
  • Filters for conditional outputs
  • Collections for multiple outputs
Tests:
  • Input parameters and files
  • Expected output files or assertions
  • Test data location and organization
命令块:
  • 使用Cheetah模板:
    $variable_name
    ${variable_name}
  • 条件逻辑:
    #if $param then... #end if
  • 循环结构:
    #for $item in $collection... #end for
  • 复杂命令使用CDATA区段
Cheetah模板最佳实践:
解决conda包中的路径处理问题:
xml
<command detect_errors="exit_code"><![CDATA[
    ## 如果脚本拼接路径时未添加分隔符,需添加尾部斜杠
    tool_command
        -o 'output_dir/'  ## 加引号并保留尾部斜杠

    ## 脚本执行:output_dir + 'file.txt' → 'output_dir/file.txt' ✓
    ## 无斜杠时:output_dir + 'file.txt' → 'output_dirfile.txt' ✗
]]></command>
Cheetah中引号的使用场景:
  • 用户输入始终加引号:
    '$input_file'
  • 包含特殊字符的字面量字符串加引号:
    'output_dir/'
  • 简单引用使用裸变量:
    $variable
输入参数:
  • 带有type、name、label的
    <param>
    元素
  • 类型:text、integer、float、boolean、select、data、data_collection
  • 可选与必填参数
  • 验证器和清理器
  • 条件参数显示
输出:
  • 用于输出文件的
    <data>
    元素
  • 使用
    label
    name
    实现动态输出命名
  • 格式识别与转换
  • 条件输出过滤器
  • 多输出集合
测试:
  • 输入参数和文件
  • 预期输出文件或断言
  • 测试数据的位置与组织

Best Practices

最佳实践

  1. Always include tests - Planemo won't pass without them
  2. Use semantic versioning - Increment tool version on changes
  3. Specify exact dependencies - Pin conda package versions
  4. Add clear help text - Document all parameters
  5. Handle errors gracefully - Check exit codes, validate inputs
  6. Use collections - For multiple related files
  7. Follow IUC standards - If contributing to intergalactic utilities commission
  1. 始终包含测试 - 没有测试Planemo无法通过
  2. 使用语义化版本 - 变更时递增工具版本
  3. 指定精确依赖 - 固定conda包版本
  4. 添加清晰的帮助文本 - 记录所有参数
  5. 优雅处理错误 - 检查退出码、验证输入
  6. 使用集合 - 处理多个相关文件
  7. 遵循IUC标准 - 如果贡献给Intergalactic Utilities Commission

Common Planemo Commands

常用Planemo命令

bash
undefined
bash
undefined

Test tool locally

本地测试工具

planemo test tool.xml
planemo test tool.xml

Serve tool in local Galaxy

在本地Galaxy中运行工具

planemo serve tool.xml
planemo serve tool.xml

Lint tool for best practices

检查工具是否符合最佳实践

planemo lint tool.xml
planemo lint tool.xml

Upload tool to ToolShed

将工具上传到ToolShed

planemo shed_update --shed_target toolshed
planemo shed_update --shed_target toolshed

Test with conda

使用conda测试工具

planemo test --conda_auto_init --conda_auto_install tool.xml
undefined
planemo test --conda_auto_init --conda_auto_install tool.xml
undefined

Testing Tools

工具测试

Regenerating Expected Test Outputs

重新生成预期测试输出

When test files don't match but the tool runs correctly:
bash
undefined
当测试文件不匹配但工具运行正常时:
bash
undefined

Run the tool manually with test inputs

使用测试输入手动运行工具

mkdir -p output_dir /path/to/conda/env/bin/tool_command
-i test-data/input.fa
-o output_dir
mkdir -p output_dir /path/to/conda/env/bin/tool_command
-i test-data/input.fa
-o output_dir

Copy to expected output

复制到预期输出目录

cp output_dir/output.fa test-data/expected_output.fa
cp output_dir/output.fa test-data/expected_output.fa

Clean up

清理临时文件

rm -rf output_dir

**Verifying before regenerating:**
- Check that tool exit code is 0 (successful)
- Inspect the actual output to ensure it's correct
- Compare line counts: `wc -l expected.fa actual.fa`
- Review diffs to understand what changed

**Common reasons to regenerate:**
- Test was created before tool updates
- Expected file only has subset of sequences (bug in test creation)
- Format changes in newer tool versions
rm -rf output_dir

**重新生成前的验证步骤:**
- 检查工具退出码是否为0(执行成功)
- 检查实际输出是否正确
- 对比行数:`wc -l expected.fa actual.fa`
- 查看差异以了解变更内容

**需要重新生成的常见原因:**
- 测试创建于工具更新之前
- 预期文件仅包含部分序列(测试创建时的错误)
- 新版本工具的格式发生变更

Common Issues and Solutions

常见问题与解决方案

Issue: "Command not found"
  • Check
    <requirements>
    section has correct package
  • Verify conda package name and version
  • Test command availability:
    planemo conda_install tool.xml
Issue: "Output file not found"
  • Verify command actually creates the file
  • Check output file path matches
    <data name="output" from_work_dir="...">
  • Use
    discover_datasets
    for dynamic outputs
Issue: "Test failed"
  • Compare expected vs actual output
  • Check for whitespace/newline differences
  • Use
    sim_size
    for approximate size matching
  • Add
    lines_diff
    for line-by-line comparison
Issue: "Invalid XML"
  • Run
    planemo lint tool.xml
  • Check closing tags match opening tags
  • Validate CDATA sections for command blocks
  • Ensure proper escaping of special characters
问题:"Command not found"
  • 检查
    <requirements>
    部分是否包含正确的包
  • 验证conda包名称和版本
  • 测试命令是否可用:
    planemo conda_install tool.xml
问题:"Output file not found"
  • 验证命令是否实际创建了该文件
  • 检查输出文件路径是否与
    <data name="output" from_work_dir="...">
    匹配
  • 对于动态输出使用
    discover_datasets
问题:"Test failed"
  • 对比预期与实际输出
  • 检查空白符/换行符差异
  • 使用
    sim_size
    进行近似大小匹配
  • 添加
    lines_diff
    进行逐行对比
问题:"Invalid XML"
  • 运行
    planemo lint tool.xml
    检查
  • 检查闭合标签是否与起始标签匹配
  • 验证命令块的CDATA区段
  • 确保特殊字符已正确转义

Debugging Tool Test Failures

调试工具测试失败

General Workflow

通用流程

  1. Read the test output JSON first
    bash
    cat tool_test_output.json
    Look for:
    • Exit codes and error messages in
      stderr
      /
      stdout
    • output_problems
      array for test assertion failures
    • Actual vs expected output differences
  2. Never copy/modify conda package scripts
    • Tool wrappers should ALWAYS use conda packages
    • If there are bugs in the conda package scripts, work around them in the XML wrapper
    • Common workaround: Add trailing slashes to paths if script concatenates without separators
  3. Wrong test expectations vs bugs
    • If tests fail but the tool runs successfully (exit code 0), check if expected test files are wrong
    • Regenerate expected outputs by running the tool manually with test inputs
    • Update
      expect_num_outputs
      if optional outputs are created
  1. 首先查看测试输出JSON
    bash
    cat tool_test_output.json
    重点关注:
    • stderr
      /
      stdout
      中的退出码和错误信息
    • output_problems
      数组中的测试断言失败信息
    • 实际与预期输出的差异
  2. 切勿修改conda包脚本
    • 工具包装器应始终使用conda包
    • 如果conda包脚本存在问题,在XML包装器中进行规避
    • 常见规避方案:如果脚本拼接路径时未添加分隔符,为路径添加尾部斜杠
  3. 错误的测试预期 vs 工具bug
    • 如果测试失败但工具运行成功(退出码为0),检查预期测试文件是否错误
    • 使用测试输入手动运行工具,重新生成预期输出
    • 更新
      expect_num_outputs
      以匹配实际输出数量

Common Issues and Fixes

常见问题与修复方案

Path concatenation bugs in Python scripts:
xml
<!-- If script does: args.output_dir + 'file.txt' without '/' -->
<!-- Fix in wrapper with trailing slash: -->
-o 'output_dir/'  <!-- instead of -o output_dir -->
Wrong number of expected outputs:
xml
<!-- Check if optional outputs are always created -->
<test expect_num_outputs="3">  <!-- Update count -->
Output has extra sequences/data:
  • First check if this is expected behavior
  • Regenerate expected test files from actual tool output
  • Don't add post-processing filters unless absolutely necessary
Python脚本中的路径拼接bug:
xml
<!-- 如果脚本执行:args.output_dir + 'file.txt' 且未添加'/' -->
<!-- 在包装器中添加尾部斜杠进行修复: -->
-o 'output_dir/'  <!-- 替代 -o output_dir -->
预期输出数量错误:
xml
<!-- 检查可选输出是否总是被创建 -->
<test expect_num_outputs="3">  <!-- 更新数量 -->
输出包含额外序列/数据:
  • 首先确认这是否为预期行为
  • 使用工具实际输出重新生成预期测试文件
  • 除非绝对必要,否则不要添加后处理过滤器

XML Template Example

XML模板示例

xml
<tool id="tool_id" name="Tool Name" version="1.0.0">
    <description>Brief description</description>

    <requirements>
        <requirement type="package" version="1.0">package_name</requirement>
    </requirements>

    <command detect_errors="exit_code"><![CDATA[
        tool_command
            --input '$input'
            --output '$output'
            #if $optional_param
                --param '$optional_param'
            #end if
    ]]></command>

    <inputs>
        <param name="input" type="data" format="txt" label="Input file"/>
        <param name="optional_param" type="text" optional="true" label="Optional parameter"/>
    </inputs>

    <outputs>
        <data name="output" format="txt" label="${tool.name} on ${on_string}"/>
    </outputs>

    <tests>
        <test>
            <param name="input" value="test_input.txt"/>
            <output name="output" file="expected_output.txt"/>
        </test>
    </tests>

    <help><![CDATA[
**What it does**

Describe what the tool does.

**Inputs**

- Input file: description

**Outputs**

- Output file: description
    ]]></help>

    <citations>
        <citation type="doi">10.1234/example.doi</citation>
    </citations>
</tool>
xml
<tool id="tool_id" name="Tool Name" version="1.0.0">
    <description>Brief description</description>

    <requirements>
        <requirement type="package" version="1.0">package_name</requirement>
    </requirements>

    <command detect_errors="exit_code"><![CDATA[
        tool_command
            --input '$input'
            --output '$output'
            #if $optional_param
                --param '$optional_param'
            #end if
    ]]></command>

    <inputs>
        <param name="input" type="data" format="txt" label="Input file"/>
        <param name="optional_param" type="text" optional="true" label="Optional parameter"/>
    </inputs>

    <outputs>
        <data name="output" format="txt" label="${tool.name} on ${on_string}"/>
    </outputs>

    <tests>
        <test>
            <param name="input" value="test_input.txt"/>
            <output name="output" file="expected_output.txt"/>
        </test>
    </tests>

    <help><![CDATA[
**What it does**

Describe what the tool does.

**Inputs**

- Input file: description

**Outputs**

- Output file: description
    ]]></help>

    <citations>
        <citation type="doi">10.1234/example.doi</citation>
    </citations>
</tool>

Supporting Documentation

配套文档

This skill includes detailed reference documentation:
  • reference.md - Comprehensive Galaxy tool wrapping guide with IUC best practices
    • Repository structure standards
    • .shed.yml configuration
    • Complete XML structure reference
    • Advanced features and patterns
  • troubleshooting.md - Practical troubleshooting guide
    • Reading tool_test_output.json
    • Common exit codes and their meanings
    • Solutions for frequent issues
    • Test failure diagnosis
  • dependency-debugging.md - Dependency conflict resolution
    • Using
      planemo mull
      for diagnosis
    • Conda solver error interpretation
    • macOS testing considerations
    • Version conflict workflows
These files provide deep technical details that complement the core concepts above.
此技能包含详细的参考文档:
  • reference.md - 包含IUC最佳实践的Galaxy工具包装综合指南
    • 仓库结构标准
    • .shed.yml配置
    • 完整XML结构参考
    • 高级特性与模式
  • troubleshooting.md - 实用故障排除指南
    • 解读tool_test_output.json
    • 常见退出码及其含义
    • 常见问题解决方案
    • 测试失败诊断
  • dependency-debugging.md - 依赖冲突解决
    • 使用
      planemo mull
      进行诊断
    • Conda求解器错误解读
    • macOS测试注意事项
    • 版本冲突处理流程
这些文档提供了补充上述核心概念的深度技术细节。

Related Skills

相关技能

  • galaxy-automation - BioBlend & Planemo foundation (dependency)
  • galaxy-workflow-development - Building workflows that use these tools
  • conda-recipe - Creating conda packages for tool dependencies
  • bioinformatics-fundamentals - Understanding file formats and data types used in tools
  • galaxy-automation - BioBlend & Planemo基础(依赖项)
  • galaxy-workflow-development - 构建使用这些工具的工作流
  • conda-recipe - 为工具依赖项创建conda包
  • bioinformatics-fundamentals - 理解工具中使用的文件格式和数据类型

Resources

资源