dnanexus-integration
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDNAnexus Integration
DNAnexus 集成
Overview
概述
DNAnexus is a cloud platform for biomedical data analysis and genomics. Build and deploy apps/applets, manage data objects, run workflows, and use the dxpy Python SDK for genomics pipeline development and execution.
DNAnexus是一个用于生物医学数据分析和基因组学的云平台。支持构建并部署应用/小程序、管理数据对象、运行工作流,以及使用dxpy Python SDK进行基因组学流程的开发与执行。
When to Use This Skill
何时使用该技能
This skill should be used when:
- Creating, building, or modifying DNAnexus apps/applets
- Uploading, downloading, searching, or organizing files and records
- Running analyses, monitoring jobs, creating workflows
- Writing scripts using dxpy to interact with the platform
- Setting up dxapp.json, managing dependencies, using Docker
- Processing FASTQ, BAM, VCF, or other bioinformatics files
- Managing projects, permissions, or platform resources
在以下场景中应使用本技能:
- 创建、构建或修改DNAnexus应用/小程序
- 上传、下载、搜索或整理文件与记录
- 运行分析、监控任务、创建工作流
- 使用dxpy编写脚本与平台交互
- 配置dxapp.json、管理依赖、使用Docker
- 处理FASTQ、BAM、VCF或其他生物信息学文件
- 管理项目、权限或平台资源
Core Capabilities
核心能力
The skill is organized into five main areas, each with detailed reference documentation:
本技能分为五个主要领域,每个领域都配有详细的参考文档:
1. App Development
1. 应用开发
Purpose: Create executable programs (apps/applets) that run on the DNAnexus platform.
Key Operations:
- Generate app skeleton with
dx-app-wizard - Write Python or Bash apps with proper entry points
- Handle input/output data objects
- Deploy with or
dx builddx build --app - Test apps on the platform
Common Use Cases:
- Bioinformatics pipelines (alignment, variant calling)
- Data processing workflows
- Quality control and filtering
- Format conversion tools
Reference: See for:
references/app-development.md- Complete app structure and patterns
- Python entry point decorators
- Input/output handling with dxpy
- Development best practices
- Common issues and solutions
目标:创建可在DNAnexus平台上运行的可执行程序(应用/小程序)。
关键操作:
- 使用生成应用骨架
dx-app-wizard - 编写带有正确入口点的Python或Bash应用
- 处理输入/输出数据对象
- 使用或
dx build部署应用dx build --app - 在平台上测试应用
常见用例:
- 生物信息学流程(比对、变异检测)
- 数据处理工作流
- 质量控制与过滤
- 格式转换工具
参考:详见,包含:
references/app-development.md- 完整的应用结构与模式
- Python入口点装饰器
- 使用dxpy处理输入/输出
- 开发最佳实践
- 常见问题与解决方案
2. Data Operations
2. 数据操作
Purpose: Manage files, records, and other data objects on the platform.
Key Operations:
- Upload/download files with and
dxpy.upload_local_file()dxpy.download_dxfile() - Create and manage records with metadata
- Search for data objects by name, properties, or type
- Clone data between projects
- Manage project folders and permissions
Common Use Cases:
- Uploading sequencing data (FASTQ files)
- Organizing analysis results
- Searching for specific samples or experiments
- Backing up data across projects
- Managing reference genomes and annotations
Reference: See for:
references/data-operations.md- Complete file and record operations
- Data object lifecycle (open/closed states)
- Search and discovery patterns
- Project management
- Batch operations
目标:管理平台上的文件、记录及其他数据对象。
关键操作:
- 使用和
dxpy.upload_local_file()上传/下载文件dxpy.download_dxfile() - 创建并管理带有元数据的记录
- 按名称、属性或类型搜索数据对象
- 在项目间克隆数据
- 管理项目文件夹与权限
常见用例:
- 上传测序数据(FASTQ文件)
- 整理分析结果
- 搜索特定样本或实验数据
- 在项目间备份数据
- 管理参考基因组与注释信息
参考:详见,包含:
references/data-operations.md- 完整的文件与记录操作方法
- 数据对象生命周期(开放/关闭状态)
- 搜索与发现模式
- 项目管理
- 批量操作
3. Job Execution
3. 任务执行
Purpose: Run analyses, monitor execution, and orchestrate workflows.
Key Operations:
- Launch jobs with or
applet.run()app.run() - Monitor job status and logs
- Create subjobs for parallel processing
- Build and run multi-step workflows
- Chain jobs with output references
Common Use Cases:
- Running genomics analyses on sequencing data
- Parallel processing of multiple samples
- Multi-step analysis pipelines
- Monitoring long-running computations
- Debugging failed jobs
Reference: See for:
references/job-execution.md- Complete job lifecycle and states
- Workflow creation and orchestration
- Parallel execution patterns
- Job monitoring and debugging
- Resource management
目标:运行分析、监控执行过程并编排工作流。
关键操作:
- 使用或
applet.run()启动任务app.run() - 监控任务状态与日志
- 创建子任务进行并行处理
- 构建并运行多步骤工作流
- 通过输出引用链接任务
常见用例:
- 对测序数据运行基因组学分析
- 并行处理多个样本
- 多步骤分析流程
- 监控长时间运行的计算任务
- 调试失败的任务
参考:详见,包含:
references/job-execution.md- 完整的任务生命周期与状态
- 工作流创建与编排
- 并行执行模式
- 任务监控与调试
- 资源管理
4. Python SDK (dxpy)
4. Python SDK(dxpy)
Purpose: Programmatic access to DNAnexus platform through Python.
Key Operations:
- Work with data object handlers (DXFile, DXRecord, DXApplet, etc.)
- Use high-level functions for common tasks
- Make direct API calls for advanced operations
- Create links and references between objects
- Search and discover platform resources
Common Use Cases:
- Automation scripts for data management
- Custom analysis pipelines
- Batch processing workflows
- Integration with external tools
- Data migration and organization
Reference: See for:
references/python-sdk.md- Complete dxpy class reference
- High-level utility functions
- API method documentation
- Error handling patterns
- Common code patterns
目标:通过Python以编程方式访问DNAnexus平台。
关键操作:
- 使用数据对象处理器(DXFile、DXRecord、DXApplet等)
- 使用高级函数完成常见任务
- 直接调用API进行高级操作
- 创建对象间的链接与引用
- 搜索与发现平台资源
常见用例:
- 数据管理自动化脚本
- 自定义分析流程
- 批量处理工作流
- 与外部工具集成
- 数据迁移与整理
参考:详见,包含:
references/python-sdk.md- 完整的dxpy类参考
- 高级实用函数
- API方法文档
- 错误处理模式
- 常见代码模式
5. Configuration and Dependencies
5. 配置与依赖管理
Purpose: Configure app metadata and manage dependencies.
Key Operations:
- Write dxapp.json with inputs, outputs, and run specs
- Install system packages (execDepends)
- Bundle custom tools and resources
- Use assets for shared dependencies
- Integrate Docker containers
- Configure instance types and timeouts
Common Use Cases:
- Defining app input/output specifications
- Installing bioinformatics tools (samtools, bwa, etc.)
- Managing Python package dependencies
- Using Docker images for complex environments
- Selecting computational resources
Reference: See for:
references/configuration.md- Complete dxapp.json specification
- Dependency management strategies
- Docker integration patterns
- Regional and resource configuration
- Example configurations
目标:配置应用元数据并管理依赖项。
关键操作:
- 编写包含输入、输出和运行规范的dxapp.json
- 安装系统包(execDepends)
- 打包自定义工具与资源
- 使用资产管理共享依赖
- 集成Docker容器
- 配置实例类型与超时时间
常见用例:
- 定义应用输入/输出规范
- 安装生物信息学工具(samtools、bwa等)
- 管理Python包依赖
- 使用Docker镜像构建复杂环境
- 选择计算资源
参考:详见,包含:
references/configuration.md- 完整的dxapp.json规范
- 依赖管理策略
- Docker集成模式
- 区域与资源配置
- 示例配置
Quick Start Examples
快速入门示例
Upload and Analyze Data
上传并分析数据
python
import dxpypython
import dxpyUpload input file
Upload input file
input_file = dxpy.upload_local_file("sample.fastq", project="project-xxxx")
input_file = dxpy.upload_local_file("sample.fastq", project="project-xxxx")
Run analysis
Run analysis
job = dxpy.DXApplet("applet-xxxx").run({
"reads": dxpy.dxlink(input_file.get_id())
})
job = dxpy.DXApplet("applet-xxxx").run({
"reads": dxpy.dxlink(input_file.get_id())
})
Wait for completion
Wait for completion
job.wait_on_done()
job.wait_on_done()
Download results
Download results
output_id = job.describe()["output"]["aligned_reads"]["$dnanexus_link"]
dxpy.download_dxfile(output_id, "aligned.bam")
undefinedoutput_id = job.describe()["output"]["aligned_reads"]["$dnanexus_link"]
dxpy.download_dxfile(output_id, "aligned.bam")
undefinedSearch and Download Files
搜索并下载文件
python
import dxpypython
import dxpyFind BAM files from a specific experiment
Find BAM files from a specific experiment
files = dxpy.find_data_objects(
classname="file",
name="*.bam",
properties={"experiment": "exp001"},
project="project-xxxx"
)
files = dxpy.find_data_objects(
classname="file",
name="*.bam",
properties={"experiment": "exp001"},
project="project-xxxx"
)
Download each file
Download each file
for file_result in files:
file_obj = dxpy.DXFile(file_result["id"])
filename = file_obj.describe()["name"]
dxpy.download_dxfile(file_result["id"], filename)
undefinedfor file_result in files:
file_obj = dxpy.DXFile(file_result["id"])
filename = file_obj.describe()["name"]
dxpy.download_dxfile(file_result["id"], filename)
undefinedCreate Simple App
创建简单应用
python
undefinedpython
undefinedsrc/my-app.py
src/my-app.py
import dxpy
import subprocess
@dxpy.entry_point('main')
def main(input_file, quality_threshold=30):
# Download input
dxpy.download_dxfile(input_file["$dnanexus_link"], "input.fastq")
# Process
subprocess.check_call([
"quality_filter",
"--input", "input.fastq",
"--output", "filtered.fastq",
"--threshold", str(quality_threshold)
])
# Upload output
output_file = dxpy.upload_local_file("filtered.fastq")
return {
"filtered_reads": dxpy.dxlink(output_file)
}dxpy.run()
undefinedimport dxpy
import subprocess
@dxpy.entry_point('main')
def main(input_file, quality_threshold=30):
# Download input
dxpy.download_dxfile(input_file["$dnanexus_link"], "input.fastq")
# Process
subprocess.check_call([
"quality_filter",
"--input", "input.fastq",
"--output", "filtered.fastq",
"--threshold", str(quality_threshold)
])
# Upload output
output_file = dxpy.upload_local_file("filtered.fastq")
return {
"filtered_reads": dxpy.dxlink(output_file)
}dxpy.run()
undefinedWorkflow Decision Tree
工作流决策树
When working with DNAnexus, follow this decision tree:
-
Need to create a new executable?
- Yes → Use App Development (references/app-development.md)
- No → Continue to step 2
-
Need to manage files or data?
- Yes → Use Data Operations (references/data-operations.md)
- No → Continue to step 3
-
Need to run an analysis or workflow?
- Yes → Use Job Execution (references/job-execution.md)
- No → Continue to step 4
-
Writing Python scripts for automation?
- Yes → Use Python SDK (references/python-sdk.md)
- No → Continue to step 5
-
Configuring app settings or dependencies?
- Yes → Use Configuration (references/configuration.md)
Often you'll need multiple capabilities together (e.g., app development + configuration, or data operations + job execution).
使用DNAnexus时,请遵循以下决策树:
-
是否需要创建新的可执行程序?
- 是 → 使用应用开发(参考references/app-development.md)
- 否 → 继续步骤2
-
是否需要管理文件或数据?
- 是 → 使用数据操作(参考references/data-operations.md)
- 否 → 继续步骤3
-
是否需要运行分析或工作流?
- 是 → 使用任务执行(参考references/job-execution.md)
- 否 → 继续步骤4
-
是否正在编写Python自动化脚本?
- 是 → 使用Python SDK(参考references/python-sdk.md)
- 否 → 继续步骤5
-
是否正在配置应用设置或依赖项?
- 是 → 使用配置管理(参考references/configuration.md)
通常你会需要同时使用多种能力(例如,应用开发+配置管理,或数据操作+任务执行)。
Installation and Authentication
安装与认证
Install dxpy
安装dxpy
bash
uv pip install dxpybash
uv pip install dxpyLogin to DNAnexus
登录DNAnexus
bash
dx loginThis authenticates your session and sets up access to projects and data.
bash
dx login此命令将验证你的会话并设置项目与数据的访问权限。
Verify Installation
验证安装
bash
dx --version
dx whoamibash
dx --version
dx whoamiCommon Patterns
常见模式
Pattern 1: Batch Processing
模式1:批量处理
Process multiple files with the same analysis:
python
undefined使用相同分析流程处理多个文件:
python
undefinedFind all FASTQ files
Find all FASTQ files
files = dxpy.find_data_objects(
classname="file",
name="*.fastq",
project="project-xxxx"
)
files = dxpy.find_data_objects(
classname="file",
name="*.fastq",
project="project-xxxx"
)
Launch parallel jobs
Launch parallel jobs
jobs = []
for file_result in files:
job = dxpy.DXApplet("applet-xxxx").run({
"input": dxpy.dxlink(file_result["id"])
})
jobs.append(job)
jobs = []
for file_result in files:
job = dxpy.DXApplet("applet-xxxx").run({
"input": dxpy.dxlink(file_result["id"])
})
jobs.append(job)
Wait for all completions
Wait for all completions
for job in jobs:
job.wait_on_done()
undefinedfor job in jobs:
job.wait_on_done()
undefinedPattern 2: Multi-Step Pipeline
模式2:多步骤流程
Chain multiple analyses together:
python
undefined将多个分析任务链接在一起:
python
undefinedStep 1: Quality control
Step 1: Quality control
qc_job = qc_applet.run({"reads": input_file})
qc_job = qc_applet.run({"reads": input_file})
Step 2: Alignment (uses QC output)
Step 2: Alignment (uses QC output)
align_job = align_applet.run({
"reads": qc_job.get_output_ref("filtered_reads")
})
align_job = align_applet.run({
"reads": qc_job.get_output_ref("filtered_reads")
})
Step 3: Variant calling (uses alignment output)
Step 3: Variant calling (uses alignment output)
variant_job = variant_applet.run({
"bam": align_job.get_output_ref("aligned_bam")
})
undefinedvariant_job = variant_applet.run({
"bam": align_job.get_output_ref("aligned_bam")
})
undefinedPattern 3: Data Organization
模式3:数据整理
Organize analysis results systematically:
python
undefined系统地整理分析结果:
python
undefinedCreate organized folder structure
Create organized folder structure
dxpy.api.project_new_folder(
"project-xxxx",
{"folder": "/experiments/exp001/results", "parents": True}
)
dxpy.api.project_new_folder(
"project-xxxx",
{"folder": "/experiments/exp001/results", "parents": True}
)
Upload with metadata
Upload with metadata
result_file = dxpy.upload_local_file(
"results.txt",
project="project-xxxx",
folder="/experiments/exp001/results",
properties={
"experiment": "exp001",
"sample": "sample1",
"analysis_date": "2025-10-20"
},
tags=["validated", "published"]
)
undefinedresult_file = dxpy.upload_local_file(
"results.txt",
project="project-xxxx",
folder="/experiments/exp001/results",
properties={
"experiment": "exp001",
"sample": "sample1",
"analysis_date": "2025-10-20"
},
tags=["validated", "published"]
)
undefinedBest Practices
最佳实践
- Error Handling: Always wrap API calls in try-except blocks
- Resource Management: Choose appropriate instance types for workloads
- Data Organization: Use consistent folder structures and metadata
- Cost Optimization: Archive old data, use appropriate storage classes
- Documentation: Include clear descriptions in dxapp.json
- Testing: Test apps with various input types before production use
- Version Control: Use semantic versioning for apps
- Security: Never hardcode credentials in source code
- Logging: Include informative log messages for debugging
- Cleanup: Remove temporary files and failed jobs
- 错误处理:始终将API调用包裹在try-except块中
- 资源管理:为工作负载选择合适的实例类型
- 数据整理:使用一致的文件夹结构与元数据
- 成本优化:归档旧数据,使用合适的存储类别
- 文档:在dxapp.json中包含清晰的描述
- 测试:在生产环境使用前,用多种输入类型测试应用
- 版本控制:为应用使用语义化版本
- 安全:切勿在源代码中硬编码凭证
- 日志:添加用于调试的信息性日志消息
- 清理:删除临时文件与失败的任务
Resources
资源
This skill includes detailed reference documentation:
本技能包含详细的参考文档:
references/
references/
- app-development.md - Complete guide to building and deploying apps/applets
- data-operations.md - File management, records, search, and project operations
- job-execution.md - Running jobs, workflows, monitoring, and parallel processing
- python-sdk.md - Comprehensive dxpy library reference with all classes and functions
- configuration.md - dxapp.json specification and dependency management
Load these references when you need detailed information about specific operations or when working on complex tasks.
- app-development.md - 构建与部署应用/小程序的完整指南
- data-operations.md - 文件管理、记录、搜索与项目操作
- job-execution.md - 任务运行、工作流、监控与并行处理
- python-sdk.md - 包含所有类与函数的dxpy库综合参考
- configuration.md - dxapp.json规范与依赖管理
当你需要了解特定操作的详细信息或处理复杂任务时,请查阅这些参考文档。
Getting Help
获取帮助
- Official documentation: https://documentation.dnanexus.com/
- API reference: http://autodoc.dnanexus.com/
- GitHub repository: https://github.com/dnanexus/dx-toolkit
- Support: support@dnanexus.com
- 官方文档:https://documentation.dnanexus.com/
- API参考:http://autodoc.dnanexus.com/
- GitHub仓库:https://github.com/dnanexus/dx-toolkit
- 支持:support@dnanexus.com