latchbio-integration

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

LatchBio Integration

LatchBio 集成

Overview

概述

Latch is a Python framework for building and deploying bioinformatics workflows as serverless pipelines. Built on Flyte, create workflows with @workflow/@task decorators, manage cloud data with LatchFile/LatchDir, configure resources, and integrate Nextflow/Snakemake pipelines.
Latch是一个Python框架,用于构建并将生物信息学工作流部署为无服务器管道。基于Flyte构建,可使用@workflow/@task装饰器创建工作流,通过LatchFile/LatchDir管理云数据,配置资源,并集成Nextflow/Snakemake管道。

Core Capabilities

核心功能

The Latch platform provides four main areas of functionality:
Latch平台提供四大核心功能领域:

1. Workflow Creation and Deployment

1. 工作流创建与部署

  • Define serverless workflows using Python decorators
  • Support for native Python, Nextflow, and Snakemake pipelines
  • Automatic containerization with Docker
  • Auto-generated no-code user interfaces
  • Version control and reproducibility
  • 使用Python装饰器定义无服务器工作流
  • 支持原生Python、Nextflow和Snakemake管道
  • 借助Docker自动容器化
  • 自动生成无代码用户界面
  • 版本控制与可复现性

2. Data Management

2. 数据管理

  • Cloud storage abstractions (LatchFile, LatchDir)
  • Structured data organization with Registry (Projects → Tables → Records)
  • Type-safe data operations with links and enums
  • Automatic file transfer between local and cloud
  • Glob pattern matching for file selection
  • 云存储抽象(LatchFile、LatchDir)
  • 借助Registry实现结构化数据组织(项目→表格→记录)
  • 支持链接和枚举的类型安全数据操作
  • 本地与云之间的自动文件传输
  • 文件选择的通配符模式匹配

3. Resource Configuration

3. 资源配置

  • Pre-configured task decorators (@small_task, @large_task, @small_gpu_task, @large_gpu_task)
  • Custom resource specifications (CPU, memory, GPU, storage)
  • GPU support (K80, V100, A100)
  • Timeout and storage configuration
  • Cost optimization strategies
  • 预配置的任务装饰器(@small_task、@large_task、@small_gpu_task、@large_gpu_task)
  • 自定义资源规格(CPU、内存、GPU、存储)
  • GPU支持(K80、V100、A100)
  • 超时与存储配置
  • 成本优化策略

4. Verified Workflows

4. 已验证工作流

  • Production-ready pre-built pipelines
  • Bulk RNA-seq, DESeq2, pathway analysis
  • AlphaFold and ColabFold for protein structure prediction
  • Single-cell tools (ArchR, scVelo, emptyDropsR)
  • CRISPR analysis, phylogenetics, and more
  • 生产就绪的预构建管道
  • 批量RNA-seq、DESeq2、通路分析
  • 用于蛋白质结构预测的AlphaFold和ColabFold
  • 单细胞工具(ArchR、scVelo、emptyDropsR)
  • CRISPR分析、系统发育分析等

Quick Start

快速开始

Installation and Setup

安装与设置

bash
undefined
bash
undefined

Install Latch SDK

Install Latch SDK

python3 -m uv pip install latch
python3 -m uv pip install latch

Login to Latch

Login to Latch

latch login
latch login

Initialize a new workflow

Initialize a new workflow

latch init my-workflow
latch init my-workflow

Register workflow to platform

Register workflow to platform

latch register my-workflow

**Prerequisites:**
- Docker installed and running
- Latch account credentials
- Python 3.8+
latch register my-workflow

**前置要求:**
- 已安装并运行Docker
- Latch账户凭证
- Python 3.8+

Basic Workflow Example

基础工作流示例

python
from latch import workflow, small_task
from latch.types import LatchFile

@small_task
def process_file(input_file: LatchFile) -> LatchFile:
    """Process a single file"""
    # Processing logic
    return output_file

@workflow
def my_workflow(input_file: LatchFile) -> LatchFile:
    """
    My bioinformatics workflow

    Args:
        input_file: Input data file
    """
    return process_file(input_file=input_file)
python
from latch import workflow, small_task
from latch.types import LatchFile

@small_task
def process_file(input_file: LatchFile) -> LatchFile:
    """Process a single file"""
    # Processing logic
    return output_file

@workflow
def my_workflow(input_file: LatchFile) -> LatchFile:
    """
    My bioinformatics workflow

    Args:
        input_file: Input data file
    """
    return process_file(input_file=input_file)

When to Use This Skill

适用场景

This skill should be used when encountering any of the following scenarios:
Workflow Development:
  • "Create a Latch workflow for RNA-seq analysis"
  • "Deploy my pipeline to Latch"
  • "Convert my Nextflow pipeline to Latch"
  • "Add GPU support to my workflow"
  • Working with
    @workflow
    ,
    @task
    decorators
Data Management:
  • "Organize my sequencing data in Latch Registry"
  • "How do I use LatchFile and LatchDir?"
  • "Set up sample tracking in Latch"
  • Working with
    latch:///
    paths
Resource Configuration:
  • "Configure GPU for AlphaFold on Latch"
  • "My task is running out of memory"
  • "How do I optimize workflow costs?"
  • Working with task decorators
Verified Workflows:
  • "Run AlphaFold on Latch"
  • "Use DESeq2 for differential expression"
  • "Available pre-built workflows"
  • Using
    latch.verified
    module
当遇到以下场景时,可使用本技能:
工作流开发:
  • "为RNA-seq分析创建Latch工作流"
  • "将我的管道部署到Latch"
  • "将我的Nextflow管道转换为Latch格式"
  • "为我的工作流添加GPU支持"
  • 使用
    @workflow
    @task
    装饰器
数据管理:
  • "在Latch Registry中整理我的测序数据"
  • "如何使用LatchFile和LatchDir?"
  • "在Latch中设置样本追踪"
  • 使用
    latch:///
    路径
资源配置:
  • "在Latch上为AlphaFold配置GPU"
  • "我的任务内存不足"
  • "如何优化工作流成本?"
  • 使用任务装饰器
已验证工作流:
  • "在Latch上运行AlphaFold"
  • "使用DESeq2进行差异表达分析"
  • "可用的预构建工作流"
  • 使用
    latch.verified
    模块

Detailed Documentation

详细文档

This skill includes comprehensive reference documentation organized by capability:
本技能包含按功能分类的全面参考文档:

references/workflow-creation.md

references/workflow-creation.md

Read this for:
  • Creating and registering workflows
  • Task definition and decorators
  • Supporting Python, Nextflow, Snakemake
  • Launch plans and conditional sections
  • Workflow execution (CLI and programmatic)
  • Multi-step and parallel pipelines
  • Troubleshooting registration issues
Key topics:
  • latch init
    and
    latch register
    commands
  • @workflow
    and
    @task
    decorators
  • LatchFile and LatchDir basics
  • Type annotations and docstrings
  • Launch plans with preset parameters
  • Conditional UI sections
适用场景:
  • 创建并注册工作流
  • 任务定义与装饰器
  • 支持Python、Nextflow、Snakemake
  • 启动计划与条件区域
  • 工作流执行(CLI与程序化方式)
  • 多步骤与并行管道
  • 注册问题排查
核心主题:
  • latch init
    latch register
    命令
  • @workflow
    @task
    装饰器
  • LatchFile和LatchDir基础
  • 类型注解与文档字符串
  • 带预设参数的启动计划
  • 条件UI区域

references/data-management.md

references/data-management.md

Read this for:
  • Cloud storage with LatchFile and LatchDir
  • Registry system (Projects, Tables, Records)
  • Linked records and relationships
  • Enum and typed columns
  • Bulk operations and transactions
  • Integration with workflows
  • Account and workspace management
Key topics:
  • latch:///
    path format
  • File transfer and glob patterns
  • Creating and querying Registry tables
  • Column types (string, number, file, link, enum)
  • Record CRUD operations
  • Workflow-Registry integration
适用场景:
  • 使用LatchFile和LatchDir进行云存储
  • Registry系统(项目、表格、记录)
  • 关联记录与关系
  • 枚举与类型化列
  • 批量操作与事务
  • 与工作流集成
  • 账户与工作区管理
核心主题:
  • latch:///
    路径格式
  • 文件传输与通配符模式
  • 创建与查询Registry表格
  • 列类型(字符串、数字、文件、链接、枚举)
  • 记录CRUD操作
  • 工作流-Registry集成

references/resource-configuration.md

references/resource-configuration.md

Read this for:
  • Task resource decorators
  • Custom CPU, memory, GPU configuration
  • GPU types (K80, V100, A100)
  • Timeout and storage settings
  • Resource optimization strategies
  • Cost-effective workflow design
  • Monitoring and debugging
Key topics:
  • @small_task
    ,
    @large_task
    ,
    @small_gpu_task
    ,
    @large_gpu_task
  • @custom_task
    with precise specifications
  • Multi-GPU configuration
  • Resource selection by workload type
  • Platform limits and quotas
适用场景:
  • 任务资源装饰器
  • 自定义CPU、内存、GPU配置
  • GPU类型(K80、V100、A100)
  • 超时与存储设置
  • 资源优化策略
  • 高性价比工作流设计
  • 监控与调试
核心主题:
  • @small_task
    @large_task
    @small_gpu_task
    @large_gpu_task
  • 带精确规格的
    @custom_task
  • 多GPU配置
  • 按工作负载类型选择资源
  • 平台限制与配额

references/verified-workflows.md

references/verified-workflows.md

Read this for:
  • Pre-built production workflows
  • Bulk RNA-seq and DESeq2
  • AlphaFold and ColabFold
  • Single-cell analysis (ArchR, scVelo)
  • CRISPR editing analysis
  • Pathway enrichment
  • Integration with custom workflows
Key topics:
  • latch.verified
    module imports
  • Available verified workflows
  • Workflow parameters and options
  • Combining verified and custom steps
  • Version management
适用场景:
  • 预构建生产级工作流
  • 批量RNA-seq与DESeq2
  • AlphaFold与ColabFold
  • 单细胞分析(ArchR、scVelo)
  • CRISPR编辑分析
  • 通路富集
  • 与自定义工作流集成
核心主题:
  • latch.verified
    模块导入
  • 可用的已验证工作流
  • 工作流参数与选项
  • 组合已验证步骤与自定义步骤
  • 版本管理

Common Workflow Patterns

常见工作流模式

Complete RNA-seq Pipeline

完整RNA-seq管道

python
from latch import workflow, small_task, large_task
from latch.types import LatchFile, LatchDir

@small_task
def quality_control(fastq: LatchFile) -> LatchFile:
    """Run FastQC"""
    return qc_output

@large_task
def alignment(fastq: LatchFile, genome: str) -> LatchFile:
    """STAR alignment"""
    return bam_output

@small_task
def quantification(bam: LatchFile) -> LatchFile:
    """featureCounts"""
    return counts

@workflow
def rnaseq_pipeline(
    input_fastq: LatchFile,
    genome: str,
    output_dir: LatchDir
) -> LatchFile:
    """RNA-seq analysis pipeline"""
    qc = quality_control(fastq=input_fastq)
    aligned = alignment(fastq=qc, genome=genome)
    return quantification(bam=aligned)
python
from latch import workflow, small_task, large_task
from latch.types import LatchFile, LatchDir

@small_task
def quality_control(fastq: LatchFile) -> LatchFile:
    """Run FastQC"""
    return qc_output

@large_task
def alignment(fastq: LatchFile, genome: str) -> LatchFile:
    """STAR alignment"""
    return bam_output

@small_task
def quantification(bam: LatchFile) -> LatchFile:
    """featureCounts"""
    return counts

@workflow
def rnaseq_pipeline(
    input_fastq: LatchFile,
    genome: str,
    output_dir: LatchDir
) -> LatchFile:
    """RNA-seq analysis pipeline"""
    qc = quality_control(fastq=input_fastq)
    aligned = alignment(fastq=qc, genome=genome)
    return quantification(bam=aligned)

GPU-Accelerated Workflow

GPU加速工作流

python
from latch import workflow, small_task, large_gpu_task
from latch.types import LatchFile

@small_task
def preprocess(input_file: LatchFile) -> LatchFile:
    """Prepare data"""
    return processed

@large_gpu_task
def gpu_computation(data: LatchFile) -> LatchFile:
    """GPU-accelerated analysis"""
    return results

@workflow
def gpu_pipeline(input_file: LatchFile) -> LatchFile:
    """Pipeline with GPU tasks"""
    preprocessed = preprocess(input_file=input_file)
    return gpu_computation(data=preprocessed)
python
from latch import workflow, small_task, large_gpu_task
from latch.types import LatchFile

@small_task
def preprocess(input_file: LatchFile) -> LatchFile:
    """Prepare data"""
    return processed

@large_gpu_task
def gpu_computation(data: LatchFile) -> LatchFile:
    """GPU-accelerated analysis"""
    return results

@workflow
def gpu_pipeline(input_file: LatchFile) -> LatchFile:
    """Pipeline with GPU tasks"""
    preprocessed = preprocess(input_file=input_file)
    return gpu_computation(data=preprocessed)

Registry-Integrated Workflow

集成Registry的工作流

python
from latch import workflow, small_task
from latch.registry.table import Table
from latch.registry.record import Record
from latch.types import LatchFile

@small_task
def process_and_track(sample_id: str, table_id: str) -> str:
    """Process sample and update Registry"""
    # Get sample from registry
    table = Table.get(table_id=table_id)
    records = Record.list(table_id=table_id, filter={"sample_id": sample_id})
    sample = records[0]

    # Process
    input_file = sample.values["fastq_file"]
    output = process(input_file)

    # Update registry
    sample.update(values={"status": "completed", "result": output})
    return "Success"

@workflow
def registry_workflow(sample_id: str, table_id: str):
    """Workflow integrated with Registry"""
    return process_and_track(sample_id=sample_id, table_id=table_id)
python
from latch import workflow, small_task
from latch.registry.table import Table
from latch.registry.record import Record
from latch.types import LatchFile

@small_task
def process_and_track(sample_id: str, table_id: str) -> str:
    """Process sample and update Registry"""
    # Get sample from registry
    table = Table.get(table_id=table_id)
    records = Record.list(table_id=table_id, filter={"sample_id": sample_id})
    sample = records[0]

    # Process
    input_file = sample.values["fastq_file"]
    output = process(input_file)

    # Update registry
    sample.update(values={"status": "completed", "result": output})
    return "Success"

@workflow
def registry_workflow(sample_id: str, table_id: str):
    """Workflow integrated with Registry"""
    return process_and_track(sample_id=sample_id, table_id=table_id)

Best Practices

最佳实践

Workflow Design

工作流设计

  1. Use type annotations for all parameters
  2. Write clear docstrings (appear in UI)
  3. Start with standard task decorators, scale up if needed
  4. Break complex workflows into modular tasks
  5. Implement proper error handling
  1. 为所有参数添加类型注解
  2. 编写清晰的文档字符串(会显示在UI中)
  3. 从标准任务装饰器开始,必要时再扩展
  4. 将复杂工作流拆分为模块化任务
  5. 实现适当的错误处理

Data Management

数据管理

  1. Use consistent folder structures
  2. Define Registry schemas before bulk entry
  3. Use linked records for relationships
  4. Store metadata in Registry for traceability
  1. 使用一致的文件夹结构
  2. 批量录入前定义Registry模式
  3. 使用关联记录建立关系
  4. 在Registry中存储元数据以实现可追溯性

Resource Configuration

资源配置

  1. Right-size resources (don't over-allocate)
  2. Use GPU only when algorithms support it
  3. Monitor execution metrics and optimize
  4. Design for parallel execution when possible
  1. 合理分配资源(不要过度分配)
  2. 仅在算法支持时使用GPU
  3. 监控执行指标并优化
  4. 尽可能设计为并行执行

Development Workflow

开发工作流

  1. Test locally with Docker before registration
  2. Use version control for workflow code
  3. Document resource requirements
  4. Profile workflows to determine actual needs
  1. 注册前使用Docker本地测试
  2. 为工作流代码使用版本控制
  3. 记录资源需求
  4. 分析工作流以确定实际需求

Troubleshooting

故障排查

Common Issues

常见问题

Registration Failures:
  • Ensure Docker is running
  • Check authentication with
    latch login
  • Verify all dependencies in Dockerfile
  • Use
    --verbose
    flag for detailed logs
Resource Problems:
  • Out of memory: Increase memory in task decorator
  • Timeouts: Increase timeout parameter
  • Storage issues: Increase ephemeral storage_gib
Data Access:
  • Use correct
    latch:///
    path format
  • Verify file exists in workspace
  • Check permissions for shared workspaces
Type Errors:
  • Add type annotations to all parameters
  • Use LatchFile/LatchDir for file/directory parameters
  • Ensure workflow return type matches actual return
注册失败:
  • 确保Docker正在运行
  • 使用
    latch login
    检查认证状态
  • 验证Dockerfile中的所有依赖
  • 使用
    --verbose
    标志获取详细日志
资源问题:
  • 内存不足:在任务装饰器中增加内存
  • 超时:增加timeout参数
  • 存储问题:增加ephemeral storage_gib
数据访问:
  • 使用正确的
    latch:///
    路径格式
  • 验证文件是否存在于工作区
  • 检查共享工作区的权限
类型错误:
  • 为所有参数添加类型注解
  • 对文件/目录参数使用LatchFile/LatchDir
  • 确保工作流返回类型与实际返回值匹配

Additional Resources

额外资源

Support

支持

For issues or questions:
  1. Check documentation links above
  2. Search GitHub issues
  3. Ask in Slack community
  4. Contact support@latch.bio
如遇问题或疑问:
  1. 查看上述文档链接
  2. 搜索GitHub问题
  3. 在Slack社区提问
  4. 联系support@latch.bio