latchbio-integration

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LatchBio Integration

LatchBio 集成

Overview

概述

Latch is a Python framework for building and deploying bioinformatics workflows as serverless pipelines. Built on Flyte, create workflows with @workflow/@task decorators, manage cloud data with LatchFile/LatchDir, configure resources, and integrate Nextflow/Snakemake pipelines.

Latch是一个Python框架，用于构建并将生物信息学工作流部署为无服务器管道。基于Flyte构建，可使用@workflow/@task装饰器创建工作流，通过LatchFile/LatchDir管理云数据，配置资源，并集成Nextflow/Snakemake管道。

Core Capabilities

核心功能

The Latch platform provides four main areas of functionality:

Latch平台提供四大核心功能领域：

1. Workflow Creation and Deployment

1. 工作流创建与部署

Define serverless workflows using Python decorators
Support for native Python, Nextflow, and Snakemake pipelines
Automatic containerization with Docker
Auto-generated no-code user interfaces
Version control and reproducibility

使用Python装饰器定义无服务器工作流
支持原生Python、Nextflow和Snakemake管道
借助Docker自动容器化
自动生成无代码用户界面
版本控制与可复现性

2. Data Management

2. 数据管理

Cloud storage abstractions (LatchFile, LatchDir)
Structured data organization with Registry (Projects → Tables → Records)
Type-safe data operations with links and enums
Automatic file transfer between local and cloud
Glob pattern matching for file selection

云存储抽象（LatchFile、LatchDir）
借助Registry实现结构化数据组织（项目→表格→记录）
支持链接和枚举的类型安全数据操作
本地与云之间的自动文件传输
文件选择的通配符模式匹配

3. Resource Configuration

3. 资源配置

Pre-configured task decorators (@small_task, @large_task, @small_gpu_task, @large_gpu_task)
Custom resource specifications (CPU, memory, GPU, storage)
GPU support (K80, V100, A100)
Timeout and storage configuration
Cost optimization strategies

预配置的任务装饰器（@small_task、@large_task、@small_gpu_task、@large_gpu_task）
自定义资源规格（CPU、内存、GPU、存储）
GPU支持（K80、V100、A100）
超时与存储配置
成本优化策略

4. Verified Workflows

4. 已验证工作流

Production-ready pre-built pipelines
Bulk RNA-seq, DESeq2, pathway analysis
AlphaFold and ColabFold for protein structure prediction
Single-cell tools (ArchR, scVelo, emptyDropsR)
CRISPR analysis, phylogenetics, and more

生产就绪的预构建管道
批量RNA-seq、DESeq2、通路分析
用于蛋白质结构预测的AlphaFold和ColabFold
单细胞工具（ArchR、scVelo、emptyDropsR）
CRISPR分析、系统发育分析等

Quick Start

快速开始

Installation and Setup

安装与设置

bash

undefined

bash

undefined

Install Latch SDK

python3 -m uv pip install latch

Login to Latch

latch login

Initialize a new workflow

latch init my-workflow

Register workflow to platform

latch register my-workflow


**Prerequisites:**
- Docker installed and running
- Latch account credentials
- Python 3.8+

latch register my-workflow


**前置要求：**
- 已安装并运行Docker
- Latch账户凭证
- Python 3.8+

Basic Workflow Example

基础工作流示例

python

from latch import workflow, small_task
from latch.types import LatchFile

@small_task
def process_file(input_file: LatchFile) -> LatchFile:
    """Process a single file"""
    # Processing logic
    return output_file

@workflow
def my_workflow(input_file: LatchFile) -> LatchFile:
    """
    My bioinformatics workflow

    Args:
        input_file: Input data file
    """
    return process_file(input_file=input_file)

python

from latch import workflow, small_task
from latch.types import LatchFile

@small_task
def process_file(input_file: LatchFile) -> LatchFile:
    """Process a single file"""
    # Processing logic
    return output_file

@workflow
def my_workflow(input_file: LatchFile) -> LatchFile:
    """
    My bioinformatics workflow

    Args:
        input_file: Input data file
    """
    return process_file(input_file=input_file)

When to Use This Skill

适用场景

This skill should be used when encountering any of the following scenarios:

Workflow Development:

"Create a Latch workflow for RNA-seq analysis"
"Deploy my pipeline to Latch"
"Convert my Nextflow pipeline to Latch"
"Add GPU support to my workflow"
Working with
```
@workflow
```
,
```
@task
```
decorators

Data Management:

"Organize my sequencing data in Latch Registry"
"How do I use LatchFile and LatchDir?"
"Set up sample tracking in Latch"
Working with
```
latch:///
```
paths

Resource Configuration:

"Configure GPU for AlphaFold on Latch"
"My task is running out of memory"
"How do I optimize workflow costs?"
Working with task decorators

Verified Workflows:

"Run AlphaFold on Latch"
"Use DESeq2 for differential expression"
"Available pre-built workflows"
Using
```
latch.verified
```
module

当遇到以下场景时，可使用本技能：

工作流开发：

"为RNA-seq分析创建Latch工作流"
"将我的管道部署到Latch"
"将我的Nextflow管道转换为Latch格式"
"为我的工作流添加GPU支持"
使用
```
@workflow
```
、
```
@task
```
装饰器

数据管理：

"在Latch Registry中整理我的测序数据"
"如何使用LatchFile和LatchDir？"
"在Latch中设置样本追踪"
使用
```
latch:///
```
路径

资源配置：

"在Latch上为AlphaFold配置GPU"
"我的任务内存不足"
"如何优化工作流成本？"
使用任务装饰器

已验证工作流：

"在Latch上运行AlphaFold"
"使用DESeq2进行差异表达分析"
"可用的预构建工作流"
使用
```
latch.verified
```
模块

Detailed Documentation

详细文档

This skill includes comprehensive reference documentation organized by capability:

本技能包含按功能分类的全面参考文档：

references/workflow-creation.md

Read this for:

Creating and registering workflows
Task definition and decorators
Supporting Python, Nextflow, Snakemake
Launch plans and conditional sections
Workflow execution (CLI and programmatic)
Multi-step and parallel pipelines
Troubleshooting registration issues

Key topics:

```
latch init
```
and
```
latch register
```
commands
```
@workflow
```
and
```
@task
```
decorators
LatchFile and LatchDir basics
Type annotations and docstrings
Launch plans with preset parameters
Conditional UI sections

适用场景：

创建并注册工作流
任务定义与装饰器
支持Python、Nextflow、Snakemake
启动计划与条件区域
工作流执行（CLI与程序化方式）
多步骤与并行管道
注册问题排查

核心主题：

```
latch init
```
和
```
latch register
```
命令
```
@workflow
```
和
```
@task
```
装饰器
LatchFile和LatchDir基础
类型注解与文档字符串
带预设参数的启动计划
条件UI区域

references/data-management.md

Read this for:

Cloud storage with LatchFile and LatchDir
Registry system (Projects, Tables, Records)
Linked records and relationships
Enum and typed columns
Bulk operations and transactions
Integration with workflows
Account and workspace management

Key topics:

```
latch:///
```
path format
File transfer and glob patterns
Creating and querying Registry tables
Column types (string, number, file, link, enum)
Record CRUD operations
Workflow-Registry integration

适用场景：

使用LatchFile和LatchDir进行云存储
Registry系统（项目、表格、记录）
关联记录与关系
枚举与类型化列
批量操作与事务
与工作流集成
账户与工作区管理

核心主题：

```
latch:///
```
路径格式
文件传输与通配符模式
创建与查询Registry表格
列类型（字符串、数字、文件、链接、枚举）
记录CRUD操作
工作流-Registry集成

references/resource-configuration.md

Read this for:

Task resource decorators
Custom CPU, memory, GPU configuration
GPU types (K80, V100, A100)
Timeout and storage settings
Resource optimization strategies
Cost-effective workflow design
Monitoring and debugging

Key topics:

@small_task

@large_task

@small_gpu_task

@large_gpu_task

```
@custom_task
```
with precise specifications
Multi-GPU configuration
Resource selection by workload type
Platform limits and quotas

适用场景：

任务资源装饰器
自定义CPU、内存、GPU配置
GPU类型（K80、V100、A100）
超时与存储设置
资源优化策略
高性价比工作流设计
监控与调试

核心主题：

@small_task

、

@large_task

、

@small_gpu_task

、

@large_gpu_task

带精确规格的
```
@custom_task
```
多GPU配置
按工作负载类型选择资源
平台限制与配额

references/verified-workflows.md

Read this for:

Pre-built production workflows
Bulk RNA-seq and DESeq2
AlphaFold and ColabFold
Single-cell analysis (ArchR, scVelo)
CRISPR editing analysis
Pathway enrichment
Integration with custom workflows

Key topics:

```
latch.verified
```
module imports
Available verified workflows
Workflow parameters and options
Combining verified and custom steps
Version management

适用场景：

预构建生产级工作流
批量RNA-seq与DESeq2
AlphaFold与ColabFold
单细胞分析（ArchR、scVelo）
CRISPR编辑分析
通路富集
与自定义工作流集成

核心主题：

```
latch.verified
```
模块导入
可用的已验证工作流
工作流参数与选项
组合已验证步骤与自定义步骤
版本管理

Common Workflow Patterns

常见工作流模式

Complete RNA-seq Pipeline

完整RNA-seq管道

python

from latch import workflow, small_task, large_task
from latch.types import LatchFile, LatchDir

@small_task
def quality_control(fastq: LatchFile) -> LatchFile:
    """Run FastQC"""
    return qc_output

@large_task
def alignment(fastq: LatchFile, genome: str) -> LatchFile:
    """STAR alignment"""
    return bam_output

@small_task
def quantification(bam: LatchFile) -> LatchFile:
    """featureCounts"""
    return counts

@workflow
def rnaseq_pipeline(
    input_fastq: LatchFile,
    genome: str,
    output_dir: LatchDir
) -> LatchFile:
    """RNA-seq analysis pipeline"""
    qc = quality_control(fastq=input_fastq)
    aligned = alignment(fastq=qc, genome=genome)
    return quantification(bam=aligned)

python

from latch import workflow, small_task, large_task
from latch.types import LatchFile, LatchDir

@small_task
def quality_control(fastq: LatchFile) -> LatchFile:
    """Run FastQC"""
    return qc_output

@large_task
def alignment(fastq: LatchFile, genome: str) -> LatchFile:
    """STAR alignment"""
    return bam_output

@small_task
def quantification(bam: LatchFile) -> LatchFile:
    """featureCounts"""
    return counts

@workflow
def rnaseq_pipeline(
    input_fastq: LatchFile,
    genome: str,
    output_dir: LatchDir
) -> LatchFile:
    """RNA-seq analysis pipeline"""
    qc = quality_control(fastq=input_fastq)
    aligned = alignment(fastq=qc, genome=genome)
    return quantification(bam=aligned)

GPU-Accelerated Workflow

GPU加速工作流

python

from latch import workflow, small_task, large_gpu_task
from latch.types import LatchFile

@small_task
def preprocess(input_file: LatchFile) -> LatchFile:
    """Prepare data"""
    return processed

@large_gpu_task
def gpu_computation(data: LatchFile) -> LatchFile:
    """GPU-accelerated analysis"""
    return results

@workflow
def gpu_pipeline(input_file: LatchFile) -> LatchFile:
    """Pipeline with GPU tasks"""
    preprocessed = preprocess(input_file=input_file)
    return gpu_computation(data=preprocessed)

python

from latch import workflow, small_task, large_gpu_task
from latch.types import LatchFile

@small_task
def preprocess(input_file: LatchFile) -> LatchFile:
    """Prepare data"""
    return processed

@large_gpu_task
def gpu_computation(data: LatchFile) -> LatchFile:
    """GPU-accelerated analysis"""
    return results

@workflow
def gpu_pipeline(input_file: LatchFile) -> LatchFile:
    """Pipeline with GPU tasks"""
    preprocessed = preprocess(input_file=input_file)
    return gpu_computation(data=preprocessed)

Registry-Integrated Workflow

集成Registry的工作流

python

from latch import workflow, small_task
from latch.registry.table import Table
from latch.registry.record import Record
from latch.types import LatchFile

@small_task
def process_and_track(sample_id: str, table_id: str) -> str:
    """Process sample and update Registry"""
    # Get sample from registry
    table = Table.get(table_id=table_id)
    records = Record.list(table_id=table_id, filter={"sample_id": sample_id})
    sample = records[0]

    # Process
    input_file = sample.values["fastq_file"]
    output = process(input_file)

    # Update registry
    sample.update(values={"status": "completed", "result": output})
    return "Success"

@workflow
def registry_workflow(sample_id: str, table_id: str):
    """Workflow integrated with Registry"""
    return process_and_track(sample_id=sample_id, table_id=table_id)

python

from latch import workflow, small_task
from latch.registry.table import Table
from latch.registry.record import Record
from latch.types import LatchFile

@small_task
def process_and_track(sample_id: str, table_id: str) -> str:
    """Process sample and update Registry"""
    # Get sample from registry
    table = Table.get(table_id=table_id)
    records = Record.list(table_id=table_id, filter={"sample_id": sample_id})
    sample = records[0]

    # Process
    input_file = sample.values["fastq_file"]
    output = process(input_file)

    # Update registry
    sample.update(values={"status": "completed", "result": output})
    return "Success"

@workflow
def registry_workflow(sample_id: str, table_id: str):
    """Workflow integrated with Registry"""
    return process_and_track(sample_id=sample_id, table_id=table_id)

Best Practices

最佳实践

Workflow Design

工作流设计

Use type annotations for all parameters
Write clear docstrings (appear in UI)
Start with standard task decorators, scale up if needed
Break complex workflows into modular tasks
Implement proper error handling

为所有参数添加类型注解
编写清晰的文档字符串（会显示在UI中）
从标准任务装饰器开始，必要时再扩展
将复杂工作流拆分为模块化任务
实现适当的错误处理

Data Management

数据管理

Use consistent folder structures
Define Registry schemas before bulk entry
Use linked records for relationships
Store metadata in Registry for traceability

使用一致的文件夹结构
批量录入前定义Registry模式
使用关联记录建立关系
在Registry中存储元数据以实现可追溯性

Resource Configuration

资源配置

Right-size resources (don't over-allocate)
Use GPU only when algorithms support it
Monitor execution metrics and optimize
Design for parallel execution when possible

合理分配资源（不要过度分配）
仅在算法支持时使用GPU
监控执行指标并优化
尽可能设计为并行执行

Development Workflow

开发工作流

Test locally with Docker before registration
Use version control for workflow code
Document resource requirements
Profile workflows to determine actual needs

注册前使用Docker本地测试
为工作流代码使用版本控制
记录资源需求
分析工作流以确定实际需求

Troubleshooting

故障排查

Common Issues

常见问题

Registration Failures:

Ensure Docker is running
Check authentication with
```
latch login
```
Verify all dependencies in Dockerfile
Use
```
--verbose
```
flag for detailed logs

Resource Problems:

Out of memory: Increase memory in task decorator
Timeouts: Increase timeout parameter
Storage issues: Increase ephemeral storage_gib

Data Access:

Use correct
```
latch:///
```
path format
Verify file exists in workspace
Check permissions for shared workspaces

Type Errors:

Add type annotations to all parameters
Use LatchFile/LatchDir for file/directory parameters
Ensure workflow return type matches actual return

注册失败：

确保Docker正在运行
使用
```
latch login
```
检查认证状态
验证Dockerfile中的所有依赖
使用
```
--verbose
```
标志获取详细日志

资源问题：

内存不足：在任务装饰器中增加内存
超时：增加timeout参数
存储问题：增加ephemeral storage_gib

数据访问：

使用正确的
```
latch:///
```
路径格式
验证文件是否存在于工作区
检查共享工作区的权限

类型错误：

为所有参数添加类型注解
对文件/目录参数使用LatchFile/LatchDir
确保工作流返回类型与实际返回值匹配

Additional Resources

额外资源

Official Documentation: https://docs.latch.bio
GitHub Repository: https://github.com/latchbio/latch
Slack Community: Join Latch SDK workspace
API Reference: https://docs.latch.bio/api/latch.html
Blog: https://blog.latch.bio

官方文档：https://docs.latch.bio
GitHub仓库：https://github.com/latchbio/latch
Slack社区：加入Latch SDK工作区
API参考：https://docs.latch.bio/api/latch.html
博客：https://blog.latch.bio

Support

支持

For issues or questions:

Check documentation links above
Search GitHub issues
Ask in Slack community
Contact support@latch.bio

如遇问题或疑问：

查看上述文档链接
搜索GitHub问题
在Slack社区提问
联系support@latch.bio