tiledbvcf

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

TileDB-VCF

TileDB-VCF

Overview

概述

TileDB-VCF is a high-performance C++ library with Python and CLI interfaces for efficient storage and retrieval of genomic variant-call data. Built on TileDB's sparse array technology, it enables scalable ingestion of VCF/BCF files, incremental sample addition without expensive merging operations, and efficient parallel queries of variant data stored locally or in the cloud.
TileDB-VCF是一个高性能C++库,提供Python和CLI接口,用于高效存储和检索基因组变异调用数据。它基于TileDB的稀疏数组技术构建,支持可扩展地导入VCF/BCF文件、无需昂贵合并操作即可增量添加样本,以及对本地或云端存储的变异数据进行高效并行查询。

When to Use This Skill

何时使用该工具

This skill should be used when:
  • Learning TileDB-VCF concepts and workflows
  • Prototyping genomics analyses and pipelines
  • Working with small-to-medium datasets (< 1000 samples)
  • Need incremental addition of new samples to existing datasets
  • Require efficient querying of specific genomic regions across many samples
  • Working with cloud-stored variant data (S3, Azure, GCS)
  • Need to export subsets of large VCF datasets
  • Building variant databases for cohort studies
  • Educational projects and method development
  • Performance is critical for variant data operations
在以下场景中应使用该工具:
  • 学习TileDB-VCF的概念与工作流程
  • 基因组分析和流程的原型开发
  • 处理中小型数据集(<1000个样本)
  • 需要向现有数据集增量添加新样本
  • 需要高效查询多个样本中特定基因组区域的变异数据
  • 处理存储在云端的变异数据(S3、Azure、GCS)
  • 需要导出大型VCF数据集的子集
  • 为队列研究构建变异数据库
  • 教育项目和方法开发
  • 变异数据操作的性能至关重要时

Quick Start

快速开始

Installation

安装

Preferred Method: Conda/Mamba
bash
undefined
推荐方法:Conda/Mamba
bash
undefined

Enter the following two lines if you are on a M1 Mac

Enter the following two lines if you are on a M1 Mac

CONDA_SUBDIR=osx-64 conda config --env --set subdir osx-64
CONDA_SUBDIR=osx-64 conda config --env --set subdir osx-64

Create the conda environment

Create the conda environment

conda create -n tiledb-vcf "python<3.10" conda activate tiledb-vcf
conda create -n tiledb-vcf "python<3.10" conda activate tiledb-vcf

Mamba is a faster and more reliable alternative to conda

Mamba is a faster and more reliable alternative to conda

conda install -c conda-forge mamba
conda install -c conda-forge mamba

Install TileDB-Py and TileDB-VCF, align with other useful libraries

Install TileDB-Py and TileDB-VCF, align with other useful libraries

mamba install -y -c conda-forge -c bioconda -c tiledb tiledb-py tiledbvcf-py pandas pyarrow numpy

**Alternative: Docker Images**
```bash
docker pull tiledb/tiledbvcf-py     # Python interface
docker pull tiledb/tiledbvcf-cli    # Command-line interface
mamba install -y -c conda-forge -c bioconda -c tiledb tiledb-py tiledbvcf-py pandas pyarrow numpy

**替代方法:Docker镜像**
```bash
docker pull tiledb/tiledbvcf-py     # Python interface
docker pull tiledb/tiledbvcf-cli    # Command-line interface

Basic Examples

基础示例

Create and populate a dataset:
python
import tiledbvcf
创建并填充数据集:
python
import tiledbvcf

Create a new dataset

Create a new dataset

ds = tiledbvcf.Dataset(uri="my_dataset", mode="w", cfg=tiledbvcf.ReadConfig(memory_budget=1024))
ds = tiledbvcf.Dataset(uri="my_dataset", mode="w", cfg=tiledbvcf.ReadConfig(memory_budget=1024))

Ingest VCF files (must be single-sample with indexes)

Ingest VCF files (must be single-sample with indexes)

Requirements:

Requirements:

- VCFs must be single-sample (not multi-sample)

- VCFs must be single-sample (not multi-sample)

- Must have indexes: .csi (bcftools) or .tbi (tabix)

- Must have indexes: .csi (bcftools) or .tbi (tabix)

ds.ingest_samples(["sample1.vcf.gz", "sample2.vcf.gz"])

**Query variant data:**
```python
ds.ingest_samples(["sample1.vcf.gz", "sample2.vcf.gz"])

**查询变异数据:**
```python

Open existing dataset for reading

Open existing dataset for reading

ds = tiledbvcf.Dataset(uri="my_dataset", mode="r")
ds = tiledbvcf.Dataset(uri="my_dataset", mode="r")

Query specific regions and samples

Query specific regions and samples

df = ds.read( attrs=["sample_name", "pos_start", "pos_end", "alleles", "fmt_GT"], regions=["chr1:1000000-2000000", "chr2:500000-1500000"], samples=["sample1", "sample2", "sample3"] ) print(df.head())

**Export to VCF:**
```python
import os
df = ds.read( attrs=["sample_name", "pos_start", "pos_end", "alleles", "fmt_GT"], regions=["chr1:1000000-2000000", "chr2:500000-1500000"], samples=["sample1", "sample2", "sample3"] ) print(df.head())

**导出为VCF格式:**
```python
import os

Export two VCF samples

Export two VCF samples

ds.export( regions=["chr21:8220186-8405573"], samples=["HG00101", "HG00097"], output_format="v", output_dir=os.path.expanduser("~"), )
undefined
ds.export( regions=["chr21:8220186-8405573"], samples=["HG00101", "HG00097"], output_format="v", output_dir=os.path.expanduser("~"), )
undefined

Core Capabilities

核心功能

1. Dataset Creation and Ingestion

1. 数据集创建与导入

Create TileDB-VCF datasets and incrementally ingest variant data from multiple VCF/BCF files. This is appropriate for building population genomics databases and cohort studies.
Requirements:
  • Single-sample VCFs only: Multi-sample VCFs are not supported
  • Index files required: VCF/BCF files must have indexes (.csi or .tbi)
Common operations:
  • Create new datasets with optimized array schemas
  • Ingest single or multiple VCF/BCF files in parallel
  • Add new samples incrementally without re-processing existing data
  • Configure memory usage and compression settings
  • Handle various VCF formats and INFO/FORMAT fields
  • Resume interrupted ingestion processes
  • Validate data integrity during ingestion
创建TileDB-VCF数据集,并从多个VCF/BCF文件中增量导入变异数据。这适用于构建群体基因组学数据库和队列研究。
要求:
  • 仅支持单样本VCF:不支持多样本VCF
  • 需要索引文件:VCF/BCF文件必须带有索引(.csi或.tbi)
常见操作:
  • 使用优化的数组模式创建新数据集
  • 并行导入单个或多个VCF/BCF文件
  • 增量添加新样本,无需重新处理现有数据
  • 配置内存使用和压缩设置
  • 处理各种VCF格式以及INFO/FORMAT字段
  • 恢复中断的导入过程
  • 在导入期间验证数据完整性

2. Efficient Querying and Filtering

2. 高效查询与过滤

Query variant data with high performance across genomic regions, samples, and variant attributes. This is appropriate for association studies, variant discovery, and population analysis.
Common operations:
  • Query specific genomic regions (single or multiple)
  • Filter by sample names or sample groups
  • Extract specific variant attributes (position, alleles, genotypes, quality)
  • Access INFO and FORMAT fields efficiently
  • Combine spatial and attribute-based filtering
  • Stream large query results
  • Perform aggregations across samples or regions
针对基因组区域、样本和变异属性进行高性能查询。这适用于关联研究、变异发现和群体分析。
常见操作:
  • 查询特定基因组区域(单个或多个)
  • 按样本名称或样本组过滤
  • 提取特定变异属性(位置、等位基因、基因型、质量)
  • 高效访问INFO和FORMAT字段
  • 结合空间过滤和基于属性的过滤
  • 流式处理大型查询结果
  • 跨样本或区域执行聚合操作

3. Data Export and Interoperability

3. 数据导出与互操作性

Export data in various formats for downstream analysis or integration with other genomics tools. This is appropriate for sharing datasets, creating analysis subsets, or feeding other pipelines.
Common operations:
  • Export to standard VCF/BCF formats
  • Generate TSV files with selected fields
  • Create sample/region-specific subsets
  • Maintain data provenance and metadata
  • Lossless data export preserving all annotations
  • Compressed output formats
  • Streaming exports for large datasets
以多种格式导出数据,用于下游分析或与其他基因组学工具集成。这适用于共享数据集、创建分析子集或为其他流程提供数据。
常见操作:
  • 导出为标准VCF/BCF格式
  • 生成包含选定字段的TSV文件
  • 创建特定样本/区域的子集
  • 维护数据来源和元数据
  • 无损导出数据,保留所有注释
  • 压缩输出格式
  • 流式导出大型数据集

4. Population Genomics Workflows

4. 群体基因组学工作流程

TileDB-VCF excels at large-scale population genomics analyses requiring efficient access to variant data across many samples and genomic regions.
Common workflows:
  • Genome-wide association studies (GWAS) data preparation
  • Rare variant burden testing
  • Population stratification analysis
  • Allele frequency calculations across populations
  • Quality control across large cohorts
  • Variant annotation and filtering
  • Cross-population comparative analysis
TileDB-VCF擅长处理需要高效访问跨多个样本和基因组区域的变异数据的大规模群体基因组学分析。
常见工作流程:
  • 全基因组关联研究(GWAS)的数据准备
  • 罕见变异负荷测试
  • 群体分层分析
  • 跨群体的等位基因频率计算
  • 大型队列的质量控制
  • 变异注释与过滤
  • 跨群体比较分析

Key Concepts

核心概念

Array Schema and Data Model

数组模式与数据模型

TileDB-VCF Data Model:
  • Variants stored as sparse arrays with genomic coordinates as dimensions
  • Samples stored as attributes allowing efficient sample-specific queries
  • INFO and FORMAT fields preserved with original data types
  • Automatic compression and chunking for optimal storage
Schema Configuration:
python
undefined
TileDB-VCF数据模型:
  • 变异以稀疏数组形式存储,基因组坐标作为维度
  • 样本作为属性存储,支持高效的样本特异性查询
  • INFO和FORMAT字段保留原始数据类型
  • 自动压缩和分块以优化存储
模式配置:
python
undefined

Custom schema with specific tile extents

Custom schema with specific tile extents

config = tiledbvcf.ReadConfig( memory_budget=2048, # MB region_partition=(0, 3095677412), # Full genome sample_partition=(0, 10000) # Up to 10k samples )
undefined
config = tiledbvcf.ReadConfig( memory_budget=2048, # MB region_partition=(0, 3095677412), # Full genome sample_partition=(0, 10000) # Up to 10k samples )
undefined

Coordinate Systems and Regions

坐标系统与区域

Critical: TileDB-VCF uses 1-based genomic coordinates following VCF standard:
  • Positions are 1-based (first base is position 1)
  • Ranges are inclusive on both ends
  • Region "chr1:1000-2000" includes positions 1000-2000 (1001 bases total)
Region specification formats:
python
undefined
重要提示: TileDB-VCF遵循VCF标准,使用1-based基因组坐标
  • 位置为1-based(第一个碱基是位置1)
  • 范围两端均包含在内
  • 区域"chr1:1000-2000"包含位置1000-2000(共1001个碱基)
区域指定格式:
python
undefined

Single region

Single region

regions = ["chr1:1000000-2000000"]
regions = ["chr1:1000000-2000000"]

Multiple regions

Multiple regions

regions = ["chr1:1000000-2000000", "chr2:500000-1500000"]
regions = ["chr1:1000000-2000000", "chr2:500000-1500000"]

Whole chromosome

Whole chromosome

regions = ["chr1"]
regions = ["chr1"]

BED-style (0-based, half-open converted internally)

BED-style (0-based, half-open converted internally)

regions = ["chr1:999999-2000000"] # Equivalent to 1-based chr1:1000000-2000000
undefined
regions = ["chr1:999999-2000000"] # Equivalent to 1-based chr1:1000000-2000000
undefined

Memory Management

内存管理

Performance considerations:
  1. Set appropriate memory budget based on available system memory
  2. Use streaming queries for very large result sets
  3. Partition large ingestions to avoid memory exhaustion
  4. Configure tile cache for repeated region access
  5. Use parallel ingestion for multiple files
  6. Optimize region queries by combining nearby regions
性能注意事项:
  1. 根据可用系统内存设置合适的内存预算
  2. 对非常大的结果集使用流式查询
  3. 拆分大型导入任务以避免内存耗尽
  4. 为重复区域访问配置 tile 缓存
  5. 对多个文件使用并行导入
  6. 通过合并邻近区域优化区域查询

Cloud Storage Integration

云存储集成

TileDB-VCF seamlessly works with cloud storage:
python
undefined
TileDB-VCF可无缝对接云存储:
python
undefined

S3 dataset

S3 dataset

ds = tiledbvcf.Dataset(uri="s3://bucket/dataset", mode="r")
ds = tiledbvcf.Dataset(uri="s3://bucket/dataset", mode="r")

Azure Blob Storage

Azure Blob Storage

ds = tiledbvcf.Dataset(uri="azure://container/dataset", mode="r")
ds = tiledbvcf.Dataset(uri="azure://container/dataset", mode="r")

Google Cloud Storage

Google Cloud Storage

ds = tiledbvcf.Dataset(uri="gcs://bucket/dataset", mode="r")
undefined
ds = tiledbvcf.Dataset(uri="gcs://bucket/dataset", mode="r")
undefined

Common Pitfalls

常见陷阱

  1. Memory exhaustion during ingestion: Use appropriate memory budget and batch processing for large VCF files
  2. Inefficient region queries: Combine nearby regions instead of many separate queries
  3. Missing sample names: Ensure sample names in VCF headers match query sample specifications
  4. Coordinate system confusion: Remember TileDB-VCF uses 1-based coordinates like VCF standard
  5. Large result sets: Use streaming or pagination for queries returning millions of variants
  6. Cloud permissions: Ensure proper authentication for cloud storage access
  7. Concurrent access: Multiple writers to the same dataset can cause corruption—use appropriate locking
  1. 导入期间内存耗尽:为大型VCF文件设置合适的内存预算并使用批处理
  2. 低效的区域查询:合并邻近区域,而非执行大量单独查询
  3. 样本名称缺失:确保VCF头中的样本名称与查询中的样本规格匹配
  4. 坐标系统混淆:记住TileDB-VCF像VCF标准一样使用1-based坐标
  5. 大型结果集:对返回数百万变异的查询使用流式处理或分页
  6. 云权限:确保拥有云存储访问的正确认证
  7. 并发访问:多个写入者操作同一数据集可能导致损坏——使用适当的锁机制

CLI Usage

CLI使用

TileDB-VCF provides a command-line interface with the following subcommands:
Available Subcommands:
  • create
    - Creates an empty TileDB-VCF dataset
  • store
    - Ingests samples into a TileDB-VCF dataset
  • export
    - Exports data from a TileDB-VCF dataset
  • list
    - Lists all sample names present in a TileDB-VCF dataset
  • stat
    - Prints high-level statistics about a TileDB-VCF dataset
  • utils
    - Utils for working with a TileDB-VCF dataset
  • version
    - Print the version information and exit
bash
undefined
TileDB-VCF提供命令行接口,包含以下子命令:
可用子命令:
  • create
    - 创建空的TileDB-VCF数据集
  • store
    - 将样本导入TileDB-VCF数据集
  • export
    - 从TileDB-VCF数据集导出数据
  • list
    - 列出TileDB-VCF数据集中的所有样本名称
  • stat
    - 打印TileDB-VCF数据集的高级统计信息
  • utils
    - 用于处理TileDB-VCF数据集的工具
  • version
    - 打印版本信息并退出
bash
undefined

Create empty dataset

Create empty dataset

tiledbvcf create --uri my_dataset
tiledbvcf create --uri my_dataset

Ingest samples (requires single-sample VCFs with indexes)

Ingest samples (requires single-sample VCFs with indexes)

tiledbvcf store --uri my_dataset --samples sample1.vcf.gz,sample2.vcf.gz
tiledbvcf store --uri my_dataset --samples sample1.vcf.gz,sample2.vcf.gz

Export data

Export data

tiledbvcf export --uri my_dataset
--regions "chr1:1000000-2000000"
--sample-names "sample1,sample2"
tiledbvcf export --uri my_dataset
--regions "chr1:1000000-2000000"
--sample-names "sample1,sample2"

List all samples

List all samples

tiledbvcf list --uri my_dataset
tiledbvcf list --uri my_dataset

Show dataset statistics

Show dataset statistics

tiledbvcf stat --uri my_dataset
undefined
tiledbvcf stat --uri my_dataset
undefined

Advanced Features

高级功能

Allele Frequency Analysis

等位基因频率分析

python
undefined
python
undefined

Calculate allele frequencies

Calculate allele frequencies

af_df = tiledbvcf.read_allele_frequency( uri="my_dataset", regions=["chr1:1000000-2000000"], samples=["sample1", "sample2", "sample3"] )
undefined
af_df = tiledbvcf.read_allele_frequency( uri="my_dataset", regions=["chr1:1000000-2000000"], samples=["sample1", "sample2", "sample3"] )
undefined

Sample Quality Control

样本质量控制

python
undefined
python
undefined

Perform sample QC

Perform sample QC

qc_results = tiledbvcf.sample_qc( uri="my_dataset", samples=["sample1", "sample2"] )
undefined
qc_results = tiledbvcf.sample_qc( uri="my_dataset", samples=["sample1", "sample2"] )
undefined

Custom Configurations

自定义配置

python
undefined
python
undefined

Advanced configuration

Advanced configuration

config = tiledbvcf.ReadConfig( memory_budget=4096, tiledb_config={ "sm.tile_cache_size": "1000000000", "vfs.s3.region": "us-east-1" } )
undefined
config = tiledbvcf.ReadConfig( memory_budget=4096, tiledb_config={ "sm.tile_cache_size": "1000000000", "vfs.s3.region": "us-east-1" } )
undefined

Resources

资源

Getting Help

获取帮助

Open Source TileDB-VCF Resources

开源TileDB-VCF资源

TileDB-Cloud Resources

TileDB-Cloud资源

For Large-Scale/Production Genomics:
Getting Started:
适用于大规模/生产级基因组学:
入门指南:

Scaling to TileDB-Cloud

扩展到TileDB-Cloud

When your genomics workloads outgrow single-node processing, TileDB-Cloud provides enterprise-scale capabilities for production genomics pipelines.
Note: This section covers TileDB-Cloud capabilities based on available documentation. For complete API details and current functionality, consult the official TileDB-Cloud documentation and API reference.
当你的基因组学工作负载超出单节点处理能力时,TileDB-Cloud为生产级基因组学流程提供企业级扩展能力。
注意:本节基于现有文档介绍TileDB-Cloud的功能。如需完整的API细节和当前功能,请查阅官方TileDB-Cloud文档和API参考。

Setting Up TileDB-Cloud

设置TileDB-Cloud

1. Create Account and Get API Token
bash
undefined
1. 创建账户并获取API令牌
bash
undefined

Generate API token in your account settings

Generate API token in your account settings


**2. Install TileDB-Cloud Python Client**
```bash

**2. 安装TileDB-Cloud Python客户端**
```bash

Base installation

Base installation

pip install tiledb-cloud
pip install tiledb-cloud

With genomics-specific functionality

With genomics-specific functionality

pip install tiledb-cloud[life-sciences]

**3. Configure Authentication**
```bash
pip install tiledb-cloud[life-sciences]

**3. 配置认证**
```bash

Set environment variable with your API token

Set environment variable with your API token

export TILEDB_REST_TOKEN="your_api_token"

```python
import tiledb.cloud
export TILEDB_REST_TOKEN="your_api_token"

```python
import tiledb.cloud

Authentication is automatic via TILEDB_REST_TOKEN

Authentication is automatic via TILEDB_REST_TOKEN

No explicit login required in code

No explicit login required in code

undefined
undefined

Migrating from Open Source to TileDB-Cloud

从开源版迁移到TileDB-Cloud

Large-Scale Ingestion
python
undefined
大规模导入
python
undefined

TileDB-Cloud: Distributed VCF ingestion

TileDB-Cloud: Distributed VCF ingestion

import tiledb.cloud.vcf
import tiledb.cloud.vcf

Use specialized VCF ingestion module

Use specialized VCF ingestion module

Note: Exact API requires TileDB-Cloud documentation

Note: Exact API requires TileDB-Cloud documentation

This represents the available functionality structure

This represents the available functionality structure

tiledb.cloud.vcf.ingestion.ingest_vcf_dataset( source="s3://my-bucket/vcf-files/", output="tiledb://my-namespace/large-dataset", namespace="my-namespace", acn="my-s3-credentials", ingest_resources={"cpu": "16", "memory": "64Gi"} )

**Distributed Query Processing**
```python
tiledb.cloud.vcf.ingestion.ingest_vcf_dataset( source="s3://my-bucket/vcf-files/", output="tiledb://my-namespace/large-dataset", namespace="my-namespace", acn="my-s3-credentials", ingest_resources={"cpu": "16", "memory": "64Gi"} )

**分布式查询处理**
```python

TileDB-Cloud: VCF querying across distributed storage

TileDB-Cloud: VCF querying across distributed storage

import tiledb.cloud.vcf import tiledbvcf
import tiledb.cloud.vcf import tiledbvcf

Define the dataset URI

Define the dataset URI

dataset_uri = "tiledb://TileDB-Inc/gvcf-1kg-dragen-v376"
dataset_uri = "tiledb://TileDB-Inc/gvcf-1kg-dragen-v376"

Get all samples from the dataset

Get all samples from the dataset

ds = tiledbvcf.Dataset(dataset_uri, tiledb_config=cfg) samples = ds.samples()
ds = tiledbvcf.Dataset(dataset_uri, tiledb_config=cfg) samples = ds.samples()

Define attributes and ranges to query on

Define attributes and ranges to query on

attrs = ["sample_name", "fmt_GT", "fmt_AD", "fmt_DP"] regions = ["chr13:32396898-32397044", "chr13:32398162-32400268"]
attrs = ["sample_name", "fmt_GT", "fmt_AD", "fmt_DP"] regions = ["chr13:32396898-32397044", "chr13:32398162-32400268"]

Perform the read, which is executed in a distributed fashion

Perform the read, which is executed in a distributed fashion

df = tiledb.cloud.vcf.read( dataset_uri=dataset_uri, regions=regions, samples=samples, attrs=attrs, namespace="my-namespace", # specifies which account to charge ) df.to_pandas()
undefined
df = tiledb.cloud.vcf.read( dataset_uri=dataset_uri, regions=regions, samples=samples, attrs=attrs, namespace="my-namespace", # specifies which account to charge ) df.to_pandas()
undefined

Enterprise Features

企业功能

Data Sharing and Collaboration
python
undefined
数据共享与协作
python
undefined

TileDB-Cloud provides enterprise data sharing capabilities

TileDB-Cloud provides enterprise data sharing capabilities

through namespace-based permissions and group management

through namespace-based permissions and group management

Access shared datasets via TileDB-Cloud URIs

Access shared datasets via TileDB-Cloud URIs

dataset_uri = "tiledb://shared-namespace/population-study"
dataset_uri = "tiledb://shared-namespace/population-study"

Collaborate through shared notebooks and compute resources

Collaborate through shared notebooks and compute resources

(Specific API requires TileDB-Cloud documentation)

(Specific API requires TileDB-Cloud documentation)


**Cost Optimization**
- **Serverless Compute**: Pay only for actual compute time
- **Auto-scaling**: Automatically scale up/down based on workload
- **Spot Instances**: Use cost-optimized compute for batch jobs
- **Data Tiering**: Automatic hot/cold storage management

**Security and Compliance**
- **End-to-end Encryption**: Data encrypted in transit and at rest
- **Access Controls**: Fine-grained permissions and audit logs
- **HIPAA/SOC2 Compliance**: Enterprise security standards
- **VPC Support**: Deploy in private cloud environments

**成本优化**
- **无服务器计算**:仅为实际计算时间付费
- **自动扩缩容**:根据工作负载自动向上/向下扩展
- **Spot实例**:为批处理作业使用成本优化的计算资源
- **数据分层**:自动热/冷存储管理

**安全性与合规性**
- **端到端加密**:数据在传输和存储时均加密
- **访问控制**:细粒度权限和审计日志
- **HIPAA/SOC2合规**:企业级安全标准
- **VPC支持**:部署在私有云环境中

When to Migrate Checklist

迁移时机检查清单

Migrate to TileDB-Cloud if you have:
  • Datasets > 1000 samples
  • Need to process > 100GB of VCF data
  • Require distributed computing
  • Multiple team members need access
  • Need enterprise security/compliance
  • Want cost-optimized serverless compute
  • Require 24/7 production uptime
如果满足以下条件,迁移到TileDB-Cloud:
  • 数据集样本数>1000
  • 需要处理>100GB的VCF数据
  • 需要分布式计算
  • 多个团队成员需要访问
  • 需要企业级安全/合规性
  • 想要成本优化的无服务器计算
  • 需要7×24小时生产级可用性

Getting Started with TileDB-Cloud

TileDB-Cloud入门

  1. Start Free: TileDB-Cloud offers free tier for evaluation
  2. Migration Support: TileDB team provides migration assistance
  3. Training: Access to genomics-specific tutorials and examples
  4. Professional Services: Custom deployment and optimization
Next Steps:
  1. 免费开始:TileDB-Cloud提供免费试用版用于评估
  2. 迁移支持:TileDB团队提供迁移协助
  3. 培训:获取基因组学特定教程和示例
  4. 专业服务:定制部署和优化
下一步: