gget

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

gget

gget

Overview

概述

gget is a command-line bioinformatics tool and Python package providing unified access to 20+ genomic databases and analysis methods. Query gene information, sequence analysis, protein structures, expression data, and disease associations through a consistent interface. All gget modules work both as command-line tools and as Python functions.
Important: The databases queried by gget are continuously updated, which sometimes changes their structure. gget modules are tested automatically on a biweekly basis and updated to match new database structures when necessary.
gget是一款命令行生物信息学工具和Python包,提供对20+个基因组数据库和分析方法的统一访问。通过一致的接口即可查询基因信息、进行序列分析、获取蛋白质结构、表达数据以及疾病关联信息。所有gget模块既可以作为命令行工具使用,也可以作为Python函数调用。
重要提示:gget查询的数据库会持续更新,这有时会改变其结构。gget模块每两周自动测试一次,并会在必要时更新以匹配新的数据库结构。

Installation

安装

Install gget in a clean virtual environment to avoid conflicts:
bash
undefined
在干净的虚拟环境中安装gget以避免冲突:
bash
undefined

Using uv (recommended)

使用uv(推荐)

uv uv pip install gget
uv uv pip install gget

Or using pip

或使用pip

uv pip install --upgrade gget
uv pip install --upgrade gget

In Python/Jupyter

在Python/Jupyter中

import gget
undefined
import gget
undefined

Quick Start

快速开始

Basic usage pattern for all modules:
bash
undefined
所有模块的基本使用模式:
bash
undefined

Command-line

命令行

gget <module> [arguments] [options]
gget <module> [arguments] [options]

Python

Python

gget.module(arguments, options)

Most modules return:
- **Command-line**: JSON (default) or CSV with `-csv` flag
- **Python**: DataFrame or dictionary

Common flags across modules:
- `-o/--out`: Save results to file
- `-q/--quiet`: Suppress progress information
- `-csv`: Return CSV format (command-line only)
gget.module(arguments, options)

大多数模块返回:
- **命令行**:默认返回JSON格式,添加`-csv`标志可返回CSV格式
- **Python**:DataFrame或字典格式

各模块通用标志:
- `-o/--out`:将结果保存到文件
- `-q/--quiet`:抑制进度信息输出
- `-csv`:返回CSV格式(仅命令行可用)

Module Categories

模块分类

1. Reference & Gene Information

1. 参考与基因信息

gget ref - Reference Genome Downloads

gget ref - 参考基因组下载

Retrieve download links and metadata for Ensembl reference genomes.
Parameters:
  • species
    : Genus_species format (e.g., 'homo_sapiens', 'mus_musculus'). Shortcuts: 'human', 'mouse'
  • -w/--which
    : Specify return types (gtf, cdna, dna, cds, cdrna, pep). Default: all
  • -r/--release
    : Ensembl release number (default: latest)
  • -l/--list_species
    : List available vertebrate species
  • -liv/--list_iv_species
    : List available invertebrate species
  • -ftp
    : Return only FTP links
  • -d/--download
    : Download files (requires curl)
Examples:
bash
undefined
获取Ensembl参考基因组的下载链接和元数据。
参数
  • species
    :属_种格式(例如'homo_sapiens'、'mus_musculus')。快捷方式:'human'(人类)、'mouse'(小鼠)
  • -w/--which
    :指定返回类型(gtf、cdna、dna、cds、cdrna、pep)。默认返回所有类型
  • -r/--release
    :Ensembl版本号(默认:最新版本)
  • -l/--list_species
    :列出所有可用的脊椎动物物种
  • -liv/--list_iv_species
    :列出所有可用的无脊椎动物物种
  • -ftp
    :仅返回FTP链接
  • -d/--download
    :下载文件(需要curl)
示例
bash
undefined

List available species

列出可用物种

gget ref --list_species
gget ref --list_species

Get all reference files for human

获取人类的所有参考文件

gget ref homo_sapiens
gget ref homo_sapiens

Download only GTF annotation for mouse

仅下载小鼠的GTF注释文件

gget ref -w gtf -d mouse

```python
gget ref -w gtf -d mouse

```python

Python

Python

gget.ref("homo_sapiens") gget.ref("mus_musculus", which="gtf", download=True)
undefined
gget.ref("homo_sapiens") gget.ref("mus_musculus", which="gtf", download=True)
undefined

gget search - Gene Search

gget search - 基因搜索

Locate genes by name or description across species.
Parameters:
  • searchwords
    : One or more search terms (case-insensitive)
  • -s/--species
    : Target species (e.g., 'homo_sapiens', 'mouse')
  • -r/--release
    : Ensembl release number
  • -t/--id_type
    : Return 'gene' (default) or 'transcript'
  • -ao/--andor
    : 'or' (default) finds ANY searchword; 'and' requires ALL
  • -l/--limit
    : Maximum results to return
Returns: ensembl_id, gene_name, ensembl_description, ext_ref_description, biotype, URL
Examples:
bash
undefined
跨物种通过基因名称或描述定位基因。
参数
  • searchwords
    :一个或多个搜索词(不区分大小写)
  • -s/--species
    :目标物种(例如'homo_sapiens'、'mouse')
  • -r/--release
    :Ensembl版本号
  • -t/--id_type
    :返回'gene'(基因,默认)或'transcript'(转录本)类型的ID
  • -ao/--andor
    :'or'(或,默认)表示匹配任意搜索词;'and'(与)表示必须匹配所有搜索词
  • -l/--limit
    :返回结果的最大数量
返回内容:ensembl_id、gene_name、ensembl_description、ext_ref_description、biotype、URL
示例
bash
undefined

Search for GABA-related genes in human

在人类中搜索与GABA相关的基因

gget search -s human gaba gamma-aminobutyric
gget search -s human gaba gamma-aminobutyric

Find specific gene, require all terms

查找特定基因,要求匹配所有搜索词

gget search -s mouse -ao and pax7 transcription

```python
gget search -s mouse -ao and pax7 transcription

```python

Python

Python

gget.search(["gaba", "gamma-aminobutyric"], species="homo_sapiens")
undefined
gget.search(["gaba", "gamma-aminobutyric"], species="homo_sapiens")
undefined

gget info - Gene/Transcript Information

gget info - 基因/转录本信息

Retrieve comprehensive gene and transcript metadata from Ensembl, UniProt, and NCBI.
Parameters:
  • ens_ids
    : One or more Ensembl IDs (also supports WormBase, Flybase IDs). Limit: ~1000 IDs
  • -n/--ncbi
    : Disable NCBI data retrieval
  • -u/--uniprot
    : Disable UniProt data retrieval
  • -pdb
    : Include PDB identifiers (increases runtime)
Returns: UniProt ID, NCBI gene ID, primary gene name, synonyms, protein names, descriptions, biotype, canonical transcript
Examples:
bash
undefined
从Ensembl、UniProt和NCBI获取全面的基因和转录本元数据。
参数
  • ens_ids
    :一个或多个Ensembl ID(也支持WormBase、Flybase ID)。限制:约1000个ID
  • -n/--ncbi
    :禁用NCBI数据检索
  • -u/--uniprot
    :禁用UniProt数据检索
  • -pdb
    :包含PDB标识符(会增加运行时间)
返回内容:UniProt ID、NCBI基因ID、主要基因名称、同义词、蛋白质名称、描述、生物类型、标准转录本
示例
bash
undefined

Get info for multiple genes

获取多个基因的信息

gget info ENSG00000034713 ENSG00000104853 ENSG00000170296
gget info ENSG00000034713 ENSG00000104853 ENSG00000170296

Include PDB IDs

包含PDB ID

gget info ENSG00000034713 -pdb

```python
gget info ENSG00000034713 -pdb

```python

Python

Python

gget.info(["ENSG00000034713", "ENSG00000104853"], pdb=True)
undefined
gget.info(["ENSG00000034713", "ENSG00000104853"], pdb=True)
undefined

gget seq - Sequence Retrieval

gget seq - 序列检索

Fetch nucleotide or amino acid sequences for genes and transcripts.
Parameters:
  • ens_ids
    : One or more Ensembl identifiers
  • -t/--translate
    : Fetch amino acid sequences instead of nucleotide
  • -iso/--isoforms
    : Return all transcript variants (gene IDs only)
Returns: FASTA format sequences
Examples:
bash
undefined
获取基因和转录本的核苷酸或氨基酸序列。
参数
  • ens_ids
    :一个或多个Ensembl标识符
  • -t/--translate
    :获取氨基酸序列而非核苷酸序列
  • -iso/--isoforms
    :返回所有转录本变体(仅适用于基因ID)
返回内容:FASTA格式的序列
示例
bash
undefined

Get nucleotide sequences

获取核苷酸序列

gget seq ENSG00000034713 ENSG00000104853
gget seq ENSG00000034713 ENSG00000104853

Get all protein isoforms

获取所有蛋白质同工型

gget seq -t -iso ENSG00000034713

```python
gget seq -t -iso ENSG00000034713

```python

Python

Python

gget.seq(["ENSG00000034713"], translate=True, isoforms=True)
undefined
gget.seq(["ENSG00000034713"], translate=True, isoforms=True)
undefined

2. Sequence Analysis & Alignment

2. 序列分析与比对

gget blast - BLAST Searches

gget blast - BLAST搜索

BLAST nucleotide or amino acid sequences against standard databases.
Parameters:
  • sequence
    : Sequence string or path to FASTA/.txt file
  • -p/--program
    : blastn, blastp, blastx, tblastn, tblastx (auto-detected)
  • -db/--database
    :
    • Nucleotide: nt, refseq_rna, pdbnt
    • Protein: nr, swissprot, pdbaa, refseq_protein
  • -l/--limit
    : Max hits (default: 50)
  • -e/--expect
    : E-value cutoff (default: 10.0)
  • -lcf/--low_comp_filt
    : Enable low complexity filtering
  • -mbo/--megablast_off
    : Disable MegaBLAST (blastn only)
Examples:
bash
undefined
在标准数据库中对核苷酸或氨基酸序列进行BLAST比对。
参数
  • sequence
    :序列字符串或FASTA/.txt文件路径
  • -p/--program
    :blastn、blastp、blastx、tblastn、tblastx(自动检测)
  • -db/--database
    • 核苷酸数据库:nt、refseq_rna、pdbnt
    • 蛋白质数据库:nr、swissprot、pdbaa、refseq_protein
  • -l/--limit
    :最大匹配结果数(默认:50)
  • -e/--expect
    :E值阈值(默认:10.0)
  • -lcf/--low_comp_filt
    :启用低复杂度过滤
  • -mbo/--megablast_off
    :禁用MegaBLAST(仅blastn可用)
示例
bash
undefined

BLAST protein sequence

对蛋白质序列进行BLAST比对

gget blast MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR
gget blast MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR

BLAST from file with specific database

从文件中读取序列并指定数据库进行BLAST比对

gget blast sequence.fasta -db swissprot -l 10

```python
gget blast sequence.fasta -db swissprot -l 10

```python

Python

Python

gget.blast("MKWMFK...", database="swissprot", limit=10)
undefined
gget.blast("MKWMFK...", database="swissprot", limit=10)
undefined

gget blat - BLAT Searches

gget blat - BLAT搜索

Locate genomic positions of sequences using UCSC BLAT.
Parameters:
  • sequence
    : Sequence string or path to FASTA/.txt file
  • -st/--seqtype
    : 'DNA', 'protein', 'translated%20RNA', 'translated%20DNA' (auto-detected)
  • -a/--assembly
    : Target assembly (default: 'human'/hg38; options: 'mouse'/mm39, 'zebrafinch'/taeGut2, etc.)
Returns: genome, query size, alignment positions, matches, mismatches, alignment percentage
Examples:
bash
undefined
使用UCSC BLAT定位序列在基因组中的位置。
参数
  • sequence
    :序列字符串或FASTA/.txt文件路径
  • -st/--seqtype
    :'DNA'、'protein'、'translated%20RNA'、'translated%20DNA'(自动检测)
  • -a/--assembly
    :目标基因组组装版本(默认:'human'/hg38;可选:'mouse'/mm39、'zebrafinch'/taeGut2等)
返回内容:基因组、查询序列长度、比对位置、匹配数、错配数、比对百分比
示例
bash
undefined

Find genomic location in human

在人类基因组中查找序列位置

gget blat ATCGATCGATCGATCG
gget blat ATCGATCGATCGATCG

Search in different assembly

在其他组装版本中搜索

gget blat -a mm39 ATCGATCGATCGATCG

```python
gget blat -a mm39 ATCGATCGATCGATCG

```python

Python

Python

gget.blat("ATCGATCGATCGATCG", assembly="mouse")
undefined
gget.blat("ATCGATCGATCGATCG", assembly="mouse")
undefined

gget muscle - Multiple Sequence Alignment

gget muscle - 多序列比对

Align multiple nucleotide or amino acid sequences using Muscle5.
Parameters:
  • fasta
    : Sequences or path to FASTA/.txt file
  • -s5/--super5
    : Use Super5 algorithm for faster processing (large datasets)
Returns: Aligned sequences in ClustalW format or aligned FASTA (.afa)
Examples:
bash
undefined
使用Muscle5对多个核苷酸或氨基酸序列进行比对。
参数
  • fasta
    :序列或FASTA/.txt文件路径
  • -s5/--super5
    :使用Super5算法以加快处理速度(适用于大型数据集)
返回内容:ClustalW格式的比对序列或比对后的FASTA文件(.afa)
示例
bash
undefined

Align sequences from file

对文件中的序列进行比对

gget muscle sequences.fasta -o aligned.afa
gget muscle sequences.fasta -o aligned.afa

Use Super5 for large dataset

对大型数据集使用Super5算法

gget muscle large_dataset.fasta -s5

```python
gget muscle large_dataset.fasta -s5

```python

Python

Python

gget.muscle("sequences.fasta", save=True)
undefined
gget.muscle("sequences.fasta", save=True)
undefined

gget diamond - Local Sequence Alignment

gget diamond - 局部序列比对

Perform fast local protein or translated DNA alignment using DIAMOND.
Parameters:
  • Query: Sequences (string/list) or FASTA file path
  • --reference
    : Reference sequences (string/list) or FASTA file path (required)
  • --sensitivity
    : fast, mid-sensitive, sensitive, more-sensitive, very-sensitive (default), ultra-sensitive
  • --threads
    : CPU threads (default: 1)
  • --diamond_db
    : Save database for reuse
  • --translated
    : Enable nucleotide-to-amino acid alignment
Returns: Identity percentage, sequence lengths, match positions, gap openings, E-values, bit scores
Examples:
bash
undefined
使用DIAMOND进行快速的蛋白质或翻译后DNA局部比对。
参数
  • Query:序列(字符串/列表)或FASTA文件路径
  • --reference
    :参考序列(字符串/列表)或FASTA文件路径(必填)
  • --sensitivity
    :比对灵敏度(fast、mid-sensitive、sensitive、more-sensitive、very-sensitive(默认)、ultra-sensitive)
  • --threads
    :CPU线程数(默认:1)
  • --diamond_db
    :保存数据库以供重复使用
  • --translated
    :启用核苷酸到氨基酸的比对
返回内容:一致性百分比、序列长度、匹配位置、空位开放数、E值、比特值
示例
bash
undefined

Align against reference

与参考序列进行比对

gget diamond GGETISAWESQME -ref reference.fasta --threads 4
gget diamond GGETISAWESQME -ref reference.fasta --threads 4

Save database for reuse

保存数据库以供重复使用

gget diamond query.fasta -ref ref.fasta --diamond_db my_db.dmnd

```python
gget diamond query.fasta -ref ref.fasta --diamond_db my_db.dmnd

```python

Python

Python

gget.diamond("GGETISAWESQME", reference="reference.fasta", threads=4)
undefined
gget.diamond("GGETISAWESQME", reference="reference.fasta", threads=4)
undefined

3. Structural & Protein Analysis

3. 结构与蛋白质分析

gget pdb - Protein Structures

gget pdb - 蛋白质结构

Query RCSB Protein Data Bank for structure and metadata.
Parameters:
  • pdb_id
    : PDB identifier (e.g., '7S7U')
  • -r/--resource
    : Data type (pdb, entry, pubmed, assembly, entity types)
  • -i/--identifier
    : Assembly, entity, or chain ID
Returns: PDB format (structures) or JSON (metadata)
Examples:
bash
undefined
查询RCSB蛋白质数据库(Protein Data Bank)获取结构和元数据。
参数
  • pdb_id
    :PDB标识符(例如'7S7U')
  • -r/--resource
    :数据类型(pdb、entry、pubmed、assembly、entity types)
  • -i/--identifier
    :组装体、实体或链ID
返回内容:PDB格式(结构)或JSON格式(元数据)
示例
bash
undefined

Download PDB structure

下载PDB结构

gget pdb 7S7U -o 7S7U.pdb
gget pdb 7S7U -o 7S7U.pdb

Get metadata

获取元数据

gget pdb 7S7U -r entry

```python
gget pdb 7S7U -r entry

```python

Python

Python

gget.pdb("7S7U", save=True)
undefined
gget.pdb("7S7U", save=True)
undefined

gget alphafold - Protein Structure Prediction

gget alphafold - 蛋白质结构预测

Predict 3D protein structures using simplified AlphaFold2.
Setup Required:
bash
undefined
使用简化版AlphaFold2预测蛋白质的3D结构。
前置设置
bash
undefined

Install OpenMM first

先安装OpenMM

uv pip install openmm
uv pip install openmm

Then setup AlphaFold

然后设置AlphaFold

gget setup alphafold

**Parameters**:
- `sequence`: Amino acid sequence (string), multiple sequences (list), or FASTA file. Multiple sequences trigger multimer modeling
- `-mr/--multimer_recycles`: Recycling iterations (default: 3; recommend 20 for accuracy)
- `-mfm/--multimer_for_monomer`: Apply multimer model to single proteins
- `-r/--relax`: AMBER relaxation for top-ranked model
- `plot`: Python-only; generate interactive 3D visualization (default: True)
- `show_sidechains`: Python-only; include side chains (default: True)

**Returns**: PDB structure file, JSON alignment error data, optional 3D visualization

**Examples**:
```bash
gget setup alphafold

**参数**:
- `sequence`:氨基酸序列(字符串)、多个序列(列表)或FASTA文件。多个序列会触发多聚体建模
- `-mr/--multimer_recycles`:循环迭代次数(默认:3;为提高准确性建议设置为20)
- `-mfm/--multimer_for_monomer`:对单个蛋白质应用多聚体模型
- `-r/--relax`:对排名最高的模型进行AMBER松弛处理
- `plot`:仅Python可用;生成交互式3D可视化(默认:True)
- `show_sidechains`:仅Python可用;包含侧链(默认:True)

**返回内容**:PDB结构文件、JSON格式的比对误差数据、可选的3D可视化

**示例**:
```bash

Predict single protein structure

预测单个蛋白质结构

gget alphafold MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR
gget alphafold MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR

Predict multimer with higher accuracy

以更高准确性预测多聚体结构

gget alphafold sequence1.fasta -mr 20 -r

```python
gget alphafold sequence1.fasta -mr 20 -r

```python

Python with visualization

Python(带可视化)

gget.alphafold("MKWMFK...", plot=True, show_sidechains=True)
gget.alphafold("MKWMFK...", plot=True, show_sidechains=True)

Multimer prediction

多聚体预测

gget.alphafold(["sequence1", "sequence2"], multimer_recycles=20)
undefined
gget.alphafold(["sequence1", "sequence2"], multimer_recycles=20)
undefined

gget elm - Eukaryotic Linear Motifs

gget elm - 真核生物线性基序

Predict Eukaryotic Linear Motifs in protein sequences.
Setup Required:
bash
gget setup elm
Parameters:
  • sequence
    : Amino acid sequence or UniProt Acc
  • -u/--uniprot
    : Indicates sequence is UniProt Acc
  • -e/--expand
    : Include protein names, organisms, references
  • -s/--sensitivity
    : DIAMOND alignment sensitivity (default: "very-sensitive")
  • -t/--threads
    : Number of threads (default: 1)
Returns: Two outputs:
  1. ortholog_df: Linear motifs from orthologous proteins
  2. regex_df: Motifs directly matched in input sequence
Examples:
bash
undefined
预测蛋白质序列中的真核生物线性基序(Eukaryotic Linear Motifs)。
前置设置
bash
gget setup elm
参数
  • sequence
    :氨基酸序列或UniProt登录号
  • -u/--uniprot
    :表示输入的是UniProt登录号
  • -e/--expand
    :包含蛋白质名称、生物、参考文献
  • -s/--sensitivity
    :DIAMOND比对灵敏度(默认:"very-sensitive")
  • -t/--threads
    :线程数(默认:1)
返回内容:两个输出:
  1. ortholog_df:同源蛋白质中的线性基序
  2. regex_df:直接匹配输入序列的基序
示例
bash
undefined

Predict motifs from sequence

预测序列中的基序

gget elm LIAQSIGQASFV -o results
gget elm LIAQSIGQASFV -o results

Use UniProt accession with expanded info

使用UniProt登录号并获取扩展信息

gget elm --uniprot Q02410 -e

```python
gget elm --uniprot Q02410 -e

```python

Python

Python

ortholog_df, regex_df = gget.elm("LIAQSIGQASFV")
undefined
ortholog_df, regex_df = gget.elm("LIAQSIGQASFV")
undefined

4. Expression & Disease Data

4. 表达与疾病数据

gget archs4 - Gene Correlation & Tissue Expression

gget archs4 - 基因相关性与组织表达

Query ARCHS4 database for correlated genes or tissue expression data.
Parameters:
  • gene
    : Gene symbol or Ensembl ID (with
    --ensembl
    flag)
  • -w/--which
    : 'correlation' (default, returns 100 most correlated genes) or 'tissue' (expression atlas)
  • -s/--species
    : 'human' (default) or 'mouse' (tissue data only)
  • -e/--ensembl
    : Input is Ensembl ID
Returns:
  • Correlation mode: Gene symbols, Pearson correlation coefficients
  • Tissue mode: Tissue identifiers, min/Q1/median/Q3/max expression values
Examples:
bash
undefined
查询ARCHS4数据库获取相关基因或组织表达数据。
参数
  • gene
    :基因符号或Ensembl ID(需添加
    --ensembl
    标志)
  • -w/--which
    :'correlation'(相关性,默认,返回100个相关性最高的基因)或'tissue'(组织表达图谱)
  • -s/--species
    :'human'(人类,默认)或'mouse'(小鼠,仅组织数据可用)
  • -e/--ensembl
    :输入为Ensembl ID
返回内容
  • 相关性模式:基因符号、皮尔逊相关系数
  • 组织模式:组织标识符、最小值/四分位距1/中位数/四分位距3/最大值表达值
示例
bash
undefined

Get correlated genes

获取相关基因

gget archs4 ACE2
gget archs4 ACE2

Get tissue expression

获取组织表达数据

gget archs4 -w tissue ACE2

```python
gget archs4 -w tissue ACE2

```python

Python

Python

gget.archs4("ACE2", which="tissue")
undefined
gget.archs4("ACE2", which="tissue")
undefined

gget cellxgene - Single-Cell RNA-seq Data

gget cellxgene - 单细胞RNA-seq数据

Query CZ CELLxGENE Discover Census for single-cell data.
Setup Required:
bash
gget setup cellxgene
Parameters:
  • --gene
    (-g): Gene names or Ensembl IDs (case-sensitive! 'PAX7' for human, 'Pax7' for mouse)
  • --tissue
    : Tissue type(s)
  • --cell_type
    : Specific cell type(s)
  • --species
    (-s): 'homo_sapiens' (default) or 'mus_musculus'
  • --census_version
    (-cv): Version ("stable", "latest", or dated)
  • --ensembl
    (-e): Use Ensembl IDs
  • --meta_only
    (-mo): Return metadata only
  • Additional filters: disease, development_stage, sex, assay, dataset_id, donor_id, ethnicity, suspension_type
Returns: AnnData object with count matrices and metadata (or metadata-only dataframes)
Examples:
bash
undefined
查询CZ CELLxGENE Discover Census获取单细胞数据。
前置设置
bash
gget setup cellxgene
参数
  • --gene
    (-g):基因名称或Ensembl ID(区分大小写!人类用'PAX7',小鼠用'Pax7')
  • --tissue
    :组织类型
  • --cell_type
    :特定细胞类型
  • --species
    (-s):'homo_sapiens'(人类,默认)或'mus_musculus'(小鼠)
  • --census_version
    (-cv):版本("stable"、"latest"或日期格式)
  • --ensembl
    (-e):使用Ensembl ID
  • --meta_only
    (-mo):仅返回元数据
  • 其他过滤参数:disease(疾病)、development_stage(发育阶段)、sex(性别)、assay(检测方法)、dataset_id(数据集ID)、donor_id(供体ID)、ethnicity(种族)、suspension_type(悬液类型)
返回内容:包含计数矩阵和元数据的AnnData对象(或仅元数据的DataFrame)
示例
bash
undefined

Get single-cell data for specific genes and cell types

获取特定基因和细胞类型的单细胞数据

gget cellxgene --gene ACE2 ABCA1 --tissue lung --cell_type "mucus secreting cell" -o lung_data.h5ad
gget cellxgene --gene ACE2 ABCA1 --tissue lung --cell_type "mucus secreting cell" -o lung_data.h5ad

Metadata only

仅获取元数据

gget cellxgene --gene PAX7 --tissue muscle --meta_only -o metadata.csv

```python
gget cellxgene --gene PAX7 --tissue muscle --meta_only -o metadata.csv

```python

Python

Python

adata = gget.cellxgene(gene=["ACE2", "ABCA1"], tissue="lung", cell_type="mucus secreting cell")
undefined
adata = gget.cellxgene(gene=["ACE2", "ABCA1"], tissue="lung", cell_type="mucus secreting cell")
undefined

gget enrichr - Enrichment Analysis

gget enrichr - 富集分析

Perform ontology enrichment analysis on gene lists using Enrichr.
Parameters:
  • genes
    : Gene symbols or Ensembl IDs
  • -db/--database
    : Reference database (supports shortcuts: 'pathway', 'transcription', 'ontology', 'diseases_drugs', 'celltypes')
  • -s/--species
    : human (default), mouse, fly, yeast, worm, fish
  • -bkg_l/--background_list
    : Background genes for comparison
  • -ko/--kegg_out
    : Save KEGG pathway images with highlighted genes
  • plot
    : Python-only; generate graphical results
Database Shortcuts:
  • 'pathway' → KEGG_2021_Human
  • 'transcription' → ChEA_2016
  • 'ontology' → GO_Biological_Process_2021
  • 'diseases_drugs' → GWAS_Catalog_2019
  • 'celltypes' → PanglaoDB_Augmented_2021
Examples:
bash
undefined
使用Enrichr对基因列表进行本体富集分析。
参数
  • genes
    :基因符号或Ensembl ID
  • -db/--database
    :参考数据库(支持快捷方式:'pathway'、'transcription'、'ontology'、'diseases_drugs'、'celltypes')
  • -s/--species
    :human(人类,默认)、mouse(小鼠)、fly(果蝇)、yeast(酵母)、worm(线虫)、fish(鱼类)
  • -bkg_l/--background_list
    :用于比较的背景基因列表
  • -ko/--kegg_out
    :保存带有高亮基因的KEGG通路图像
  • plot
    :仅Python可用;生成图形化结果
数据库快捷方式
  • 'pathway' → KEGG_2021_Human
  • 'transcription' → ChEA_2016
  • 'ontology' → GO_Biological_Process_2021
  • 'diseases_drugs' → GWAS_Catalog_2019
  • 'celltypes' → PanglaoDB_Augmented_2021
示例
bash
undefined

Enrichment analysis for ontology

进行本体富集分析

gget enrichr -db ontology ACE2 AGT AGTR1
gget enrichr -db ontology ACE2 AGT AGTR1

Save KEGG pathways

保存KEGG通路图像

gget enrichr -db pathway ACE2 AGT AGTR1 -ko ./kegg_images/

```python
gget enrichr -db pathway ACE2 AGT AGTR1 -ko ./kegg_images/

```python

Python with plot

Python(带绘图)

gget.enrichr(["ACE2", "AGT", "AGTR1"], database="ontology", plot=True)
undefined
gget.enrichr(["ACE2", "AGT", "AGTR1"], database="ontology", plot=True)
undefined

gget bgee - Orthology & Expression

gget bgee - 同源性与表达

Retrieve orthology and gene expression data from Bgee database.
Parameters:
  • ens_id
    : Ensembl gene ID or NCBI gene ID (for non-Ensembl species). Multiple IDs supported when
    type=expression
  • -t/--type
    : 'orthologs' (default) or 'expression'
Returns:
  • Orthologs mode: Matching genes across species with IDs, names, taxonomic info
  • Expression mode: Anatomical entities, confidence scores, expression status
Examples:
bash
undefined
从Bgee数据库获取同源性和基因表达数据。
参数
  • ens_id
    :Ensembl基因ID或NCBI基因ID(适用于非Ensembl物种)。当
    type=expression
    时支持多个ID
  • -t/--type
    :'orthologs'(同源基因,默认)或'expression'(表达数据)
返回内容
  • 同源基因模式:跨物种的匹配基因及其ID、名称、分类信息
  • 表达模式:解剖实体、置信度评分、表达状态
示例
bash
undefined

Get orthologs

获取同源基因

gget bgee ENSG00000169194
gget bgee ENSG00000169194

Get expression data

获取表达数据

gget bgee ENSG00000169194 -t expression
gget bgee ENSG00000169194 -t expression

Multiple genes

多个基因

gget bgee ENSBTAG00000047356 ENSBTAG00000018317 -t expression

```python
gget bgee ENSBTAG00000047356 ENSBTAG00000018317 -t expression

```python

Python

Python

gget.bgee("ENSG00000169194", type="orthologs")
undefined
gget.bgee("ENSG00000169194", type="orthologs")
undefined

gget opentargets - Disease & Drug Associations

gget opentargets - 疾病与药物关联

Retrieve disease and drug associations from OpenTargets.
Parameters:
  • Ensembl gene ID (required)
  • -r/--resource
    : diseases (default), drugs, tractability, pharmacogenetics, expression, depmap, interactions
  • -l/--limit
    : Cap results count
  • Filter arguments (vary by resource):
    • drugs:
      --filter_disease
    • pharmacogenetics:
      --filter_drug
    • expression/depmap:
      --filter_tissue
      ,
      --filter_anat_sys
      ,
      --filter_organ
    • interactions:
      --filter_protein_a
      ,
      --filter_protein_b
      ,
      --filter_gene_b
Examples:
bash
undefined
从OpenTargets获取疾病和药物关联数据。
参数
  • Ensembl基因ID(必填)
  • -r/--resource
    :diseases(疾病,默认)、drugs(药物)、tractability(成药性)、pharmacogenetics(药物遗传学)、expression(表达)、depmap(癌症依赖性图谱)、interactions(相互作用)
  • -l/--limit
    :结果数量上限
  • 过滤参数(因资源而异):
    • drugs:
      --filter_disease
    • pharmacogenetics:
      --filter_drug
    • expression/depmap:
      --filter_tissue
      --filter_anat_sys
      --filter_organ
    • interactions:
      --filter_protein_a
      --filter_protein_b
      --filter_gene_b
示例
bash
undefined

Get associated diseases

获取相关疾病

gget opentargets ENSG00000169194 -r diseases -l 5
gget opentargets ENSG00000169194 -r diseases -l 5

Get associated drugs

获取相关药物

gget opentargets ENSG00000169194 -r drugs -l 10
gget opentargets ENSG00000169194 -r drugs -l 10

Get tissue expression

获取组织表达数据

gget opentargets ENSG00000169194 -r expression --filter_tissue brain

```python
gget opentargets ENSG00000169194 -r expression --filter_tissue brain

```python

Python

Python

gget.opentargets("ENSG00000169194", resource="diseases", limit=5)
undefined
gget.opentargets("ENSG00000169194", resource="diseases", limit=5)
undefined

gget cbio - cBioPortal Cancer Genomics

gget cbio - cBioPortal癌症基因组学

Plot cancer genomics heatmaps using cBioPortal data.
Two subcommands:
search - Find study IDs:
bash
gget cbio search breast lung
plot - Generate heatmaps:
Parameters:
  • -s/--study_ids
    : Space-separated cBioPortal study IDs (required)
  • -g/--genes
    : Space-separated gene names or Ensembl IDs (required)
  • -st/--stratification
    : Column to organize data (tissue, cancer_type, cancer_type_detailed, study_id, sample)
  • -vt/--variation_type
    : Data type (mutation_occurrences, cna_nonbinary, sv_occurrences, cna_occurrences, Consequence)
  • -f/--filter
    : Filter by column value (e.g., 'study_id:msk_impact_2017')
  • -dd/--data_dir
    : Cache directory (default: ./gget_cbio_cache)
  • -fd/--figure_dir
    : Output directory (default: ./gget_cbio_figures)
  • -dpi
    : Resolution (default: 100)
  • -sh/--show
    : Display plot in window
  • -nc/--no_confirm
    : Skip download confirmations
Examples:
bash
undefined
使用cBioPortal数据绘制癌症基因组学热图。
两个子命令
search - 查找研究ID:
bash
gget cbio search breast lung
plot - 生成热图:
参数
  • -s/--study_ids
    :空格分隔的cBioPortal研究ID(必填)
  • -g/--genes
    :空格分隔的基因名称或Ensembl ID(必填)
  • -st/--stratification
    :用于组织数据的列(tissue、cancer_type、cancer_type_detailed、study_id、sample)
  • -vt/--variation_type
    :数据类型(mutation_occurrences、cna_nonbinary、sv_occurrences、cna_occurrences、Consequence)
  • -f/--filter
    :按列值过滤(例如'study_id:msk_impact_2017')
  • -dd/--data_dir
    :缓存目录(默认:./gget_cbio_cache)
  • -fd/--figure_dir
    :输出目录(默认:./gget_cbio_figures)
  • -dpi
    :分辨率(默认:100)
  • -sh/--show
    :在窗口中显示绘图
  • -nc/--no_confirm
    :跳过下载确认
示例
bash
undefined

Search for studies

搜索研究

gget cbio search esophag ovary
gget cbio search esophag ovary

Create heatmap

创建热图

gget cbio plot -s msk_impact_2017 -g AKT1 ALK BRAF -st tissue -vt mutation_occurrences

```python
gget cbio plot -s msk_impact_2017 -g AKT1 ALK BRAF -st tissue -vt mutation_occurrences

```python

Python

Python

gget.cbio_search(["esophag", "ovary"]) gget.cbio_plot(["msk_impact_2017"], ["AKT1", "ALK"], stratification="tissue")
undefined
gget.cbio_search(["esophag", "ovary"]) gget.cbio_plot(["msk_impact_2017"], ["AKT1", "ALK"], stratification="tissue")
undefined

gget cosmic - COSMIC Database

gget cosmic - COSMIC数据库

Search COSMIC (Catalogue Of Somatic Mutations In Cancer) database.
Important: License fees apply for commercial use. Requires COSMIC account credentials.
Parameters:
  • searchterm
    : Gene name, Ensembl ID, mutation notation, or sample ID
  • -ctp/--cosmic_tsv_path
    : Path to downloaded COSMIC TSV file (required for querying)
  • -l/--limit
    : Maximum results (default: 100)
Database download flags:
  • -d/--download_cosmic
    : Activate download mode
  • -gm/--gget_mutate
    : Create version for gget mutate
  • -cp/--cosmic_project
    : Database type (cancer, census, cell_line, resistance, genome_screen, targeted_screen)
  • -cv/--cosmic_version
    : COSMIC version
  • -gv/--grch_version
    : Human reference genome (37 or 38)
  • --email
    ,
    --password
    : COSMIC credentials
Examples:
bash
undefined
搜索COSMIC(癌症体细胞突变目录)数据库。
重要提示:商业使用需支付许可费。需要COSMIC账户凭据。
参数
  • searchterm
    :基因名称、Ensembl ID、突变符号或样本ID
  • -ctp/--cosmic_tsv_path
    :下载的COSMIC TSV文件路径(查询必填)
  • -l/--limit
    :最大结果数(默认:100)
数据库下载标志
  • -d/--download_cosmic
    :激活下载模式
  • -gm/--gget_mutate
    :创建适用于gget mutate的版本
  • -cp/--cosmic_project
    :数据库类型(cancer、census、cell_line、resistance、genome_screen、targeted_screen)
  • -cv/--cosmic_version
    :COSMIC版本
  • -gv/--grch_version
    :人类参考基因组(37或38)
  • --email
    ,
    --password
    :COSMIC账户凭据
示例
bash
undefined

First download database

先下载数据库

gget cosmic -d --email user@example.com --password xxx -cp cancer
gget cosmic -d --email user@example.com --password xxx -cp cancer

Then query

然后进行查询

gget cosmic EGFR -ctp cosmic_data.tsv -l 10

```python
gget cosmic EGFR -ctp cosmic_data.tsv -l 10

```python

Python

Python

gget.cosmic("EGFR", cosmic_tsv_path="cosmic_data.tsv", limit=10)
undefined
gget.cosmic("EGFR", cosmic_tsv_path="cosmic_data.tsv", limit=10)
undefined

5. Additional Tools

5. 附加工具

gget mutate - Generate Mutated Sequences

gget mutate - 生成突变序列

Generate mutated nucleotide sequences from mutation annotations.
Parameters:
  • sequences
    : FASTA file path or direct sequence input (string/list)
  • -m/--mutations
    : CSV/TSV file or DataFrame with mutation data (required)
  • -mc/--mut_column
    : Mutation column name (default: 'mutation')
  • -sic/--seq_id_column
    : Sequence ID column (default: 'seq_ID')
  • -mic/--mut_id_column
    : Mutation ID column
  • -k/--k
    : Length of flanking sequences (default: 30 nucleotides)
Returns: Mutated sequences in FASTA format
Examples:
bash
undefined
根据突变注释生成突变后的核苷酸序列。
参数
  • sequences
    :FASTA文件路径或直接输入序列(字符串/列表)
  • -m/--mutations
    :包含突变数据的CSV/TSV文件或DataFrame(必填)
  • -mc/--mut_column
    :突变列名称(默认:'mutation')
  • -sic/--seq_id_column
    :序列ID列(默认:'seq_ID')
  • -mic/--mut_id_column
    :突变ID列
  • -k/--k
    :侧翼序列长度(默认:30个核苷酸)
返回内容:FASTA格式的突变序列
示例
bash
undefined

Single mutation

单个突变

gget mutate ATCGCTAAGCT -m "c.4G>T"
gget mutate ATCGCTAAGCT -m "c.4G>T"

Multiple sequences with mutations from file

多个序列与文件中的突变数据

gget mutate sequences.fasta -m mutations.csv -o mutated.fasta

```python
gget mutate sequences.fasta -m mutations.csv -o mutated.fasta

```python

Python

Python

import pandas as pd mutations_df = pd.DataFrame({"seq_ID": ["seq1"], "mutation": ["c.4G>T"]}) gget.mutate(["ATCGCTAAGCT"], mutations=mutations_df)
undefined
import pandas as pd mutations_df = pd.DataFrame({"seq_ID": ["seq1"], "mutation": ["c.4G>T"]}) gget.mutate(["ATCGCTAAGCT"], mutations=mutations_df)
undefined

gget gpt - OpenAI Text Generation

gget gpt - OpenAI文本生成

Generate natural language text using OpenAI's API.
Setup Required:
bash
gget setup gpt
Important: Free tier limited to 3 months after account creation. Set monthly billing limits.
Parameters:
  • prompt
    : Text input for generation (required)
  • api_key
    : OpenAI authentication (required)
  • Model configuration: temperature, top_p, max_tokens, frequency_penalty, presence_penalty
  • Default model: gpt-3.5-turbo (configurable)
Examples:
bash
gget gpt "Explain CRISPR" --api_key your_key_here
python
undefined
使用OpenAI的API生成自然语言文本。
前置设置
bash
gget setup gpt
重要提示:免费版在账户创建后3个月内可用。请设置每月账单限额。
参数
  • prompt
    :用于生成的文本输入(必填)
  • api_key
    :OpenAI认证密钥(必填)
  • 模型配置:temperature、top_p、max_tokens、frequency_penalty、presence_penalty
  • 默认模型:gpt-3.5-turbo(可配置)
示例
bash
gget gpt "Explain CRISPR" --api_key your_key_here
python
undefined

Python

Python

gget.gpt("Explain CRISPR", api_key="your_key_here")
undefined
gget.gpt("Explain CRISPR", api_key="your_key_here")
undefined

gget setup - Install Dependencies

gget setup - 安装依赖

Install/download third-party dependencies for specific modules.
Parameters:
  • module
    : Module name requiring dependency installation
  • -o/--out
    : Output folder path (elm module only)
Modules requiring setup:
  • alphafold
    - Downloads ~4GB of model parameters
  • cellxgene
    - Installs cellxgene-census (may not support latest Python)
  • elm
    - Downloads local ELM database
  • gpt
    - Configures OpenAI integration
Examples:
bash
undefined
为特定模块安装/下载第三方依赖。
参数
  • module
    :需要安装依赖的模块名称
  • -o/--out
    :输出文件夹路径(仅elm模块可用)
需要设置的模块
  • alphafold
    - 下载约4GB的模型参数
  • cellxgene
    - 安装cellxgene-census(可能不支持最新Python版本)
  • elm
    - 下载本地ELM数据库
  • gpt
    - 配置OpenAI集成
示例
bash
undefined

Setup AlphaFold

设置AlphaFold

gget setup alphafold
gget setup alphafold

Setup ELM with custom directory

设置ELM并指定自定义目录

gget setup elm -o /path/to/elm_data

```python
gget setup elm -o /path/to/elm_data

```python

Python

Python

gget.setup("alphafold")
undefined
gget.setup("alphafold")
undefined

Common Workflows

常见工作流

Workflow 1: Gene Discovery to Sequence Analysis

工作流1:基因发现到序列分析

Find and analyze genes of interest:
python
undefined
查找并分析目标基因:
python
undefined

1. Search for genes

1. 搜索基因

results = gget.search(["GABA", "receptor"], species="homo_sapiens")
results = gget.search(["GABA", "receptor"], species="homo_sapiens")

2. Get detailed information

2. 获取详细信息

gene_ids = results["ensembl_id"].tolist() info = gget.info(gene_ids[:5])
gene_ids = results["ensembl_id"].tolist() info = gget.info(gene_ids[:5])

3. Retrieve sequences

3. 检索序列

sequences = gget.seq(gene_ids[:5], translate=True)
undefined
sequences = gget.seq(gene_ids[:5], translate=True)
undefined

Workflow 2: Sequence Alignment and Structure

工作流2:序列比对与结构预测

Align sequences and predict structures:
python
undefined
比对序列并预测结构:
python
undefined

1. Align multiple sequences

1. 比对多个序列

alignment = gget.muscle("sequences.fasta")
alignment = gget.muscle("sequences.fasta")

2. Find similar sequences

2. 查找相似序列

blast_results = gget.blast(my_sequence, database="swissprot", limit=10)
blast_results = gget.blast(my_sequence, database="swissprot", limit=10)

3. Predict structure

3. 预测结构

structure = gget.alphafold(my_sequence, plot=True)
structure = gget.alphafold(my_sequence, plot=True)

4. Find linear motifs

4. 查找线性基序

ortholog_df, regex_df = gget.elm(my_sequence)
undefined
ortholog_df, regex_df = gget.elm(my_sequence)
undefined

Workflow 3: Gene Expression and Enrichment

工作流3:基因表达与富集分析

Analyze expression patterns and functional enrichment:
python
undefined
分析表达模式和功能富集:
python
undefined

1. Get tissue expression

1. 获取组织表达数据

tissue_expr = gget.archs4("ACE2", which="tissue")
tissue_expr = gget.archs4("ACE2", which="tissue")

2. Find correlated genes

2. 查找相关基因

correlated = gget.archs4("ACE2", which="correlation")
correlated = gget.archs4("ACE2", which="correlation")

3. Get single-cell data

3. 获取单细胞数据

adata = gget.cellxgene(gene=["ACE2"], tissue="lung", cell_type="epithelial cell")
adata = gget.cellxgene(gene=["ACE2"], tissue="lung", cell_type="epithelial cell")

4. Perform enrichment analysis

4. 进行富集分析

gene_list = correlated["gene_symbol"].tolist()[:50] enrichment = gget.enrichr(gene_list, database="ontology", plot=True)
undefined
gene_list = correlated["gene_symbol"].tolist()[:50] enrichment = gget.enrichr(gene_list, database="ontology", plot=True)
undefined

Workflow 4: Disease and Drug Analysis

工作流4:疾病与药物分析

Investigate disease associations and therapeutic targets:
python
undefined
研究疾病关联和治疗靶点:
python
undefined

1. Search for genes

1. 搜索基因

genes = gget.search(["breast cancer"], species="homo_sapiens")
genes = gget.search(["breast cancer"], species="homo_sapiens")

2. Get disease associations

2. 获取疾病关联数据

diseases = gget.opentargets("ENSG00000169194", resource="diseases")
diseases = gget.opentargets("ENSG00000169194", resource="diseases")

3. Get drug associations

3. 获取药物关联数据

drugs = gget.opentargets("ENSG00000169194", resource="drugs")
drugs = gget.opentargets("ENSG00000169194", resource="drugs")

4. Query cancer genomics data

4. 查询癌症基因组学数据

study_ids = gget.cbio_search(["breast"]) gget.cbio_plot(study_ids[:2], ["BRCA1", "BRCA2"], stratification="cancer_type")
study_ids = gget.cbio_search(["breast"]) gget.cbio_plot(study_ids[:2], ["BRCA1", "BRCA2"], stratification="cancer_type")

5. Search COSMIC for mutations

5. 在COSMIC中搜索突变

cosmic_results = gget.cosmic("BRCA1", cosmic_tsv_path="cosmic.tsv")
undefined
cosmic_results = gget.cosmic("BRCA1", cosmic_tsv_path="cosmic.tsv")
undefined

Workflow 5: Comparative Genomics

工作流5:比较基因组学

Compare proteins across species:
python
undefined
跨物种比较蛋白质:
python
undefined

1. Get orthologs

1. 获取同源基因

orthologs = gget.bgee("ENSG00000169194", type="orthologs")
orthologs = gget.bgee("ENSG00000169194", type="orthologs")

2. Get sequences for comparison

2. 获取用于比较的序列

human_seq = gget.seq("ENSG00000169194", translate=True) mouse_seq = gget.seq("ENSMUSG00000026091", translate=True)
human_seq = gget.seq("ENSG00000169194", translate=True) mouse_seq = gget.seq("ENSMUSG00000026091", translate=True)

3. Align sequences

3. 比对序列

alignment = gget.muscle([human_seq, mouse_seq])
alignment = gget.muscle([human_seq, mouse_seq])

4. Compare structures

4. 比较结构

human_structure = gget.pdb("7S7U") mouse_structure = gget.alphafold(mouse_seq)
undefined
human_structure = gget.pdb("7S7U") mouse_structure = gget.alphafold(mouse_seq)
undefined

Workflow 6: Building Reference Indices

工作流6:构建参考索引

Prepare reference data for downstream analysis (e.g., kallisto|bustools):
bash
undefined
为下游分析准备参考数据(例如kallisto|bustools):
bash
undefined

1. List available species

1. 列出可用物种

gget ref --list_species
gget ref --list_species

2. Download reference files

2. 下载参考文件

gget ref -w gtf -w cdna -d homo_sapiens
gget ref -w gtf -w cdna -d homo_sapiens

3. Build kallisto index

3. 构建kallisto索引

kallisto index -i transcriptome.idx transcriptome.fasta
kallisto index -i transcriptome.idx transcriptome.fasta

4. Download genome for alignment

4. 下载用于比对的基因组

gget ref -w dna -d homo_sapiens
undefined
gget ref -w dna -d homo_sapiens
undefined

Best Practices

最佳实践

Data Retrieval

数据检索

  • Use
    --limit
    to control result sizes for large queries
  • Save results with
    -o/--out
    for reproducibility
  • Check database versions/releases for consistency across analyses
  • Use
    --quiet
    in production scripts to reduce output
  • 使用
    --limit
    控制大型查询的结果数量
  • 使用
    -o/--out
    保存结果以保证可重复性
  • 检查数据库版本/发布版本以确保分析的一致性
  • 在生产脚本中使用
    --quiet
    减少输出

Sequence Analysis

序列分析

  • For BLAST/BLAT, start with default parameters, then adjust sensitivity
  • Use
    gget diamond
    with
    --threads
    for faster local alignment
  • Save DIAMOND databases with
    --diamond_db
    for repeated queries
  • For multiple sequence alignment, use
    -s5/--super5
    for large datasets
  • 对于BLAST/BLAT,先使用默认参数,再调整灵敏度
  • 使用
    gget diamond
    并添加
    --threads
    以加快局部比对速度
  • 使用
    --diamond_db
    保存DIAMOND数据库以供重复查询
  • 对于多序列比对,对大型数据集使用
    -s5/--super5

Expression and Disease Data

表达与疾病数据

  • Gene symbols are case-sensitive in cellxgene (e.g., 'PAX7' vs 'Pax7')
  • Run
    gget setup
    before first use of alphafold, cellxgene, elm, gpt
  • For enrichment analysis, use database shortcuts for convenience
  • Cache cBioPortal data with
    -dd
    to avoid repeated downloads
  • cellxgene中的基因符号区分大小写(例如'PAX7' vs 'Pax7')
  • 首次使用alphafold、cellxgene、elm、gpt前先运行
    gget setup
  • 富集分析中使用数据库快捷方式以简化操作
  • 使用
    -dd
    缓存cBioPortal数据以避免重复下载

Structure Prediction

结构预测

  • AlphaFold multimer predictions: use
    -mr 20
    for higher accuracy
  • Use
    -r
    flag for AMBER relaxation of final structures
  • Visualize results in Python with
    plot=True
  • Check PDB database first before running AlphaFold predictions
  • AlphaFold多聚体预测:使用
    -mr 20
    以提高准确性
  • 使用
    -r
    标志对最终结构进行AMBER松弛处理
  • 在Python中使用
    plot=True
    可视化结果
  • 运行AlphaFold预测前先检查PDB数据库

Error Handling

错误处理

  • Database structures change; update gget regularly:
    uv pip install --upgrade gget
  • Process max ~1000 Ensembl IDs at once with gget info
  • For large-scale analyses, implement rate limiting for API queries
  • Use virtual environments to avoid dependency conflicts
  • 数据库结构会变化;定期更新gget:
    uv pip install --upgrade gget
  • 使用gget info时一次最多处理约1000个Ensembl ID
  • 对于大规模分析,为API查询实现速率限制
  • 使用虚拟环境避免依赖冲突

Output Formats

输出格式

Command-line

命令行

  • Default: JSON
  • CSV: Add
    -csv
    flag
  • FASTA: gget seq, gget mutate
  • PDB: gget pdb, gget alphafold
  • PNG: gget cbio plot
  • 默认:JSON
  • CSV:添加
    -csv
    标志
  • FASTA:gget seq、gget mutate
  • PDB:gget pdb、gget alphafold
  • PNG:gget cbio plot

Python

Python

  • Default: DataFrame or dictionary
  • JSON: Add
    json=True
    parameter
  • Save to file: Add
    save=True
    or specify
    out="filename"
  • AnnData: gget cellxgene
  • 默认:DataFrame或字典
  • JSON:添加
    json=True
    参数
  • 保存到文件:添加
    save=True
    或指定
    out="filename"
  • AnnData:gget cellxgene

Resources

资源

This skill includes reference documentation for detailed module information:
本工具包含参考文档以获取模块的详细信息:

references/

references/

  • module_reference.md
    - Comprehensive parameter reference for all modules
  • database_info.md
    - Information about queried databases and their update frequencies
  • workflows.md
    - Extended workflow examples and use cases
For additional help:
  • module_reference.md
    - 所有模块的综合参数参考
  • database_info.md
    - 查询的数据库及其更新频率信息
  • workflows.md
    - 扩展工作流示例和用例
如需更多帮助: