database-lookup
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDatabase Lookup
数据库查询
You have access to 78 public databases through their REST APIs. Your job is to figure out which database(s) are relevant to the user's question, query them, and return the raw JSON results along with which databases you used.
你可以通过REST API访问78个公共数据库。你的任务是确定哪些数据库与用户的问题相关,查询这些数据库,并返回原始JSON结果以及你使用的数据库信息。
Core Workflow
核心工作流程
-
Understand the query — What is the user looking for? A compound? A gene? A pathway? A patent? Expression data? An economic indicator? This determines which database(s) to hit.
-
Select database(s) — Use the database selection guide below. When in doubt, search multiple databases — it's better to cast a wide net than to miss relevant data.
-
Read the reference file — Each database has a reference file inwith endpoint details, query formats, and example calls. Read the relevant file(s) before making API calls.
references/ -
Make the API call(s) — See the Making API Calls section below for which HTTP fetch tool to use on your platform.
-
Return results — Always return:
- The raw JSON response from each database
- A list of databases queried with the specific endpoints used
- If a query returned no results, say so explicitly rather than omitting it
-
理解查询需求 — 用户要查找什么?化合物?基因?通路?专利?表达数据?经济指标?这将决定要调用哪些数据库。
-
选择数据库 — 使用下方的数据库选择指南。不确定时,可搜索多个数据库——宁可扩大范围也不要遗漏相关数据。
-
查阅参考文件 — 每个数据库在目录下都有一个参考文件,包含端点详情、查询格式和调用示例。在调用API前请阅读相关文件。
references/ -
调用API — 请查看下方的「API调用方法」部分,了解你所在平台应使用的HTTP获取工具。
-
返回结果 — 请始终返回:
- 每个数据库的原始JSON响应
- 已查询数据库列表及使用的具体端点
- 如果查询无结果,请明确说明,不要省略
Database Selection Guide
数据库选择指南
Match the user's intent to the right database(s). Many queries benefit from hitting multiple databases.
将用户的需求与合适的数据库匹配。许多查询需要调用多个数据库才能得到全面结果。
Physics & Astronomy
物理与天文学
| User is asking about... | Primary database(s) | Also consider |
|---|---|---|
| Near-Earth objects, asteroids | NASA (NeoWs) | — |
| Mars rover images | NASA (Mars Rover Photos) | — |
| Exoplanets, orbital parameters | NASA Exoplanet Archive | — |
| Astronomical objects by name/coordinates | SIMBAD | SDSS |
| Galaxy/star spectra, photometry | SDSS | SIMBAD |
| Physical constants | NIST | — |
| Atomic spectra, spectral lines | NIST (ASD) | — |
| 用户查询内容 | 首选数据库 | 备选数据库 |
|---|---|---|
| 近地天体、小行星 | NASA (NeoWs) | — |
| 火星探测器图像 | NASA (Mars Rover Photos) | — |
| 系外行星、轨道参数 | NASA Exoplanet Archive | — |
| 按名称/坐标查询天文对象 | SIMBAD | SDSS |
| 星系/恒星光谱、光度测量 | SDSS | SIMBAD |
| 物理常数 | NIST | — |
| 原子光谱、谱线 | NIST (ASD) | — |
Earth & Environmental Sciences
地球与环境科学
| User is asking about... | Primary database(s) | Also consider |
|---|---|---|
| Earthquakes, seismic events | USGS Earthquakes | — |
| Water data, streamflow, groundwater | USGS Water Services | — |
| Weather (current, forecast, historical) | OpenWeatherMap | NOAA |
| Climate data, historical weather stations | NOAA (CDO) | — |
| Air quality, toxic releases | EPA (Envirofacts) | — |
| 用户查询内容 | 首选数据库 | 备选数据库 |
|---|---|---|
| 地震、地震事件 | USGS Earthquakes | — |
| 水文数据、河流流量、地下水 | USGS Water Services | — |
| 天气(实时、预报、历史) | OpenWeatherMap | NOAA |
| 气候数据、历史气象站 | NOAA (CDO) | — |
| 空气质量、有毒物质排放 | EPA (Envirofacts) | — |
Chemistry & Drugs
化学与药物学
| User is asking about... | Primary database(s) | Also consider |
|---|---|---|
| Chemical compounds, molecules | PubChem | ChEMBL |
| Molecular properties (weight, formula, SMILES) | PubChem | — |
| Drug synonyms, CAS numbers | PubChem (synonyms) | DrugBank |
| Bioactivity data, IC50, binding assays | ChEMBL | BindingDB, PubChem |
| Drug binding affinities (Ki, IC50, Kd) | ChEMBL, BindingDB | PubChem |
| Drug-target interactions | ChEMBL, DrugBank | BindingDB, Open Targets |
| Ligands for a protein target (by UniProt) | BindingDB | ChEMBL |
| Target identification from compound structure | BindingDB (SMILES similarity) | ChEMBL |
| Drug labels, adverse events, recalls | FDA (OpenFDA) | DailyMed |
| Drug labels (structured product labels) | DailyMed | FDA (OpenFDA) |
| Drug pharmacology, indications | DrugBank | FDA |
| Chemical cross-referencing | PubChem (xrefs) | ChEMBL |
| Commercially available compounds for screening | ZINC | PubChem |
| Similarity/substructure search (purchasable) | ZINC | PubChem, ChEMBL |
| Drug-like compound libraries, building blocks | ZINC | — |
| FDA-approved drug structures | ZINC (fda subset) | PubChem, FDA |
| Compound purchasability, vendor catalogs | ZINC | — |
| 用户查询内容 | 首选数据库 | 备选数据库 |
|---|---|---|
| 化学化合物、分子 | PubChem | ChEMBL |
| 分子属性(分子量、分子式、SMILES) | PubChem | — |
| 药物同义词、CAS号 | PubChem (synonyms) | DrugBank |
| 生物活性数据、IC50、结合实验 | ChEMBL | BindingDB、PubChem |
| 药物结合亲和力(Ki、IC50、Kd) | ChEMBL、BindingDB | PubChem |
| 药物-靶点相互作用 | ChEMBL、DrugBank | BindingDB、Open Targets |
| 蛋白靶点的配体(通过UniProt) | BindingDB | ChEMBL |
| 从化合物结构识别靶点 | BindingDB (SMILES相似度) | ChEMBL |
| 药物标签、不良反应、召回信息 | FDA (OpenFDA) | DailyMed |
| 药物标签(结构化产品标签) | DailyMed | FDA (OpenFDA) |
| 药物药理学、适应症 | DrugBank | FDA |
| 化学交叉引用 | PubChem (xrefs) | ChEMBL |
| 可商业购买的筛选化合物 | ZINC | PubChem |
| 相似度/子结构搜索(可购买) | ZINC | PubChem、ChEMBL |
| 类药化合物库、合成砌块 | ZINC | — |
| FDA批准药物结构 | ZINC (fda子集) | PubChem、FDA |
| 化合物可购买性、供应商目录 | ZINC | — |
Materials Science & Crystallography
材料科学与晶体学
| User is asking about... | Primary database(s) | Also consider |
|---|---|---|
| Materials by formula or elements | Materials Project | COD |
| Band gap, electronic structure | Materials Project | — |
| Crystal structures, CIF files | COD | Materials Project |
| Elastic/mechanical properties | Materials Project | — |
| Formation energy, thermodynamics | Materials Project | — |
| Cell parameters, space groups | COD | Materials Project |
| 用户查询内容 | 首选数据库 | 备选数据库 |
|---|---|---|
| 按分子式或元素查询材料 | Materials Project | COD |
| 带隙、电子结构 | Materials Project | — |
| 晶体结构、CIF文件 | COD | Materials Project |
| 弹性/力学性能 | Materials Project | — |
| 形成能、热力学 | Materials Project | — |
| 晶胞参数、空间群 | COD | Materials Project |
Biology & Genomics
生物学与基因组学
| User is asking about... | Primary database(s) | Also consider |
|---|---|---|
| Biological pathways | Reactome, KEGG | — |
| What pathways a gene/protein is in | Reactome (mapping), KEGG | — |
| Enzyme kinetics, catalytic activity | BRENDA | KEGG |
| Metabolomics studies, metabolite profiles | Metabolomics Workbench | PubChem |
| m/z or exact mass lookup | Metabolomics Workbench (moverz/exactmass) | PubChem |
| Protein sequence, function, annotation | UniProt | Ensembl |
| Protein-protein interactions | STRING | BioGRID |
| Gene information, genomic location | NCBI Gene | Ensembl |
| Genome sequences, variants, transcripts | Ensembl | NCBI Gene |
| Gene expression datasets | GEO (NCBI E-utilities) | — |
| Gene expression across tissues | GTEx | Human Protein Atlas |
| Gene expression signatures (CMap/L1000) | LINCS L1000 | GEO |
| Gene set enrichment vs GEO | RummaGEO | GEO |
| Protein sequences (NCBI) | NCBI Protein | UniProt |
| Taxonomic classification | NCBI Taxonomy | — |
| SNP/variant data (dbSNP) | dbSNP | ClinVar, gnomAD |
| Population variant frequencies | gnomAD | dbSNP |
| Sequencing run metadata | SRA | ENA, GEO |
| Nucleotide sequences (European archive) | ENA | SRA, NCBI Gene |
| Genome assemblies, raw reads (European) | ENA | SRA, Ensembl |
| Cross-references from sequence accessions | ENA (xref) | NCBI Gene, UniProt |
| Genome annotations, tracks | UCSC Genome Browser | Ensembl |
| 3D protein structures (experimental) | PDB (RCSB) | EMDB |
| 3D protein structures (predicted) | AlphaFold DB | PDB |
| EM maps, cryo-EM structures | EMDB | PDB |
| Protein families, domains | InterPro | UniProt |
| Chemical entities (biological) | ChEBI | PubChem |
| Protein/genetic interactions | BioGRID | STRING |
| Gene function annotations (GO terms) | QuickGO | Gene Ontology |
| Regulatory elements, ChIP-seq, ATAC-seq | ENCODE | — |
| TF binding profiles/motifs | JASPAR | ENCODE |
| Protein expression across tissues | Human Protein Atlas | UniProt |
| Single-cell atlas projects | Human Cell Atlas | — |
| Proteomics datasets | PRIDE | — |
| Mouse gene data | MouseMine | NCBI Gene |
| Plasmid repository | Addgene | — |
Organism/species matters. Most biology databases cover multiple organisms. If the user's query is about a specific organism, pass it explicitly — don't assume human. Common patterns: Ensembl uses in the URL path (e.g. ), STRING/BioGRID/QuickGO use NCBI taxon IDs ( for human, for mouse), UniProt uses in search queries, KEGG uses organism codes (, ). GTEx and Human Protein Atlas are human-only. Check the reference file for each database's specific parameter.
{species}homo_sapiensspecies=960610090organism_id:9606hsammu| 用户查询内容 | 首选数据库 | 备选数据库 |
|---|---|---|
| 生物通路 | Reactome、KEGG | — |
| 基因/蛋白参与的通路 | Reactome (mapping)、KEGG | — |
| 酶动力学、催化活性 | BRENDA | KEGG |
| 代谢组学研究、代谢物谱 | Metabolomics Workbench | PubChem |
| m/z或精确质量查询 | Metabolomics Workbench (moverz/exactmass) | PubChem |
| 蛋白序列、功能、注释 | UniProt | Ensembl |
| 蛋白-蛋白相互作用 | STRING | BioGRID |
| 基因信息、基因组位置 | NCBI Gene | Ensembl |
| 基因组序列、变异体、转录本 | Ensembl | NCBI Gene |
| 基因表达数据集 | GEO (NCBI E-utilities) | — |
| 跨组织基因表达 | GTEx | Human Protein Atlas |
| 基因表达特征(CMap/L1000) | LINCS L1000 | GEO |
| 基因集富集分析 vs GEO | RummaGEO | GEO |
| 蛋白序列(NCBI) | NCBI Protein | UniProt |
| 分类学分类 | NCBI Taxonomy | — |
| SNP/变异体数据(dbSNP) | dbSNP | ClinVar、gnomAD |
| 群体变异频率 | gnomAD | dbSNP |
| 测序运行元数据 | SRA | ENA、GEO |
| 核苷酸序列(欧洲存档) | ENA | SRA、NCBI Gene |
| 基因组组装、原始测序读段(欧洲) | ENA | SRA、Ensembl |
| 序列登录号的交叉引用 | ENA (xref) | NCBI Gene、UniProt |
| 基因组注释、轨道数据 | UCSC Genome Browser | Ensembl |
| 3D蛋白结构(实验) | PDB (RCSB) | EMDB |
| 3D蛋白结构(预测) | AlphaFold DB | PDB |
| EM图谱、冷冻电镜结构 | EMDB | PDB |
| 蛋白家族、结构域 | InterPro | UniProt |
| 生物相关化学实体 | ChEBI | PubChem |
| 蛋白/基因相互作用 | BioGRID | STRING |
| 基因功能注释(GO术语) | QuickGO | Gene Ontology |
| 调控元件、ChIP-seq、ATAC-seq | ENCODE | — |
| TF结合谱/基序 | JASPAR | ENCODE |
| 跨组织蛋白表达 | Human Protein Atlas | UniProt |
| 单细胞图谱项目 | Human Cell Atlas | — |
| 蛋白质组学数据集 | PRIDE | — |
| 小鼠基因数据 | MouseMine | NCBI Gene |
| 质粒库 | Addgene | — |
物种至关重要。大多数生物学数据库涵盖多个物种。如果用户的查询针对特定物种,请明确传入该物种参数——不要默认是人类。常见规则:Ensembl在URL路径中使用(例如),STRING/BioGRID/QuickGO使用NCBI分类ID(人类为,小鼠为),UniProt在搜索查询中使用,KEGG使用物种代码(人类,小鼠)。GTEx和Human Protein Atlas仅支持人类。请查阅每个数据库的参考文件了解具体参数。
{species}homo_sapiensspecies=960610090organism_id:9606hsammuDisease & Clinical
疾病与临床医学
| User is asking about... | Primary database(s) | Also consider |
|---|---|---|
| Somatic mutations in cancer | COSMIC | Open Targets, cBioPortal |
| Cancer genomics (TCGA) | GDC (TCGA) | COSMIC, cBioPortal |
| Cancer study mutations, CNA, expression | cBioPortal | GDC (TCGA), COSMIC |
| Tumor clinical data (survival, staging) | cBioPortal | GDC (TCGA) |
| Drug-target-disease associations | Open Targets | ChEMBL |
| Gene-disease associations | DisGeNET | Open Targets, Monarch |
| Mendelian disease-gene relationships | OMIM | NCBI Gene |
| Variant clinical significance | ClinVar (NCBI) | OMIM |
| GWAS SNP-trait associations | GWAS Catalog | — |
| Disease-phenotype-gene links | Monarch Initiative | HPO |
| Phenotype ontology, HPO terms | HPO | Monarch |
| Pharmacogenomics, drug-gene interactions | ClinPGx (PharmGKB) | DrugBank |
| Clinical trials for a drug/disease | ClinicalTrials.gov | FDA |
| Disease-related expression data | GEO | Open Targets |
| 用户查询内容 | 首选数据库 | 备选数据库 |
|---|---|---|
| 癌症体细胞突变 | COSMIC | Open Targets、cBioPortal |
| 癌症基因组学(TCGA) | GDC (TCGA) | COSMIC、cBioPortal |
| 癌症研究突变、拷贝数变异、表达 | cBioPortal | GDC (TCGA)、COSMIC |
| 肿瘤临床数据(生存、分期) | cBioPortal | GDC (TCGA) |
| 药物-靶点-疾病关联 | Open Targets | ChEMBL |
| 基因-疾病关联 | DisGeNET | Open Targets、Monarch |
| 孟德尔疾病-基因关系 | OMIM | NCBI Gene |
| 变异体临床意义 | ClinVar (NCBI) | OMIM |
| GWAS SNP-性状关联 | GWAS Catalog | — |
| 疾病-表型-基因关联 | Monarch Initiative | HPO |
| 表型本体、HPO术语 | HPO | Monarch |
| 药物基因组学、药物-基因相互作用 | ClinPGx (PharmGKB) | DrugBank |
| 药物/疾病相关临床试验 | ClinicalTrials.gov | FDA |
| 疾病相关表达数据 | GEO | Open Targets |
Patents & Regulatory
专利与监管
| User is asking about... | Primary database(s) | Also consider |
|---|---|---|
| Patents by keyword or technology | USPTO (PatentsView) | — |
| Patents by inventor or assignee | USPTO (PatentsView) | — |
| Patent prosecution status | USPTO (PEDS) | — |
| Trademark lookup | USPTO (TSDR) | — |
| SEC company filings, 10-K, 10-Q | SEC EDGAR | — |
| 用户查询内容 | 首选数据库 | 备选数据库 |
|---|---|---|
| 按关键词或技术查询专利 | USPTO (PatentsView) | — |
| 按发明人或受让人查询专利 | USPTO (PatentsView) | — |
| 专利审查状态 | USPTO (PEDS) | — |
| 商标查询 | USPTO (TSDR) | — |
| SEC公司文件、10-K、10-Q | SEC EDGAR | — |
Economics & Finance
经济与金融
| User is asking about... | Primary database(s) | Also consider |
|---|---|---|
| US economic time series (GDP, CPI, rates) | FRED | BEA |
| Employment, wages, labor statistics | BLS | FRED |
| GDP, national accounts | BEA | FRED, World Bank |
| International development indicators | World Bank | FRED |
| Interest rates, money supply | Federal Reserve | FRED |
| Euro exchange rates, ECB monetary stats | ECB | — |
| US debt, yield curves, fiscal data | US Treasury | FRED |
| Stock prices, forex, crypto | Alpha Vantage | — |
| Statistical data across many topics | Data Commons | — |
| 用户查询内容 | 首选数据库 | 备选数据库 |
|---|---|---|
| 美国经济时间序列(GDP、CPI、利率) | FRED | BEA |
| 就业、工资、劳工统计 | BLS | FRED |
| GDP、国民账户 | BEA | FRED、World Bank |
| 国际发展指标 | World Bank | FRED |
| 利率、货币供应量 | Federal Reserve | FRED |
| 欧元汇率、ECB货币统计 | ECB | — |
| 美国债务、收益率曲线、财政数据 | US Treasury | FRED |
| 股票价格、外汇、加密货币 | Alpha Vantage | — |
| 多领域统计数据 | Data Commons | — |
Social Sciences & Demographics
社会科学与人口统计学
| User is asking about... | Primary database(s) | Also consider |
|---|---|---|
| US population, housing, income data | US Census | Data Commons |
| EU statistics (economy, trade, health) | Eurostat | World Bank |
| Global health indicators (mortality, disease) | WHO GHO | World Bank |
| 用户查询内容 | 首选数据库 | 备选数据库 |
|---|---|---|
| 美国人口、住房、收入数据 | US Census | Data Commons |
| 欧盟统计数据(经济、贸易、健康) | Eurostat | World Bank |
| 全球健康指标(死亡率、疾病) | WHO GHO | World Bank |
Cross-domain queries
跨领域查询
| User is asking about... | Primary database(s) | Also consider |
|---|---|---|
| Everything about a compound | PubChem + ChEMBL + DrugBank | BindingDB, ZINC, Reactome, FDA |
| Everything about a gene | NCBI Gene + UniProt + Ensembl | Reactome, STRING, COSMIC, cBioPortal, ENA |
| Everything about a variant | dbSNP + ClinVar + gnomAD | GWAS Catalog, COSMIC, cBioPortal |
| Drug target pathways | ChEMBL + Reactome | Open Targets, GEO |
| Prior art for a chemical invention | USPTO + PubChem | ChEMBL |
| Everything about a material | Materials Project + COD | — |
| US economic overview | FRED + BLS + BEA | Federal Reserve |
When the user's query spans multiple domains (e.g. "what do we know about aspirin" or "find everything about BRCA1"), query all relevant databases in parallel.
| 用户查询内容 | 首选数据库 | 备选数据库 |
|---|---|---|
| 某化合物的所有信息 | PubChem + ChEMBL + DrugBank | BindingDB、ZINC、Reactome、FDA |
| 某基因的所有信息 | NCBI Gene + UniProt + Ensembl | Reactome、STRING、COSMIC、cBioPortal、ENA |
| 某变异体的所有信息 | dbSNP + ClinVar + gnomAD | GWAS Catalog、COSMIC、cBioPortal |
| 药物靶点通路 | ChEMBL + Reactome | Open Targets、GEO |
| 化学发明的现有技术 | USPTO + PubChem | ChEMBL |
| 某材料的所有信息 | Materials Project + COD | — |
| 美国经济概况 | FRED + BLS + BEA | Federal Reserve |
当用户的查询跨多个领域时(例如“我们对阿司匹林了解多少”或“查找BRCA1的所有信息”),请并行查询所有相关数据库。
Common Identifier Formats
常见标识符格式
Different databases use different identifier systems. If a query fails, the identifier format may be wrong. Here's a quick reference:
| Identifier | Format | Example | Used by |
|---|---|---|---|
| UniProt accession | | | UniProt, STRING, AlphaFold, Reactome mapping |
| Ensembl gene ID | | | Ensembl, Open Targets, GTEx |
| NCBI Gene ID | Integer | | NCBI Gene, GEO, DisGeNET, HPO |
| HGNC ID | | | Monarch |
| PubChem CID | Integer | | PubChem |
| ZINC ID | | | ZINC |
| ENA Project | | | ENA |
| ENA Run | | | ENA |
| ENA Experiment | | | ENA |
| ENA Sample | | | ENA |
| ChEMBL ID | | | ChEMBL |
| Reactome stable ID | | | Reactome |
| HP term | | | HPO (URL-encode colon as %3A) |
| MONDO disease | | | Monarch |
| GO term | | | QuickGO, Gene Ontology |
| dbSNP rsID | | | dbSNP, GWAS Catalog, gnomAD |
| GENCODE ID | | | GTEx (requires version suffix) |
不同数据库使用不同的标识符系统。如果查询失败,可能是标识符格式错误。以下是快速参考:
| 标识符 | 格式 | 示例 | 使用数据库 |
|---|---|---|---|
| UniProt登录号 | | | UniProt、STRING、AlphaFold、Reactome mapping |
| Ensembl基因ID | | | Ensembl、Open Targets、GTEx |
| NCBI基因ID | 整数 | | NCBI Gene、GEO、DisGeNET、HPO |
| HGNC ID | | | Monarch |
| PubChem CID | 整数 | | PubChem |
| ZINC ID | | | ZINC |
| ENA项目ID | | | ENA |
| ENA运行ID | | | ENA |
| ENA实验ID | | | ENA |
| ENA样本ID | | | ENA |
| ChEMBL ID | | | ChEMBL |
| Reactome稳定ID | | | Reactome |
| HP术语 | | | HPO(需将冒号URL编码为%3A) |
| MONDO疾病ID | | | Monarch |
| GO术语 | | | QuickGO、Gene Ontology |
| dbSNP rsID | | | dbSNP、GWAS Catalog、gnomAD |
| GENCODE ID | | | GTEx(需要版本后缀) |
Identifier Resolution
标识符转换
When a database doesn't recognize an identifier, convert it using these workflows:
Genes: Symbol (e.g. "TP53") → look up in NCBI Gene (esearch by symbol) → get NCBI Gene ID → convert to Ensembl ID via Ensembl , or to UniProt accession via UniProt search ().
/xrefs/symbol/homo_sapiens/{symbol}gene_exact:{symbol} AND organism_id:9606Compounds: Name → PubChem → get CID → convert to ChEMBL ID via UniChem or ChEMBL molecule search. If name lookup fails, try SMILES, InChIKey, or CAS number.
/compound/name/{name}/cids/JSONVariants: rsID (e.g. "rs334") works directly in dbSNP, ClinVar, GWAS Catalog, gnomAD. For genomic coordinates, use Ensembl VEP to get consequence annotations and linked rsIDs.
Diseases: Name → Open Targets or Monarch search → get EFO or MONDO ID → use in downstream queries.
当数据库无法识别某个标识符时,请使用以下流程进行转换:
基因:符号(例如“TP53”)→ 在NCBI Gene中查找(通过符号搜索)→ 获取NCBI基因ID → 通过Ensembl的转换为Ensembl ID,或通过UniProt搜索()转换为UniProt登录号。
/xrefs/symbol/homo_sapiens/{symbol}gene_exact:{symbol} AND organism_id:9606化合物:名称 → PubChem的 → 获取CID → 通过UniChem或ChEMBL分子搜索转换为ChEMBL ID。如果名称查找失败,尝试SMILES、InChIKey或CAS号。
/compound/name/{name}/cids/JSON变异体:rsID(例如“rs334”)可直接在dbSNP、ClinVar、GWAS Catalog、gnomAD中使用。对于基因组坐标,使用Ensembl VEP获取后果注释和关联的rsID。
疾病:名称 → Open Targets或Monarch搜索 → 获取EFO或MONDO ID → 用于后续查询。
POST-Only APIs
仅支持POST的API
These databases require HTTP POST and will not work with WebFetch (GET-only). Use via your platform's shell tool instead:
curl| Database | Why POST needed | Example |
|---|---|---|
| Open Targets | GraphQL endpoint | |
| gnomAD | GraphQL endpoint | |
| RummaGEO | POST-only enrichment | |
| GDC/TCGA | Complex filter queries | |
| SEC EDGAR | Requires User-Agent header | |
以下数据库需要HTTP POST请求,无法使用WebFetch(仅支持GET)。请改用平台的shell工具执行命令:
curl| 数据库 | 需要POST的原因 | 示例 |
|---|---|---|
| Open Targets | GraphQL端点 | |
| gnomAD | GraphQL端点 | |
| RummaGEO | 仅支持POST的富集分析 | |
| GDC/TCGA | 复杂过滤查询 | |
| SEC EDGAR | 需要User-Agent头 | |
API Keys and Access Restrictions
API密钥与访问限制
Some databases require API keys or have access restrictions. When an API key is needed:
- Check the current environment first — the key may already be exported as a shell environment variable (e.g. ). Read it directly from the environment.
$FRED_API_KEY - Fall back to — if the variable isn't in the environment, check the
.envfile in the current working directory..env - If neither has it — proceed without the key (most APIs still work at lower rate limits) and tell the user which key is missing and how to get one.
部分数据库需要API密钥或有访问限制。当需要API密钥时:
- 先检查当前环境 — 密钥可能已作为shell环境变量导出(例如)。直接从环境中读取。
$FRED_API_KEY - ** fallback到.env文件** — 如果环境中没有该变量,请检查当前工作目录下的文件。
.env - 如果都没有 — 无需密钥继续操作(大多数API在低速率限制下仍可使用),并告知用户缺少哪个密钥以及如何获取。
Databases requiring API keys (free registration)
需要API密钥的数据库(免费注册)
| Database | Env Variable | Registration URL |
|---|---|---|
| FRED | | https://fred.stlouisfed.org/docs/api/api_key.html |
| BEA | | https://apps.bea.gov/API/signup/ |
| BLS | | https://data.bls.gov/registrationEngine/ |
| NCBI (GEO, Gene) | | https://www.ncbi.nlm.nih.gov/account/settings/ |
| OpenFDA | | https://open.fda.gov/apis/authentication/ |
| USPTO (PatentsView) | | https://patentsview.org/apis/keyrequest |
| Data Commons | | Google Cloud Console |
| Materials Project | | https://materialsproject.org (free account) |
| NASA | | https://api.nasa.gov (free, DEMO_KEY available) |
| NOAA (CDO) | | https://www.ncdc.noaa.gov/cdo-web/token |
| OpenWeatherMap | | https://openweathermap.org/appid |
| OMIM | | https://omim.org/api (free academic) |
| BioGRID | | https://webservice.thebiogrid.org (free) |
| Alpha Vantage | | https://www.alphavantage.co/support/#api-key |
| US Census | | https://api.census.gov/data/key_signup.html |
| DisGeNET | | https://www.disgenet.org (free academic) |
| Addgene | | https://www.addgene.org (free account) |
| LINCS L1000 (CLUE) | | https://clue.io (free academic) |
These are all free to obtain. The APIs work without keys but have lower rate limits. Always try with a key first — if the env variable isn't set, proceed without the key and note in your response that rate limits may be lower.
所有密钥均可免费获取。无密钥时API仍可使用,但速率限制较低。请优先尝试使用密钥——如果环境变量未设置,无需密钥继续操作,并在响应中注明速率限制可能较低。
Databases with paid or restricted access
付费或受限访问的数据库
| Database | Restriction | Free alternative |
|---|---|---|
| DrugBank | Paid API license required | Use ChEMBL + PubChem + OpenFDA instead |
| COSMIC | Free academic registration required (JWT auth) | Use Open Targets for cancer mutation data |
| BRENDA | Free registration required (SOAP, not REST) | Use KEGG for enzyme/pathway data |
When a database requires paid access or registration the user hasn't set up:
- Fall back to a free alternative that can answer the same question
- Tell the user which database you couldn't access, why, and what you used instead
- If the user specifically requests a restricted database, explain the access requirements so they can set it up
| 数据库 | 限制 | 免费替代方案 |
|---|---|---|
| DrugBank | 需要付费API许可证 | 使用ChEMBL + PubChem + OpenFDA替代 |
| COSMIC | 需要免费学术注册(JWT认证) | 使用Open Targets获取癌症突变数据 |
| BRENDA | 需要免费注册(SOAP,非REST) | 使用KEGG获取酶/通路数据 |
当数据库需要付费访问或用户未完成注册时:
- 改用免费替代方案来回答相同问题
- 告知用户无法访问哪个数据库、原因以及你使用的替代方案
- 如果用户明确要求访问受限数据库,请说明访问要求以便他们进行设置
Loading API keys
加载API密钥
Step 1 — Check the current environment. The key may already be exported as a shell variable. For example, in Claude Code you can check with Bash: . If the variable is set and non-empty, use it.
echo $FRED_API_KEYStep 2 — Check file. If the environment variable isn't set, read from the current working directory. Format:
.env.envFRED_API_KEY=your_key_here
BEA_API_KEY=your_key_hereStep 3 — Proceed without. If neither source has the key, proceed without it (most APIs still work at lower rate limits) and mention this to the user.
步骤1 — 检查当前环境。密钥可能已作为shell变量导出。例如,在Claude Code中可通过Bash检查:。如果变量已设置且非空,请使用它。
echo $FRED_API_KEY步骤2 — 检查.env文件。如果环境变量未设置,请读取当前工作目录下的文件。格式:
.envFRED_API_KEY=your_key_here
BEA_API_KEY=your_key_here步骤3 — 无需密钥继续。如果两个来源都没有密钥,无需密钥继续操作(大多数API在低速率限制下仍可使用),并向用户提及这一点。
Making API Calls
API调用方法
Use your environment's HTTP fetch tool to call REST endpoints. The tool name varies by platform:
| Platform | HTTP Fetch Tool | Fallback |
|---|---|---|
| Claude Code | | |
| Gemini CLI | | |
| Windsurf | | |
| Cursor | No dedicated fetch tool | |
| Codex CLI | No dedicated fetch tool | |
| Cline | No dedicated fetch tool | |
If you don't recognize your platform or the fetch tool fails, fall back to via whatever shell/terminal tool is available. Example:
curlbash
curl -s -H "Accept: application/json" "https://api.example.com/endpoint"使用你所在环境的HTTP获取工具调用REST端点。工具名称因平台而异:
| 平台 | HTTP获取工具 | 备选方案 |
|---|---|---|
| Claude Code | | 通过Bash使用 |
| Gemini CLI | | 通过shell使用 |
| Windsurf | | 通过终端使用 |
| Cursor | 无专用获取工具 | 通过 |
| Codex CLI | 无专用获取工具 | 通过 |
| Cline | 无专用获取工具 | 通过 |
如果你不认识自己的平台或获取工具失败,请改用可用的shell/终端工具执行。示例:
curlbash
curl -s -H "Accept: application/json" "https://api.example.com/endpoint"Request guidelines
请求指南
- Set header where supported
Accept: application/json - URL-encode special characters in query parameters — SMILES strings (,
/,#,=), compound names with parentheses, and ontology terms with colons (@→HP:0001250) are common sources of failures. WithHP%3A0001250, usecurlfor safety.--data-urlencode - Parallel OK: When querying different databases (e.g., PubChem + ChEMBL + Reactome), run them in parallel — most APIs have generous rate limits.
- Serialize requests to rate-limited APIs: NCBI APIs (Gene, GEO, Protein, Taxonomy, dbSNP, SRA) at 3 req/sec without key, 10 with key. Also watch: Ensembl (15 req/sec), BLS v1 (25 req/day without key), SEC EDGAR (10 req/sec), NOAA (5 req/sec with token).
- If you get a rate-limit error (HTTP 429 or 503), wait briefly and retry once
- 支持的情况下设置头
Accept: application/json - 对查询参数中的特殊字符进行URL编码——SMILES字符串(、
/、#、=)、带括号的化合物名称和带冒号的本体术语(@→HP:0001250)是常见的失败原因。使用HP%3A0001250时,为安全起见请使用curl。--data-urlencode - 并行调用可行:查询不同数据库时(例如PubChem + ChEMBL + Reactome),可并行执行——大多数API的速率限制较为宽松。
- 对速率受限的API序列化请求:NCBI API(Gene、GEO、Protein、Taxonomy、dbSNP、SRA)无密钥时为3次/秒,有密钥时为10次/秒。此外注意:Ensembl(15次/秒)、BLS v1(无密钥时25次/天)、SEC EDGAR(10次/秒)、NOAA(有令牌时5次/秒)。
- 如果收到速率限制错误(HTTP 429或503),请稍作等待后重试一次
Error recovery
错误恢复
If an API returns an error or empty results:
- Check the identifier format — use the Common Identifier Formats table above. A gene symbol may need to be converted to NCBI Gene ID or Ensembl ID first.
- Try alternative identifiers — if a compound name fails in PubChem, try SMILES, InChIKey, or CID. If a gene symbol fails, try the NCBI Gene ID.
- Try a different database — if one database is down or returns nothing, check the "Also consider" column in the selection guide for alternatives.
- Report the failure — tell the user which database failed, the error, and what you tried instead.
如果API返回错误或空结果:
- 检查标识符格式 — 使用上方的常见标识符格式表。基因符号可能需要先转换为NCBI基因ID或Ensembl ID。
- 尝试替代标识符 — 如果化合物名称在PubChem中查找失败,尝试SMILES、InChIKey或CID。如果基因符号失败,尝试NCBI基因ID。
- 尝试其他数据库 — 如果某个数据库宕机或无结果,请查看选择指南中的“备选数据库”列寻找替代方案。
- 报告失败情况 — 告知用户哪个数据库失败、错误信息以及你尝试的替代方案。
Pagination
分页
Many APIs return paginated results — if you only read the first page, you may miss data. Common patterns:
- Offset/Limit: → increment offset by limit for the next page (ChEMBL, FRED, NOAA, USGS, NCBI E-utilities, ENA, GDC, FDA)
offset=0&limit=100 - Cursor-based: Response includes a or
nextPageTokenvalue — pass it in the next request (ClinicalTrials.gov, UniProt)cursor - Page number: → increment page (World Bank, cBioPortal, ZINC)
page=1&per_page=50
Check the reference file for each database's specific pagination parameters. If a response includes , , or and the number of returned results is less than the total, there are more pages.
totaltotalCountnextFor targeted lookups (single gene, single compound), the first page is usually sufficient. Paginate when the user needs comprehensive results (e.g., "all clinical trials for X" or "all known variants in gene Y").
许多API返回分页结果——如果你只读取第一页,可能会遗漏数据。常见模式:
- 偏移量/限制:→ 每次将偏移量增加限制值获取下一页(ChEMBL、FRED、NOAA、USGS、NCBI E-utilities、ENA、GDC、FDA)
offset=0&limit=100 - 基于游标:响应包含或
nextPageToken值——将其传入下一次请求(ClinicalTrials.gov、UniProt)cursor - 页码:→ 递增页码(World Bank、cBioPortal、ZINC)
page=1&per_page=50
请查阅每个数据库的参考文件了解具体分页参数。如果响应包含、或,且返回结果数小于总数,则存在更多页面。
totaltotalCountnext对于针对性查询(单个基因、单个化合物),第一页通常足够。当用户需要全面结果时(例如“X的所有临床试验”或“基因Y的所有已知变异体”),请进行分页查询。
Output Format
输出格式
Structure your response like this:
undefined请按以下结构组织响应:
undefinedDatabases Queried
已查询数据库
- PubChem — /compound/name/aspirin/property/...
- Reactome — /search/query?query=aspirin
- PubChem — /compound/name/aspirin/property/...
- Reactome — /search/query?query=aspirin
Results
结果
PubChem
PubChem
[raw JSON response]
[原始JSON响应]
Reactome
Reactome
[raw JSON response]
If results are very large, present the most relevant portion and note that additional data is available. But default to showing the full raw JSON — the user asked for it.[原始JSON响应]
如果结果非常大,请展示最相关的部分并注明还有更多数据可用。但默认情况下请展示完整的原始JSON——用户要求获取原始结果。Adding New Databases
添加新数据库
This skill is designed to grow. Each database is a self-contained reference file in . To add a new database:
references/- Create following the same format as existing files
references/<database-name>.md - Add an entry to the database selection guide above
- The reference file should include: base URL, key endpoints, query parameter formats, example calls, rate limits, and response structure
本工具支持扩展。每个数据库对应目录下一个独立的参考文件。添加新数据库的步骤:
references/- 按照现有文件格式创建
references/<database-name>.md - 在上方的数据库选择指南中添加条目
- 参考文件应包含:基础URL、关键端点、查询参数格式、调用示例、速率限制和响应结构
Available Databases
可用数据库
Read the relevant reference file before making any API call.
调用任何API前请阅读相关参考文件。
Physics & Astronomy
物理与天文学
| Database | Reference File | What it covers |
|---|---|---|
| NASA | | NEO asteroids, Mars rover, APOD |
| NASA Exoplanet Archive | | Exoplanets, orbital parameters |
| NIST | | Physical constants, atomic spectra |
| SDSS | | Galaxy/star spectra, photometry |
| SIMBAD | | Astronomical object catalog |
| 数据库 | 参考文件 | 涵盖内容 |
|---|---|---|
| NASA | | NEO小行星、火星探测器、APOD |
| NASA Exoplanet Archive | | 系外行星、轨道参数 |
| NIST | | 物理常数、原子光谱 |
| SDSS | | 星系/恒星光谱、光度测量 |
| SIMBAD | | 天文对象目录 |
Earth & Environmental Sciences
地球与环境科学
| Database | Reference File | What it covers |
|---|---|---|
| USGS | | Earthquakes, water data |
| NOAA | | Climate, weather station data |
| EPA | | Air quality, toxic releases |
| OpenWeatherMap | | Weather current/forecast |
| 数据库 | 参考文件 | 涵盖内容 |
|---|---|---|
| USGS | | 地震、水文数据 |
| NOAA | | 气候、气象站数据 |
| EPA | | 空气质量、有毒物质排放 |
| OpenWeatherMap | | 实时/预报天气 |
Chemistry & Drugs
化学与药物学
| Database | Reference File | What it covers |
|---|---|---|
| PubChem | | Compounds, properties, synonyms |
| ChEMBL | | Bioactivity, drug discovery |
| DrugBank | | Drug data, interactions (paid) |
| FDA (OpenFDA) | | Drug labels, adverse events, recalls |
| DailyMed | | Drug labels (NIH/NLM) |
| KEGG | | Pathways, genes, compounds |
| ChEBI | | Chemical entities of biological interest |
| ZINC | | Commercially available compounds, virtual screening |
| BindingDB | | Experimentally measured binding affinities |
| 数据库 | 参考文件 | 涵盖内容 |
|---|---|---|
| PubChem | | 化合物、属性、同义词 |
| ChEMBL | | 生物活性、药物发现 |
| DrugBank | | 药物数据、相互作用(付费) |
| FDA (OpenFDA) | | 药物标签、不良反应、召回信息 |
| DailyMed | | 药物标签(NIH/NLM) |
| KEGG | | 通路、基因、化合物 |
| ChEBI | | 生物相关化学实体 |
| ZINC | | 可商业购买化合物、虚拟筛选 |
| BindingDB | | 实验测量的结合亲和力 |
Materials Science
材料科学
| Database | Reference File | What it covers |
|---|---|---|
| Materials Project | | Band gaps, elastic properties, crystal structures |
| COD | | Crystal structures, CIF files |
| 数据库 | 参考文件 | 涵盖内容 |
|---|---|---|
| Materials Project | | 带隙、弹性性能、晶体结构 |
| COD | | 晶体结构、CIF文件 |
Biology & Genomics
生物学与基因组学
| Database | Reference File | What it covers |
|---|---|---|
| Reactome | | Biological pathways, reactions |
| BRENDA | | Enzyme kinetics, catalysis (SOAP) |
| UniProt | | Protein sequences, function |
| STRING | | Protein-protein interactions |
| Ensembl | | Genomes, variants, sequences |
| NCBI Gene | | Gene information, links |
| NCBI Protein | | Protein sequences, records |
| NCBI Taxonomy | | Taxonomic classification |
| GEO (NCBI) | | Gene expression datasets |
| GTEx | | Gene expression across tissues |
| PDB | | Protein 3D structures |
| AlphaFold DB | | Predicted protein structures |
| EMDB | | Electron microscopy maps |
| InterPro | | Protein families, domains |
| BioGRID | | Protein/genetic interactions |
| Gene Ontology | | GO terms, gene annotations |
| QuickGO | | GO annotations (EBI, recommended) |
| dbSNP | | SNP/variant data |
| SRA | | Sequencing run metadata |
| gnomAD | | Population variant frequencies (POST) |
| UCSC Genome Browser | | Genome annotations, tracks |
| ENCODE | | DNA elements, ChIP-seq, ATAC-seq |
| JASPAR | | TF binding profiles/motifs |
| Human Protein Atlas | | Protein expression across tissues |
| Human Cell Atlas | | Single-cell atlas data |
| LINCS L1000 | | Gene expression signatures (CMap) |
| RummaGEO | | GEO gene set enrichment (POST) |
| PRIDE | | Proteomics data repository |
| Metabolomics Workbench | | Metabolomics studies, metabolites |
| MouseMine | | Mouse genome informatics |
| ENA | | Nucleotide sequences, reads, assemblies, taxonomy (EMBL-EBI) |
| Addgene | | Plasmid repository |
| 数据库 | 参考文件 | 涵盖内容 |
|---|---|---|
| Reactome | | 生物通路、反应 |
| BRENDA | | 酶动力学、催化(SOAP) |
| UniProt | | 蛋白序列、功能 |
| STRING | | 蛋白-蛋白相互作用 |
| Ensembl | | 基因组、变异体、序列 |
| NCBI Gene | | 基因信息、链接 |
| NCBI Protein | | 蛋白序列、记录 |
| NCBI Taxonomy | | 分类学分类 |
| GEO (NCBI) | | 基因表达数据集 |
| GTEx | | 跨组织基因表达 |
| PDB | | 蛋白3D结构 |
| AlphaFold DB | | 预测蛋白结构 |
| EMDB | | 电子显微镜图谱 |
| InterPro | | 蛋白家族、结构域 |
| BioGRID | | 蛋白/基因相互作用 |
| Gene Ontology | | GO术语、基因注释 |
| QuickGO | | GO注释(EBI,推荐) |
| dbSNP | | SNP/变异体数据 |
| SRA | | 测序运行元数据 |
| gnomAD | | 群体变异频率(POST) |
| UCSC Genome Browser | | 基因组注释、轨道数据 |
| ENCODE | | DNA元件、ChIP-seq、ATAC-seq |
| JASPAR | | TF结合谱/基序 |
| Human Protein Atlas | | 跨组织蛋白表达 |
| Human Cell Atlas | | 单细胞图谱数据 |
| LINCS L1000 | | 基因表达特征(CMap) |
| RummaGEO | | GEO基因集富集分析(POST) |
| PRIDE | | 蛋白质组学数据仓库 |
| Metabolomics Workbench | | 代谢组学研究、代谢物 |
| MouseMine | | 小鼠基因组信息学 |
| ENA | | 核苷酸序列、读段、组装、分类学(EMBL-EBI) |
| Addgene | | 质粒库 |
Disease & Clinical
疾病与临床医学
| Database | Reference File | What it covers |
|---|---|---|
| Open Targets | | Target-disease associations (POST) |
| COSMIC | | Somatic mutations in cancer |
| ClinPGx (PharmGKB) | | Pharmacogenomics |
| ClinicalTrials.gov | | Clinical trial registry |
| OMIM | | Mendelian disease-gene data |
| ClinVar | | Variant clinical significance |
| GDC (TCGA) | | Cancer genomics, mutations (POST) |
| cBioPortal | | Cancer study mutations, CNA, expression, clinical data |
| DisGeNET | | Gene-disease associations |
| GWAS Catalog | | GWAS SNP-trait associations |
| Monarch Initiative | | Disease-phenotype-gene links |
| HPO | | Human Phenotype Ontology |
| 数据库 | 参考文件 | 涵盖内容 |
|---|---|---|
| Open Targets | | 靶点-疾病关联(POST) |
| COSMIC | | 癌症体细胞突变 |
| ClinPGx (PharmGKB) | | 药物基因组学 |
| ClinicalTrials.gov | | 临床试验注册 |
| OMIM | | 孟德尔疾病-基因数据 |
| ClinVar | | 变异体临床意义 |
| GDC (TCGA) | | 癌症基因组学、突变(POST) |
| cBioPortal | | 癌症研究突变、拷贝数变异、表达、临床数据 |
| DisGeNET | | 基因-疾病关联 |
| GWAS Catalog | | GWAS SNP-性状关联 |
| Monarch Initiative | | 疾病-表型-基因关联 |
| HPO | | 人类表型本体 |
Patents & Regulatory
专利与监管
| Database | Reference File | What it covers |
|---|---|---|
| USPTO | | Patents, trademarks |
| SEC EDGAR | | Company filings (needs User-Agent header) |
| 数据库 | 参考文件 | 涵盖内容 |
|---|---|---|
| USPTO | | 专利、商标 |
| SEC EDGAR | | 公司文件(需要User-Agent头) |
Economics & Finance
经济与金融
| Database | Reference File | What it covers |
|---|---|---|
| FRED | | US economic time series |
| Federal Reserve | | Monetary/financial data |
| BEA | | GDP, national accounts |
| BLS | | Employment, wages, CPI |
| World Bank | | Development indicators |
| ECB | | Euro exchange rates, monetary stats |
| US Treasury | | Debt, yield curves, fiscal data |
| Alpha Vantage | | Stocks, forex, crypto |
| Data Commons | | Statistical knowledge graph |
| 数据库 | 参考文件 | 涵盖内容 |
|---|---|---|
| FRED | | 美国经济时间序列 |
| Federal Reserve | | 货币/金融数据 |
| BEA | | GDP、国民账户 |
| BLS | | 就业、工资、CPI |
| World Bank | | 发展指标 |
| ECB | | 欧元汇率、货币统计 |
| US Treasury | | 债务、收益率曲线、财政数据 |
| Alpha Vantage | | 股票、外汇、加密货币 |
| Data Commons | | 统计知识图谱 |
Social Sciences & Demographics
社会科学与人口统计学
| Database | Reference File | What it covers |
|---|---|---|
| US Census | | Population, housing, economic surveys |
| Eurostat | | EU statistics |
| WHO GHO | | Global health indicators |
| 数据库 | 参考文件 | 涵盖内容 |
|---|---|---|
| US Census | | 人口、住房、经济调查 |
| Eurostat | | 欧盟统计数据 |
| WHO GHO | | 全球健康指标 |