primekg

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PrimeKG Knowledge Graph Skill

PrimeKG知识图谱技能

Overview

概述

PrimeKG is a precision medicine knowledge graph that integrates over 20 primary databases and high-quality scientific literature into a single resource. It contains over 100,000 nodes and 4 million edges across 29 relationship types, including drug-target, disease-gene, and phenotype-disease associations.
Key capabilities:
  • Search for nodes (genes, proteins, drugs, diseases, phenotypes)
  • Retrieve direct neighbors (associated entities and clinical evidence)
  • Analyze local disease context (related genes, drugs, phenotypes)
  • Identify drug-disease paths (potential repurposing opportunities)
Data access: Programmatic access via
query_primekg.py
. Data is stored at
C:\Users\eamon\Documents\Data\PrimeKG\kg.csv
.
PrimeKG是一个精准医学知识图谱,它将20余个原始数据库和高质量科学文献整合为单一资源。该图谱包含超过10万个节点和400万条边,涵盖29种关系类型,包括药物-靶点、疾病-基因、表型-疾病关联等。
核心功能:
  • 搜索节点(基因、蛋白质、药物、疾病、表型)
  • 获取直接关联节点(相关实体及临床证据)
  • 分析局部疾病背景(相关基因、药物、表型)
  • 识别药物-疾病路径(潜在药物重定位机会)
数据访问: 通过
query_primekg.py
进行程序化访问。数据存储于
C:\Users\eamon\Documents\Data\PrimeKG\kg.csv

When to Use This Skill

适用场景

This skill should be used when:
  • Knowledge-based drug discovery: Identifying targets and mechanisms for diseases.
  • Drug repurposing: Finding existing drugs that might have evidence for new indications.
  • Phenotype analysis: Understanding how symptoms/phenotypes relate to diseases and genes.
  • Multiscale biology: Bridging the gap between molecular targets (genes) and clinical outcomes (diseases).
  • Network pharmacology: Investigating the broader network effects of drug-target interactions.
当您有以下需求时,可使用本技能:
  • 基于知识的药物发现: 确定疾病的靶点和作用机制。
  • 药物重定位: 寻找可能对新适应症有效的现有药物。
  • 表型分析: 理解症状/表型与疾病、基因之间的关联。
  • 多尺度生物学研究: 搭建分子靶点(基因)与临床结局(疾病)之间的桥梁。
  • 网络药理学研究: 探究药物-靶点相互作用的广泛网络效应。

Core Workflow

核心工作流程

1. Search for Entities

1. 搜索实体

Find identifiers for genes, drugs, or diseases.
python
from scripts.query_primekg import search_nodes
查找基因、药物或疾病的标识符。
python
from scripts.query_primekg import search_nodes

Search for Alzheimer's disease nodes

搜索阿尔茨海默病节点

results = search_nodes("Alzheimer", node_type="disease")
results = search_nodes("Alzheimer", node_type="disease")

Returns: [{"id": "EFO_0000249", "type": "disease", "name": "Alzheimer's disease", ...}]

返回结果: [{"id": "EFO_0000249", "type": "disease", "name": "Alzheimer's disease", ...}]

undefined
undefined

2. Get Neighbors (Direct Associations)

2. 获取关联节点(直接关联)

Retrieve all connected nodes and relationship types.
python
from scripts.query_primekg import get_neighbors
检索所有相连节点及关系类型。
python
from scripts.query_primekg import get_neighbors

Get all neighbors of a specific disease ID

获取特定疾病ID的所有关联节点

neighbors = get_neighbors("EFO_0000249")
neighbors = get_neighbors("EFO_0000249")

Returns: List of neighbors like {"neighbor_name": "APOE", "relation": "disease_gene", ...}

返回结果: 关联节点列表,格式如 {"neighbor_name": "APOE", "relation": "disease_gene", ...}

undefined
undefined

3. Analyze Disease Context

3. 分析疾病背景

A high-level function to summarize associations for a disease.
python
from scripts.query_primekg import get_disease_context
用于总结疾病关联信息的高阶函数。
python
from scripts.query_primekg import get_disease_context

Comprehensive summary for a disease

获取某疾病的综合总结

context = get_disease_context("Alzheimer's disease")
context = get_disease_context("Alzheimer's disease")

Access: context['associated_genes'], context['associated_drugs'], context['phenotypes']

可访问: context['associated_genes'], context['associated_drugs'], context['phenotypes']

undefined
undefined

Relationship Types in PrimeKG

PrimeKG中的关系类型

The graph contains several key relationship types including:
  • protein_protein
    : Physical PPIs
  • drug_protein
    : Drug target/mechanism associations
  • disease_gene
    : Genetic associations
  • drug_disease
    : Indications and contraindications
  • disease_phenotype
    : Clinical signs and symptoms
  • gwas
    : Genome-wide association studies evidence
该图谱包含多种关键关系类型,包括:
  • protein_protein
    : 物理蛋白质相互作用(PPI)
  • drug_protein
    : 药物靶点/作用机制关联
  • disease_gene
    : 遗传关联
  • drug_disease
    : 适应症与禁忌症
  • disease_phenotype
    : 临床体征与症状
  • gwas
    : 全基因组关联研究证据

Best Practices

最佳实践

  1. Use specific IDs: When using
    get_neighbors
    , ensure you have the correct ID from
    search_nodes
    .
  2. Context first: Use
    get_disease_context
    for a broad overview before diving into specific genes or drugs.
  3. Filter relationships: Use the
    relation_type
    filter in
    get_neighbors
    to focus on specific evidence (e.g., only
    drug_protein
    ).
  4. Multiscale integration: Combine with
    OpenTargets
    for deeper genetic evidence or
    Semantic Scholar
    for the latest literature context.
  1. 使用特定ID: 使用
    get_neighbors
    时,请确保您拥有从
    search_nodes
    获取的正确ID。
  2. 先看全局背景: 在深入研究特定基因或药物之前,先使用
    get_disease_context
    获取概览信息。
  3. 过滤关系类型:
    get_neighbors
    中使用
    relation_type
    过滤器,聚焦特定证据(例如仅查看
    drug_protein
    )。
  4. 多尺度整合: 结合
    OpenTargets
    获取更深入的遗传证据,或结合
    Semantic Scholar
    获取最新文献背景。

Resources

资源

Scripts

脚本

  • scripts/query_primekg.py
    : Core functions for searching and querying the knowledge graph.
  • scripts/query_primekg.py
    : 用于搜索和查询知识图谱的核心函数。

Data Path

数据路径

  • Data:
    /mnt/c/Users/eamon/Documents/Data/PrimeKG/kg.csv
  • Total nodes: ~129,000
  • Total edges: ~4,000,000
  • Database: CSV-based, optimized for pandas querying.
  • 数据:
    /mnt/c/Users/eamon/Documents/Data/PrimeKG/kg.csv
  • 总节点数: ~129,000
  • 总边数: ~4,000,000
  • 数据库: 基于CSV格式,针对pandas查询优化。