torchdrug
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTorchDrug
TorchDrug
Overview
概述
TorchDrug is a comprehensive PyTorch-based machine learning toolbox for drug discovery and molecular science. Apply graph neural networks, pre-trained models, and task definitions to molecules, proteins, and biological knowledge graphs, including molecular property prediction, protein modeling, knowledge graph reasoning, molecular generation, retrosynthesis planning, with 40+ curated datasets and 20+ model architectures.
TorchDrug是一个基于PyTorch的综合性机器学习工具包,用于药物发现和分子科学研究。可将图神经网络、预训练模型和任务定义应用于分子、蛋白质和生物医学知识图谱,涵盖分子属性预测、蛋白质建模、知识图谱推理、分子生成、逆合成规划等场景,内置40+精选数据集和20+模型架构。
When to Use This Skill
何时使用TorchDrug
This skill should be used when working with:
Data Types:
- SMILES strings or molecular structures
- Protein sequences or 3D structures (PDB files)
- Chemical reactions and retrosynthesis
- Biomedical knowledge graphs
- Drug discovery datasets
Tasks:
- Predicting molecular properties (solubility, toxicity, activity)
- Protein function or structure prediction
- Drug-target binding prediction
- Generating new molecular structures
- Planning chemical synthesis routes
- Link prediction in biomedical knowledge bases
- Training graph neural networks on scientific data
Libraries and Integration:
- TorchDrug is the primary library
- Often used with RDKit for cheminformatics
- Compatible with PyTorch and PyTorch Lightning
- Integrates with AlphaFold and ESM for proteins
在处理以下场景时可使用TorchDrug:
数据类型:
- SMILES字符串或分子结构
- 蛋白质序列或3D结构(PDB文件)
- 化学反应与逆合成
- 生物医学知识图谱
- 药物发现数据集
任务类型:
- 分子属性预测(溶解度、毒性、活性)
- 蛋白质功能或结构预测
- 药物-靶点结合预测
- 新型分子结构生成
- 化学合成路线规划
- 生物医学知识库中的链接预测
- 在科学数据上训练图神经网络
库与集成:
- TorchDrug为核心库
- 常与RDKit联用进行 cheminformatics 研究
- 兼容PyTorch和PyTorch Lightning
- 可与AlphaFold和ESM集成处理蛋白质相关任务
Getting Started
快速开始
Installation
安装
bash
uv pip install torchdrugbash
uv pip install torchdrugOr with optional dependencies
或安装可选依赖
uv pip install torchdrug[full]
undefineduv pip install torchdrug[full]
undefinedQuick Example
快速示例
python
from torchdrug import datasets, models, tasks
from torch.utils.data import DataLoaderpython
from torchdrug import datasets, models, tasks
from torch.utils.data import DataLoaderLoad molecular dataset
加载分子数据集
dataset = datasets.BBBP("~/molecule-datasets/")
train_set, valid_set, test_set = dataset.split()
dataset = datasets.BBBP("~/molecule-datasets/")
train_set, valid_set, test_set = dataset.split()
Define GNN model
定义GNN模型
model = models.GIN(
input_dim=dataset.node_feature_dim,
hidden_dims=[256, 256, 256],
edge_input_dim=dataset.edge_feature_dim,
batch_norm=True,
readout="mean"
)
model = models.GIN(
input_dim=dataset.node_feature_dim,
hidden_dims=[256, 256, 256],
edge_input_dim=dataset.edge_feature_dim,
batch_norm=True,
readout="mean"
)
Create property prediction task
创建属性预测任务
task = tasks.PropertyPrediction(
model,
task=dataset.tasks,
criterion="bce",
metric=["auroc", "auprc"]
)
task = tasks.PropertyPrediction(
model,
task=dataset.tasks,
criterion="bce",
metric=["auroc", "auprc"]
)
Train with PyTorch
使用PyTorch训练
optimizer = torch.optim.Adam(task.parameters(), lr=1e-3)
train_loader = DataLoader(train_set, batch_size=32, shuffle=True)
for epoch in range(100):
for batch in train_loader:
loss = task(batch)
optimizer.zero_grad()
loss.backward()
optimizer.step()
undefinedoptimizer = torch.optim.Adam(task.parameters(), lr=1e-3)
train_loader = DataLoader(train_set, batch_size=32, shuffle=True)
for epoch in range(100):
for batch in train_loader:
loss = task(batch)
optimizer.zero_grad()
loss.backward()
optimizer.step()
undefinedCore Capabilities
核心功能
1. Molecular Property Prediction
1. 分子属性预测
Predict chemical, physical, and biological properties of molecules from structure.
Use Cases:
- Drug-likeness and ADMET properties
- Toxicity screening
- Quantum chemistry properties
- Binding affinity prediction
Key Components:
- 20+ molecular datasets (BBBP, HIV, Tox21, QM9, etc.)
- GNN models (GIN, GAT, SchNet)
- PropertyPrediction and MultipleBinaryClassification tasks
Reference: See for:
references/molecular_property_prediction.md- Complete dataset catalog
- Model selection guide
- Training workflows and best practices
- Feature engineering details
根据结构预测分子的化学、物理和生物属性。
适用场景:
- 药物相似性与ADMET属性
- 毒性筛选
- 量子化学属性
- 结合亲和力预测
核心组件:
- 20+分子数据集(BBBP、HIV、Tox21、QM9等)
- GNN模型(GIN、GAT、SchNet)
- PropertyPrediction和MultipleBinaryClassification任务
参考文档: 查看获取:
references/molecular_property_prediction.md- 完整数据集目录
- 模型选择指南
- 训练流程与最佳实践
- 特征工程细节
2. Protein Modeling
2. 蛋白质建模
Work with protein sequences, structures, and properties.
Use Cases:
- Enzyme function prediction
- Protein stability and solubility
- Subcellular localization
- Protein-protein interactions
- Structure prediction
Key Components:
- 15+ protein datasets (EnzymeCommission, GeneOntology, PDBBind, etc.)
- Sequence models (ESM, ProteinBERT, ProteinLSTM)
- Structure models (GearNet, SchNet)
- Multiple task types for different prediction levels
Reference: See for:
references/protein_modeling.md- Protein-specific datasets
- Sequence vs structure models
- Pre-training strategies
- Integration with AlphaFold and ESM
处理蛋白质序列、结构和属性相关任务。
适用场景:
- 酶功能预测
- 蛋白质稳定性与溶解度
- 亚细胞定位
- 蛋白质-蛋白质相互作用
- 结构预测
核心组件:
- 15+蛋白质数据集(EnzymeCommission、GeneOntology、PDBBind等)
- 序列模型(ESM、ProteinBERT、ProteinLSTM)
- 结构模型(GearNet、SchNet)
- 适用于不同预测层级的多种任务类型
参考文档: 查看获取:
references/protein_modeling.md- 蛋白质专用数据集
- 序列模型vs结构模型对比
- 预训练策略
- 与AlphaFold和ESM的集成方法
3. Knowledge Graph Reasoning
3. 知识图谱推理
Predict missing links and relationships in biological knowledge graphs.
Use Cases:
- Drug repurposing
- Disease mechanism discovery
- Gene-disease associations
- Multi-hop biomedical reasoning
Key Components:
- General KGs (FB15k, WN18) and biomedical (Hetionet)
- Embedding models (TransE, RotatE, ComplEx)
- KnowledgeGraphCompletion task
Reference: See for:
references/knowledge_graphs.md- Knowledge graph datasets (including Hetionet with 45k biomedical entities)
- Embedding model comparison
- Evaluation metrics and protocols
- Biomedical applications
预测生物医学知识图谱中缺失的链接和关系。
适用场景:
- 药物重定位
- 疾病机制发现
- 基因-疾病关联
- 多跳生物医学推理
核心组件:
- 通用知识图谱(FB15k、WN18)和生物医学知识图谱(Hetionet)
- 嵌入模型(TransE、RotatE、ComplEx)
- KnowledgeGraphCompletion任务
参考文档: 查看获取:
references/knowledge_graphs.md- 知识图谱数据集(包含拥有45k生物医学实体的Hetionet)
- 嵌入模型对比
- 评估指标与协议
- 生物医学应用案例
4. Molecular Generation
4. 分子生成
Generate novel molecular structures with desired properties.
Use Cases:
- De novo drug design
- Lead optimization
- Chemical space exploration
- Property-guided generation
Key Components:
- Autoregressive generation
- GCPN (policy-based generation)
- GraphAutoregressiveFlow
- Property optimization workflows
Reference: See for:
references/molecular_generation.md- Generation strategies (unconditional, conditional, scaffold-based)
- Multi-objective optimization
- Validation and filtering
- Integration with property prediction
生成具有目标属性的新型分子结构。
适用场景:
- 从头药物设计
- 先导化合物优化
- 化学空间探索
- 属性引导的分子生成
核心组件:
- 自回归生成
- GCPN(基于策略的生成)
- GraphAutoregressiveFlow
- 属性优化流程
参考文档: 查看获取:
references/molecular_generation.md- 生成策略(无条件、条件、基于骨架)
- 多目标优化
- 验证与过滤
- 与属性预测的集成
5. Retrosynthesis
5. 逆合成
Predict synthetic routes from target molecules to starting materials.
Use Cases:
- Synthesis planning
- Route optimization
- Synthetic accessibility assessment
- Multi-step planning
Key Components:
- USPTO-50k reaction dataset
- CenterIdentification (reaction center prediction)
- SynthonCompletion (reactant prediction)
- End-to-end Retrosynthesis pipeline
Reference: See for:
references/retrosynthesis.md- Task decomposition (center ID → synthon completion)
- Multi-step synthesis planning
- Commercial availability checking
- Integration with other retrosynthesis tools
预测从目标分子到起始原料的合成路线。
适用场景:
- 合成规划
- 路线优化
- 合成可及性评估
- 多步规划
核心组件:
- USPTO-50k反应数据集
- CenterIdentification(反应中心预测)
- SynthonCompletion(反应物预测)
- 端到端逆合成流水线
参考文档: 查看获取:
references/retrosynthesis.md- 任务分解(中心识别→合成子补全)
- 多步合成规划
- 商业可得性检查
- 与其他逆合成工具的集成
6. Graph Neural Network Models
6. 图神经网络模型
Comprehensive catalog of GNN architectures for different data types and tasks.
Available Models:
- General GNNs: GCN, GAT, GIN, RGCN, MPNN
- 3D-aware: SchNet, GearNet
- Protein-specific: ESM, ProteinBERT, GearNet
- Knowledge graph: TransE, RotatE, ComplEx, SimplE
- Generative: GraphAutoregressiveFlow
Reference: See for:
references/models_architectures.md- Detailed model descriptions
- Model selection guide by task and dataset
- Architecture comparisons
- Implementation tips
适用于不同数据类型和任务的GNN架构全集。
可用模型:
- 通用GNN:GCN、GAT、GIN、RGCN、MPNN
- 3D感知模型:SchNet、GearNet
- 蛋白质专用模型:ESM、ProteinBERT、GearNet
- 知识图谱模型:TransE、RotatE、ComplEx、SimplE
- 生成式模型:GraphAutoregressiveFlow
参考文档: 查看获取:
references/models_architectures.md- 详细模型描述
- 按任务和数据集分类的模型选择指南
- 架构对比
- 实现技巧
7. Datasets
7. 数据集
40+ curated datasets spanning chemistry, biology, and knowledge graphs.
Categories:
- Molecular properties (drug discovery, quantum chemistry)
- Protein properties (function, structure, interactions)
- Knowledge graphs (general and biomedical)
- Retrosynthesis reactions
Reference: See for:
references/datasets.md- Complete dataset catalog with sizes and tasks
- Dataset selection guide
- Loading and preprocessing
- Splitting strategies (random, scaffold)
40+精选数据集,涵盖化学、生物学和知识图谱领域。
分类:
- 分子属性(药物发现、量子化学)
- 蛋白质属性(功能、结构、相互作用)
- 知识图谱(通用与生物医学)
- 逆合成反应
参考文档: 查看获取:
references/datasets.md- 完整数据集目录(包含规模和对应任务)
- 数据集选择指南
- 加载与预处理方法
- 拆分策略(随机拆分、骨架拆分)
Common Workflows
常见工作流
Workflow 1: Molecular Property Prediction
工作流1:分子属性预测
Scenario: Predict blood-brain barrier penetration for drug candidates.
Steps:
- Load dataset:
datasets.BBBP() - Choose model: GIN for molecular graphs
- Define task: with binary classification
PropertyPrediction - Train with scaffold split for realistic evaluation
- Evaluate using AUROC and AUPRC
Navigation: → Dataset selection → Model selection → Training
references/molecular_property_prediction.md场景: 预测候选药物的血脑屏障穿透性。
步骤:
- 加载数据集:
datasets.BBBP() - 选择模型:用于分子图的GIN
- 定义任务:用于二分类的
PropertyPrediction - 使用骨架拆分进行真实场景下的训练
- 用AUROC和AUPRC评估模型
导航: → 数据集选择 → 模型选择 → 训练
references/molecular_property_prediction.mdWorkflow 2: Protein Function Prediction
工作流2:蛋白质功能预测
Scenario: Predict enzyme function from sequence.
Steps:
- Load dataset:
datasets.EnzymeCommission() - Choose model: ESM (pre-trained) or GearNet (with structure)
- Define task: with multi-class classification
PropertyPrediction - Fine-tune pre-trained model or train from scratch
- Evaluate using accuracy and per-class metrics
Navigation: → Model selection (sequence vs structure) → Pre-training strategies
references/protein_modeling.md场景: 根据序列预测酶功能。
步骤:
- 加载数据集:
datasets.EnzymeCommission() - 选择模型:ESM(预训练)或GearNet(基于结构)
- 定义任务:用于多分类的
PropertyPrediction - 微调预训练模型或从头训练
- 用准确率和各类别指标评估模型
导航: → 模型选择(序列vs结构) → 预训练策略
references/protein_modeling.mdWorkflow 3: Drug Repurposing via Knowledge Graphs
工作流3:基于知识图谱的药物重定位
Scenario: Find new disease treatments in Hetionet.
Steps:
- Load dataset:
datasets.Hetionet() - Choose model: RotatE or ComplEx
- Define task:
KnowledgeGraphCompletion - Train with negative sampling
- Query for "Compound-treats-Disease" predictions
- Filter by plausibility and mechanism
Navigation: → Hetionet dataset → Model selection → Biomedical applications
references/knowledge_graphs.md场景: 在Hetionet中寻找新的疾病治疗方案。
步骤:
- 加载数据集:
datasets.Hetionet() - 选择模型:RotatE或ComplEx
- 定义任务:
KnowledgeGraphCompletion - 用负采样训练模型
- 查询“化合物-治疗-疾病”预测结果
- 根据合理性和作用机制过滤结果
导航: → Hetionet数据集 → 模型选择 → 生物医学应用
references/knowledge_graphs.mdWorkflow 4: De Novo Molecule Generation
工作流4:从头分子生成
Scenario: Generate drug-like molecules optimized for target binding.
Steps:
- Train property predictor on activity data
- Choose generation approach: GCPN for RL-based optimization
- Define reward function combining affinity, drug-likeness, synthesizability
- Generate candidates with property constraints
- Validate chemistry and filter by drug-likeness
- Rank by multi-objective scoring
Navigation: → Conditional generation → Multi-objective optimization
references/molecular_generation.md场景: 生成针对靶点结合优化的类药分子。
步骤:
- 在活性数据上训练属性预测器
- 选择生成方法:基于RL优化的GCPN
- 定义结合亲和力、类药性、可合成性的奖励函数
- 生成带有属性约束的候选分子
- 验证化学合理性并按类药性过滤
- 按多目标评分排序
导航: → 条件生成 → 多目标优化
references/molecular_generation.mdWorkflow 5: Retrosynthesis Planning
工作流5:逆合成规划
Scenario: Plan synthesis route for target molecule.
Steps:
- Load dataset:
datasets.USPTO50k() - Train center identification model (RGCN)
- Train synthon completion model (GIN)
- Combine into end-to-end retrosynthesis pipeline
- Apply recursively for multi-step planning
- Check commercial availability of building blocks
Navigation: → Task types → Multi-step planning
references/retrosynthesis.md场景: 为目标分子规划合成路线。
步骤:
- 加载数据集:
datasets.USPTO50k() - 训练反应中心识别模型(RGCN)
- 训练合成子补全模型(GIN)
- 组合为端到端逆合成流水线
- 递归应用进行多步规划
- 检查构建模块的商业可得性
导航: → 任务类型 → 多步规划
references/retrosynthesis.mdIntegration Patterns
集成模式
With RDKit
与RDKit集成
Convert between TorchDrug molecules and RDKit:
python
from torchdrug import data
from rdkit import Chem在TorchDrug分子与RDKit之间转换:
python
from torchdrug import data
from rdkit import ChemSMILES → TorchDrug molecule
SMILES → TorchDrug分子
smiles = "CCO"
mol = data.Molecule.from_smiles(smiles)
smiles = "CCO"
mol = data.Molecule.from_smiles(smiles)
TorchDrug → RDKit
TorchDrug → RDKit
rdkit_mol = mol.to_molecule()
rdkit_mol = mol.to_molecule()
RDKit → TorchDrug
RDKit → TorchDrug
rdkit_mol = Chem.MolFromSmiles(smiles)
mol = data.Molecule.from_molecule(rdkit_mol)
undefinedrdkit_mol = Chem.MolFromSmiles(smiles)
mol = data.Molecule.from_molecule(rdkit_mol)
undefinedWith AlphaFold/ESM
与AlphaFold/ESM集成
Use predicted structures:
python
from torchdrug import data使用预测结构:
python
from torchdrug import dataLoad AlphaFold predicted structure
加载AlphaFold预测结构
protein = data.Protein.from_pdb("AF-P12345-F1-model_v4.pdb")
protein = data.Protein.from_pdb("AF-P12345-F1-model_v4.pdb")
Build graph with spatial edges
构建带空间边的图
graph = protein.residue_graph(
node_position="ca",
edge_types=["sequential", "radius"],
radius_cutoff=10.0
)
undefinedgraph = protein.residue_graph(
node_position="ca",
edge_types=["sequential", "radius"],
radius_cutoff=10.0
)
undefinedWith PyTorch Lightning
与PyTorch Lightning集成
Wrap tasks for Lightning training:
python
import pytorch_lightning as pl
class LightningTask(pl.LightningModule):
def __init__(self, torchdrug_task):
super().__init__()
self.task = torchdrug_task
def training_step(self, batch, batch_idx):
return self.task(batch)
def validation_step(self, batch, batch_idx):
pred = self.task.predict(batch)
target = self.task.target(batch)
return {"pred": pred, "target": target}
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=1e-3)将任务包装为Lightning训练模块:
python
import pytorch_lightning as pl
class LightningTask(pl.LightningModule):
def __init__(self, torchdrug_task):
super().__init__()
self.task = torchdrug_task
def training_step(self, batch, batch_idx):
return self.task(batch)
def validation_step(self, batch, batch_idx):
pred = self.task.predict(batch)
target = self.task.target(batch)
return {"pred": pred, "target": target}
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=1e-3)Technical Details
技术细节
For deep dives into TorchDrug's architecture:
Core Concepts: See for:
references/core_concepts.md- Architecture philosophy (modular, configurable)
- Data structures (Graph, Molecule, Protein, PackedGraph)
- Model interface and forward function signature
- Task interface (predict, target, forward, evaluate)
- Training workflows and best practices
- Loss functions and metrics
- Common pitfalls and debugging
如需深入了解TorchDrug的架构:
核心概念: 查看获取:
references/core_concepts.md- 架构设计理念(模块化、可配置)
- 数据结构(Graph、Molecule、Protein、PackedGraph)
- 模型接口与前向函数签名
- 任务接口(predict、target、forward、evaluate)
- 训练流程与最佳实践
- 损失函数与指标
- 常见陷阱与调试方法
Quick Reference Cheat Sheet
速查表
Choose Dataset:
- Molecular property → → Molecular section
references/datasets.md - Protein task → → Protein section
references/datasets.md - Knowledge graph → → Knowledge graph section
references/datasets.md
Choose Model:
- Molecules → → GNN section → GIN/GAT/SchNet
references/models_architectures.md - Proteins (sequence) → → Protein section → ESM
references/models_architectures.md - Proteins (structure) → → Protein section → GearNet
references/models_architectures.md - Knowledge graph → → KG section → RotatE/ComplEx
references/models_architectures.md
Common Tasks:
- Property prediction → or
references/molecular_property_prediction.mdreferences/protein_modeling.md - Generation →
references/molecular_generation.md - Retrosynthesis →
references/retrosynthesis.md - KG reasoning →
references/knowledge_graphs.md
Understand Architecture:
- Data structures → → Data Structures
references/core_concepts.md - Model design → → Model Interface
references/core_concepts.md - Task design → → Task Interface
references/core_concepts.md
选择数据集:
- 分子属性任务 → → 分子部分
references/datasets.md - 蛋白质任务 → → 蛋白质部分
references/datasets.md - 知识图谱任务 → → 知识图谱部分
references/datasets.md
选择模型:
- 分子任务 → → GNN部分 → GIN/GAT/SchNet
references/models_architectures.md - 蛋白质(序列)任务 → → 蛋白质部分 → ESM
references/models_architectures.md - 蛋白质(结构)任务 → → 蛋白质部分 → GearNet
references/models_architectures.md - 知识图谱任务 → → KG部分 → RotatE/ComplEx
references/models_architectures.md
常见任务:
- 属性预测 → 或
references/molecular_property_prediction.mdreferences/protein_modeling.md - 分子生成 →
references/molecular_generation.md - 逆合成 →
references/retrosynthesis.md - KG推理 →
references/knowledge_graphs.md
架构理解:
- 数据结构 → → 数据结构
references/core_concepts.md - 模型设计 → → 模型接口
references/core_concepts.md - 任务设计 → → 任务接口
references/core_concepts.md
Troubleshooting Common Issues
常见问题排查
Issue: Dimension mismatch errors
→ Check matches
→ See → Essential Attributes
model.input_dimdataset.node_feature_dimreferences/core_concepts.mdIssue: Poor performance on molecular tasks
→ Use scaffold splitting, not random
→ Try GIN instead of GCN
→ See → Best Practices
references/molecular_property_prediction.mdIssue: Protein model not learning
→ Use pre-trained ESM for sequence tasks
→ Check edge construction for structure models
→ See → Training Workflows
references/protein_modeling.mdIssue: Memory errors with large graphs
→ Reduce batch size
→ Use gradient accumulation
→ See → Memory Efficiency
references/core_concepts.mdIssue: Generated molecules are invalid
→ Add validity constraints
→ Post-process with RDKit validation
→ See → Validation and Filtering
references/molecular_generation.md问题:维度不匹配错误
→ 检查是否与匹配
→ 查看 → 核心属性
model.input_dimdataset.node_feature_dimreferences/core_concepts.md问题:分子任务性能不佳
→ 使用骨架拆分而非随机拆分
→ 尝试用GIN替代GCN
→ 查看 → 最佳实践
references/molecular_property_prediction.md问题:蛋白质模型无法收敛
→ 针对序列任务使用预训练ESM模型
→ 检查结构模型的边构建逻辑
→ 查看 → 训练流程
references/protein_modeling.md问题:大图导致内存错误
→ 减小批量大小
→ 使用梯度累积
→ 查看 → 内存优化
references/core_concepts.md问题:生成的分子无效
→ 添加有效性约束
→ 用RDKit验证进行后处理
→ 查看 → 验证与过滤
references/molecular_generation.mdResources
资源
Official Documentation: https://torchdrug.ai/docs/
GitHub: https://github.com/DeepGraphLearning/torchdrug
Paper: TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug Discovery
官方文档: https://torchdrug.ai/docs/
GitHub: https://github.com/DeepGraphLearning/torchdrug
论文: TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug Discovery
Summary
总结
Navigate to the appropriate reference file based on your task:
- Molecular property prediction →
molecular_property_prediction.md - Protein modeling →
protein_modeling.md - Knowledge graphs →
knowledge_graphs.md - Molecular generation →
molecular_generation.md - Retrosynthesis →
retrosynthesis.md - Model selection →
models_architectures.md - Dataset selection →
datasets.md - Technical details →
core_concepts.md
Each reference provides comprehensive coverage of its domain with examples, best practices, and common use cases.
根据你的任务类型导航至对应的参考文档:
- 分子属性预测 →
molecular_property_prediction.md - 蛋白质建模 →
protein_modeling.md - 知识图谱 →
knowledge_graphs.md - 分子生成 →
molecular_generation.md - 逆合成 →
retrosynthesis.md - 模型选择 →
models_architectures.md - 数据集选择 →
datasets.md - 技术细节 →
core_concepts.md
每个参考文档都包含对应领域的全面内容,涵盖示例、最佳实践和常见使用场景。