tooluniverse-protein-therapeutic-design
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTherapeutic Protein Designer
治疗性蛋白质设计工具
AI-guided de novo protein design using RFdiffusion backbone generation, ProteinMPNN sequence optimization, and structure validation for therapeutic protein development.
KEY PRINCIPLES:
- Structure-first design - Generate backbone geometry before sequence
- Target-guided - Design binders with target structure in mind
- Iterative validation - Predict structure to validate designs
- Developability-aware - Consider aggregation, immunogenicity, expression
- Evidence-graded - Grade designs by confidence metrics
- Actionable output - Provide sequences ready for experimental testing
- English-first queries - Always use English terms in tool calls (protein names, target names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language
基于AI引导的从头蛋白质设计,通过RFdiffusion生成蛋白骨架、ProteinMPNN优化序列并进行结构验证,助力治疗性蛋白质开发。
核心原则:
- 结构优先设计 - 先生成蛋白骨架结构,再设计序列
- 靶点导向 - 设计结合剂时以靶点结构为核心
- 迭代验证 - 通过结构预测验证设计结果
- 成药性考量 - 评估聚集性、免疫原性及表达可行性
- 可信度分级 - 基于置信度指标对设计结果分级
- 可落地输出 - 提供可直接用于实验测试的序列
- 英文优先查询 - 工具调用中始终使用英文术语(蛋白质名称、靶点名称),即使用户使用其他语言提问。仅在英文查询失败时尝试原语言术语。回复使用用户的语言
When to Use
适用场景
Apply when user asks:
- "Design a protein binder for [target]"
- "Create a therapeutic protein against [protein/epitope]"
- "Design a protein scaffold with [property]"
- "Optimize this protein sequence for [function]"
- "Design a de novo enzyme for [reaction]"
- "Generate protein variants for [target binding]"
当用户提出以下需求时使用:
- "为[靶点]设计蛋白质结合剂"
- "创建针对[蛋白质/表位]的治疗性蛋白质"
- "设计具备[特性]的蛋白质支架"
- "优化该蛋白质序列以实现[功能]"
- "为[反应]设计从头酶"
- "生成针对[靶点结合]的蛋白质变体"
Critical Workflow Requirements
关键工作流要求
1. Report-First Approach (MANDATORY)
1. 报告优先方法(强制要求)
-
Create the report file FIRST:
- File name:
[TARGET]_protein_design_report.md - Initialize with section headers
- Add placeholder:
[Designing...]
- File name:
-
Progressively update as designs are generated
-
Output separate files:
- - All designed sequences
[TARGET]_designed_sequences.fasta - - Ranked candidates with metrics
[TARGET]_top_candidates.csv
-
优先创建报告文件:
- 文件名:
[TARGET]_protein_design_report.md - 初始化时添加章节标题
- 加入占位符:
[设计中...]
- 文件名:
-
随设计推进逐步更新报告
-
输出独立文件:
- - 所有设计的序列
[TARGET]_designed_sequences.fasta - - 带指标的候选序列排名表
[TARGET]_top_candidates.csv
2. Design Documentation (MANDATORY)
2. 设计文档规范(强制要求)
Every design MUST include:
markdown
undefined每个设计必须包含:
markdown
undefinedDesign: Binder_001
设计: Binder_001
Sequence: MVLSPADKTN...
Length: 85 amino acids
Target: PD-L1 (UniProt: Q9NZQ7)
Method: RFdiffusion → ProteinMPNN → ESMFold validation
Quality Metrics:
| Metric | Value | Interpretation |
|---|---|---|
| pLDDT | 88.5 | High confidence |
| pTM | 0.82 | Good fold |
| ProteinMPNN score | -2.3 | Favorable |
| Predicted binding | Strong | Based on interface pLDDT |
Source: NVIDIA NIM via , ,
NvidiaNIM_rfdiffusionNvidiaNIM_proteinmpnnNvidiaNIM_esmfold
---序列: MVLSPADKTN...
长度: 85个氨基酸
靶点: PD-L1 (UniProt: Q9NZQ7)
方法: RFdiffusion → ProteinMPNN → ESMFold验证
质量指标:
| 指标 | 数值 | 解读 |
|---|---|---|
| pLDDT | 88.5 | 高置信度 |
| pTM | 0.82 | 折叠效果良好 |
| ProteinMPNN得分 | -2.3 | 结果理想 |
| 预测结合能力 | 强 | 基于界面pLDDT |
来源: NVIDIA NIM via , ,
NvidiaNIM_rfdiffusionNvidiaNIM_proteinmpnnNvidiaNIM_esmfold
---Phase 0: Tool Verification
阶段0: 工具验证
NVIDIA NIM Tools Required
所需NVIDIA NIM工具
| Tool | Purpose | API Key Required |
|---|---|---|
| Backbone generation | Yes |
| Sequence design | Yes |
| Fast structure validation | Yes |
| High-accuracy validation | Yes |
| Sequence embeddings | Yes |
| 工具 | 用途 | 是否需要API密钥 |
|---|---|---|
| 骨架生成 | 是 |
| 序列设计 | 是 |
| 快速结构验证 | 是 |
| 高精度结构验证 | 是 |
| 序列嵌入 | 是 |
Parameter Verification
参数验证
| Tool | WRONG Parameter | CORRECT Parameter |
|---|---|---|
| | |
| | |
| | |
| 工具 | 错误参数 | 正确参数 |
|---|---|---|
| | |
| | |
| | |
Workflow Overview
工作流概览
Phase 1: Target Characterization
├── Get target structure (PDB, EMDB cryo-EM, or AlphaFold)
├── Identify binding epitope
├── Analyze existing binders
├── Check EMDB for membrane protein structures (NEW)
└── OUTPUT: Target profile
↓
Phase 2: Backbone Generation (RFdiffusion)
├── Define design constraints
├── Generate multiple backbones
├── Filter by geometry quality
└── OUTPUT: Candidate backbones
↓
Phase 3: Sequence Design (ProteinMPNN)
├── Design sequences for each backbone
├── Sample multiple sequences per backbone
├── Score by ProteinMPNN likelihood
└── OUTPUT: Designed sequences
↓
Phase 4: Structure Validation
├── Predict structure (ESMFold/AlphaFold2)
├── Compare to designed backbone
├── Assess fold quality (pLDDT, pTM)
└── OUTPUT: Validated designs
↓
Phase 5: Developability Assessment
├── Aggregation propensity
├── Expression likelihood
├── Immunogenicity prediction
└── OUTPUT: Developability scores
↓
Phase 6: Report Synthesis
├── Ranked candidate list
├── Experimental recommendations
├── Next steps
└── OUTPUT: Final report阶段1: 靶点特征分析
├── 获取靶点结构(PDB、EMDB冷冻电镜结构或AlphaFold预测结构)
├── 识别结合表位
├── 分析现有结合剂
├── 检查EMDB中的膜蛋白结构(新增)
└── 输出: 靶点特征报告
↓
阶段2: 骨架生成(RFdiffusion)
├── 定义设计约束
├── 生成多个骨架结构
├── 基于几何质量筛选
└── 输出: 候选骨架
↓
阶段3: 序列设计(ProteinMPNN)
├── 为每个骨架设计序列
├── 每个骨架生成多个序列样本
├── 基于ProteinMPNN可能性得分排序
└── 输出: 设计序列
↓
阶段4: 结构验证
├── 预测结构(ESMFold/AlphaFold2)
├── 与设计骨架对比
├── 评估折叠质量(pLDDT、pTM)
└── 输出: 验证通过的设计
↓
阶段5: 成药性评估
├── 聚集倾向分析
├── 表达可能性预测
├── 免疫原性预测
└── 输出: 成药性得分
↓
阶段6: 报告整合
├── 候选序列排名
├── 实验建议
├── 后续步骤
└── 输出: 最终报告Phase 1: Target Characterization
阶段1: 靶点特征分析
1.1 Get Target Structure
1.1 获取靶点结构
python
def get_target_structure(tu, target_id):
"""Get target structure from PDB, EMDB, or predict."""
# Try PDB first (X-ray/NMR)
pdb_results = tu.tools.PDB_search_by_uniprot(uniprot_id=target_id)
if pdb_results:
# Get highest resolution structure
best_pdb = sorted(pdb_results, key=lambda x: x['resolution'])[0]
structure = tu.tools.PDB_get_structure(pdb_id=best_pdb['pdb_id'])
return {'source': 'PDB', 'pdb_id': best_pdb['pdb_id'],
'resolution': best_pdb['resolution'], 'structure': structure}
# Try EMDB for cryo-EM structures (valuable for membrane proteins)
protein_info = tu.tools.UniProt_get_protein_by_accession(accession=target_id)
emdb_results = tu.tools.emdb_search(
query=protein_info['proteinDescription']['recommendedName']['fullName']['value']
)
if emdb_results and len(emdb_results) > 0:
# Get highest resolution cryo-EM entry
best_emdb = sorted(emdb_results, key=lambda x: x.get('resolution', 99))[0]
# Get associated PDB model if available
emdb_details = tu.tools.emdb_get_entry(entry_id=best_emdb['emdb_id'])
if emdb_details.get('pdb_ids'):
structure = tu.tools.PDB_get_structure(pdb_id=emdb_details['pdb_ids'][0])
return {'source': 'EMDB cryo-EM', 'emdb_id': best_emdb['emdb_id'],
'pdb_id': emdb_details['pdb_ids'][0],
'resolution': best_emdb.get('resolution'), 'structure': structure}
# Fallback to AlphaFold prediction
sequence = tu.tools.UniProt_get_protein_sequence(accession=target_id)
structure = tu.tools.NvidiaNIM_alphafold2(
sequence=sequence['sequence'],
algorithm="mmseqs2"
)
return {'source': 'AlphaFold2 (predicted)', 'structure': structure}python
def get_target_structure(tu, target_id):
"""从PDB、EMDB获取靶点结构,或进行结构预测。"""
# 优先尝试PDB(X射线/NMR结构)
pdb_results = tu.tools.PDB_search_by_uniprot(uniprot_id=target_id)
if pdb_results:
# 获取分辨率最高的结构
best_pdb = sorted(pdb_results, key=lambda x: x['resolution'])[0]
structure = tu.tools.PDB_get_structure(pdb_id=best_pdb['pdb_id'])
return {'source': 'PDB', 'pdb_id': best_pdb['pdb_id'],
'resolution': best_pdb['resolution'], 'structure': structure}
# 尝试EMDB的冷冻电镜结构(对膜蛋白有价值)
protein_info = tu.tools.UniProt_get_protein_by_accession(accession=target_id)
emdb_results = tu.tools.emdb_search(
query=protein_info['proteinDescription']['recommendedName']['fullName']['value']
)
if emdb_results and len(emdb_results) > 0:
# 获取分辨率最高的冷冻电镜条目
best_emdb = sorted(emdb_results, key=lambda x: x.get('resolution', 99))[0]
# 获取关联的PDB模型(如果有)
emdb_details = tu.tools.emdb_get_entry(entry_id=best_emdb['emdb_id'])
if emdb_details.get('pdb_ids'):
structure = tu.tools.PDB_get_structure(pdb_id=emdb_details['pdb_ids'][0])
return {'source': 'EMDB冷冻电镜', 'emdb_id': best_emdb['emdb_id'],
'pdb_id': emdb_details['pdb_ids'][0],
'resolution': best_emdb.get('resolution'), 'structure': structure}
# 备选方案:AlphaFold预测结构
sequence = tu.tools.UniProt_get_protein_sequence(accession=target_id)
structure = tu.tools.NvidiaNIM_alphafold2(
sequence=sequence['sequence'],
algorithm="mmseqs2"
)
return {'source': 'AlphaFold2(预测)', 'structure': structure}1.1b EMDB for Membrane Proteins (NEW)
1.1b 膜蛋白的EMDB结构(新增)
When to prioritize EMDB: Membrane proteins, large complexes, and targets where conformational states matter.
python
def get_cryoem_structures(tu, target_name):
"""Get cryo-EM structures for membrane proteins/complexes."""
# Search EMDB
emdb_results = tu.tools.emdb_search(
query=f"{target_name} membrane OR receptor"
)
structures = []
for entry in emdb_results[:5]:
details = tu.tools.emdb_get_entry(entry_id=entry['emdb_id'])
structures.append({
'emdb_id': entry['emdb_id'],
'resolution': entry.get('resolution', 'N/A'),
'title': entry.get('title', 'N/A'),
'conformational_state': details.get('state', 'Unknown'),
'pdb_models': details.get('pdb_ids', [])
})
return structuresOutput for Report:
markdown
undefined优先使用EMDB的场景: 膜蛋白、大型复合物,以及需要关注构象状态的靶点。
python
def get_cryoem_structures(tu, target_name):
"""获取膜蛋白/复合物的冷冻电镜结构。"""
# 搜索EMDB
emdb_results = tu.tools.emdb_search(
query=f"{target_name} membrane OR receptor"
)
structures = []
for entry in emdb_results[:5]:
details = tu.tools.emdb_get_entry(entry_id=entry['emdb_id'])
structures.append({
'emdb_id': entry['emdb_id'],
'resolution': entry.get('resolution', 'N/A'),
'title': entry.get('title', 'N/A'),
'conformational_state': details.get('state', 'Unknown'),
'pdb_models': details.get('pdb_ids', [])
})
return structures报告输出格式:
markdown
undefined1.1b Cryo-EM Structures (EMDB)
1.1b 冷冻电镜结构(EMDB)
| EMDB ID | Resolution | PDB Model | Conformation |
|---|---|---|---|
| EMD-12345 | 2.8 Å | 7ABC | Active state |
| EMD-23456 | 3.1 Å | 8DEF | Inactive state |
Note: Cryo-EM structures capture physiologically relevant conformations for membrane protein targets.
Source: EMDB
undefined| EMDB编号 | 分辨率 | PDB模型 | 构象状态 |
|---|---|---|---|
| EMD-12345 | 2.8 Å | 7ABC | 激活态 |
| EMD-23456 | 3.1 Å | 8DEF | 非激活态 |
说明: 冷冻电镜结构可捕捉膜蛋白靶点的生理相关构象。
来源: EMDB
undefined1.2 Identify Binding Epitope
1.2 识别结合表位
python
def identify_epitope(tu, target_structure, epitope_residues=None):
"""Identify or validate binding epitope."""
if epitope_residues:
# User-specified epitope
return {'residues': epitope_residues, 'source': 'user-defined'}
# Find surface-exposed regions
# Use structural analysis to identify potential epitopes
return analyze_surface(target_structure)python
def identify_epitope(tu, target_structure, epitope_residues=None):
"""识别或验证结合表位。"""
if epitope_residues:
# 用户指定的表位
return {'residues': epitope_residues, 'source': '用户定义'}
# 寻找表面暴露区域
# 通过结构分析识别潜在表位
return analyze_surface(target_structure)1.3 Output for Report
1.3 报告输出格式
markdown
undefinedmarkdown
undefined1. Target Characterization
1. 靶点特征分析
1.1 Target Information
1.1 靶点信息
| Property | Value |
|---|---|
| Target | PD-L1 (Programmed death-ligand 1) |
| UniProt | Q9NZQ7 |
| Structure source | PDB: 4ZQK (2.0 Å resolution) |
| Binding epitope | IgV domain, residues 19-127 |
| Known binders | Atezolizumab, durvalumab, avelumab |
| 属性 | 数值 |
|---|---|
| 靶点 | PD-L1(程序性死亡配体1) |
| UniProt编号 | Q9NZQ7 |
| 结构来源 | PDB: 4ZQK(分辨率2.0 Å) |
| 结合表位 | IgV结构域,残基19-127 |
| 已知结合剂 | 阿替利珠单抗、度伐利尤单抗、阿维鲁单抗 |
1.2 Epitope Analysis
1.2 表位分析
| Residue Range | Type | Surface Area | Druggability |
|---|---|---|---|
| 54-68 | Loop | 850 Ų | High |
| 115-125 | Beta strand | 420 Ų | Medium |
| 19-30 | N-terminus | 380 Ų | Medium |
Selected Epitope: Residues 54-68 (PD-1 binding interface)
Source: PDB 4ZQK, surface analysis
---| 残基范围 | 类型 | 表面积 | 成药性 |
|---|---|---|---|
| 54-68 | 环区 | 850 Ų | 高 |
| 115-125 | β链 | 420 Ų | 中 |
| 19-30 | N端 | 380 Ų | 中 |
选定表位: 残基54-68(PD-1结合界面)
来源: PDB 4ZQK,表面分析
---Phase 2: Backbone Generation
阶段2: 骨架生成
2.1 RFdiffusion Design
2.1 RFdiffusion设计
python
def generate_backbones(tu, design_params):
"""Generate de novo backbones using RFdiffusion."""
backbones = tu.tools.NvidiaNIM_rfdiffusion(
diffusion_steps=design_params.get('steps', 50),
# Additional parameters depending on design type
)
return backbonespython
def generate_backbones(tu, design_params):
"""使用RFdiffusion生成从头骨架结构。"""
backbones = tu.tools.NvidiaNIM_rfdiffusion(
diffusion_steps=design_params.get('steps', 50),
# 根据设计类型添加其他参数
)
return backbones2.2 Design Modes
2.2 设计模式
| Mode | Use Case | Key Parameters |
|---|---|---|
| Unconditional | De novo scaffold | |
| Binder design | Target-guided binder | |
| Motif scaffolding | Functional motif embedding | |
| 模式 | 适用场景 | 关键参数 |
|---|---|---|
| 无约束 | 从头支架设计 | 仅 |
| 结合剂设计 | 靶点导向结合剂 | |
| 基序支架化 | 功能基序嵌入 | |
2.3 Output for Report
2.3 报告输出格式
markdown
undefinedmarkdown
undefined2. Backbone Generation
2. 骨架生成
2.1 Design Parameters
2.1 设计参数
| Parameter | Value |
|---|---|
| Method | RFdiffusion via NVIDIA NIM |
| Design mode | Unconditional scaffold generation |
| Diffusion steps | 50 |
| Number generated | 10 backbones |
| 参数 | 数值 |
|---|---|
| 方法 | RFdiffusion via NVIDIA NIM |
| 设计模式 | 无约束支架生成 |
| 扩散步数 | 50 |
| 生成数量 | 10个骨架 |
2.2 Generated Backbones
2.2 生成的骨架
| Backbone | Length | Topology | Quality |
|---|---|---|---|
| BB_001 | 85 aa | 3-helix bundle | Good |
| BB_002 | 92 aa | Beta sandwich | Good |
| BB_003 | 78 aa | Alpha-beta | Good |
| BB_004 | 88 aa | All-alpha | Moderate |
| BB_005 | 95 aa | Mixed | Good |
Selected for sequence design: BB_001, BB_002, BB_003, BB_005 (top 4)
Source: NVIDIA NIM via
NvidiaNIM_rfdiffusion
---| 骨架 | 长度 | 拓扑结构 | 质量 |
|---|---|---|---|
| BB_001 | 85 aa | 3螺旋束 | 良好 |
| BB_002 | 92 aa | β折叠夹层 | 良好 |
| BB_003 | 78 aa | α-β混合 | 良好 |
| BB_004 | 88 aa | 全α螺旋 | 中等 |
| BB_005 | 95 aa | 混合拓扑 | 良好 |
选定用于序列设计的骨架: BB_001, BB_002, BB_003, BB_005(排名前4)
来源: NVIDIA NIM via
NvidiaNIM_rfdiffusion
---Phase 3: Sequence Design
阶段3: 序列设计
3.1 ProteinMPNN Design
3.1 ProteinMPNN设计
python
def design_sequences(tu, backbone_pdb, num_sequences=8):
"""Design sequences for backbone using ProteinMPNN."""
sequences = tu.tools.NvidiaNIM_proteinmpnn(
pdb_string=backbone_pdb,
num_sequences=num_sequences,
temperature=0.1 # Lower = more conservative
)
return sequencespython
def design_sequences(tu, backbone_pdb, num_sequences=8):
"""使用ProteinMPNN为骨架设计序列。"""
sequences = tu.tools.NvidiaNIM_proteinmpnn(
pdb_string=backbone_pdb,
num_sequences=num_sequences,
temperature=0.1 # 数值越低,设计越保守
)
return sequences3.2 Sampling Parameters
3.2 采样参数
| Parameter | Conservative | Moderate | Diverse |
|---|---|---|---|
| Temperature | 0.1 | 0.2 | 0.5 |
| Sequences per backbone | 4 | 8 | 16 |
| Use case | Validated scaffold | Exploration | Diversity |
| 参数 | 保守型 | 中等型 | 多样化 |
|---|---|---|---|
| 温度 | 0.1 | 0.2 | 0.5 |
| 每个骨架生成的序列数 | 4 | 8 | 16 |
| 适用场景 | 已验证支架 | 探索性设计 | 多样性设计 |
3.3 Output for Report
3.3 报告输出格式
markdown
undefinedmarkdown
undefined3. Sequence Design
3. 序列设计
3.1 Design Parameters
3.1 设计参数
| Parameter | Value |
|---|---|
| Method | ProteinMPNN via NVIDIA NIM |
| Temperature | 0.1 (conservative) |
| Sequences per backbone | 8 |
| Total sequences | 32 |
| 参数 | 数值 |
|---|---|
| 方法 | ProteinMPNN via NVIDIA NIM |
| 温度 | 0.1(保守型) |
| 每个骨架生成的序列数 | 8 |
| 总序列数 | 32 |
3.2 Designed Sequences (Top 10 by Score)
3.2 设计序列(得分前10)
| Rank | Backbone | Sequence ID | Length | MPNN Score | Predicted pI |
|---|---|---|---|---|---|
| 1 | BB_001 | Seq_001_A | 85 | -1.89 | 6.2 |
| 2 | BB_002 | Seq_002_C | 92 | -1.95 | 5.8 |
| 3 | BB_001 | Seq_001_B | 85 | -2.01 | 7.1 |
| 4 | BB_003 | Seq_003_A | 78 | -2.08 | 6.5 |
| 5 | BB_005 | Seq_005_B | 95 | -2.12 | 5.4 |
| 排名 | 骨架 | 序列ID | 长度 | MPNN得分 | 预测等电点 |
|---|---|---|---|---|---|
| 1 | BB_001 | Seq_001_A | 85 | -1.89 | 6.2 |
| 2 | BB_002 | Seq_002_C | 92 | -1.95 | 5.8 |
| 3 | BB_001 | Seq_001_B | 85 | -2.01 | 7.1 |
| 4 | BB_003 | Seq_003_A | 78 | -2.08 | 6.5 |
| 5 | BB_005 | Seq_005_B | 95 | -2.12 | 5.4 |
3.3 Top Sequence: Seq_001_A
3.3 最优序列: Seq_001_A
>Seq_001_A (85 aa, MPNN score: -1.89)
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH
GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLSource: NVIDIA NIM via
NvidiaNIM_proteinmpnn
--->Seq_001_A (85 aa, MPNN score: -1.89)
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH
GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL来源: NVIDIA NIM via
NvidiaNIM_proteinmpnn
---Phase 4: Structure Validation
阶段4: 结构验证
4.1 ESMFold Validation
4.1 ESMFold验证
python
def validate_structure(tu, sequence):
"""Validate designed sequence by structure prediction."""
# Fast validation with ESMFold
predicted = tu.tools.NvidiaNIM_esmfold(sequence=sequence)
# Extract quality metrics
plddt = extract_plddt(predicted)
ptm = extract_ptm(predicted)
return {
'structure': predicted,
'mean_plddt': np.mean(plddt),
'ptm': ptm,
'passes': np.mean(plddt) > 70 and ptm > 0.7
}python
def validate_structure(tu, sequence):
"""通过结构预测验证设计序列。"""
# 使用ESMFold快速验证
predicted = tu.tools.NvidiaNIM_esmfold(sequence=sequence)
# 提取质量指标
plddt = extract_plddt(predicted)
ptm = extract_ptm(predicted)
return {
'structure': predicted,
'mean_plddt': np.mean(plddt),
'ptm': ptm,
'passes': np.mean(plddt) > 70 and ptm > 0.7
}4.2 Validation Criteria
4.2 验证标准
| Metric | Threshold | Interpretation |
|---|---|---|
| Mean pLDDT | >70 | Confident fold |
| pTM | >0.7 | Good global topology |
| RMSD to backbone | <2 Å | Design recapitulated |
| 指标 | 阈值 | 解读 |
|---|---|---|
| 平均pLDDT | >70 | 折叠置信度高 |
| pTM | >0.7 | 全局拓扑结构良好 |
| 与骨架的RMSD | <2 Å | 设计结构复现度高 |
4.3 Output for Report
4.3 报告输出格式
markdown
undefinedmarkdown
undefined4. Structure Validation
4. 结构验证
4.1 Validation Results
4.1 验证结果
| Sequence | pLDDT | pTM | RMSD to Design | Status |
|---|---|---|---|---|
| Seq_001_A | 88.5 | 0.85 | 1.2 Å | ✓ PASS |
| Seq_002_C | 82.3 | 0.79 | 1.5 Å | ✓ PASS |
| Seq_001_B | 85.1 | 0.82 | 1.3 Å | ✓ PASS |
| Seq_003_A | 79.8 | 0.76 | 1.8 Å | ✓ PASS |
| Seq_005_B | 68.2 | 0.65 | 2.8 Å | ✗ FAIL |
| 序列 | pLDDT | pTM | 与设计骨架的RMSD | 状态 |
|---|---|---|---|---|
| Seq_001_A | 88.5 | 0.85 | 1.2 Å | ✓ 通过 |
| Seq_002_C | 82.3 | 0.79 | 1.5 Å | ✓ 通过 |
| Seq_001_B | 85.1 | 0.82 | 1.3 Å | ✓ 通过 |
| Seq_003_A | 79.8 | 0.76 | 1.8 Å | ✓ 通过 |
| Seq_005_B | 68.2 | 0.65 | 2.8 Å | ✗ 未通过 |
4.2 Top Validated Design: Seq_001_A
4.2 最优验证设计: Seq_001_A
| Region | Residues | pLDDT | Interpretation |
|---|---|---|---|
| Helix 1 | 1-28 | 92.3 | Very high confidence |
| Loop 1 | 29-35 | 78.4 | Moderate confidence |
| Helix 2 | 36-58 | 91.8 | Very high confidence |
| Loop 2 | 59-65 | 75.2 | Moderate confidence |
| Helix 3 | 66-85 | 90.1 | Very high confidence |
Overall: Well-folded 3-helix bundle with high confidence core
Source: NVIDIA NIM via
NvidiaNIM_esmfold
---| 区域 | 残基 | pLDDT | 解读 |
|---|---|---|---|
| 螺旋1 | 1-28 | 92.3 | 置信度极高 |
| 环区1 | 29-35 | 78.4 | 置信度中等 |
| 螺旋2 | 36-58 | 91.8 | 置信度极高 |
| 环区2 | 59-65 | 75.2 | 置信度中等 |
| 螺旋3 | 66-85 | 90.1 | 置信度极高 |
整体评价: 折叠良好的3螺旋束结构,核心区域置信度高
来源: NVIDIA NIM via
NvidiaNIM_esmfold
---Phase 5: Developability Assessment
阶段5: 成药性评估
5.1 Aggregation Propensity
5.1 聚集倾向分析
python
def assess_aggregation(sequence):
"""Assess aggregation propensity."""
# Calculate hydrophobic patches
# Calculate isoelectric point
# Identify aggregation-prone motifs
return {
'aggregation_score': score,
'hydrophobic_patches': patches,
'risk_level': 'Low' if score < 0.5 else 'Medium' if score < 0.7 else 'High'
}python
def assess_aggregation(sequence):
"""评估序列的聚集倾向。"""
# 计算疏水区
# 计算等电点
# 识别聚集倾向基序
return {
'aggregation_score': score,
'hydrophobic_patches': patches,
'risk_level': '低' if score < 0.5 else '中' if score < 0.7 else '高'
}5.2 Developability Metrics
5.2 成药性指标
| Metric | Favorable | Marginal | Unfavorable |
|---|---|---|---|
| Aggregation score | <0.5 | 0.5-0.7 | >0.7 |
| Isoelectric point | 5-9 | 4-5 or 9-10 | <4 or >10 |
| Hydrophobic patches | <3 | 3-5 | >5 |
| Cysteine count | 0 or even | Odd | Multiple unpaired |
| 指标 | 理想值 | 临界值 | 不理想值 |
|---|---|---|---|
| 聚集得分 | <0.5 | 0.5-0.7 | >0.7 |
| 等电点 | 5-9 | 4-5 或 9-10 | <4 或 >10 |
| 疏水区数量 | <3 | 3-5 | >5 |
| 半胱氨酸数量 | 0 或偶数 | 奇数 | 多个未配对 |
5.3 Output for Report
5.3 报告输出格式
markdown
undefinedmarkdown
undefined5. Developability Assessment
5. 成药性评估
5.1 Developability Scores
5.1 成药性得分
| Design | Aggregation | pI | Cysteines | Expression | Overall |
|---|---|---|---|---|---|
| Seq_001_A | 0.32 (Low) | 6.2 | 0 | High | ★★★ |
| Seq_002_C | 0.45 (Low) | 5.8 | 2 (paired) | Medium | ★★☆ |
| Seq_001_B | 0.38 (Low) | 7.1 | 0 | High | ★★★ |
| Seq_003_A | 0.58 (Med) | 6.5 | 0 | Medium | ★★☆ |
| 设计 | 聚集倾向 | pI | 半胱氨酸 | 表达可能性 | 整体评价 |
|---|---|---|---|---|---|
| Seq_001_A | 0.32(低) | 6.2 | 0 | 高 | ★★★ |
| Seq_002_C | 0.45(低) | 5.8 | 2(配对) | 中 | ★★☆ |
| Seq_001_B | 0.38(低) | 7.1 | 0 | 高 | ★★★ |
| Seq_003_A | 0.58(中) | 6.5 | 0 | 中 | ★★☆ |
5.2 Recommendations
5.2 建议
Best candidate for expression: Seq_001_A
- Low aggregation propensity
- Neutral pI (easy purification)
- No cysteines (no misfolding risk)
- Predicted high E. coli expression
Source: Sequence analysis
---最适合表达的候选序列: Seq_001_A
- 聚集倾向低
- 中性等电点(易于纯化)
- 无半胱氨酸(无错误折叠风险)
- 预测在大肠杆菌中表达量高
来源: 序列分析
---Report Template
报告模板
markdown
undefinedmarkdown
undefinedTherapeutic Protein Design Report: [TARGET]
治疗性蛋白质设计报告: [TARGET]
Generated: [Date] | Query: [Original query] | Status: In Progress
生成时间: [日期] | 用户请求: [原始请求] | 状态: 设计中
Executive Summary
执行摘要
[Designing...]
[设计中...]
1. Target Characterization
1. 靶点特征分析
1.1 Target Information
1.1 靶点信息
[Designing...]
[设计中...]
1.2 Binding Epitope
1.2 结合表位
[Designing...]
[设计中...]
2. Backbone Generation
2. 骨架生成
2.1 Design Parameters
2.1 设计参数
[Designing...]
[设计中...]
2.2 Generated Backbones
2.2 生成的骨架
[Designing...]
[设计中...]
3. Sequence Design
3. 序列设计
3.1 ProteinMPNN Results
3.1 ProteinMPNN结果
[Designing...]
[设计中...]
3.2 Top Sequences
3.2 最优序列
[Designing...]
[设计中...]
4. Structure Validation
4. 结构验证
4.1 ESMFold Validation
4.1 ESMFold验证
[Designing...]
[设计中...]
4.2 Quality Metrics
4.2 质量指标
[Designing...]
[设计中...]
5. Developability Assessment
5. 成药性评估
5.1 Scores
5.1 得分
[Designing...]
[设计中...]
5.2 Recommendations
5.2 建议
[Designing...]
[设计中...]
6. Final Candidates
6. 最终候选序列
6.1 Ranked List
6.1 排名列表
[Designing...]
[设计中...]
6.2 Sequences for Testing
6.2 用于测试的序列
[Designing...]
[设计中...]
7. Experimental Recommendations
7. 实验建议
[Designing...]
[设计中...]
8. Data Sources
8. 数据来源
[Will be populated...]
---[待填充...]
---Evidence Grading
可信度分级
| Tier | Symbol | Criteria |
|---|---|---|
| T1 | ★★★ | pLDDT >85, pTM >0.8, low aggregation, neutral pI |
| T2 | ★★☆ | pLDDT >75, pTM >0.7, acceptable developability |
| T3 | ★☆☆ | pLDDT >70, pTM >0.65, developability concerns |
| T4 | ☆☆☆ | Failed validation or major developability issues |
| 等级 | 符号 | 标准 |
|---|---|---|
| T1 | ★★★ | pLDDT>85, pTM>0.8, 聚集倾向低, 等电点中性 |
| T2 | ★★☆ | pLDDT>75, pTM>0.7, 成药性可接受 |
| T3 | ★☆☆ | pLDDT>70, pTM>0.65, 存在成药性问题 |
| T4 | ☆☆☆ | 未通过验证或存在严重成药性问题 |
Completeness Checklist
完整性检查清单
Phase 1: Target
阶段1: 靶点
- Target structure obtained (PDB or predicted)
- Binding epitope identified
- Existing binders noted
- 已获取靶点结构(PDB或预测结构)
- 已识别结合表位
- 已记录现有结合剂
Phase 2: Backbones
阶段2: 骨架
- ≥5 backbones generated
- Top 3-5 selected for sequence design
- Selection criteria documented
- 生成≥5个骨架
- 选定3-5个最优骨架用于序列设计
- 已记录筛选标准
Phase 3: Sequences
阶段3: 序列
- ≥8 sequences per backbone designed
- MPNN scores reported
- Top 10 sequences listed
- 每个骨架设计≥8个序列
- 已报告MPNN得分
- 已列出前10个序列
Phase 4: Validation
阶段4: 验证
- All sequences validated by ESMFold
- pLDDT and pTM reported
- Pass/fail criteria applied
- ≥3 passing designs
- 所有序列已通过ESMFold验证
- 已报告pLDDT和pTM
- 已应用通过/未通过标准
- 获得≥3个通过验证的设计
Phase 5: Developability
阶段5: 成药性
- Aggregation assessed
- pI calculated
- Expression prediction
- Final ranking
- 已评估聚集倾向
- 已计算等电点
- 已预测表达可能性
- 已完成最终排名
Phase 6: Deliverables
阶段6: 交付物
- Ranked candidate list
- FASTA file with sequences
- Experimental recommendations
- 已生成候选序列排名表
- 已生成FASTA序列文件
- 已提供实验建议
Fallback Chains
备选工具链
| Primary Tool | Fallback 1 | Fallback 2 |
|---|---|---|
| Manual backbone design | Scaffold from PDB |
| Rosetta ProteinMPNN | Manual sequence design |
| | AlphaFold DB |
| PDB structure | | AlphaFold DB |
| 主工具 | 备选工具1 | 备选工具2 |
|---|---|---|
| 手动骨架设计 | 从PDB获取支架 |
| Rosetta ProteinMPNN | 手动序列设计 |
| | AlphaFold DB |
| PDB结构 | | AlphaFold DB |
Tool Reference
工具参考
See TOOLS_REFERENCE.md for complete tool documentation.
完整工具文档请查看TOOLS_REFERENCE.md。