esm
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseESM: Evolutionary Scale Modeling
ESM:进化尺度建模
Overview
概述
ESM provides state-of-the-art protein language models for understanding, generating, and designing proteins. This skill enables working with two model families: ESM3 for generative protein design across sequence, structure, and function, and ESM C for efficient protein representation learning and embeddings.
ESM提供最先进的蛋白质语言模型,用于蛋白质的理解、生成与设计。该工具支持两大模型系列:用于跨序列、结构和功能生成式蛋白质设计的ESM3,以及用于高效蛋白质表示学习与嵌入的ESM C。
Core Capabilities
核心功能
1. Protein Sequence Generation with ESM3
1. 基于ESM3的蛋白质序列生成
Generate novel protein sequences with desired properties using multimodal generative modeling.
When to use:
- Designing proteins with specific functional properties
- Completing partial protein sequences
- Generating variants of existing proteins
- Creating proteins with desired structural characteristics
Basic usage:
python
from esm.models.esm3 import ESM3
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig通过多模态生成建模,生成具有所需特性的新型蛋白质序列。
适用场景:
- 设计具有特定功能特性的蛋白质
- 补全部分蛋白质序列
- 生成现有蛋白质的变体
- 创建具有所需结构特征的蛋白质
基础用法:
python
from esm.models.esm3 import ESM3
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfigLoad model locally
本地加载模型
model: ESM3InferenceClient = ESM3.from_pretrained("esm3-sm-open-v1").to("cuda")
model: ESM3InferenceClient = ESM3.from_pretrained("esm3-sm-open-v1").to("cuda")
Create protein prompt
创建蛋白质提示
protein = ESMProtein(sequence="MPRT___KEND") # '_' represents masked positions
protein = ESMProtein(sequence="MPRT___KEND") # '_'代表掩码位置
Generate completion
生成补全序列
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))
print(protein.sequence)
**For remote/cloud usage via Forge API:**
```python
from esm.sdk.forge import ESM3ForgeInferenceClient
from esm.sdk.api import ESMProtein, GenerationConfigprotein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))
print(protein.sequence)
**通过Forge API实现远程/云端使用:**
```python
from esm.sdk.forge import ESM3ForgeInferenceClient
from esm.sdk.api import ESMProtein, GenerationConfigConnect to Forge
连接到Forge
model = ESM3ForgeInferenceClient(model="esm3-medium-2024-08", url="https://forge.evolutionaryscale.ai", token="<token>")
model = ESM3ForgeInferenceClient(model="esm3-medium-2024-08", url="https://forge.evolutionaryscale.ai", token="<token>")
Generate
生成序列
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))
See `references/esm3-api.md` for detailed ESM3 model specifications, advanced generation configurations, and multimodal prompting examples.protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))
如需了解ESM3模型的详细规格、高级生成配置和多模态提示示例,请参阅`references/esm3-api.md`。2. Structure Prediction and Inverse Folding
2. 结构预测与逆折叠
Use ESM3's structure track for structure prediction from sequence or inverse folding (sequence design from structure).
Structure prediction:
python
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig使用ESM3的结构轨迹,从序列预测结构,或执行逆折叠(从结构设计序列)。
结构预测:
python
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfigPredict structure from sequence
从序列预测结构
protein = ESMProtein(sequence="MPRTKEINDAGLIVHSP...")
protein_with_structure = model.generate(
protein,
GenerationConfig(track="structure", num_steps=protein.sequence.count("_"))
)
protein = ESMProtein(sequence="MPRTKEINDAGLIVHSP...")
protein_with_structure = model.generate(
protein,
GenerationConfig(track="structure", num_steps=protein.sequence.count("_"))
)
Access predicted structure
获取预测结构
coordinates = protein_with_structure.coordinates # 3D coordinates
pdb_string = protein_with_structure.to_pdb()
**Inverse folding (sequence from structure):**
```pythoncoordinates = protein_with_structure.coordinates # 3D坐标
pdb_string = protein_with_structure.to_pdb()
**逆折叠(从结构生成序列):**
```pythonDesign sequence for a target structure
为目标结构设计序列
protein_with_structure = ESMProtein.from_pdb("target_structure.pdb")
protein_with_structure.sequence = None # Remove sequence
protein_with_structure = ESMProtein.from_pdb("target_structure.pdb")
protein_with_structure.sequence = None # 移除现有序列
Generate sequence that folds to this structure
生成可折叠为该结构的序列
designed_protein = model.generate(
protein_with_structure,
GenerationConfig(track="sequence", num_steps=50, temperature=0.7)
)
undefineddesigned_protein = model.generate(
protein_with_structure,
GenerationConfig(track="sequence", num_steps=50, temperature=0.7)
)
undefined3. Protein Embeddings with ESM C
3. 基于ESM C的蛋白质嵌入
Generate high-quality embeddings for downstream tasks like function prediction, classification, or similarity analysis.
When to use:
- Extracting protein representations for machine learning
- Computing sequence similarities
- Feature extraction for protein classification
- Transfer learning for protein-related tasks
Basic usage:
python
from esm.models.esmc import ESMC
from esm.sdk.api import ESMProtein生成高质量嵌入,用于下游任务如功能预测、分类或相似性分析。
适用场景:
- 为机器学习提取蛋白质表示
- 计算序列相似性
- 为蛋白质分类提取特征
- 为蛋白质相关任务进行迁移学习
基础用法:
python
from esm.models.esmc import ESMC
from esm.sdk.api import ESMProteinLoad ESM C model
加载ESM C模型
model = ESMC.from_pretrained("esmc-300m").to("cuda")
model = ESMC.from_pretrained("esmc-300m").to("cuda")
Get embeddings
获取嵌入
protein = ESMProtein(sequence="MPRTKEINDAGLIVHSP...")
protein_tensor = model.encode(protein)
protein = ESMProtein(sequence="MPRTKEINDAGLIVHSP...")
protein_tensor = model.encode(protein)
Generate embeddings
生成嵌入
embeddings = model.forward(protein_tensor)
**Batch processing:**
```pythonembeddings = model.forward(protein_tensor)
**批量处理:**
```pythonEncode multiple proteins
编码多个蛋白质
proteins = [
ESMProtein(sequence="MPRTKEIND..."),
ESMProtein(sequence="AGLIVHSPQ..."),
ESMProtein(sequence="KTEFLNDGR...")
]
embeddings_list = [model.logits(model.forward(model.encode(p))) for p in proteins]
See `references/esm-c-api.md` for ESM C model details, efficiency comparisons, and advanced embedding strategies.proteins = [
ESMProtein(sequence="MPRTKEIND..."),
ESMProtein(sequence="AGLIVHSPQ..."),
ESMProtein(sequence="KTEFLNDGR...")
]
embeddings_list = [model.logits(model.forward(model.encode(p))) for p in proteins]
如需了解ESM C模型的详细信息、效率对比和高级嵌入策略,请参阅`references/esm-c-api.md`。4. Function Conditioning and Annotation
4. 功能条件控制与注释
Use ESM3's function track to generate proteins with specific functional annotations or predict function from sequence.
Function-conditioned generation:
python
from esm.sdk.api import ESMProtein, FunctionAnnotation, GenerationConfig使用ESM3的功能轨迹,生成具有特定功能注释的蛋白质,或从序列预测功能。
功能条件生成:
python
from esm.sdk.api import ESMProtein, FunctionAnnotation, GenerationConfigCreate protein with desired function
创建具有所需功能的蛋白质
protein = ESMProtein(
sequence="_" * 200, # Generate 200 residue protein
function_annotations=[
FunctionAnnotation(label="fluorescent_protein", start=50, end=150)
]
)
protein = ESMProtein(
sequence="_" * 200, # 生成200个残基的蛋白质
function_annotations=[
FunctionAnnotation(label="fluorescent_protein", start=50, end=150)
]
)
Generate sequence with specified function
生成具有指定功能的序列
functional_protein = model.generate(
protein,
GenerationConfig(track="sequence", num_steps=200)
)
undefinedfunctional_protein = model.generate(
protein,
GenerationConfig(track="sequence", num_steps=200)
)
undefined5. Chain-of-Thought Generation
5. 思维链生成
Iteratively refine protein designs using ESM3's chain-of-thought generation approach.
python
from esm.sdk.api import GenerationConfig使用ESM3的思维链生成方法,迭代优化蛋白质设计。
python
from esm.sdk.api import GenerationConfigMulti-step refinement
多步骤优化
protein = ESMProtein(sequence="MPRT" + "_" * 100 + "KEND")
protein = ESMProtein(sequence="MPRT" + "_" * 100 + "KEND")
Step 1: Generate initial structure
步骤1:生成初始结构
config = GenerationConfig(track="structure", num_steps=50)
protein = model.generate(protein, config)
config = GenerationConfig(track="structure", num_steps=50)
protein = model.generate(protein, config)
Step 2: Refine sequence based on structure
步骤2:基于结构优化序列
config = GenerationConfig(track="sequence", num_steps=50, temperature=0.5)
protein = model.generate(protein, config)
config = GenerationConfig(track="sequence", num_steps=50, temperature=0.5)
protein = model.generate(protein, config)
Step 3: Predict function
步骤3:预测功能
config = GenerationConfig(track="function", num_steps=20)
protein = model.generate(protein, config)
undefinedconfig = GenerationConfig(track="function", num_steps=20)
protein = model.generate(protein, config)
undefined6. Batch Processing with Forge API
6. 基于Forge API的批量处理
Process multiple proteins efficiently using Forge's async executor.
python
from esm.sdk.forge import ESM3ForgeInferenceClient
import asyncio
client = ESM3ForgeInferenceClient(model="esm3-medium-2024-08", token="<token>")使用Forge的异步执行器高效处理多个蛋白质。
python
from esm.sdk.forge import ESM3ForgeInferenceClient
import asyncio
client = ESM3ForgeInferenceClient(model="esm3-medium-2024-08", token="<token>")Async batch processing
异步批量处理
async def batch_generate(proteins_list):
tasks = [
client.async_generate(protein, GenerationConfig(track="sequence"))
for protein in proteins_list
]
return await asyncio.gather(*tasks)
async def batch_generate(proteins_list):
tasks = [
client.async_generate(protein, GenerationConfig(track="sequence"))
for protein in proteins_list
]
return await asyncio.gather(*tasks)
Execute
执行
proteins = [ESMProtein(sequence=f"MPRT{'_' * 50}KEND") for _ in range(10)]
results = asyncio.run(batch_generate(proteins))
See `references/forge-api.md` for detailed Forge API documentation, authentication, rate limits, and batch processing patterns.proteins = [ESMProtein(sequence=f"MPRT{'_' * 50}KEND") for _ in range(10)]
results = asyncio.run(batch_generate(proteins))
如需了解详细的Forge API文档、认证、速率限制和批量处理模式,请参阅`references/forge-api.md`。Model Selection Guide
模型选择指南
ESM3 Models (Generative):
- (1.4B) - Open weights, local usage, good for experimentation
esm3-sm-open-v1 - (7B) - Best balance of quality and speed (Forge only)
esm3-medium-2024-08 - (98B) - Highest quality, slower (Forge only)
esm3-large-2024-03
ESM C Models (Embeddings):
- (30 layers) - Lightweight, fast inference
esmc-300m - (36 layers) - Balanced performance
esmc-600m - (80 layers) - Maximum representation quality
esmc-6b
Selection criteria:
- Local development/testing: Use or
esm3-sm-open-v1esmc-300m - Production quality: Use via Forge
esm3-medium-2024-08 - Maximum accuracy: Use or
esm3-large-2024-03esmc-6b - High throughput: Use Forge API with batch executor
- Cost optimization: Use smaller models, implement caching strategies
ESM3生成式模型:
- (14亿参数) - 开源权重,本地使用,适合实验
esm3-sm-open-v1 - (70亿参数) - 质量与速度的最佳平衡(仅支持Forge)
esm3-medium-2024-08 - (980亿参数) - 最高质量,速度较慢(仅支持Forge)
esm3-large-2024-03
ESM C嵌入模型:
- (30层) - 轻量级,推理速度快
esmc-300m - (36层) - 性能均衡
esmc-600m - (80层) - 表示质量最优
esmc-6b
选择标准:
- 本地开发/测试: 使用或
esm3-sm-open-v1esmc-300m - 生产级质量: 通过Forge使用
esm3-medium-2024-08 - 最高精度: 使用或
esm3-large-2024-03esmc-6b - 高吞吐量: 使用Forge API搭配批量执行器
- 成本优化: 使用较小模型,实现缓存策略
Installation
安装
Basic installation:
bash
uv pip install esmWith Flash Attention (recommended for faster inference):
bash
uv pip install esm
uv pip install flash-attn --no-build-isolationFor Forge API access:
bash
uv pip install esm # SDK includes Forge clientNo additional dependencies needed. Obtain Forge API token at https://forge.evolutionaryscale.ai
基础安装:
bash
uv pip install esm搭配Flash Attention(推荐,可提升推理速度):
bash
uv pip install esm
uv pip install flash-attn --no-build-isolationFor API访问:
bash
uv pip install esm # SDK包含Forge客户端无需额外依赖。请访问https://forge.evolutionaryscale.ai获取Forge API令牌。
Common Workflows
常见工作流
For detailed examples and complete workflows, see which includes:
references/workflows.md- Novel GFP design with chain-of-thought
- Protein variant generation and screening
- Structure-based sequence optimization
- Function prediction pipelines
- Embedding-based clustering and analysis
如需详细示例和完整工作流,请参阅,其中包含:
references/workflows.md- 基于思维链的新型GFP设计
- 蛋白质变体生成与筛选
- 基于结构的序列优化
- 功能预测流水线
- 基于嵌入的聚类与分析
References
参考资料
This skill includes comprehensive reference documentation:
- - ESM3 model architecture, API reference, generation parameters, and multimodal prompting
references/esm3-api.md - - ESM C model details, embedding strategies, and performance optimization
references/esm-c-api.md - - Forge platform documentation, authentication, batch processing, and deployment
references/forge-api.md - - Complete examples and common workflow patterns
references/workflows.md
These references contain detailed API specifications, parameter descriptions, and advanced usage patterns. Load them as needed for specific tasks.
本工具包含全面的参考文档:
- - ESM3模型架构、API参考、生成参数和多模态提示
references/esm3-api.md - - ESM C模型细节、嵌入策略和性能优化
references/esm-c-api.md - - Forge平台文档、认证、批量处理和部署
references/forge-api.md - - 完整示例和常见工作流模式
references/workflows.md
这些参考资料包含详细的API规范、参数说明和高级使用模式。可根据具体任务按需加载。
Best Practices
最佳实践
For generation tasks:
- Start with smaller models for prototyping ()
esm3-sm-open-v1 - Use temperature parameter to control diversity (0.0 = deterministic, 1.0 = diverse)
- Implement iterative refinement with chain-of-thought for complex designs
- Validate generated sequences with structure prediction or wet-lab experiments
For embedding tasks:
- Batch process sequences when possible for efficiency
- Cache embeddings for repeated analyses
- Normalize embeddings when computing similarities
- Use appropriate model size based on downstream task requirements
For production deployment:
- Use Forge API for scalability and latest models
- Implement error handling and retry logic for API calls
- Monitor token usage and implement rate limiting
- Consider AWS SageMaker deployment for dedicated infrastructure
生成任务:
- 原型开发从较小模型开始()
esm3-sm-open-v1 - 使用temperature参数控制多样性(0.0=确定性,1.0=高多样性)
- 对复杂设计使用思维链进行迭代优化
- 通过结构预测或湿实验验证生成的序列
嵌入任务:
- 尽可能批量处理序列以提升效率
- 为重复分析缓存嵌入结果
- 计算相似性时对嵌入进行归一化
- 根据下游任务需求选择合适的模型大小
生产部署:
- 使用Forge API实现可扩展性和获取最新模型
- 为API调用实现错误处理和重试逻辑
- 监控令牌使用情况并实现速率限制
- 考虑使用AWS SageMaker部署专用基础设施
Resources and Documentation
资源与文档
- GitHub Repository: https://github.com/evolutionaryscale/esm
- Forge Platform: https://forge.evolutionaryscale.ai
- Scientific Paper: Hayes et al., Science (2025) - https://www.science.org/doi/10.1126/science.ads0018
- Blog Posts:
- ESM3 Release: https://www.evolutionaryscale.ai/blog/esm3-release
- ESM C Launch: https://www.evolutionaryscale.ai/blog/esm-cambrian
- Community: Slack community at https://bit.ly/3FKwcWd
- Model Weights: HuggingFace EvolutionaryScale organization
- GitHub仓库: https://github.com/evolutionaryscale/esm
- Forge平台: https://forge.evolutionaryscale.ai
- 科学论文: Hayes等人,《Science》(2025) - https://www.science.org/doi/10.1126/science.ads0018
- 博客文章:
- 社区: Slack社区链接https://bit.ly/3FKwcWd
- 模型权重: HuggingFace EvolutionaryScale组织
Responsible Use
负责任使用
ESM is designed for beneficial applications in protein engineering, drug discovery, and scientific research. Follow the Responsible Biodesign Framework (https://responsiblebiodesign.ai/) when designing novel proteins. Consider biosafety and ethical implications of protein designs before experimental validation.
ESM专为蛋白质工程、药物发现和科学研究中的有益应用而设计。设计新型蛋白质时,请遵循《负责任生物设计框架》(https://responsiblebiodesign.ai/)。在实验验证前,请考虑蛋白质设计的生物安全和伦理影响。