esm

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ESM: Evolutionary Scale Modeling

ESM：进化尺度建模

Overview

概述

ESM provides state-of-the-art protein language models for understanding, generating, and designing proteins. This skill enables working with two model families: ESM3 for generative protein design across sequence, structure, and function, and ESM C for efficient protein representation learning and embeddings.

ESM提供最先进的蛋白质语言模型，用于蛋白质的理解、生成与设计。该工具支持两大模型系列：用于跨序列、结构和功能生成式蛋白质设计的ESM3，以及用于高效蛋白质表示学习与嵌入的ESM C。

Core Capabilities

核心功能

1. Protein Sequence Generation with ESM3

1. 基于ESM3的蛋白质序列生成

Generate novel protein sequences with desired properties using multimodal generative modeling.

When to use:

Designing proteins with specific functional properties
Completing partial protein sequences
Generating variants of existing proteins
Creating proteins with desired structural characteristics

Basic usage:

python

from esm.models.esm3 import ESM3
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig

通过多模态生成建模，生成具有所需特性的新型蛋白质序列。

适用场景：

设计具有特定功能特性的蛋白质
补全部分蛋白质序列
生成现有蛋白质的变体
创建具有所需结构特征的蛋白质

基础用法：

python

from esm.models.esm3 import ESM3
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig

Load model locally

本地加载模型

model: ESM3InferenceClient = ESM3.from_pretrained("esm3-sm-open-v1").to("cuda")

Create protein prompt

创建蛋白质提示

protein = ESMProtein(sequence="MPRT___KEND") # '_' represents masked positions

protein = ESMProtein(sequence="MPRT___KEND") # '_'代表掩码位置

Generate completion

生成补全序列

protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8)) print(protein.sequence)


**For remote/cloud usage via Forge API:**

```python
from esm.sdk.forge import ESM3ForgeInferenceClient
from esm.sdk.api import ESMProtein, GenerationConfig

protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8)) print(protein.sequence)


**通过Forge API实现远程/云端使用：**

```python
from esm.sdk.forge import ESM3ForgeInferenceClient
from esm.sdk.api import ESMProtein, GenerationConfig

Connect to Forge

连接到Forge

model = ESM3ForgeInferenceClient(model="esm3-medium-2024-08", url="https://forge.evolutionaryscale.ai", token="<token>")

Generate

生成序列

protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))


See `references/esm3-api.md` for detailed ESM3 model specifications, advanced generation configurations, and multimodal prompting examples.

protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))


如需了解ESM3模型的详细规格、高级生成配置和多模态提示示例，请参阅`references/esm3-api.md`。

2. Structure Prediction and Inverse Folding

2. 结构预测与逆折叠

Use ESM3's structure track for structure prediction from sequence or inverse folding (sequence design from structure).

Structure prediction:

python

from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig

使用ESM3的结构轨迹，从序列预测结构，或执行逆折叠（从结构设计序列）。

结构预测：

python

from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig

Predict structure from sequence

从序列预测结构

protein = ESMProtein(sequence="MPRTKEINDAGLIVHSP...") protein_with_structure = model.generate( protein, GenerationConfig(track="structure", num_steps=protein.sequence.count("_")) )

Access predicted structure

获取预测结构

coordinates = protein_with_structure.coordinates # 3D coordinates pdb_string = protein_with_structure.to_pdb()


**Inverse folding (sequence from structure):**

```python

coordinates = protein_with_structure.coordinates # 3D坐标 pdb_string = protein_with_structure.to_pdb()


**逆折叠（从结构生成序列）：**

```python

Design sequence for a target structure

为目标结构设计序列

protein_with_structure = ESMProtein.from_pdb("target_structure.pdb") protein_with_structure.sequence = None # Remove sequence

protein_with_structure = ESMProtein.from_pdb("target_structure.pdb") protein_with_structure.sequence = None # 移除现有序列

Generate sequence that folds to this structure

生成可折叠为该结构的序列

designed_protein = model.generate( protein_with_structure, GenerationConfig(track="sequence", num_steps=50, temperature=0.7) )

undefined

designed_protein = model.generate( protein_with_structure, GenerationConfig(track="sequence", num_steps=50, temperature=0.7) )

undefined

3. Protein Embeddings with ESM C

3. 基于ESM C的蛋白质嵌入

Generate high-quality embeddings for downstream tasks like function prediction, classification, or similarity analysis.

When to use:

Extracting protein representations for machine learning
Computing sequence similarities
Feature extraction for protein classification
Transfer learning for protein-related tasks

Basic usage:

python

from esm.models.esmc import ESMC
from esm.sdk.api import ESMProtein

生成高质量嵌入，用于下游任务如功能预测、分类或相似性分析。

适用场景：

为机器学习提取蛋白质表示
计算序列相似性
为蛋白质分类提取特征
为蛋白质相关任务进行迁移学习

基础用法：

python

from esm.models.esmc import ESMC
from esm.sdk.api import ESMProtein

Load ESM C model

加载ESM C模型

model = ESMC.from_pretrained("esmc-300m").to("cuda")

Get embeddings

获取嵌入

protein = ESMProtein(sequence="MPRTKEINDAGLIVHSP...") protein_tensor = model.encode(protein)

Generate embeddings

生成嵌入

embeddings = model.forward(protein_tensor)


**Batch processing:**

```python

embeddings = model.forward(protein_tensor)


**批量处理：**

```python

Encode multiple proteins

编码多个蛋白质

proteins = [ ESMProtein(sequence="MPRTKEIND..."), ESMProtein(sequence="AGLIVHSPQ..."), ESMProtein(sequence="KTEFLNDGR...") ]

embeddings_list = [model.logits(model.forward(model.encode(p))) for p in proteins]


See `references/esm-c-api.md` for ESM C model details, efficiency comparisons, and advanced embedding strategies.

proteins = [ ESMProtein(sequence="MPRTKEIND..."), ESMProtein(sequence="AGLIVHSPQ..."), ESMProtein(sequence="KTEFLNDGR...") ]

embeddings_list = [model.logits(model.forward(model.encode(p))) for p in proteins]


如需了解ESM C模型的详细信息、效率对比和高级嵌入策略，请参阅`references/esm-c-api.md`。

4. Function Conditioning and Annotation

4. 功能条件控制与注释

Use ESM3's function track to generate proteins with specific functional annotations or predict function from sequence.

Function-conditioned generation:

python

from esm.sdk.api import ESMProtein, FunctionAnnotation, GenerationConfig

使用ESM3的功能轨迹，生成具有特定功能注释的蛋白质，或从序列预测功能。

功能条件生成：

python

from esm.sdk.api import ESMProtein, FunctionAnnotation, GenerationConfig

Create protein with desired function

创建具有所需功能的蛋白质

protein = ESMProtein( sequence="_" * 200, # Generate 200 residue protein function_annotations=[ FunctionAnnotation(label="fluorescent_protein", start=50, end=150) ] )

protein = ESMProtein( sequence="_" * 200, # 生成200个残基的蛋白质 function_annotations=[ FunctionAnnotation(label="fluorescent_protein", start=50, end=150) ] )

Generate sequence with specified function

生成具有指定功能的序列

functional_protein = model.generate( protein, GenerationConfig(track="sequence", num_steps=200) )

undefined

functional_protein = model.generate( protein, GenerationConfig(track="sequence", num_steps=200) )

undefined

5. Chain-of-Thought Generation

5. 思维链生成

Iteratively refine protein designs using ESM3's chain-of-thought generation approach.

python

from esm.sdk.api import GenerationConfig

使用ESM3的思维链生成方法，迭代优化蛋白质设计。

python

from esm.sdk.api import GenerationConfig

Multi-step refinement

多步骤优化

protein = ESMProtein(sequence="MPRT" + "_" * 100 + "KEND")

Step 1: Generate initial structure

步骤1：生成初始结构

config = GenerationConfig(track="structure", num_steps=50) protein = model.generate(protein, config)

Step 2: Refine sequence based on structure

步骤2：基于结构优化序列

config = GenerationConfig(track="sequence", num_steps=50, temperature=0.5) protein = model.generate(protein, config)

Step 3: Predict function

步骤3：预测功能

config = GenerationConfig(track="function", num_steps=20) protein = model.generate(protein, config)

undefined

config = GenerationConfig(track="function", num_steps=20) protein = model.generate(protein, config)

undefined

6. Batch Processing with Forge API

6. 基于Forge API的批量处理

Process multiple proteins efficiently using Forge's async executor.

python

from esm.sdk.forge import ESM3ForgeInferenceClient
import asyncio

client = ESM3ForgeInferenceClient(model="esm3-medium-2024-08", token="<token>")

使用Forge的异步执行器高效处理多个蛋白质。

python

from esm.sdk.forge import ESM3ForgeInferenceClient
import asyncio

client = ESM3ForgeInferenceClient(model="esm3-medium-2024-08", token="<token>")

Async batch processing

异步批量处理

async def batch_generate(proteins_list): tasks = [ client.async_generate(protein, GenerationConfig(track="sequence")) for protein in proteins_list ] return await asyncio.gather(*tasks)

Execute

执行

proteins = [ESMProtein(sequence=f"MPRT{'_' * 50}KEND") for _ in range(10)] results = asyncio.run(batch_generate(proteins))


See `references/forge-api.md` for detailed Forge API documentation, authentication, rate limits, and batch processing patterns.

proteins = [ESMProtein(sequence=f"MPRT{'_' * 50}KEND") for _ in range(10)] results = asyncio.run(batch_generate(proteins))


如需了解详细的Forge API文档、认证、速率限制和批量处理模式，请参阅`references/forge-api.md`。

Model Selection Guide

模型选择指南

ESM3 Models (Generative):

```
esm3-sm-open-v1
```
(1.4B) - Open weights, local usage, good for experimentation
```
esm3-medium-2024-08
```
(7B) - Best balance of quality and speed (Forge only)
```
esm3-large-2024-03
```
(98B) - Highest quality, slower (Forge only)

ESM C Models (Embeddings):

```
esmc-300m
```
(30 layers) - Lightweight, fast inference
```
esmc-600m
```
(36 layers) - Balanced performance
```
esmc-6b
```
(80 layers) - Maximum representation quality

Selection criteria:

Local development/testing: Use
```
esm3-sm-open-v1
```
or
```
esmc-300m
```
Production quality: Use
```
esm3-medium-2024-08
```
via Forge
Maximum accuracy: Use
```
esm3-large-2024-03
```
or
```
esmc-6b
```
High throughput: Use Forge API with batch executor
Cost optimization: Use smaller models, implement caching strategies

ESM3生成式模型：

```
esm3-sm-open-v1
```
（14亿参数） - 开源权重，本地使用，适合实验
```
esm3-medium-2024-08
```
（70亿参数） - 质量与速度的最佳平衡（仅支持Forge）
```
esm3-large-2024-03
```
（980亿参数） - 最高质量，速度较慢（仅支持Forge）

ESM C嵌入模型：

```
esmc-300m
```
（30层） - 轻量级，推理速度快
```
esmc-600m
```
（36层） - 性能均衡
```
esmc-6b
```
（80层） - 表示质量最优

选择标准：

本地开发/测试： 使用
```
esm3-sm-open-v1
```
或
```
esmc-300m
```
生产级质量： 通过Forge使用
```
esm3-medium-2024-08
```
最高精度： 使用
```
esm3-large-2024-03
```
或
```
esmc-6b
```
高吞吐量： 使用Forge API搭配批量执行器
成本优化： 使用较小模型，实现缓存策略

Installation

安装

Basic installation:

bash

uv pip install esm

With Flash Attention (recommended for faster inference):

bash

uv pip install esm
uv pip install flash-attn --no-build-isolation

For Forge API access:

bash

uv pip install esm  # SDK includes Forge client

No additional dependencies needed. Obtain Forge API token at https://forge.evolutionaryscale.ai

基础安装：

bash

uv pip install esm

搭配Flash Attention（推荐，可提升推理速度）：

bash

uv pip install esm
uv pip install flash-attn --no-build-isolation

For API访问：

bash

uv pip install esm  # SDK包含Forge客户端

无需额外依赖。请访问https://forge.evolutionaryscale.ai获取Forge API令牌。

Common Workflows

常见工作流

For detailed examples and complete workflows, see

references/workflows.md

which includes:

Novel GFP design with chain-of-thought
Protein variant generation and screening
Structure-based sequence optimization
Function prediction pipelines
Embedding-based clustering and analysis

如需详细示例和完整工作流，请参阅

references/workflows.md

，其中包含：

基于思维链的新型GFP设计
蛋白质变体生成与筛选
基于结构的序列优化
功能预测流水线
基于嵌入的聚类与分析

References

参考资料

This skill includes comprehensive reference documentation:

```
references/esm3-api.md
```
- ESM3 model architecture, API reference, generation parameters, and multimodal prompting
```
references/esm-c-api.md
```
- ESM C model details, embedding strategies, and performance optimization
```
references/forge-api.md
```
- Forge platform documentation, authentication, batch processing, and deployment
```
references/workflows.md
```
- Complete examples and common workflow patterns

These references contain detailed API specifications, parameter descriptions, and advanced usage patterns. Load them as needed for specific tasks.

本工具包含全面的参考文档：

```
references/esm3-api.md
```
- ESM3模型架构、API参考、生成参数和多模态提示
```
references/esm-c-api.md
```
- ESM C模型细节、嵌入策略和性能优化
```
references/forge-api.md
```
- Forge平台文档、认证、批量处理和部署
```
references/workflows.md
```
- 完整示例和常见工作流模式

这些参考资料包含详细的API规范、参数说明和高级使用模式。可根据具体任务按需加载。

Best Practices

最佳实践

For generation tasks:

Start with smaller models for prototyping (
```
esm3-sm-open-v1
```
)
Use temperature parameter to control diversity (0.0 = deterministic, 1.0 = diverse)
Implement iterative refinement with chain-of-thought for complex designs
Validate generated sequences with structure prediction or wet-lab experiments

For embedding tasks:

Batch process sequences when possible for efficiency
Cache embeddings for repeated analyses
Normalize embeddings when computing similarities
Use appropriate model size based on downstream task requirements

For production deployment:

Use Forge API for scalability and latest models
Implement error handling and retry logic for API calls
Monitor token usage and implement rate limiting
Consider AWS SageMaker deployment for dedicated infrastructure

生成任务：

原型开发从较小模型开始（
```
esm3-sm-open-v1
```
）
使用temperature参数控制多样性（0.0=确定性，1.0=高多样性）
对复杂设计使用思维链进行迭代优化
通过结构预测或湿实验验证生成的序列

嵌入任务：

尽可能批量处理序列以提升效率
为重复分析缓存嵌入结果
计算相似性时对嵌入进行归一化
根据下游任务需求选择合适的模型大小

生产部署：

使用Forge API实现可扩展性和获取最新模型
为API调用实现错误处理和重试逻辑
监控令牌使用情况并实现速率限制
考虑使用AWS SageMaker部署专用基础设施

Resources and Documentation

资源与文档

GitHub Repository: https://github.com/evolutionaryscale/esm
Forge Platform: https://forge.evolutionaryscale.ai
Scientific Paper: Hayes et al., Science (2025) - https://www.science.org/doi/10.1126/science.ads0018
Blog Posts:
- ESM3 Release: https://www.evolutionaryscale.ai/blog/esm3-release
- ESM C Launch: https://www.evolutionaryscale.ai/blog/esm-cambrian
Community: Slack community at https://bit.ly/3FKwcWd
Model Weights: HuggingFace EvolutionaryScale organization

GitHub仓库： https://github.com/evolutionaryscale/esm
Forge平台： https://forge.evolutionaryscale.ai
科学论文： Hayes等人，《Science》(2025) - https://www.science.org/doi/10.1126/science.ads0018
博客文章：
- ESM3发布：https://www.evolutionaryscale.ai/blog/esm3-release
- ESM C发布：https://www.evolutionaryscale.ai/blog/esm-cambrian
社区： Slack社区链接https://bit.ly/3FKwcWd
模型权重： HuggingFace EvolutionaryScale组织

Responsible Use

负责任使用

ESM is designed for beneficial applications in protein engineering, drug discovery, and scientific research. Follow the Responsible Biodesign Framework (https://responsiblebiodesign.ai/) when designing novel proteins. Consider biosafety and ethical implications of protein designs before experimental validation.

ESM专为蛋白质工程、药物发现和科学研究中的有益应用而设计。设计新型蛋白质时，请遵循《负责任生物设计框架》(https://responsiblebiodesign.ai/)。在实验验证前，请考虑蛋白质设计的生物安全和伦理影响。