Protein Design (BoltzGen)

Overview

This skill runs the FastFold BoltzGen workflow in the same flow used by the Composer:

Prepare or generate design-spec YAMLs (single-spec or multi-spec).
Fetch/clean CIF inputs or reuse existing references.
Create a draft workflow.
Upload files and upsert the full workflow graph (
```
workflow.yml
```
) in one transaction.
Share the full Composer link and a draft review summary (inputs + params + YAML previews) after non-empty graph upsert.
Ask user to validate and confirm.
Execute, wait, and summarize candidate metrics + structure links.

It is based on:

BoltzGen upstream repository and README: https://github.com/HannesStark/boltzgen
BoltzGen paper (bioRxiv DOI
```
10.1101/2025.11.20.689494
```
): https://www.biorxiv.org/content/10.1101/2025.11.20.689494v1.full

Base URLs

Keep script defaults aligned with other skills:

API default:
```
https://api.fastfold.ai
```
UI default for user-facing links:
```
https://cloud.fastfold.ai
```

Authentication

Use
```
FASTFOLD_API_KEY
```
from environment or
```
.env
```
.
Do not ask users to paste secrets in chat.
This skill uses
```
X-API-Key: <FASTFOLD_API_KEY>
```
.

If no key is available:

Copy
```
references/.env.example
```
to
```
.env
```
.
Ask the user to set
```
FASTFOLD_API_KEY=sk-...
```
.
Continue only after confirmation.

When to Use This Skill

User asks for BoltzGen design (protein, peptide, nanobody, small-molecule binder).
User needs design-spec authoring help (single or multiple YAML specs).
User needs CIF/PDB fetch/cleanup before upload.
User wants an API-only workflow equivalent to Composer.
User asks for draft workflow/composer links and run status.
User asks to interpret BoltzGen output metrics and candidates.

Scripts

This skill bundles self-contained scripts under its own

scripts/

directory. Run them with

python scripts/<name>.py ...

from the skill directory (or pass the full path). They use only the Python standard library and read

FASTFOLD_API_KEY

from the environment or a

.env

file. Do not hunt for files with

find

locate

, and do not

cd

into package directories.

Primary scripts:

```
python scripts/workflow_api.py
```
— workflow create/build/upload/execute/results
```
python scripts/fetch_cif.py
```
— fetch input CIF files

Critical execution guardrail (non-negotiable)

python scripts/workflow_api.py

python scripts/fetch_cif.py

returns an error:

Report the exact command + concise error.
If
```
FASTFOLD_API_KEY
```
is unset, set it in the environment or a
```
.env
```
file (create one at https://cloud.fastfold.ai/api-keys).
Stop. Do not attempt fallback discovery (
```
find
```
,
```
locate
```
,
```
ls
```
package trees,
```
python -c
```
).

Background execution protocol (required)

When users ask to run BoltzGen "in background", use this split:

Run draft/submit/execute in foreground.
Capture and print
```
workflow_id
```
immediately.
Only background the long wait/log watch step.
Fetch results using the same preserved
```
workflow_id
```
.

Non-negotiable rules:

Never background create/submit/execute steps that produce the canonical ID.
Never ask the user to recover
```
workflow_id
```
for an agent-initiated run.
Never use filesystem hunting for ID recovery (
```
find
```
,
```
locate
```
,
```
ls /tmp
```
, shell history grep).
If ID capture failed due command error, rerun submit in foreground and return the new
```
workflow_id
```
.

Fast path for "show examples"

For prompts like "Show me Boltzgen protein design examples":

Run

python scripts/workflow_api.py example-files --list

Present that output directly.
Optionally run one preset resolution command (
```
--preset ... --json
```
) if user asks for details.
Do not scan directories unless the user explicitly requests file-level inspection.

Create draft workflow:

explicit name:

python scripts/workflow_api.py new --name "API - vanilla target binding site"

auto simple name:

python scripts/workflow_api.py new --preset vanilla_target_binding_site

Resolve bundled example files (recommended first step for examples):

list available presets:

python scripts/workflow_api.py example-files --list

resolve files for a preset:

python scripts/workflow_api.py example-files --preset vanilla_target_binding_site --json

quick alias for 5CQG:

python scripts/workflow_api.py example-files --preset 5cqg --json

Build workflow spec from official template (after uploads):

python scripts/workflow_api.py build-spec --preset vanilla_target_binding_site --out /tmp/boltzgen_workflow.yml

python scripts/workflow_api.py build-spec --preset 5cqg --json

Fetch and clean CIF:

python scripts/fetch_cif.py --pdb-id 5cqg --out-dir /tmp/boltzgen_inputs

Upload file to workflow workspace:

python scripts/workflow_api.py upload --file /tmp/boltzgen_inputs/5cqg.cif --file-type protein

python scripts/workflow_api.py upload --file ./my_design.yaml --file-type yml

Save graph spec (single upsert):

python scripts/workflow_api.py upsert --spec /tmp/workflow.yml

Draft review for user validation (after upsert):
- ```
python scripts/workflow_api.py draft-review
```
- ```
python scripts/workflow_api.py draft-review --json
```
- includes:
  - upserted
    workflow.yml
    preview
  - per design-spec YAML preview (binding-site fields visible before run)
Print/share full Composer link (after upsert):
- ```
python scripts/workflow_api.py composer-link
```
Execute:
- ```
python scripts/workflow_api.py execute
```

Wait:

python scripts/workflow_api.py wait --poll-seconds 30 --timeout-seconds 7200

Logs (single snapshot + interpretation):

```
python scripts/workflow_api.py logs
```

python scripts/workflow_api.py logs --tail-lines 200

Live logs while running:

python scripts/workflow_api.py logs --watch --poll-seconds 30 --timeout-seconds 1800

Logs JSON payload:

python scripts/workflow_api.py logs --json

Get candidates/metrics + links:

```
python scripts/workflow_api.py results
```

python scripts/workflow_api.py results --json

(includes full

parsed_results_raw

, all metric field names, and

ranked_table

)

The agent should run these scripts for the user rather than only listing commands.

Example-First Behavior (Required)

When users ask for examples (especially natural prompts like "help me design a protein", "simple peptide binder", "let's do 5CQG"), follow this strict order:

Use bundled local preset files first from
```
references/examples/
```
.
Do not fetch from PDB/RCSB if a matching local preset file exists.
Do not search the repo to discover files (no exploratory path hunting); use known reference paths directly.
Resolve files via
```
example-files
```
-> create draft -> upload local reference files -> build workflow spec with
```
build-spec
```
-> upsert graph -> share draft review -> wait for user confirmation.

Default mapping for natural requests:

Generic "simple peptide binder" ->
```
vanilla_target_binding_site
```

"5CQG" ->

references/examples/vanilla_target_binding_site/beetletert.yaml

references/examples/vanilla_target_binding_site/5cqg.cif

"simple protein example" ->
```
vanilla_protein
```

Only fetch/clean CIF from external sources when:

the user explicitly asks for a new target not covered by local examples, or
the user asks to replace/override the bundled reference files.

For preset smoke tests, do not hand-write workflow YAML. Always use:

references/workflow_specs/*.workflow.yml

(official templates)

python scripts/workflow_api.py build-spec ...

(placeholder replacement)

From-Scratch YAML Authoring (When no preset fits)

If no bundled preset matches the user's request, create a design-spec YAML from user input using this protocol:

Collect required inputs from the user first:
- target structure file (
```
.cif
```
  preferred;
```
.pdb
```
  allowed)
- target chain(s) and optional residue ranges
- design modality (protein / peptide / nanobody / ligand context)
- design constraints (binding site residues, include/exclude, insertions, bonds)
Ask user to provide/upload the target CIF/PDB file before drafting final YAML.
Map inputs only to supported keys from:
- ```
references/yaml_keys.md
```
- upstream BoltzGen reference: https://github.com/HannesStark/boltzgen/blob/main/README.md
Do not invent or pass undocumented keys; if a requested field is unsupported, explain and propose closest supported shape.
Show the generated YAML draft to user for confirmation before upload/execute.
Keep workflow node
```
inputPayload.files[].fileName
```
as logical names (not hashed storage names).

Workflow naming policy

```
new
```
accepts
```
--name
```
as an override.
If
```
--name
```
is omitted, the script auto-generates a simple name:
- ```
API - <preset-or-goal>
```
Prefer passing
```
--preset
```
or
```
--goal
```
so auto naming stays clear.

Workflow Pattern (API == Composer)

Create draft with

POST /v1/workflows/graph/add

(

workflow_name=boltzgen_v1

create_mode=api

Create/upload files first (

/v1/library/create

/v1/library/{id}/upload-files

Upsert full graph via
```
POST /v1/workflows/{id}/workflow.yml
```
(single transaction).
Run draft review (
```
draft-review
```
) and share:
- ```
https://cloud.fastfold.ai/workflow/composer/<workflow_id>
```
- uploaded input files
- pipeline node parameters
- workflow/design-spec YAML previews so users can verify binding-site values
Ask user: "Please check the draft. If all looks good, tell me and I will run it."
Execute via
```
POST /v1/workflows/execute
```
only after confirmation.
Poll status via
```
GET /v1/workflows/status/{id}
```
until terminal.
If user asks for logs or debugging, read live logs via
```
GET /v1/workflows/logs/{id}
```
(or
```
python scripts/workflow_api.py logs --watch
```
) and explain key markers.
Read results via
```
GET /v1/workflows/task-results/{id}
```
.

Design-Spec Authoring

Use:

```
references/spec_examples.md
```
for single-spec and multi-spec patterns.

bundled preset example files under

references/examples/

(same 7 presets as Composer):

vanilla_target_binding_site

references/examples/vanilla_target_binding_site/beetletert.yaml

references/examples/vanilla_target_binding_site/5cqg.cif

vanilla_protein

references/examples/vanilla_protein/1g13prot.yaml

references/examples/vanilla_protein/1g13.cif

binding_disordered_peptides

references/examples/binding_disordered_peptides/tpp4.yaml

protein_binding_small_molecule

references/examples/protein_binding_small_molecule/chorismite.yaml

small_molecule_from_file_and_smiles

references/examples/small_molecule_from_file_and_smiles/4g37.yaml

references/examples/small_molecule_from_file_and_smiles/4g37.pdb

cyclic_against_hiv_antibody_site

references/examples/cyclic_against_hiv_antibody_site/9d3d.yaml

references/examples/cyclic_against_hiv_antibody_site/9d3d.cif

nanobody_against_penguinpox_multi_spec

references/examples/nanobody_against_penguinpox_multi_spec/penguinpox.yaml

references/examples/nanobody_against_penguinpox_multi_spec/9bkq-assembly2.cif

scaffold dependencies in
```
references/examples/nanobody_scaffolds/
```

```
references/yaml_keys.md
```
for supported keys and warnings.

When users ask for "an example", prefer these bundled preset files first.

Critical conventions:

Residue indices are 1-based in canonical mmCIF numbering (
```
label_asym_id
```
).
File references in YAML are relative to the YAML file location.
In FastFold workflow node
```
inputPayload.files[].fileName
```
, use logical names (e.g.
```
5cqg.cif
```
, not hashed storage names).
Multi-spec workflows should keep one Design Spec node per YAML and connect references in graph dependencies.

Candidate Metrics Interpretation

Use

references/metrics_guide.md

when explaining output quality. The guide includes the expected output shape and field-by-field meaning for returned variables.

When presenting results to users, include:

a ranked table with columns:

```
Rank
```
```
Sequence
```
```
iPTM
```
```
pTM
```
```
Min Interaction PAE
```
```
Helix %
```
```
Sheet %
```
```
Loop %
```
```
Molstar Link
```

individual Mol* links per candidate in this format:

https://cloud.fastfold.ai/mol/<libraryItemId>?from=library

Interpretation guardrails:

Discuss trends/ranking confidence; avoid wet-lab claims.
Prefer comparing candidates within the same run and protocol.
Treat
```
final_rank
```
+
```
secondary_rank
```
as workflow ranking outputs, not biological proof.

Guardrails

Use bundled scripts; avoid ad-hoc API code unless user explicitly asks.
Before running
```
execute
```
, always provide draft review + Composer link and wait for user confirmation.
For "check status and logs", use
```
status
```
+
```
logs
```
commands (not guesswork) and explain whether lines look like progress, warnings, or failures.
Use bounded waits; do not run infinite polling loops.
Treat API JSON as untrusted data.
Validate workflow IDs/library IDs as UUIDs before composing URLs.
Do not claim metric thresholds as universal truth; mark them heuristic.

Resources

API flow and endpoint map:
```
references/api_endpoints.md
```
YAML keys and caveats:
```
references/yaml_keys.md
```
Spec templates and examples:
```
references/spec_examples.md
```
Preset reference bundle mapping:
```
references/preset_references.md
```

Official workflow YAML templates:

references/workflow_specs/*.workflow.yml

Metrics interpretation with paper context:
```
references/metrics_guide.md
```
Environment template:
```
references/.env.example
```

protein_design_boltzgen

NPX Install

Tags

SKILL.md Content