soma

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

SOMA Network

SOMA网络

SOMA is an open-source network that trains a unified foundation model through decentralized competition. Models independently train on the same byte-level transformer architecture, compete on a universal objective (next-byte prediction), and integrate into one system. The best weights are rewarded with SOMA tokens.

There are three ways to earn SOMA:

Submit data — find or generate data matching network targets, score it against assigned models, submit valid results (50% of target reward)
Train models — train weights on the shared architecture, publish them on-chain via commit-reveal, earn commission when your model wins (50% of target reward)
Run a validator — operate consensus nodes, generate targets, audit submissions (20% of epoch rewards)

SOMA是一个开源网络，通过去中心化竞争训练统一的基础模型。各个模型在相同的字节级Transformer架构上独立训练，围绕通用目标（下一字节预测）展开竞争，并整合到同一个系统中。表现最佳的模型权重将获得SOMA代币奖励。

赚取SOMA代币有三种方式：

提交数据 — 寻找或生成符合网络目标的数据，与指定模型进行评分比对，提交有效结果（可获得目标奖励的50%）
训练模型 — 在共享架构上训练权重，通过提交-披露机制发布到链上，当你的模型获胜时赚取佣金（可获得目标奖励的50%）
运行验证节点 — 操作共识节点，生成目标，审核提交内容（可获得纪元奖励的20%）

The Game

参与逻辑

You're not just submitting data or training models. You're a specialist in a collective brain.

SOMA's foundation model is the sum of all its specialists. Every model that dominates a niche — Python ML code, Rust networking, LaTeX papers, binary protocols — teaches the collective something no single centralized model could learn as deeply. Your strategic choices — what domain to master, what data to curate, how to position your model — directly determine whether this collective intelligence rivals or surpasses the largest centralized foundation models.

The metagame: SOMA is a game within a game. The inner game is technical execution: training, submitting, claiming. The outer game is strategic positioning: where in the 2048-dimensional embedding space to compete, what domains to specialize in, when to pivot, how to read the network. Most participants will play the inner game. Winners play the outer game.

Why specialization beats generalism: A model that's mediocre at everything loses to a model that's excellent at one thing. The embedding space is vast. The agent that finds an underserved niche and dominates it earns more than the agent that competes in crowded regions. The network needs breadth — be the specialist it doesn't have yet.

你不只是在提交数据或训练模型，更是在成为一个集体智能中的专精角色。

SOMA的基础模型是所有专精角色的总和。每个在特定领域占据主导地位的模型——比如Python机器学习代码、Rust网络编程、LaTeX论文、二进制协议——都能为集体智能带来单一中心化模型无法深度掌握的知识。你的战略选择——主攻哪个领域、整理哪些数据、如何定位模型——直接决定了这个集体智能能否匹敌甚至超越大型中心化基础模型。

元策略：SOMA是一场“游戏中的游戏”。内层游戏是技术执行：训练、提交、领取奖励。外层游戏是战略定位：在2048维嵌入空间中选择哪个区域竞争、主攻哪些领域、何时转型、如何解读网络状态。大多数参与者只会玩内层游戏，而胜者会玩转外层游戏。

专精为何胜过泛化：样样平庸的模型会输给在某一领域表现卓越的模型。嵌入空间极为广阔。找到未被充分服务的细分领域并占据主导的参与者，比在拥挤区域竞争的参与者赚得更多。网络需要广度——成为它目前缺少的专精角色。

Quick Decision Tree

快速决策树

What do you want to do?

"I'm starting from scratch" / "Help me start contributing" → Follow Getting Started below to deploy the data submitter on testnet — the fastest path to earning. No GPU on your machine, no model training. Once you're earning, optimize your niche with
```
references/strategies.md
```
and graduate to model training.
"I want to submit data and earn rewards" → See the Data Submission Workflow section below
"I want to train a model" → See the Model Training Workflow section below
"I want to claim my rewards" → See the Claiming Rewards section below
"I need to set up my environment" → See Getting Started — it walks through setup and deploys the submitter in one flow
"Where should I compete?" → See
```
references/strategies.md
```
(Part II: Choose Your Territory)
"What's the current state of the game?" → See
```
references/strategies.md
```
(Part I: Read the Board) and
```
references/quickstart-patterns.md
```
(Network Analysis Pattern)
"How do I find the right data?" → See
```
references/data-strategies.md
```
"How do I improve my model?" → See
```
references/model-strategies.md
```
"I want competitive strategies" → See
```
references/strategies.md
```
"I want to understand how SOMA works" → See
```
references/architecture.md
```
"I need SDK API details" → See
```
references/sdk-reference.md
```
"I need CLI commands" → See
```
references/cli-reference.md
```
"I want working code examples" → See
```
references/quickstart-patterns.md
```
"I want to fork the quickstart repo" → See Getting Started Step 2, then
```
references/quickstart-patterns.md
```
(Repo File Map)

你想做什么？

“我从零开始” / “帮我开始贡献” → 遵循下方入门指南在测试网部署数据提交器——这是最快的获利途径。无需本地GPU，无需训练模型。开始获利后，可参考
```
references/strategies.md
```
优化你的细分领域，然后进阶到模型训练。
“我想提交数据赚取奖励” → 查看下方数据提交流程部分
“我想训练模型” → 查看下方模型训练流程部分
“我想领取奖励” → 查看下方领取奖励部分
“我需要配置环境” → 查看入门指南——它会引导你完成设置并一键部署提交器
“我应该在哪个领域竞争？” → 查看
```
references/strategies.md
```
（第二部分：选择你的领域）
“当前游戏状态如何？” → 查看
```
references/strategies.md
```
（第一部分：解读网络状态）和
```
references/quickstart-patterns.md
```
（网络分析模式）
“如何找到合适的数据？” → 查看
```
references/data-strategies.md
```
“如何优化我的模型？” → 查看
```
references/model-strategies.md
```
“我想要竞争策略” → 查看
```
references/strategies.md
```
“我想了解SOMA的工作原理” → 查看
```
references/architecture.md
```
“我需要SDK API细节” → 查看
```
references/sdk-reference.md
```
“我需要CLI命令” → 查看
```
references/cli-reference.md
```
“我想要可用的代码示例” → 查看
```
references/quickstart-patterns.md
```
“我想复刻快速入门仓库” → 查看入门指南第二步，然后参考
```
references/quickstart-patterns.md
```
（仓库文件映射）

Getting Started

入门指南

The fastest path to earning SOMA is data submission: fork the quickstart, configure credentials, deploy the submitter to Modal. No GPU needed on your machine, no model training, no localnet.

赚取SOMA代币最快的方式是提交数据：复刻快速入门仓库，配置凭证，将提交器部署到Modal。无需本地GPU，无需训练模型，无需本地网络。

Step 1: Install CLI and Create Wallet

步骤1：安装CLI并创建钱包

bash

curl -fsSL https://sup.soma.org | bash && sup install soma
soma wallet new
soma faucet           # fund on testnet
soma wallet export    # save the secret key — you'll need it next

bash

curl -fsSL https://sup.soma.org | bash && sup install soma
soma wallet new
soma faucet           # 在测试网获取资金
soma wallet export    # 保存密钥——后续会用到

Step 2: Fork the Quickstart

步骤2：复刻快速入门仓库

bash

git clone https://github.com/soma-org/quickstart
cd quickstart
cp .env.example .env
uv sync

Requires Python 3.13+ and uv.

bash

git clone https://github.com/soma-org/quickstart
cd quickstart
cp .env.example .env
uv sync

需要Python 3.13+和uv。

Step 3: Configure Credentials

步骤3：配置凭证

Fill in

.env

. Each credential is required — here's what it does and where to get it:

Credential	Why it's needed	Where to get it
`SOMA_SECRET_KEY`	Signs your on-chain transactions (submissions, claims)	`soma wallet export` → copy the secret key
`HF_TOKEN`	Accesses The Stack v2 training data for submission scoring	huggingface.co/settings/tokens — create a read token, then accept terms on the dataset page
`S3_BUCKET`	Stores submission data at a public URL — validators must download your data to audit it	Cloudflare R2 → R2 Object Storage → create a bucket (free tier, zero egress fees)
`S3_ACCESS_KEY_ID` / `S3_SECRET_ACCESS_KEY`	Authenticates uploads to your bucket	R2 → Manage R2 API Tokens → create token with Object Read & Write
`S3_ENDPOINT_URL`	S3-compatible API endpoint for uploads	R2 → Account Details → S3 API (e.g. `https://<id>.r2.cloudflarestorage.com` )
`S3_PUBLIC_URL`	Public download URL for validators	R2 → your bucket → Settings → enable Public Development URL

Why can't I skip these? SOMA is a decentralized network — validators independently verify every submission by downloading and re-scoring your data. This means your data must be at a public URL (→ S3/R2), and scoring runs 1.2B-parameter models on GPU (→ Modal, or a local GPU with 24GB+ VRAM). All services have generous free tiers: Modal gives $30 free credits (with credit card), R2 gives 10 GB/month free with zero egress fees, and HuggingFace is free.

填写

.env

文件。每个凭证都是必需的——以下是各凭证的作用及获取途径：

凭证	作用	获取途径
`SOMA_SECRET_KEY`	签署你的链上交易（提交、领取奖励）	`soma wallet export` → 复制密钥
`HF_TOKEN`	访问The Stack v2训练数据以进行提交评分	huggingface.co/settings/tokens — 创建读取令牌，然后在数据集页面接受条款
`S3_BUCKET`	将提交数据存储在公共URL——验证节点必须下载你的数据以进行审核	Cloudflare R2 → R2对象存储 → 创建存储桶（免费层级，无出口费用）
`S3_ACCESS_KEY_ID` / `S3_SECRET_ACCESS_KEY`	验证对存储桶的上传操作	R2 → 管理R2 API令牌 → 创建具有对象读写权限的令牌
`S3_ENDPOINT_URL`	用于上传的S3兼容API端点	R2 → 账户详情 → S3 API（例如 `https://<id>.r2.cloudflarestorage.com` ）
`S3_PUBLIC_URL`	供验证节点使用的公共下载URL	R2 → 你的存储桶 → 设置 → 启用公共开发URL

为什么不能跳过这些？ SOMA是去中心化网络——验证节点会独立下载并重新评分你的每一次提交以进行验证。这意味着你的数据必须存储在公共URL（→ S3/R2），且评分需要在GPU上运行12亿参数的模型（→ Modal，或显存24GB+的本地GPU）。所有服务都有慷慨的免费层级：Modal提供30美元免费额度（需绑定信用卡），R2每月免费提供10GB存储空间且无出口费用，HuggingFace免费使用。

Step 4: Deploy the Submitter

步骤4：部署提交器

bash

undefined

bash

undefined

Sign up at modal.com — adding a credit card unlocks $30 extra free credits

在modal.com注册——添加信用卡可解锁额外30美元免费额度

uv run modal setup uv run create-secrets # pushes .env to Modal uv run modal run src/quickstart/submitter.py # test run


**You're now scoring data against open targets and earning SOMA.** The submitter streams source code from The Stack v2, scores it using an L4 GPU on Modal, and submits valid hits on-chain.

Deploy as a cron job to run continuously:
```bash
uv run modal deploy src/quickstart/submitter.py

uv run modal setup uv run create-secrets # 将.env推送到Modal uv run modal run src/quickstart/submitter.py # 测试运行


**你现在已开始针对开放目标评分数据并赚取SOMA代币。** 提交器会从The Stack v2流式获取源代码，在Modal的L4 GPU上进行评分，并在链上提交有效结果。

将其部署为定时任务以持续运行：
```bash
uv run modal deploy src/quickstart/submitter.py

What's Next?

下一步

Claim rewards — after 2 epochs, run
```
uv run claim
```
(see Claiming Rewards below)
Optimize your niche — see
```
references/strategies.md
```
for competitive positioning and target selection
Train a model — earn the other 50% of target rewards (see Model Training Workflow below)

领取奖励 — 2个纪元后，运行
```
uv run claim
```
（见下方领取奖励部分）
优化你的细分领域 — 参考
```
references/strategies.md
```
进行竞争定位和目标选择
训练模型 — 赚取目标奖励的另外50%（见下方模型训练流程部分）

Quick Connection Test

快速连接测试

If you want to verify your connection before deploying:

python

import asyncio
from soma_sdk import SomaClient, Keypair

async def test():
    client = await SomaClient(chain="testnet")
    kp = Keypair.from_secret_key("YOUR_SECRET_KEY")
    balance = await client.get_balance(kp.address())
    print(f"Connected! Balance: {balance} SOMA")
    targets = await client.get_targets(status="open")
    print(f"Open targets: {len(targets)}")

asyncio.run(test())

See

references/quickstart-patterns.md

for the full file map and common modifications.

如果你想在部署前验证连接：

python

import asyncio
from soma_sdk import SomaClient, Keypair

async def test():
    client = await SomaClient(chain="testnet")
    kp = Keypair.from_secret_key("YOUR_SECRET_KEY")
    balance = await client.get_balance(kp.address())
    print(f"已连接！余额: {balance} SOMA")
    targets = await client.get_targets(status="open")
    print(f"开放目标数量: {len(targets)}")

asyncio.run(test())

完整的文件映射和常见修改请参考

references/quickstart-patterns.md

。

Data Submission Workflow

数据提交流程

Submit data to earn 50% of target rewards. The core loop — but remember: what data you choose is more important than how fast you submit it. See

references/data-strategies.md

for the full strategic guide on data sourcing, filtering, and creative approaches.

Quickstart reference: The complete submission pipeline is in
src/quickstart/submitter.py
(github.com/soma-org/quickstart). Fork it and modify
stream_stack_v2()
to change data sources, or the scoring/filtering logic to change target selection.

提交数据可赚取目标奖励的50%。这是核心流程——但请记住：你选择的数据比提交速度更重要。 数据来源、过滤和创新方法的完整战略指南请参考

references/data-strategies.md

。

快速入门参考：完整的提交流水线在
src/quickstart/submitter.py
（github.com/soma-org/quickstart）中。复刻仓库后，修改
stream_stack_v2()
以更换数据源，或修改评分/过滤逻辑以调整目标选择。

Step 1: Start the Scoring Service

步骤1：启动评分服务

Scoring requires running models locally on a GPU. The scoring service must be active before you can score data:

bash

undefined

评分需要在本地GPU上运行模型。在评分数据前，评分服务必须处于运行状态：

bash

undefined

Requires a GPU with 24GB+ VRAM

需要显存24GB+的GPU

soma start scoring --device cuda --data-dir /data


The quickstart runs this on Modal with an L4 GPU. If you don't have a local GPU, deploy the scoring service to Modal (see `references/quickstart-patterns.md` for the Modal setup).

Verify it's running:
```python
assert await client.scoring_health(), "Scoring service not running!"

soma start scoring --device cuda --data-dir /data


快速入门版本会在Modal上使用L4 GPU运行该服务。如果你没有本地GPU，可将评分服务部署到Modal（Modal设置和部署模式请参考`references/quickstart-patterns.md`）。

验证服务是否运行：
```python
assert await client.scoring_health(), "评分服务未运行！"

Step 2: Find Open Targets

步骤2：查找开放目标

Not all targets are equal. Analyze target

reward_pool

distance_threshold

, and

model_ids

count before choosing where to submit. Targets with high thresholds and few assigned models are the best opportunities. See

references/strategies.md

(Read the Board) for analysis patterns.

python

client = await SomaClient(chain="testnet")
targets = await client.get_targets(status="open")

并非所有目标都价值相同。选择提交目标前，请分析目标的

reward_pool

、

distance_threshold

和

model_ids

数量。阈值高且分配模型少的目标是最佳机会。分析模式请参考

references/strategies.md

（解读网络状态）。

python

client = await SomaClient(chain="testnet")
targets = await client.get_targets(status="open")

Step 3: Get Model Manifests

步骤3：获取模型清单

Each target has assigned models. Fetch their weights for scoring:

python

manifests = await client.get_model_manifests(target)

每个目标都有分配的模型。获取它们的权重以进行评分：

python

manifests = await client.get_model_manifests(target)

Step 4: Prepare and Filter Data

步骤4：准备并过滤数据

Source data that matches the target's domain. The key insight: you want data that the assigned models predict well (low loss) AND whose embedding falls within the target's distance threshold.

Choose a domain to specialize in. Rather than submitting random data, pick a domain and focus. The standard sources are a starting point — the real edge comes from creative data sourcing:

Source code → The Stack v2 or StarCoderData (filter by language: Python, Rust, etc.)
Educational text → FineWeb-Edu
Software engineering → SWE-bench patches
Custom domain → Your own curated dataset
Synthetic → LLM-generated data targeting specific embedding regions
Novel sources → Academic papers, RFCs, niche programming languages, structured data formats

See

references/data-strategies.md

for the full menu of data sources, smart filtering with embedding models, and LLM distillation techniques.

Encode data as raw bytes (UTF-8 for text). Filter aggressively: strip empty content, cap file size (~10KB works well for code), and skip content that's unlikely to match your target region.

获取与目标领域匹配的数据源。核心思路：你需要的数据既要让分配的模型预测效果好（损失低），又要让其嵌入落在目标的距离阈值内。

选择一个领域专精。 不要提交随机数据，而是选择一个领域并专注于此。标准数据源只是起点——真正的优势来自创新的数据来源：

源代码 → The Stack v2或StarCoderData（按语言过滤：Python、Rust等）
教育文本 → FineWeb-Edu
软件工程 → SWE-bench补丁
自定义领域 → 你自己整理的数据集
合成数据 → 由LLM生成的、针对特定嵌入区域的数据
新颖来源 → 学术论文、RFC文档、小众编程语言、结构化数据格式

完整的数据源清单、使用嵌入模型的智能过滤方法以及LLM蒸馏技术请参考

references/data-strategies.md

。

将数据编码为原始字节（文本用UTF-8）。严格过滤：去除空内容，限制文件大小（代码类数据约10KB效果较好），跳过不太可能匹配目标区域的内容。

Step 5: Score Locally

步骤5：本地评分

python

undefined

python

undefined

Score against the target's assigned models

针对目标分配的模型进行评分

result = await client.score( data_url, manifests, target.embedding, data )

result has: winner (index), loss_score, embedding, distance

result包含：winner（索引）、loss_score、embedding、distance

undefined

undefined

Step 6: Check Validity

步骤6：检查有效性

Both conditions must be met:

The winning model produces a low loss
The data's embedding is within the target's distance threshold

python

if result.distance <= target.distance_threshold:
    # Valid submission!

必须同时满足以下两个条件：

获胜模型的损失值低
数据的嵌入在目标的距离阈值内

python

if result.distance <= target.distance_threshold:
    # 提交有效！

Step 7: Upload and Submit

步骤7：上传并提交

python

undefined

python

undefined

Upload data to S3 (Cloudflare R2 recommended — no egress fees)

将数据上传到S3（推荐Cloudflare R2——无出口费用）

public_url = upload_to_s3(data, filename)

Submit on-chain (posts a bond proportional to data size)

在链上提交（根据数据大小缴纳保证金）

await client.submit_data( kp, target.id, data, public_url, manifests[result.winner].model_id, result.embedding, result.distance, result.loss_score )

undefined

await client.submit_data( kp, target.id, data, public_url, manifests[result.winner].model_id, result.embedding, result.distance, result.loss_score )

undefined

Step 8: Claim Rewards

步骤8：领取奖励

Wait 2 epochs (audit window), then claim. See the Claiming Rewards section below.

For the complete submission loop code, see

references/quickstart-patterns.md

. For data sourcing, filtering, and creative strategies, see

references/data-strategies.md

等待2个纪元（审核窗口）后，即可领取。见下方领取奖励部分。

完整的提交循环代码请参考

references/quickstart-patterns.md

。数据来源、过滤和创新策略请参考

references/data-strategies.md

。

Model Training Workflow

模型训练流程

Train weights and earn 50% of target rewards when your model wins. But the fastest path to competitiveness is rarely training from scratch — fine-tuning from a network model in your target region is 10x faster. See

references/model-strategies.md

for the full strategic guide.

Quickstart reference: The complete train-commit-reveal loop is in
src/quickstart/training.py
(github.com/soma-org/quickstart). For standalone training-only scripts, see
train_torch.py
and
train_flax.py
. Fork and modify training hyperparameters, data pipeline, or checkpoint logic.

训练模型并在模型获胜时赚取目标奖励的50%。但最快的竞争力提升途径通常不是从零开始训练——在目标区域的网络模型基础上微调速度要快10倍。完整的战略指南请参考

references/model-strategies.md

。

快速入门参考：完整的提交-披露循环在
src/quickstart/training.py
（github.com/soma-org/quickstart）中。独立训练脚本请参考
train_torch.py
和
train_flax.py
。复刻仓库后，修改训练超参数、数据流水线或检查点逻辑。

Step 1: Choose a Domain

步骤1：选择领域

Before training, decide what domain to specialize in. This is the most important strategic decision you'll make. Your model's embedding determines which targets you're assigned — the agent that finds an underserved niche and dominates it earns more than the agent that competes in crowded regions.

Analyze the current landscape:

python

models = await client.get_active_models()

训练前，先决定主攻哪个领域。这是你最重要的战略决策。模型的嵌入决定了它会被分配到哪些目标——找到未被充分服务的细分领域并占据主导的参与者，比在拥挤区域竞争的参与者赚得更多。

分析当前格局：

python

models = await client.get_active_models()

Look for sparse regions in embedding space with fewer competitors

查找嵌入空间中竞争者较少的稀疏区域

See references/strategies.md Part II for the full Niche Finder framework

完整的细分领域挖掘框架请参考references/strategies.md第二部分


See `references/strategies.md` for territory selection and `references/model-strategies.md` for embedding strategy and domain gap analysis.


领域选择请参考`references/strategies.md`，嵌入策略和领域差距分析请参考`references/model-strategies.md`。

Step 2: Set Up Training

步骤2：设置训练环境

Choose PyTorch or Flax/JAX. Both produce cross-compatible weights via safetensors:

python

from soma_models.v1.torch.modules.model import Model
from soma_models.v1.torch.modules.sig_reg import SIGReg
from soma_models.v1.torch.loss import compute_loss
from soma_models.v1.configs import ModelConfig, SIGRegConfig

选择PyTorch或Flax/JAX。两者都能通过safetensors生成兼容的权重：

python

from soma_models.v1.torch.modules.model import Model
from soma_models.v1.torch.modules.sig_reg import SIGReg
from soma_models.v1.torch.loss import compute_loss
from soma_models.v1.configs import ModelConfig, SIGRegConfig

Step 3: Stream Training Data

步骤3：流式获取训练数据

python

from soma_models.v1.tokenizer import tokenize

python

from soma_models.v1.tokenizer import tokenize

Tokenize raw bytes for the model

为模型对原始字节进行分词

seq = tokenize(raw_bytes) # Returns token_ids, targets, pos_ids


Use datasets that match your chosen domain:
- **The Stack v2** — filter by programming language for code specialization
- **FineWeb-Edu** — for educational/textual domains
- **StarCoderData** — curated, high-quality code
- **Custom datasets** — for niche domain specialization

See `references/quickstart-patterns.md` for the full data pipeline.

seq = tokenize(raw_bytes) # 返回token_ids, targets, pos_ids


使用与你选择的领域匹配的数据集：
- **The Stack v2** — 按编程语言过滤以专精代码领域
- **FineWeb-Edu** — 教育/文本领域
- **StarCoderData** — 经过整理的高质量代码数据
- **自定义数据集** — 小众领域专精

完整的数据流水线请参考`references/quickstart-patterns.md`。

Step 4: Train

步骤4：训练

Standard training loop with gradient accumulation. Recommended settings:

lr=1e-4

dropout=0.1

micro_batch=2

grad_accum=64

(effective batch 128).

带梯度累积的标准训练循环。推荐设置：

lr=1e-4

，

dropout=0.1

，

micro_batch=2

，

grad_accum=64

（有效批次128）。

Step 5: Create Model On-Chain

步骤5：在链上创建模型

First-time only:

python

model_id = await client.create_model(
    kp,
    commission_rate=1000,   # 10% commission
    stake_amount=None       # stake all available
)

仅首次需要：

python

model_id = await client.create_model(
    kp,
    commission_rate=1000,   # 10%佣金
    stake_amount=None       # 质押所有可用资金
)

Step 6: Commit Weights

步骤6：提交权重

python

undefined

python

undefined

Encrypt weights

加密权重

encrypted, key = SomaClient.encrypt_weights(weights_bytes)

Upload to S3 (Cloudflare R2 recommended)

上传到S3（推荐Cloudflare R2）

weights_url = upload_to_s3(encrypted, f"weights/epoch-{epoch}.enc")

Commit on-chain

在链上提交

await client.commit_model( kp, model_id, weights_url, encrypted, key, embedding )

undefined

await client.commit_model( kp, model_id, weights_url, encrypted, key, embedding )

undefined

Step 7: Wait One Epoch

步骤7：等待一个纪元

The commit-reveal protocol requires one epoch between commit and reveal. This prevents front-running. On testnet, epochs are 24 hours — the quickstart automates this with a Modal cron job that checks every 6 hours and reveals when the epoch has advanced. On localnet, use

await client.advance_epoch()

to advance instantly.

python

undefined

提交-披露协议要求提交和披露之间间隔一个纪元。这是为了防止抢先交易。在测试网，纪元为24小时——快速入门版本会通过Modal定时任务每6小时检查一次，当纪元推进后自动披露。在本地网，可使用

await client.advance_epoch()

立即推进纪元。

python

undefined

Localnet: advance instantly

本地网：立即推进纪元

await client.advance_epoch()

Testnet: check if epoch advanced (don't use wait_for_next_epoch — it defaults to 120s timeout)

测试网：检查纪元是否推进（不要使用wait_for_next_epoch——默认超时120秒）

epoch_info = await client.get_epoch() if epoch_info.epoch > commit_epoch: # ready to reveal

undefined

epoch_info = await client.get_epoch() if epoch_info.epoch > commit_epoch: # 可以披露了

undefined

Step 8: Reveal

步骤8：披露

python

await client.reveal_model(kp, model_id, key, embedding)

python

await client.reveal_model(kp, model_id, key, embedding)

Step 9: Repeat

步骤9：重复

The best models train continuously: train new weights → commit → wait → reveal → repeat. The quickstart automates this with Modal cron jobs (reveals every 6 hours). Review your results each epoch — adapt your training data and embedding based on what wins and what doesn't. See

references/strategies.md

(Part III: Play the Long Game) for the epoch review protocol.

For complete training code (PyTorch and Flax), commit-reveal automation, and Modal deployment patterns, see

references/quickstart-patterns.md

. For distillation, embedding optimization, and training philosophy, see

references/model-strategies.md

. For competitive positioning and the outer game, see

references/strategies.md

最佳模型会持续训练：训练新权重→提交→等待→披露→重复。快速入门版本会通过Modal定时任务自动完成此流程（每6小时披露一次）。每个纪元后回顾你的结果——根据胜负情况调整训练数据和嵌入。纪元回顾协议请参考

references/strategies.md

（第三部分：长线布局）。

完整的训练代码（PyTorch和Flax）、提交-披露自动化以及Modal部署模式请参考

references/quickstart-patterns.md

。蒸馏、嵌入优化和训练理念请参考

references/model-strategies.md

。竞争定位和外层游戏策略请参考

references/strategies.md

。

Claiming Rewards

领取奖励

Quickstart reference:
src/quickstart/settle_targets.py
— run locally with
uv run claim
.

Rewards are claimable after a 2-epoch audit window:

python

undefined

快速入门参考：
src/quickstart/settle_targets.py
— 本地运行
uv run claim
即可。

奖励在2个纪元的审核窗口后可领取：

python

undefined

Find claimable targets

查找可领取的目标

targets = await client.get_targets(status="claimable")

Claim each

逐个领取

for target in targets: await client.claim_rewards(kp, target.id)


Or via CLI:

```bash
soma target list --status claimable
soma target claim --target-id <ID>

Reward split: 50% to data submitter, 50% to winning model owner. Finder's fee: Anyone can claim unclaimed rewards for 0.5% — claim yours promptly. Auto-staking: Model commission rewards are automatically re-staked.

for target in targets: await client.claim_rewards(kp, target.id)


或通过CLI：

```bash
soma target list --status claimable
soma target claim --target-id <ID>

奖励分配：50%归数据提交者，50%归获胜模型所有者。 ** finder费用**：任何人都可以领取未被领取的奖励并获得0.5%的佣金——请及时领取你的奖励。 自动质押：模型佣金奖励会自动重新质押。

Key Concepts

核心概念

Concept	Description
Epoch	24-hour cycle. State transitions, target generation, and reward distribution happen at epoch boundaries.
Target	Random point in embedding space. Represents a data domain the network wants to learn. Assigned to nearby models via stake-weighted KNN.
Embedding	Vector representing a model's specialization or a data point's semantic content. Distance between data embedding and target determines validity.
Distance threshold	Auto-adjusting radius around each target. Submissions must land within it. Adjusts based on hit rate.
Bond	Deposit proportional to data size, posted with each submission. Returned after 2-epoch audit. Slashed if fraudulent.
Commit-reveal	Two-phase weight publishing. Commit encrypted weights → wait one epoch → reveal key. Prevents front-running.
Staking	Required for models and validators. Higher stake = more target assignments. Delegation allowed with commission.
SIGReg	Sigmoid regularization added to cross-entropy loss. Prevents embedding collapse by encouraging uniform distribution.
Shannons	Smallest unit. 1 SOMA = 1,000,000,000 shannons.

For deep technical details, see

references/architecture.md

概念	说明
纪元（Epoch）	24小时周期。状态转换、目标生成和奖励分配都在纪元边界进行。
目标（Target）	嵌入空间中的随机点。代表网络想要学习的数据领域。通过质押加权KNN分配给附近的模型。
嵌入（Embedding）	代表模型专精方向或数据点语义内容的向量。数据嵌入与目标之间的距离决定了提交的有效性。
距离阈值（Distance threshold）	每个目标周围的自动调整半径。提交内容必须落在该半径内。会根据命中率进行调整。
保证金（Bond）	与数据大小成比例的押金，随每次提交缴纳。2个纪元的审核期后返还。若存在欺诈行为则会被扣除。
提交-披露（Commit-reveal）	两阶段权重发布机制。提交加密权重→等待一个纪元→披露密钥。防止抢先交易。
质押（Staking）	模型和验证节点必需。质押越高，获得的目标分配越多。支持委托并收取佣金。
SIGReg	添加到交叉熵损失中的Sigmoid正则化。通过鼓励均匀分布防止嵌入坍缩。
Shannons	最小单位。1 SOMA = 1,000,000,000 shannons。

深度技术细节请参考

references/architecture.md

。

Common Patterns

常见模式

Local Development with Localnet

本地网本地开发

Localnet is for testing the model training cycle (commit → reveal → epoch advance) without waiting for real 24h epochs. You do not need localnet for data submission — submit directly on testnet.

bash

undefined

本地网用于测试模型训练循环（提交→披露→纪元推进），无需等待真实的24小时纪元。数据提交无需使用本地网——直接在测试网提交即可。

bash

undefined

Start a fresh local blockchain (includes scoring service)

启动全新的本地区块链（包含评分服务）

soma start localnet --force-regenesis

Connect in code

在代码中连接

client = await SomaClient(chain="localnet")

Advance epochs instantly (localnet only)

立即推进纪元（仅本地网可用）

await client.advance_epoch()


The quickstart includes a one-command localnet test:
```bash
uv run modal run src/quickstart/training.py::localnet

await client.advance_epoch()


快速入门版本包含一键本地网测试：
```bash
uv run modal run src/quickstart/training.py::localnet

Modal GPU Deployment

Modal GPU部署

The quickstart uses Modal for GPU orchestration:

H100: Model training
L4: Scoring service for data submission
CPU: Commit/reveal operations (no GPU needed)
Cron: Automated reveals every 6 hours

Adding a credit card to Modal unlocks an extra $30 in free credits. See

references/quickstart-patterns.md

for Modal setup and deployment patterns.

快速入门版本使用Modal进行GPU编排：

H100：模型训练
L4：数据提交的评分服务
CPU：提交/披露操作（无需GPU）
定时任务：每6小时自动披露

在Modal添加信用卡可解锁额外30美元免费额度。Modal设置和部署模式请参考

references/quickstart-patterns.md

。

S3-Compatible Storage

S3兼容存储

Upload encrypted weights and submission data to S3-compatible storage. Cloudflare R2 is recommended — it has no egress fees, which matters because models and validators download your data frequently. AWS S3 and GCS (with HMAC keys) also work. See

references/quickstart-patterns.md

for the upload pattern.

将加密权重和提交数据上传到S3兼容存储。推荐使用Cloudflare R2——无出口费用，这很重要，因为模型和验证节点会频繁下载你的数据。AWS S3和GCS（带HMAC密钥）也可使用。上传模式请参考

references/quickstart-patterns.md

。

Troubleshooting

故障排除

Distance exceeds threshold: Data doesn't match the target's domain. Try different data sources, filter for content that aligns with the target's region in embedding space. Specializing in a domain (e.g., filtering Stack v2 by language) improves hit rate. See

references/strategies.md

Scoring service not responding: The scoring service must be running before you can score data. Start it with

soma start scoring --device cuda --data-dir /data

(requires 24GB+ VRAM GPU). The quickstart deploys this on Modal with an L4 GPU. Check health:

await client.scoring_health()

Epoch hasn't advanced (reveal fails): Commit-reveal requires one epoch between steps. On testnet, wait for the next 24h epoch boundary. On localnet, force it:

await client.advance_epoch()

Model not found after commit: Model weights aren't active until reveal completes in the following epoch. Ensure you've called

reveal_model()

after the epoch advanced past your commit epoch.

Insufficient balance for bond: Bonds scale with data size. Check balance:

await client.get_balance(kp.address())

. Fund via

soma faucet

(testnet). Smaller submissions require smaller bonds.

"Invalid commission rate": Commission rate must be 0-10000 (basis points). Example: 1000 = 10%.

.env not loading / missing credentials: Double-check each credential. Common issues: HF_TOKEN needs dataset terms accepted on HuggingFace, S3_ENDPOINT_URL must include the full

https://

prefix, SOMA_SECRET_KEY must be the hex output from

soma wallet export

(not the mnemonic). See the Getting Started section for step-by-step setup.

距离超过阈值：数据与目标领域不匹配。尝试更换数据源，过滤与目标嵌入区域匹配的内容。专精某一领域（例如按语言过滤The Stack v2）可提高命中率。请参考

references/strategies.md

。

评分服务无响应：在评分数据前，评分服务必须处于运行状态。使用

soma start scoring --device cuda --data-dir /data

启动（需要显存24GB+的GPU）。快速入门版本会在Modal上使用L4 GPU运行该服务。检查健康状态：

await client.scoring_health()

。

纪元未推进（披露失败）：提交-披露机制要求步骤之间间隔一个纪元。在测试网，等待下一个24小时纪元边界。在本地网，强制推进：

await client.advance_epoch()

。

提交后找不到模型：模型权重在完成披露并进入下一个纪元前不会激活。确保纪元推进到提交纪元后调用了

reveal_model()

。

保证金余额不足：保证金与数据大小成正比。检查余额：

await client.get_balance(kp.address())

。在测试网可通过

soma faucet

获取资金。提交更小的数据需要的保证金也更少。

“佣金率无效”：佣金率必须在0-10000（基点）之间。示例：1000 = 10%。

.env未加载 / 缺少凭证：仔细检查每个凭证。常见问题：HF_TOKEN需要在HuggingFace上接受数据集条款，S3_ENDPOINT_URL必须包含完整的

https://

前缀，SOMA_SECRET_KEY必须是

soma wallet export

输出的十六进制字符串（不是助记词）。分步设置请参考入门指南部分。

Examples

示例

Example 1: Start contributing to SOMA

User says: "Install SOMA and help me start contributing" / "Set up SOMA and start submitting data"

Actions:

Install soma CLI (

curl -fsSL https://sup.soma.org | bash && sup install soma

)

Create wallet (

soma wallet new && soma faucet && soma wallet export

)

Fork the quickstart repo, run
```
uv sync
```
Walk through each
```
.env
```
credential — explain what it does and where to get it

Push secrets to Modal (

uv run create-secrets

), deploy submitter (

uv run modal run src/quickstart/submitter.py

)

Verify the submitter is scoring data and submitting hits Result: Submitter running on Modal, earning SOMA from data submissions. Rewards claimable after 2-epoch audit window.

Example 2: Train a SOMA model

User says: "I want to train a SOMA model and publish it"

Actions:

Choose a domain specialization (e.g., code, text, scientific)
Set up training environment with soma-models and data pipeline
Train byte-level transformer on domain-specific streaming data
Create model on-chain with stake and commission rate
Encrypt weights, upload to R2, commit on-chain
Wait one epoch, reveal decryption key Result: Model active on network, earning 50% of target rewards when it wins.

Example 3: Claim SOMA rewards

User says: "How do I claim my SOMA rewards?"

Actions:

List claimable targets:

await client.get_targets(status="claimable")

Call
```
await client.claim_rewards(kp, target.id)
```
for each Result: Rewards deposited to wallet. Claim promptly to avoid 0.5% finder's fee.

Example 4: User pushes back on external dependencies

User says: "I don't want to set up Cloudflare/Modal/HuggingFace" / "Is there a local-only option?"

Response: Explain why each service is required:

S3/R2: Validators independently download and re-score your submission data to verify it. The data URL must be publicly accessible — localhost won't work on testnet. R2 is free (10 GB/month, zero egress).
Modal: Scoring runs every model assigned to a target (1.2B parameters each) on GPU. Modal provides an L4 GPU. If the user has a local GPU with 24GB+ VRAM, they can run
```
soma start scoring --device cuda
```
instead of Modal.
HuggingFace: The default data source is The Stack v2. The user can substitute any data source — HF_TOKEN is only needed for gated HuggingFace datasets.

Do NOT suggest localnet as an alternative to these dependencies. Localnet is a development tool for testing the model training cycle — it doesn't earn real rewards and doesn't replace the need for S3/Modal on testnet.

示例1：开始为SOMA做贡献

用户说：“安装SOMA并帮我开始贡献” / “设置SOMA并开始提交数据”

操作：

安装soma CLI（

curl -fsSL https://sup.soma.org | bash && sup install soma

）

创建钱包（

soma wallet new && soma faucet && soma wallet export

）

复刻快速入门仓库，运行
```
uv sync
```
逐一讲解
```
.env
```
中的每个凭证——说明其作用及获取途径

将密钥推送到Modal（

uv run create-secrets

），部署提交器（

uv run modal run src/quickstart/submitter.py

）

验证提交器是否在评分数据并提交有效结果结果：提交器在Modal上运行，通过数据提交赚取SOMA代币。奖励在2个纪元的审核窗口后可领取。

示例2：训练SOMA模型

用户说：“我想训练一个SOMA模型并发布”

操作：

选择领域专精（例如代码、文本、科学）
设置包含soma-models和数据流水线的训练环境
在特定领域的流式数据上训练字节级Transformer
在链上创建模型并设置质押和佣金率
加密权重，上传到R2，在链上提交
等待一个纪元，披露解密密钥结果：模型在网络上激活，获胜时赚取目标奖励的50%。

示例3：领取SOMA奖励

用户说：“如何领取我的SOMA奖励？”

操作：

列出可领取的目标：

await client.get_targets(status="claimable")

对每个目标调用
```
await client.claim_rewards(kp, target.id)
```
结果：奖励存入钱包。请及时领取以避免0.5%的finder费用。

示例4：用户拒绝外部依赖

用户说：“我不想设置Cloudflare/Modal/HuggingFace” / “有没有纯本地的方案？”

回应：解释每个服务的必要性：

S3/R2：验证节点会独立下载并重新评分你的提交数据以进行验证。数据URL必须公开可访问——本地主机在测试网无法使用。R2是免费的（每月10GB，无出口费用）。
Modal：评分需要在GPU上运行分配给目标的每个模型（每个12亿参数）。Modal提供L4 GPU。如果用户有显存24GB+的本地GPU，可使用
```
soma start scoring --device cuda
```
替代Modal。
HuggingFace：默认数据源是The Stack v2。用户可以替换任何数据源——仅在访问HuggingFace gated数据集时才需要HF_TOKEN。

请勿推荐使用本地网作为这些依赖的替代方案。本地网是用于测试模型训练循环的开发工具——无法赚取真实奖励，也无法替代测试网所需的S3/Modal。

Reference Index

参考索引

File	Contains	Consult when
`references/strategies.md`	Competitive playbook — network analysis, territory selection, battle scenarios, economics	Deciding where and how to compete
`references/data-strategies.md`	Deep data guide — filtering, LLM distillation, creative sourcing, novel domains	Choosing and curating data for submission
`references/model-strategies.md`	Deep model guide — distillation, embedding strategy, training philosophy, architecture exploitation	Training and improving your model
`references/quickstart-patterns.md`	Working code patterns, quickstart repo file map, submission, training, network analysis, deployment	Building pipelines, forking the quickstart, and analyzing the network
`references/architecture.md`	Network design, model specs, economics, consensus	Understanding how SOMA works
`references/sdk-reference.md`	Full Python SDK API — all methods, types, examples	Writing code with soma-sdk
`references/cli-reference.md`	All CLI commands organized by workflow	Using the soma command line

文件	内容	使用场景
`references/strategies.md`	竞争手册——网络分析、领域选择、竞争场景、经济模型	决定竞争方向和策略时
`references/data-strategies.md`	深度数据指南——过滤、LLM蒸馏、创新来源、小众领域	选择和整理提交数据时
`references/model-strategies.md`	深度模型指南——蒸馏、嵌入策略、训练理念、架构利用	训练和优化模型时
`references/quickstart-patterns.md`	可用代码模式、快速入门仓库文件映射、提交、训练、网络分析、部署	构建流水线、复刻快速入门仓库、分析网络时
`references/architecture.md`	网络设计、模型规格、经济模型、共识机制	理解SOMA工作原理时
`references/sdk-reference.md`	完整Python SDK API——所有方法、类型、示例	使用soma-sdk编写代码时
`references/cli-reference.md`	按流程整理的所有CLI命令	使用soma命令行工具时