agent-swarm-deployer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Agent Swarm Deployer

Agent集群部署工具

A high-throughput parallel data processing framework that deploys swarms of sub-agents to handle massive data tasks. While Agent Army is built for code changes (edit files, respect dependency graphs, verify builds), Agent Swarm is built for data operations where individual units of work are independent and results need aggregation.

这是一个高吞吐量的并行数据处理框架，通过部署子Agent集群来处理大规模数据任务。Agent Army专为代码变更场景设计（编辑文件、遵循依赖图、验证构建），而Agent Swarm则适用于工作单元相互独立且需要聚合结果的数据操作场景。

Agent Army vs Agent Swarm

Dimension	Agent Army	Agent Swarm
Purpose	Code changes across files	Data processing at scale
Units of work	Files in a codebase	Documents, records, rows, items
Dependencies	Import graph matters	Items are independent
Output	Modified source files	Aggregated results (reports, datasets, content)
Verification	Build check, pattern scan	Result validation, completeness check
Error handling	Fix and re-verify code	Retry failed items, collect partial results
Typical scale	10-200 files	100-10,000+ items
Key risk	Breaking the build	Data loss, incomplete processing

维度	Agent Army	Agent Swarm
用途	跨文件代码变更	大规模数据处理
工作单元	代码库中的文件	文档、记录、行、条目
依赖关系	导入图至关重要	工作单元相互独立
输出	修改后的源文件	聚合结果（报告、数据集、内容）
验证方式	构建检查、模式扫描	结果验证、完整性检查
错误处理	修复并重新验证代码	重试失败条目、收集部分结果
典型规模	10-200个文件	100-10,000+个条目
核心风险	破坏构建	数据丢失、处理不完整

Use Cases

适用场景

Document Processing

文档处理

Analyze 500 customer support tickets for sentiment and categorization
Extract key information from 200 legal contracts
Summarize 1000 research paper abstracts
Parse 300 resumes for qualification matching

分析500份客户支持工单的情感倾向并进行分类
从200份法律合同中提取关键信息
总结1000篇研究论文的摘要
解析300份简历以匹配任职资格

Dataset Analysis

数据集分析

Score and rank 2000 leads by ICP fit
Classify 5000 product reviews by topic and sentiment
Audit 1000 blog posts for SEO compliance
Grade 500 sales call transcripts against a methodology

根据理想客户画像（ICP）匹配度为2000条销售线索打分并排名
按主题和情感倾向对5000条产品评论进行分类
审核1000篇博客文章的SEO合规性
根据方法论为500份销售通话记录评分

Bulk Content Generation

批量内容生成

Generate personalized email first lines for 500 prospects
Create product descriptions for 1000 SKUs
Write social media posts for 200 blog articles
Generate meta descriptions for 800 web pages

为500个潜在客户生成个性化邮件开头
为1000个SKU创建产品描述
为200篇博客文章撰写社交媒体帖子
为800个网页生成元描述

Data Transformation

数据转换

Convert 1000 CSV rows into structured JSON objects
Normalize 500 address records
Translate 300 support articles into 5 languages
Reformat 2000 database records from schema A to schema B

将1000行CSV数据转换为结构化JSON对象
标准化500条地址记录
将300篇支持文章翻译成5种语言
将2000条数据库记录从Schema A重新格式化为Schema B

Architecture

架构

You (Commander)
 |
 |-- Phase 1: Intake & Inventory
 |    |-- Count total items
 |    |-- Sample items for schema detection
 |    |-- Estimate token budget per item
 |
 |-- Phase 2: Swarm Design
 |    |-- Calculate optimal batch size
 |    |-- Determine number of swarm agents
 |    |-- Define result schema
 |
 |-- Phase 3: Deploy Swarm (parallel)
 |    |-- Agent S1: items 1-50      ---\
 |    |-- Agent S2: items 51-100     ---|
 |    |-- Agent S3: items 101-150    ---|-- All run in parallel
 |    |-- Agent S4: items 151-200    ---|
 |    |-- ...                       ---/
 |
 |-- Phase 4: Collect & Aggregate
 |    |-- Gather all agent results
 |    |-- Merge into unified output
 |    |-- Identify failures
 |
 |-- Phase 5: Recovery (if needed)
 |    |-- Retry failed items
 |    |-- Fill gaps
 |
 |-- Phase 6: Deliver
 |    |-- Write final output
 |    |-- Generate summary report

你（指挥官）
 |
 |-- 阶段1：数据采集与盘点
 |    |-- 统计总条目数
 |    |-- 采样条目以检测数据结构
 |    |-- 估算每条目所需令牌预算
 |
 |-- 阶段2：集群设计
 |    |-- 计算最优批次大小
 |    |-- 确定集群Agent数量
 |    |-- 定义结果数据结构
 |
 |-- 阶段3：部署集群（并行执行）
 |    |-- Agent S1：条目1-50      ---\
 |    |-- Agent S2：条目51-100     ---|
 |    |-- Agent S3：条目101-150    ---|-- 全部并行运行
 |    |-- Agent S4：条目151-200    ---|
 |    |-- ...                       ---/
 |
 |-- 阶段4：收集与聚合
 |    |-- 收集所有Agent的结果
 |    |-- 合并为统一输出
 |    |-- 识别失败条目
 |
 |-- 阶段5：恢复（如有需要）
 |    |-- 重试失败条目
 |    |-- 填补缺失内容
 |
 |-- 阶段6：交付
 |    |-- 写入最终输出
 |    |-- 生成汇总报告

Execution Protocol

执行流程

Step 0: Understand the Task

步骤0：理解任务

Before deploying any agents, fully understand what the user needs:

Data source -- Where is the data? Files on disk? A CSV? A directory of documents? Inline in the conversation?
Operation -- What should be done to each item? Summarize? Classify? Extract? Transform? Generate?
Output format -- What should the result look like? JSON? CSV? Markdown? Individual files?
Output destination -- Where should results go? Single file? Directory of files? Returned in conversation?
Quality requirements -- Are there validation rules? Schemas? Scoring criteria?

If any of these are ambiguous, ask the user before proceeding. Getting the spec wrong wastes all agent compute.

在部署任何Agent之前，需完全理解用户需求：

数据源 -- 数据存储在哪里？磁盘文件？CSV？文档目录？对话内联内容？
操作要求 -- 对每条目执行什么操作？总结？分类？提取？转换？生成？
输出格式 -- 结果应是什么格式？JSON？CSV？Markdown？单个文件？
输出目标 -- 结果应存储到哪里？单个文件？文件目录？返回至对话？
质量要求 -- 是否有验证规则？数据结构？评分标准？

如果任何信息不明确，需先询问用户再继续。任务规格错误会浪费所有Agent的计算资源。

Step 1: Intake and Inventory

步骤1：数据采集与盘点

1a. Discover and Count Items

1a. 发现并统计条目

Locate all data items and count them:

Glob/Bash: Find all files matching the pattern
Read: Sample first 3-5 items to understand structure
Bash: Count total items (wc -l for CSVs, file count for directories, etc.)

Report:

undefined

定位所有数据条目并统计数量：

Glob/Bash：查找所有匹配模式的文件
Read：采样前3-5条条目以了解结构
Bash：统计总条目数（CSV用wc -l，目录用文件计数等）

报告示例：

undefined

Intake Report

采集报告

Data source: /path/to/data/ (or inline, or CSV)
Total items found: 1,247
Item format: JSON files, avg 2KB each
Sample item structure: { name, email, company, title, linkedin_url }
Estimated tokens per item: ~500
Total estimated tokens: ~623,500

undefined

数据源：/path/to/data/（或内联内容、CSV）
发现总条目数：1,247
条目格式：JSON文件，平均2KB每个
采样条目结构：{ name, email, company, title, linkedin_url }
每条目估算令牌数：~500
总估算令牌数：~623,500

undefined

1b. Detect Schema

1b. 检测数据结构

From the sample items, define the input schema:

json

{
  "inputSchema": {
    "type": "object",
    "fields": {
      "name": "string",
      "email": "string",
      "company": "string",
      "title": "string",
      "linkedin_url": "string"
    }
  }
}

从采样条目中定义输入数据结构：

json

{
  "inputSchema": {
    "type": "object",
    "fields": {
      "name": "string",
      "email": "string",
      "company": "string",
      "title": "string",
      "linkedin_url": "string"
    }
  }
}

1c. Define Output Schema

1c. 定义输出数据结构

Based on the task, define what each processed item should look like:

json

{
  "outputSchema": {
    "type": "object",
    "fields": {
      "name": "string (from input)",
      "company": "string (from input)",
      "personalized_first_line": "string (generated, 1-2 sentences)",
      "pain_point_guess": "string (inferred from title + company)",
      "confidence": "number (0-1)",
      "processing_status": "enum: success | failed | skipped"
    }
  }
}

根据任务要求，定义每条处理后条目应有的格式：

json

{
  "outputSchema": {
    "type": "object",
    "fields": {
      "name": "string (来自输入)",
      "company": "string (来自输入)",
      "personalized_first_line": "string (生成内容，1-2句话)",
      "pain_point_guess": "string (从职位+公司推断)",
      "confidence": "number (0-1)",
      "processing_status": "enum: success | failed | skipped"
    }
  }
}

Step 2: Swarm Design

步骤2：集群设计

2a. Calculate Batch Size

2a. 计算批次大小

The batch size determines how many items each agent processes. Factors:

Factor	Guideline
Token budget per item	Input tokens + expected output tokens + instruction overhead
Agent context limit	~200K usable tokens per agent (conservative, leaves room for instructions)
Instruction overhead	~2K tokens for the agent's brief
Safety margin	Use 70% of theoretical capacity

Formula:

usable_tokens = 200,000 * 0.70 = 140,000
tokens_per_item = input_tokens + output_tokens + 100 (overhead)
batch_size = floor(usable_tokens / tokens_per_item)

Practical limits:

Minimum batch size: 5 items (agent overhead is not worth it for fewer)
Maximum batch size: 200 items (beyond this, agent context gets cluttered)
Sweet spot: 20-80 items per agent for most text processing tasks

批次大小决定每个Agent处理的条目数量，需考虑以下因素：

因素	指导原则
每条目令牌预算	输入令牌 + 预期输出令牌 + 指令开销
Agent上下文限制	每个Agent约200K可用令牌（保守值，预留指令空间）
指令开销	Agent任务说明约2K令牌
安全余量	使用理论容量的70%

计算公式：

usable_tokens = 200,000 * 0.70 = 140,000
tokens_per_item = input_tokens + output_tokens + 100（开销）
batch_size = floor(usable_tokens / tokens_per_item)

实际限制：

最小批次大小：5条条目（Agent开销不值得处理更少条目）
最大批次大小：200条条目（超过此值，Agent上下文会变得混乱）
最优区间：大多数文本处理任务中，每个Agent处理20-80条条目

2b. Determine Swarm Size

2b. 确定集群规模

total_items = 1,247
batch_size = 50
swarm_size = ceil(1,247 / 50) = 25 agents

Practical limits:

Maximum parallel agents: 20 per wave (prevents system overload)
If swarm_size > 20: Deploy in waves. Wave 1: agents 1-20. Wave 2: agents 21-25.

total_items = 1,247
batch_size = 50
swarm_size = ceil(1,247 / 50) = 25个Agent

实际限制：

最大并行Agent数：每轮20个（防止系统过载）
如果集群规模>20：分轮部署。第一轮：Agent 1-20。第二轮：Agent 21-25。

2c. Present Swarm Plan

2c. 提交集群计划

undefined

undefined

Swarm Plan

集群计划

Total items: 1,247
Batch size: 50 items per agent
Swarm size: 25 agents
Waves: 2 (Wave 1: 20 agents, Wave 2: 5 agents)
Estimated processing: All items covered
Output format: Single CSV file with all results

总条目数：1,247
批次大小：每个Agent处理50条条目
集群规模：25个Agent
部署轮次：2轮（第一轮：20个Agent，第二轮：5个Agent）
估算覆盖范围：所有条目均会被处理
输出格式：包含所有结果的单个CSV文件

Agent Assignments

Agent分配

Agent	Items	Range	Notes
swarm-01	50	items 1-50	--
swarm-02	50	items 51-100	--
...	...	...	--
swarm-25	47	items 1201-1247	Partial batch

Proceed? (Y to deploy / N to adjust batch size)

undefined

Agent	条目数	范围	备注
swarm-01	50	条目1-50	--
swarm-02	50	条目51-100	--
...	...	...	--
swarm-25	47	条目1201-1247	部分批次

是否继续？（Y部署 / N调整批次大小）

undefined

Step 3: Prepare Agent Briefs

步骤3：准备Agent任务说明

Each swarm agent receives a self-contained brief. The brief must include:

Role: "You are a data processing agent in a swarm of 25. Your job is to process items 51-100."
Task description: Exactly what to do with each item (the operation).
Input data: The actual data items for this batch (embedded in the brief or read from a file range).
Output schema: The exact format for results. Include a concrete example.
Quality rules: Validation criteria, edge case handling, what to do with malformed items.
Error protocol: "If an item cannot be processed, mark it as failed with a reason. Do not skip items silently."
Output format: "Return your results as a JSON array. Each element must match the output schema."

每个集群Agent会收到一份独立的任务说明，必须包含以下内容：

角色："你是由25个Agent组成的数据处理集群中的一员，负责处理条目51-100。"
任务描述：明确对每条目执行的操作。
输入数据：此批次的实际数据条目（嵌入任务说明或从文件范围读取）。
输出数据结构：结果的精确格式，包含具体示例。
质量规则：验证标准、边缘情况处理、格式错误条目的处理方式。
错误协议："如果某条目无法处理，标记为失败并注明原因。请勿静默跳过条目。"
输出格式："以JSON数组形式返回结果。每个元素必须匹配输出数据结构。"

Agent Brief Template

Agent任务说明模板

You are swarm-agent-{N}, processing batch {N} of {TOTAL_BATCHES} in a data processing swarm.

你是swarm-agent-{N}，在数据处理集群中处理第{N}批次（共{TOTAL_BATCHES}批次）。

Your Task

你的任务

{TASK_DESCRIPTION}

Input Data

输入数据

You will process the following {BATCH_SIZE} items:

{ITEMS_AS_STRUCTURED_DATA}

你将处理以下{BATCH_SIZE}条条目：

{ITEMS_AS_STRUCTURED_DATA}

Output Schema

输出数据结构

For each item, produce a result matching this schema: {OUTPUT_SCHEMA_WITH_EXAMPLE}

对每条条目，生成符合以下结构的结果： {OUTPUT_SCHEMA_WITH_EXAMPLE}

Example

示例

Input: {EXAMPLE_INPUT}

Expected output: {EXAMPLE_OUTPUT}

输入： {EXAMPLE_INPUT}

预期输出： {EXAMPLE_OUTPUT}

Quality Rules

质量规则

{RULE_1}
{RULE_2}
{RULE_3}
Every item MUST appear in your output, even if processing failed.
For failed items, set processing_status to "failed" and include an error_reason field.

{RULE_1}
{RULE_2}
{RULE_3}
每条条目必须出现在输出中，即使处理失败。
对于失败条目，将processing_status设置为"failed"并添加error_reason字段。

Error Handling

错误处理

Malformed input: Mark as "failed", reason: "malformed input: {description}"
Ambiguous data: Make your best judgment, set confidence to < 0.5
Missing required field: Mark as "skipped", reason: "missing {field}"

格式错误输入：标记为"failed"，原因："malformed input: {description}"
模糊数据：做出最佳判断，将confidence设置为< 0.5
缺失必填字段：标记为"skipped"，原因："missing {field}"

Output Format

输出格式

Return a JSON object with this structure: { "agentId": "swarm-agent-{N}", "batchRange": "{START}-{END}", "totalProcessed": {NUMBER}, "totalSuccess": {NUMBER}, "totalFailed": {NUMBER}, "totalSkipped": {NUMBER}, "results": [ {OUTPUT_SCHEMA_ITEM_1}, {OUTPUT_SCHEMA_ITEM_2}, ... ], "errors": [ {"itemIndex": N, "reason": "description"} ], "notes": "Any observations about the data quality or patterns" }

undefined

返回具有以下结构的JSON对象： { "agentId": "swarm-agent-{N}", "batchRange": "{START}-{END}", "totalProcessed": {NUMBER}, "totalSuccess": {NUMBER}, "totalFailed": {NUMBER}, "totalSkipped": {NUMBER}, "results": [ {OUTPUT_SCHEMA_ITEM_1}, {OUTPUT_SCHEMA_ITEM_2}, ... ], "errors": [ {"itemIndex": N, "reason": "description"} ], "notes": "关于数据质量或模式的任何观察结果" }

undefined

Step 4: Deploy Swarm

步骤4：部署集群

4a. Wave Deployment

4a. 分轮部署

Deploy agents in waves to manage system load:

Wave 1: Launch up to 20 agents in parallel using the Agent tool with

run_in_background: true

. Send ALL agent calls in a single message.

Launch swarm-agent-01 through swarm-agent-20 in parallel.
Each receives its batch of items and the standardized brief.

Wave 2+ (if needed): After Wave 1 completes, launch remaining agents.

分轮部署Agent以管理系统负载：

第一轮：使用Agent工具并行启动最多20个Agent，设置

run_in_background: true

。在单条消息中发送所有Agent调用指令。

并行启动swarm-agent-01至swarm-agent-20。
每个Agent将收到对应的批次条目和标准化任务说明。

后续轮次（如有需要）：第一轮完成后，启动剩余Agent。

4b. Data Distribution

4b. 数据分发

How to get data to each agent depends on the data source:

Source Type	Distribution Method
Directory of files	Tell each agent which file paths to Read
Single large CSV	Pre-split into batch files using Bash, tell each agent its file
Single JSON array	Pre-split into batch files using Bash
Inline data	Embed directly in the agent brief (for small datasets)
Database export	Export to CSV first, then split

For CSVs and large files, pre-split before deploying:

bash

undefined

向每个Agent传递数据的方式取决于数据源类型：

数据源类型	分发方式
文件目录	告知每个Agent需要读取的文件路径
大型CSV文件	使用Bash预先拆分为批次文件，告知每个Agent对应的文件
大型JSON数组	使用Bash预先拆分为批次文件
内联数据	直接嵌入Agent任务说明（适用于小型数据集）
数据库导出	先导出为CSV，再拆分

对于CSV和大型文件，部署前需预先拆分：

bash

undefined

Split a CSV into batches of 50 rows (preserving header)

将CSV拆分为50行的批次（保留表头）

head -1 data.csv > header.csv tail -n +2 data.csv | split -l 50 - batch_ for f in batch_*; do cat header.csv "$f" > "batches/${f}.csv" && rm "$f"; done rm header.csv

undefined

head -1 data.csv > header.csv tail -n +2 data.csv | split -l 50 - batch_ for f in batch_*; do cat header.csv "$f" > "batches/${f}.csv" && rm "$f"; done rm header.csv

undefined

4c. Progress Tracking

4c. 进度追踪

As agents complete, track progress:

undefined

Agent完成任务后，追踪进度：

undefined

Swarm Progress

集群进度

Wave 1 (20 agents): [################----] 16/20 complete

Agent	Status	Processed	Success	Failed	Duration
swarm-01	DONE	50	48	2	45s
swarm-02	DONE	50	50	0	38s
swarm-03	RUNNING	--	--	--	--
...	...	...	...	...	...

Cumulative: 800/1247 items processed (64%)

undefined

第一轮（20个Agent）： [################----] 16/20 已完成

Agent	状态	已处理	成功	失败	耗时
swarm-01	已完成	50	48	2	45s
swarm-02	已完成	50	50	0	38s
swarm-03	运行中	--	--	--	--
...	...	...	...	...	...

累计：800/1247条条目已处理（64%）

undefined

Step 5: Collect and Aggregate Results

步骤5：收集与聚合结果

5a. Gather Agent Outputs

5a. 收集Agent输出

As each agent completes, collect its JSON output. Parse and validate:

Schema validation -- Does each result match the output schema?
Completeness check -- Does the agent's result count match its batch size?
Duplicate detection -- Check for duplicate item IDs across agents.
Error extraction -- Pull out all failed/skipped items for the retry queue.

每个Agent完成后，收集其JSON输出并解析验证：

数据结构验证 -- 每个结果是否匹配输出数据结构？
完整性检查 -- Agent返回的结果数量是否与批次大小一致？
重复检测 -- 检查不同Agent之间是否存在重复条目ID。
错误提取 -- 提取所有失败/跳过的条目至重试队列。

5b. Merge Results

5b. 合并结果

Combine all agent results into a single unified output:

python

undefined

将所有Agent的结果合并为统一输出：

python

undefined

Conceptual merge logic

概念性合并逻辑

merged_results = [] failed_items = [] skipped_items = []

for agent_output in all_agent_outputs: for result in agent_output["results"]: if result["processing_status"] == "success": merged_results.append(result) elif result["processing_status"] == "failed": failed_items.append(result) elif result["processing_status"] == "skipped": skipped_items.append(result)

merged_results = [] failed_items = [] skipped_items = []

Sort by original item order

按原始条目顺序排序

merged_results.sort(key=lambda x: x["original_index"])

undefined

merged_results.sort(key=lambda x: x["original_index"])

undefined

5c. Validate Completeness

5c. 验证完整性

undefined

undefined

Aggregation Report

聚合报告

Total items in input: 1,247
Total results received: 1,247
Successful: 1,198 (96.1%)
Failed: 34 (2.7%)
Skipped: 15 (1.2%)

输入总条目数：1,247
收到的总结果数：1,247
成功：1,198（96.1%）
失败：34（2.7%）
跳过：15（1.2%）

Coverage Check

覆盖检查

Items with results: 1,247 / 1,247 (100% coverage)
Missing items: 0
Duplicate results: 0

有结果的条目：1,247 / 1,247（100%覆盖）
缺失条目：0
重复结果：0

Failure Analysis

失败分析

Failure Reason	Count
Malformed input	12
Missing required field	8
Ambiguous data	14

Proceed to retry failed items? (Y / skip / manual review)

undefined

失败原因	数量
格式错误输入	12
缺失必填字段	8
模糊数据	14

是否重试失败条目？（Y / 跳过 / 人工审核）

undefined

Step 6: Error Recovery

步骤6：错误恢复

6a. Retry Queue

6a. 重试队列

Collect all failed and skipped items into a retry batch:

Retry batch: 49 items (34 failed + 15 skipped)
Retry strategy: Single agent with enhanced instructions

将所有失败和跳过的条目收集到重试批次：

重试批次：49条条目（34条失败 + 15条跳过）
重试策略：单个Agent搭配增强版任务说明

6b. Enhanced Retry Brief

6b. 增强版重试任务说明

The retry agent gets special instructions:

You are the retry agent. These items failed or were skipped in the first pass.
For each item, I am providing the original item AND the failure reason from the first attempt.

Your job:
1. Try harder -- use more creative interpretation for ambiguous items
2. For truly malformed items, extract whatever you can and note what is missing
3. For items that failed due to missing fields, infer the field if possible or mark as "unrecoverable"

The bar is lower for retries: partial results are better than no results.

重试Agent会收到特殊指令：

你是重试Agent。这些条目在第一次处理中失败或被跳过。
对于每条条目，我会提供原始条目以及第一次尝试的失败原因。

你的任务：
1. 尝试更灵活的处理方式——对模糊条目使用更具创造性的解读
2. 对于确实格式错误的条目，提取尽可能多的信息并注明缺失内容
3. 对于因缺失字段失败的条目，尽可能推断字段内容，否则标记为"unrecoverable"

重试的标准更低：部分结果优于无结果。

6c. Retry Limits

6c. 重试限制

Maximum retries: 2 (original attempt + 2 retries = 3 total attempts)
After max retries: Mark item as "unrecoverable" and include in the final report
Unrecoverable threshold: If > 10% of items are unrecoverable, flag to the user for manual review

最大重试次数：2次（原始尝试 + 2次重试 = 共3次尝试）
达到最大重试次数后：将条目标记为"unrecoverable"并纳入最终报告
不可恢复阈值：如果超过10%的条目不可恢复，需向用户标记并建议人工审核

Step 7: Write Final Output

步骤7：写入最终输出

Based on the user's requested output format:

根据用户要求的输出格式执行：

CSV Output

CSV输出

bash

undefined

bash

undefined

Write header + all successful results as CSV

写入表头 + 所有成功结果为CSV

Include a separate failures.csv for failed items

为失败条目单独创建failures.csv

undefined

undefined

JSON Output

JSON输出

json

{
  "metadata": {
    "task": "description of what was processed",
    "totalItems": 1247,
    "successfulItems": 1210,
    "failedItems": 22,
    "unrecoverableItems": 15,
    "processingDate": "2026-04-10T12:00:00Z",
    "swarmSize": 25,
    "waves": 2
  },
  "results": [...],
  "failures": [...],
  "summary": {
    "key_patterns": "...",
    "notable_findings": "...",
    "data_quality_notes": "..."
  }
}

json

{
  "metadata": {
    "task": "处理任务描述",
    "totalItems": 1247,
    "successfulItems": 1210,
    "failedItems": 22,
    "unrecoverableItems": 15,
    "processingDate": "2026-04-10T12:00:00Z",
    "swarmSize": 25,
    "waves": 2
  },
  "results": [...],
  "failures": [...],
  "summary": {
    "key_patterns": "...",
    "notable_findings": "...",
    "data_quality_notes": "..."
  }
}

Markdown Report

Markdown报告

markdown

undefined

markdown

undefined

Processing Results: [Task Name]

处理结果：[任务名称]

Summary

摘要

Processed: 1,247 items
Success rate: 97%
Key findings: [aggregated insights]

已处理：1,247条条目
成功率：97%
关键发现：[聚合洞察]

Results Table

结果表格

Item	Result Field 1	Result Field 2	Status
...	...	...	...

条目	结果字段1	结果字段2	状态
...	...	...	...

Failures

失败条目

Item	Reason	Attempted Retries
...	...	...

undefined

条目	原因	重试次数
...	...	...

undefined

Individual Files

单个文件输出

For tasks where each item produces a standalone document (e.g., generating 500 blog post outlines):

output/
  001-item-name.md
  002-item-name.md
  ...
  500-item-name.md
  _summary.md
  _failures.md

对于每条目生成独立文档的任务（例如，生成500篇博客文章大纲）：

output/
  001-item-name.md
  002-item-name.md
  ...
  500-item-name.md
  _summary.md
  _failures.md

Step 8: Summary Report

步骤8：汇总报告

Always end with a comprehensive summary:

undefined

最终需提供全面的汇总报告：

undefined

Swarm Processing Complete

集群处理完成

Execution Summary

执行摘要

Task: [description]
Data source: [source]
Total items: 1,247
Swarm size: 25 agents across 2 waves
Total processing time: ~3 minutes

任务：[描述]
数据源：[来源]
总条目数：1,247
集群规模：25个Agent，分2轮部署
总处理时间：约3分钟

Results

结果

Successful: 1,210 (97.0%)
Failed (recovered on retry): 22 (1.8%)
Unrecoverable: 15 (1.2%)
Output written to: [path]

成功：1,210（97.0%）
失败（重试后恢复）：22（1.8%）
不可恢复：15（1.2%）
输出已写入：[路径]

Quality Metrics

质量指标

Schema compliance: 100% of successful results match output schema
Average confidence: 0.82
Items flagged for review: 37 (low confidence < 0.5)

数据结构合规性：100%的成功结果匹配输出数据结构
平均置信度：0.82
标记需审核的条目：37条（置信度<0.5）

Patterns Observed

观察到的模式

[Any interesting patterns the swarm noticed across the data]
[Data quality issues found]
[Recommendations for future processing]

[集群在数据中发现的任何有趣模式]
[发现的数据质量问题]
[未来处理的建议]

Cost

成本

Agents deployed: 25 (Wave 1) + 1 (retry) = 26
Estimated tokens consumed: ~1.2M input + ~400K output

undefined

部署的Agent数量：25（第一轮） + 1（重试） = 26
估算消耗令牌数：~1.2M输入 + ~400K输出

undefined

Swarm Configurations for Common Tasks

常见任务的集群配置

Sentiment Analysis (1000 reviews)

情感分析（1000条评论）

yaml

task: Classify each review as positive/negative/neutral with confidence score
batch_size: 100  # Reviews are short, pack more per agent
swarm_size: 10
output_schema:
  review_id: string
  sentiment: enum(positive, negative, neutral, mixed)
  confidence: float(0-1)
  key_phrases: string[]
  summary: string(1 sentence)

yaml

task: 将每条评论分类为正面/负面/中性并给出置信度评分
batch_size: 100  # 评论内容较短，每个Agent可处理更多条目
swarm_size: 10
output_schema:
  review_id: string
  sentiment: enum(positive, negative, neutral, mixed)
  confidence: float(0-1)
  key_phrases: string[]
  summary: string(1句话)

Lead Scoring (2000 contacts)

销售线索评分（2000个联系人）

yaml

task: Score each lead 1-100 based on ICP fit criteria
batch_size: 40  # Each lead needs more analysis context
swarm_size: 50
waves: 3
output_schema:
  lead_id: string
  score: int(1-100)
  icp_match: object
    company_size: bool
    industry: bool
    tech_stack: bool
    title_seniority: bool
  buying_signals: string[]
  recommended_action: enum(hot, warm, nurture, disqualify)

yaml

task: 根据ICP匹配标准为每条线索打1-100分
batch_size: 40  # 每条线索需要更多分析上下文
swarm_size: 50
waves: 3
output_schema:
  lead_id: string
  score: int(1-100)
  icp_match: object
    company_size: bool
    industry: bool
    tech_stack: bool
    title_seniority: bool
  buying_signals: string[]
  recommended_action: enum(hot, warm, nurture, disqualify)

Content Generation (500 product descriptions)

内容生成（500个产品描述）

yaml

task: Write a 100-word product description from product data
batch_size: 25  # Output is longer, needs more generation tokens
swarm_size: 20
output_schema:
  product_id: string
  title: string
  description: string(100 words)
  key_features: string[3]
  seo_keywords: string[5]

yaml

task: 根据产品数据撰写100字的产品描述
batch_size: 25  # 输出内容较长，需要更多生成令牌
swarm_size: 20
output_schema:
  product_id: string
  title: string
  description: string(100字)
  key_features: string[3]
  seo_keywords: string[5]

Document Summarization (300 papers)

文档总结（300篇论文）

yaml

task: Summarize each paper in 3 bullet points with key findings
batch_size: 15  # Papers are long, fewer per agent
swarm_size: 20
output_schema:
  paper_id: string
  title: string
  summary_bullets: string[3]
  key_finding: string
  methodology: string
  relevance_score: float(0-1)

yaml

task: 用3个要点总结每篇论文的关键发现
batch_size: 15  # 论文内容较长，每个Agent处理更少条目
swarm_size: 20
output_schema:
  paper_id: string
  title: string
  summary_bullets: string[3]
  key_finding: string
  methodology: string
  relevance_score: float(0-1)

Error Handling

错误处理

Error	Cause	Response
Agent returns no output	Agent timeout or crash	Re-deploy that batch with a fresh agent
Agent returns partial results	Context overflow or mid-processing failure	Identify processed items, re-deploy unprocessed items
Agent returns malformed JSON	Output parsing failure	Attempt to extract results from raw text, re-deploy if impossible
Duplicate results across agents	Batch overlap miscalculation	Deduplicate by item ID, keep first occurrence
All agents fail	Systemic issue (bad brief, impossible task)	Abort, report to user, suggest task reformulation
Retry agent also fails	Item is truly unprocessable	Mark as unrecoverable, include raw input in failure report
Data source unavailable	File missing, permission denied	Abort before deploying swarm, report to user
Output file write fails	Disk space, permissions	Attempt alternative location, or return results in conversation

错误	原因	应对措施
Agent无输出返回	Agent超时或崩溃	使用新Agent重新部署该批次
Agent返回部分结果	上下文溢出或处理中途失败	识别已处理条目，重新部署未处理条目
Agent返回格式错误的JSON	输出解析失败	尝试从原始文本提取结果，若无法则重新部署
不同Agent返回重复结果	批次重叠计算错误	按条目ID去重，保留第一个结果
所有Agent失败	系统性问题（任务说明错误、任务不可完成）	中止，向用户报告，建议重新制定任务
重试Agent也失败	条目确实无法处理	标记为不可恢复，在失败报告中包含原始输入
数据源不可用	文件缺失、权限不足	部署集群前中止，向用户报告
输出文件写入失败	磁盘空间不足、权限问题	尝试备用位置，或在对话中返回结果

Scaling Guidelines

扩展指南

Total Items	Recommended Batch Size	Swarm Size	Waves	Notes
10-50	10-25	2-5	1	Small job, minimal overhead
50-200	25-50	4-10	1	Standard processing
200-500	40-60	5-15	1	Moderate scale
500-1000	50-80	10-20	1-2	Large scale, may need waves
1000-5000	50-100	20+	2-5	Multi-wave deployment
5000+	100-200	50+	5+	Enterprise scale, consider chunking

总条目数	推荐批次大小	集群规模	轮次	备注
10-50	10-25	2-5	1	小型任务，开销最小
50-200	25-50	4-10	1	标准处理
200-500	40-60	5-15	1	中等规模
500-1000	50-80	10-20	1-2	大规模，可能需要分轮
1000-5000	50-100	20+	2-5	多轮部署
5000+	100-200	50+	5+	企业级规模，考虑分块处理

Anti-Patterns to Avoid

需避免的反模式

Do not use swarm for sequential tasks -- If item N depends on the result of item N-1, a swarm is the wrong tool. Use a chain instead.
Do not deploy 100 agents for 100 items -- One item per agent wastes overhead. Batch them.
Do not skip the schema definition -- Without a schema, merging results from 25 agents becomes a nightmare.
Do not ignore failures -- Even at 99% success rate, 1% of 10,000 items is 100 failures. Always run retries.
Do not deploy without a sample run -- Process 5 items manually first to validate the task definition and output quality before scaling.

不要将集群用于顺序任务 -- 如果条目N依赖于条目N-1的结果，集群不是合适的工具。应使用链式处理。
不要为100条条目部署100个Agent -- 每个Agent处理一条条目会浪费开销。应进行批次处理。
不要跳过数据结构定义 -- 没有数据结构，合并25个Agent的结果会变得异常困难。
不要忽略失败条目 -- 即使成功率99%，10,000条条目中的1%也是100条失败条目。务必执行重试。
不要在未进行样本测试的情况下部署 -- 先手动处理5条条目以验证任务定义和输出质量，再进行扩展。