agent-swarm-deployer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Agent Swarm Deployer

Agent集群部署工具

A high-throughput parallel data processing framework that deploys swarms of sub-agents to handle massive data tasks. While Agent Army is built for code changes (edit files, respect dependency graphs, verify builds), Agent Swarm is built for data operations where individual units of work are independent and results need aggregation.
这是一个高吞吐量的并行数据处理框架,通过部署子Agent集群来处理大规模数据任务。Agent Army专为代码变更场景设计(编辑文件、遵循依赖图、验证构建),而Agent Swarm则适用于工作单元相互独立且需要聚合结果的数据操作场景。

Agent Army vs Agent Swarm

Agent Army vs Agent Swarm

DimensionAgent ArmyAgent Swarm
PurposeCode changes across filesData processing at scale
Units of workFiles in a codebaseDocuments, records, rows, items
DependenciesImport graph mattersItems are independent
OutputModified source filesAggregated results (reports, datasets, content)
VerificationBuild check, pattern scanResult validation, completeness check
Error handlingFix and re-verify codeRetry failed items, collect partial results
Typical scale10-200 files100-10,000+ items
Key riskBreaking the buildData loss, incomplete processing
维度Agent ArmyAgent Swarm
用途跨文件代码变更大规模数据处理
工作单元代码库中的文件文档、记录、行、条目
依赖关系导入图至关重要工作单元相互独立
输出修改后的源文件聚合结果(报告、数据集、内容)
验证方式构建检查、模式扫描结果验证、完整性检查
错误处理修复并重新验证代码重试失败条目、收集部分结果
典型规模10-200个文件100-10,000+个条目
核心风险破坏构建数据丢失、处理不完整

Use Cases

适用场景

Document Processing

文档处理

  • Analyze 500 customer support tickets for sentiment and categorization
  • Extract key information from 200 legal contracts
  • Summarize 1000 research paper abstracts
  • Parse 300 resumes for qualification matching
  • 分析500份客户支持工单的情感倾向并进行分类
  • 从200份法律合同中提取关键信息
  • 总结1000篇研究论文的摘要
  • 解析300份简历以匹配任职资格

Dataset Analysis

数据集分析

  • Score and rank 2000 leads by ICP fit
  • Classify 5000 product reviews by topic and sentiment
  • Audit 1000 blog posts for SEO compliance
  • Grade 500 sales call transcripts against a methodology
  • 根据理想客户画像(ICP)匹配度为2000条销售线索打分并排名
  • 按主题和情感倾向对5000条产品评论进行分类
  • 审核1000篇博客文章的SEO合规性
  • 根据方法论为500份销售通话记录评分

Bulk Content Generation

批量内容生成

  • Generate personalized email first lines for 500 prospects
  • Create product descriptions for 1000 SKUs
  • Write social media posts for 200 blog articles
  • Generate meta descriptions for 800 web pages
  • 为500个潜在客户生成个性化邮件开头
  • 为1000个SKU创建产品描述
  • 为200篇博客文章撰写社交媒体帖子
  • 为800个网页生成元描述

Data Transformation

数据转换

  • Convert 1000 CSV rows into structured JSON objects
  • Normalize 500 address records
  • Translate 300 support articles into 5 languages
  • Reformat 2000 database records from schema A to schema B
  • 将1000行CSV数据转换为结构化JSON对象
  • 标准化500条地址记录
  • 将300篇支持文章翻译成5种语言
  • 将2000条数据库记录从Schema A重新格式化为Schema B

Architecture

架构

You (Commander)
 |
 |-- Phase 1: Intake & Inventory
 |    |-- Count total items
 |    |-- Sample items for schema detection
 |    |-- Estimate token budget per item
 |
 |-- Phase 2: Swarm Design
 |    |-- Calculate optimal batch size
 |    |-- Determine number of swarm agents
 |    |-- Define result schema
 |
 |-- Phase 3: Deploy Swarm (parallel)
 |    |-- Agent S1: items 1-50      ---\
 |    |-- Agent S2: items 51-100     ---|
 |    |-- Agent S3: items 101-150    ---|-- All run in parallel
 |    |-- Agent S4: items 151-200    ---|
 |    |-- ...                       ---/
 |
 |-- Phase 4: Collect & Aggregate
 |    |-- Gather all agent results
 |    |-- Merge into unified output
 |    |-- Identify failures
 |
 |-- Phase 5: Recovery (if needed)
 |    |-- Retry failed items
 |    |-- Fill gaps
 |
 |-- Phase 6: Deliver
 |    |-- Write final output
 |    |-- Generate summary report
你(指挥官)
 |
 |-- 阶段1:数据采集与盘点
 |    |-- 统计总条目数
 |    |-- 采样条目以检测数据结构
 |    |-- 估算每条目所需令牌预算
 |
 |-- 阶段2:集群设计
 |    |-- 计算最优批次大小
 |    |-- 确定集群Agent数量
 |    |-- 定义结果数据结构
 |
 |-- 阶段3:部署集群(并行执行)
 |    |-- Agent S1:条目1-50      ---\
 |    |-- Agent S2:条目51-100     ---|
 |    |-- Agent S3:条目101-150    ---|-- 全部并行运行
 |    |-- Agent S4:条目151-200    ---|
 |    |-- ...                       ---/
 |
 |-- 阶段4:收集与聚合
 |    |-- 收集所有Agent的结果
 |    |-- 合并为统一输出
 |    |-- 识别失败条目
 |
 |-- 阶段5:恢复(如有需要)
 |    |-- 重试失败条目
 |    |-- 填补缺失内容
 |
 |-- 阶段6:交付
 |    |-- 写入最终输出
 |    |-- 生成汇总报告

Execution Protocol

执行流程

Step 0: Understand the Task

步骤0:理解任务

Before deploying any agents, fully understand what the user needs:
  1. Data source -- Where is the data? Files on disk? A CSV? A directory of documents? Inline in the conversation?
  2. Operation -- What should be done to each item? Summarize? Classify? Extract? Transform? Generate?
  3. Output format -- What should the result look like? JSON? CSV? Markdown? Individual files?
  4. Output destination -- Where should results go? Single file? Directory of files? Returned in conversation?
  5. Quality requirements -- Are there validation rules? Schemas? Scoring criteria?
If any of these are ambiguous, ask the user before proceeding. Getting the spec wrong wastes all agent compute.
在部署任何Agent之前,需完全理解用户需求:
  1. 数据源 -- 数据存储在哪里?磁盘文件?CSV?文档目录?对话内联内容?
  2. 操作要求 -- 对每条目执行什么操作?总结?分类?提取?转换?生成?
  3. 输出格式 -- 结果应是什么格式?JSON?CSV?Markdown?单个文件?
  4. 输出目标 -- 结果应存储到哪里?单个文件?文件目录?返回至对话?
  5. 质量要求 -- 是否有验证规则?数据结构?评分标准?
如果任何信息不明确,需先询问用户再继续。任务规格错误会浪费所有Agent的计算资源。

Step 1: Intake and Inventory

步骤1:数据采集与盘点

1a. Discover and Count Items

1a. 发现并统计条目

Locate all data items and count them:
Glob/Bash: Find all files matching the pattern
Read: Sample first 3-5 items to understand structure
Bash: Count total items (wc -l for CSVs, file count for directories, etc.)
Report:
undefined
定位所有数据条目并统计数量:
Glob/Bash:查找所有匹配模式的文件
Read:采样前3-5条条目以了解结构
Bash:统计总条目数(CSV用wc -l,目录用文件计数等)
报告示例:
undefined

Intake Report

采集报告

  • Data source: /path/to/data/ (or inline, or CSV)
  • Total items found: 1,247
  • Item format: JSON files, avg 2KB each
  • Sample item structure: { name, email, company, title, linkedin_url }
  • Estimated tokens per item: ~500
  • Total estimated tokens: ~623,500
undefined
  • 数据源:/path/to/data/(或内联内容、CSV)
  • 发现总条目数:1,247
  • 条目格式:JSON文件,平均2KB每个
  • 采样条目结构:{ name, email, company, title, linkedin_url }
  • 每条目估算令牌数:~500
  • 总估算令牌数:~623,500
undefined

1b. Detect Schema

1b. 检测数据结构

From the sample items, define the input schema:
json
{
  "inputSchema": {
    "type": "object",
    "fields": {
      "name": "string",
      "email": "string",
      "company": "string",
      "title": "string",
      "linkedin_url": "string"
    }
  }
}
从采样条目中定义输入数据结构:
json
{
  "inputSchema": {
    "type": "object",
    "fields": {
      "name": "string",
      "email": "string",
      "company": "string",
      "title": "string",
      "linkedin_url": "string"
    }
  }
}

1c. Define Output Schema

1c. 定义输出数据结构

Based on the task, define what each processed item should look like:
json
{
  "outputSchema": {
    "type": "object",
    "fields": {
      "name": "string (from input)",
      "company": "string (from input)",
      "personalized_first_line": "string (generated, 1-2 sentences)",
      "pain_point_guess": "string (inferred from title + company)",
      "confidence": "number (0-1)",
      "processing_status": "enum: success | failed | skipped"
    }
  }
}
根据任务要求,定义每条处理后条目应有的格式:
json
{
  "outputSchema": {
    "type": "object",
    "fields": {
      "name": "string (来自输入)",
      "company": "string (来自输入)",
      "personalized_first_line": "string (生成内容,1-2句话)",
      "pain_point_guess": "string (从职位+公司推断)",
      "confidence": "number (0-1)",
      "processing_status": "enum: success | failed | skipped"
    }
  }
}

Step 2: Swarm Design

步骤2:集群设计

2a. Calculate Batch Size

2a. 计算批次大小

The batch size determines how many items each agent processes. Factors:
FactorGuideline
Token budget per itemInput tokens + expected output tokens + instruction overhead
Agent context limit~200K usable tokens per agent (conservative, leaves room for instructions)
Instruction overhead~2K tokens for the agent's brief
Safety marginUse 70% of theoretical capacity
Formula:
usable_tokens = 200,000 * 0.70 = 140,000
tokens_per_item = input_tokens + output_tokens + 100 (overhead)
batch_size = floor(usable_tokens / tokens_per_item)
Practical limits:
  • Minimum batch size: 5 items (agent overhead is not worth it for fewer)
  • Maximum batch size: 200 items (beyond this, agent context gets cluttered)
  • Sweet spot: 20-80 items per agent for most text processing tasks
批次大小决定每个Agent处理的条目数量,需考虑以下因素:
因素指导原则
每条目令牌预算输入令牌 + 预期输出令牌 + 指令开销
Agent上下文限制每个Agent约200K可用令牌(保守值,预留指令空间)
指令开销Agent任务说明约2K令牌
安全余量使用理论容量的70%
计算公式:
usable_tokens = 200,000 * 0.70 = 140,000
tokens_per_item = input_tokens + output_tokens + 100(开销)
batch_size = floor(usable_tokens / tokens_per_item)
实际限制:
  • 最小批次大小:5条条目(Agent开销不值得处理更少条目)
  • 最大批次大小:200条条目(超过此值,Agent上下文会变得混乱)
  • 最优区间:大多数文本处理任务中,每个Agent处理20-80条条目

2b. Determine Swarm Size

2b. 确定集群规模

total_items = 1,247
batch_size = 50
swarm_size = ceil(1,247 / 50) = 25 agents
Practical limits:
  • Maximum parallel agents: 20 per wave (prevents system overload)
  • If swarm_size > 20: Deploy in waves. Wave 1: agents 1-20. Wave 2: agents 21-25.
total_items = 1,247
batch_size = 50
swarm_size = ceil(1,247 / 50) = 25个Agent
实际限制:
  • 最大并行Agent数:每轮20个(防止系统过载)
  • 如果集群规模>20:分轮部署。第一轮:Agent 1-20。第二轮:Agent 21-25。

2c. Present Swarm Plan

2c. 提交集群计划

undefined
undefined

Swarm Plan

集群计划

  • Total items: 1,247
  • Batch size: 50 items per agent
  • Swarm size: 25 agents
  • Waves: 2 (Wave 1: 20 agents, Wave 2: 5 agents)
  • Estimated processing: All items covered
  • Output format: Single CSV file with all results
  • 总条目数:1,247
  • 批次大小:每个Agent处理50条条目
  • 集群规模:25个Agent
  • 部署轮次:2轮(第一轮:20个Agent,第二轮:5个Agent)
  • 估算覆盖范围:所有条目均会被处理
  • 输出格式:包含所有结果的单个CSV文件

Agent Assignments

Agent分配

AgentItemsRangeNotes
swarm-0150items 1-50--
swarm-0250items 51-100--
.........--
swarm-2547items 1201-1247Partial batch
Proceed? (Y to deploy / N to adjust batch size)
undefined
Agent条目数范围备注
swarm-0150条目1-50--
swarm-0250条目51-100--
.........--
swarm-2547条目1201-1247部分批次
是否继续?(Y部署 / N调整批次大小)
undefined

Step 3: Prepare Agent Briefs

步骤3:准备Agent任务说明

Each swarm agent receives a self-contained brief. The brief must include:
  1. Role: "You are a data processing agent in a swarm of 25. Your job is to process items 51-100."
  2. Task description: Exactly what to do with each item (the operation).
  3. Input data: The actual data items for this batch (embedded in the brief or read from a file range).
  4. Output schema: The exact format for results. Include a concrete example.
  5. Quality rules: Validation criteria, edge case handling, what to do with malformed items.
  6. Error protocol: "If an item cannot be processed, mark it as failed with a reason. Do not skip items silently."
  7. Output format: "Return your results as a JSON array. Each element must match the output schema."
每个集群Agent会收到一份独立的任务说明,必须包含以下内容:
  1. 角色:"你是由25个Agent组成的数据处理集群中的一员,负责处理条目51-100。"
  2. 任务描述:明确对每条目执行的操作。
  3. 输入数据:此批次的实际数据条目(嵌入任务说明或从文件范围读取)。
  4. 输出数据结构:结果的精确格式,包含具体示例。
  5. 质量规则:验证标准、边缘情况处理、格式错误条目的处理方式。
  6. 错误协议:"如果某条目无法处理,标记为失败并注明原因。请勿静默跳过条目。"
  7. 输出格式:"以JSON数组形式返回结果。每个元素必须匹配输出数据结构。"

Agent Brief Template

Agent任务说明模板

You are swarm-agent-{N}, processing batch {N} of {TOTAL_BATCHES} in a data processing swarm.
你是swarm-agent-{N},在数据处理集群中处理第{N}批次(共{TOTAL_BATCHES}批次)。

Your Task

你的任务

{TASK_DESCRIPTION}
{TASK_DESCRIPTION}

Input Data

输入数据

You will process the following {BATCH_SIZE} items:
{ITEMS_AS_STRUCTURED_DATA}
你将处理以下{BATCH_SIZE}条条目:
{ITEMS_AS_STRUCTURED_DATA}

Output Schema

输出数据结构

For each item, produce a result matching this schema: {OUTPUT_SCHEMA_WITH_EXAMPLE}
对每条条目,生成符合以下结构的结果: {OUTPUT_SCHEMA_WITH_EXAMPLE}

Example

示例

Input: {EXAMPLE_INPUT}
Expected output: {EXAMPLE_OUTPUT}
输入: {EXAMPLE_INPUT}
预期输出: {EXAMPLE_OUTPUT}

Quality Rules

质量规则

  • {RULE_1}
  • {RULE_2}
  • {RULE_3}
  • Every item MUST appear in your output, even if processing failed.
  • For failed items, set processing_status to "failed" and include an error_reason field.
  • {RULE_1}
  • {RULE_2}
  • {RULE_3}
  • 每条条目必须出现在输出中,即使处理失败。
  • 对于失败条目,将processing_status设置为"failed"并添加error_reason字段。

Error Handling

错误处理

  • Malformed input: Mark as "failed", reason: "malformed input: {description}"
  • Ambiguous data: Make your best judgment, set confidence to < 0.5
  • Missing required field: Mark as "skipped", reason: "missing {field}"
  • 格式错误输入:标记为"failed",原因:"malformed input: {description}"
  • 模糊数据:做出最佳判断,将confidence设置为< 0.5
  • 缺失必填字段:标记为"skipped",原因:"missing {field}"

Output Format

输出格式

Return a JSON object with this structure: { "agentId": "swarm-agent-{N}", "batchRange": "{START}-{END}", "totalProcessed": {NUMBER}, "totalSuccess": {NUMBER}, "totalFailed": {NUMBER}, "totalSkipped": {NUMBER}, "results": [ {OUTPUT_SCHEMA_ITEM_1}, {OUTPUT_SCHEMA_ITEM_2}, ... ], "errors": [ {"itemIndex": N, "reason": "description"} ], "notes": "Any observations about the data quality or patterns" }
undefined
返回具有以下结构的JSON对象: { "agentId": "swarm-agent-{N}", "batchRange": "{START}-{END}", "totalProcessed": {NUMBER}, "totalSuccess": {NUMBER}, "totalFailed": {NUMBER}, "totalSkipped": {NUMBER}, "results": [ {OUTPUT_SCHEMA_ITEM_1}, {OUTPUT_SCHEMA_ITEM_2}, ... ], "errors": [ {"itemIndex": N, "reason": "description"} ], "notes": "关于数据质量或模式的任何观察结果" }
undefined

Step 4: Deploy Swarm

步骤4:部署集群

4a. Wave Deployment

4a. 分轮部署

Deploy agents in waves to manage system load:
Wave 1: Launch up to 20 agents in parallel using the Agent tool with
run_in_background: true
. Send ALL agent calls in a single message.
Launch swarm-agent-01 through swarm-agent-20 in parallel.
Each receives its batch of items and the standardized brief.
Wave 2+ (if needed): After Wave 1 completes, launch remaining agents.
分轮部署Agent以管理系统负载:
第一轮:使用Agent工具并行启动最多20个Agent,设置
run_in_background: true
。在单条消息中发送所有Agent调用指令。
并行启动swarm-agent-01至swarm-agent-20。
每个Agent将收到对应的批次条目和标准化任务说明。
后续轮次(如有需要):第一轮完成后,启动剩余Agent。

4b. Data Distribution

4b. 数据分发

How to get data to each agent depends on the data source:
Source TypeDistribution Method
Directory of filesTell each agent which file paths to Read
Single large CSVPre-split into batch files using Bash, tell each agent its file
Single JSON arrayPre-split into batch files using Bash
Inline dataEmbed directly in the agent brief (for small datasets)
Database exportExport to CSV first, then split
For CSVs and large files, pre-split before deploying:
bash
undefined
向每个Agent传递数据的方式取决于数据源类型:
数据源类型分发方式
文件目录告知每个Agent需要读取的文件路径
大型CSV文件使用Bash预先拆分为批次文件,告知每个Agent对应的文件
大型JSON数组使用Bash预先拆分为批次文件
内联数据直接嵌入Agent任务说明(适用于小型数据集)
数据库导出先导出为CSV,再拆分
对于CSV和大型文件,部署前需预先拆分:
bash
undefined

Split a CSV into batches of 50 rows (preserving header)

将CSV拆分为50行的批次(保留表头)

head -1 data.csv > header.csv tail -n +2 data.csv | split -l 50 - batch_ for f in batch_*; do cat header.csv "$f" > "batches/${f}.csv" && rm "$f"; done rm header.csv
undefined
head -1 data.csv > header.csv tail -n +2 data.csv | split -l 50 - batch_ for f in batch_*; do cat header.csv "$f" > "batches/${f}.csv" && rm "$f"; done rm header.csv
undefined

4c. Progress Tracking

4c. 进度追踪

As agents complete, track progress:
undefined
Agent完成任务后,追踪进度:
undefined

Swarm Progress

集群进度

Wave 1 (20 agents): [################----] 16/20 complete
AgentStatusProcessedSuccessFailedDuration
swarm-01DONE5048245s
swarm-02DONE5050038s
swarm-03RUNNING--------
..................
Cumulative: 800/1247 items processed (64%)
undefined
第一轮(20个Agent): [################----] 16/20 已完成
Agent状态已处理成功失败耗时
swarm-01已完成5048245s
swarm-02已完成5050038s
swarm-03运行中--------
..................
累计:800/1247条条目已处理(64%)
undefined

Step 5: Collect and Aggregate Results

步骤5:收集与聚合结果

5a. Gather Agent Outputs

5a. 收集Agent输出

As each agent completes, collect its JSON output. Parse and validate:
  1. Schema validation -- Does each result match the output schema?
  2. Completeness check -- Does the agent's result count match its batch size?
  3. Duplicate detection -- Check for duplicate item IDs across agents.
  4. Error extraction -- Pull out all failed/skipped items for the retry queue.
每个Agent完成后,收集其JSON输出并解析验证:
  1. 数据结构验证 -- 每个结果是否匹配输出数据结构?
  2. 完整性检查 -- Agent返回的结果数量是否与批次大小一致?
  3. 重复检测 -- 检查不同Agent之间是否存在重复条目ID。
  4. 错误提取 -- 提取所有失败/跳过的条目至重试队列。

5b. Merge Results

5b. 合并结果

Combine all agent results into a single unified output:
python
undefined
将所有Agent的结果合并为统一输出:
python
undefined

Conceptual merge logic

概念性合并逻辑

merged_results = [] failed_items = [] skipped_items = []
for agent_output in all_agent_outputs: for result in agent_output["results"]: if result["processing_status"] == "success": merged_results.append(result) elif result["processing_status"] == "failed": failed_items.append(result) elif result["processing_status"] == "skipped": skipped_items.append(result)
merged_results = [] failed_items = [] skipped_items = []
for agent_output in all_agent_outputs: for result in agent_output["results"]: if result["processing_status"] == "success": merged_results.append(result) elif result["processing_status"] == "failed": failed_items.append(result) elif result["processing_status"] == "skipped": skipped_items.append(result)

Sort by original item order

按原始条目顺序排序

merged_results.sort(key=lambda x: x["original_index"])
undefined
merged_results.sort(key=lambda x: x["original_index"])
undefined

5c. Validate Completeness

5c. 验证完整性

undefined
undefined

Aggregation Report

聚合报告

  • Total items in input: 1,247
  • Total results received: 1,247
  • Successful: 1,198 (96.1%)
  • Failed: 34 (2.7%)
  • Skipped: 15 (1.2%)
  • 输入总条目数:1,247
  • 收到的总结果数:1,247
  • 成功:1,198(96.1%)
  • 失败:34(2.7%)
  • 跳过:15(1.2%)

Coverage Check

覆盖检查

  • Items with results: 1,247 / 1,247 (100% coverage)
  • Missing items: 0
  • Duplicate results: 0
  • 有结果的条目:1,247 / 1,247(100%覆盖)
  • 缺失条目:0
  • 重复结果:0

Failure Analysis

失败分析

Failure ReasonCount
Malformed input12
Missing required field8
Ambiguous data14
Proceed to retry failed items? (Y / skip / manual review)
undefined
失败原因数量
格式错误输入12
缺失必填字段8
模糊数据14
是否重试失败条目?(Y / 跳过 / 人工审核)
undefined

Step 6: Error Recovery

步骤6:错误恢复

6a. Retry Queue

6a. 重试队列

Collect all failed and skipped items into a retry batch:
Retry batch: 49 items (34 failed + 15 skipped)
Retry strategy: Single agent with enhanced instructions
将所有失败和跳过的条目收集到重试批次:
重试批次:49条条目(34条失败 + 15条跳过)
重试策略:单个Agent搭配增强版任务说明

6b. Enhanced Retry Brief

6b. 增强版重试任务说明

The retry agent gets special instructions:
You are the retry agent. These items failed or were skipped in the first pass.
For each item, I am providing the original item AND the failure reason from the first attempt.

Your job:
1. Try harder -- use more creative interpretation for ambiguous items
2. For truly malformed items, extract whatever you can and note what is missing
3. For items that failed due to missing fields, infer the field if possible or mark as "unrecoverable"

The bar is lower for retries: partial results are better than no results.
重试Agent会收到特殊指令:
你是重试Agent。这些条目在第一次处理中失败或被跳过。
对于每条条目,我会提供原始条目以及第一次尝试的失败原因。

你的任务:
1. 尝试更灵活的处理方式——对模糊条目使用更具创造性的解读
2. 对于确实格式错误的条目,提取尽可能多的信息并注明缺失内容
3. 对于因缺失字段失败的条目,尽可能推断字段内容,否则标记为"unrecoverable"

重试的标准更低:部分结果优于无结果。

6c. Retry Limits

6c. 重试限制

  • Maximum retries: 2 (original attempt + 2 retries = 3 total attempts)
  • After max retries: Mark item as "unrecoverable" and include in the final report
  • Unrecoverable threshold: If > 10% of items are unrecoverable, flag to the user for manual review
  • 最大重试次数:2次(原始尝试 + 2次重试 = 共3次尝试)
  • 达到最大重试次数后:将条目标记为"unrecoverable"并纳入最终报告
  • 不可恢复阈值:如果超过10%的条目不可恢复,需向用户标记并建议人工审核

Step 7: Write Final Output

步骤7:写入最终输出

Based on the user's requested output format:
根据用户要求的输出格式执行:

CSV Output

CSV输出

bash
undefined
bash
undefined

Write header + all successful results as CSV

写入表头 + 所有成功结果为CSV

Include a separate failures.csv for failed items

为失败条目单独创建failures.csv

undefined
undefined

JSON Output

JSON输出

json
{
  "metadata": {
    "task": "description of what was processed",
    "totalItems": 1247,
    "successfulItems": 1210,
    "failedItems": 22,
    "unrecoverableItems": 15,
    "processingDate": "2026-04-10T12:00:00Z",
    "swarmSize": 25,
    "waves": 2
  },
  "results": [...],
  "failures": [...],
  "summary": {
    "key_patterns": "...",
    "notable_findings": "...",
    "data_quality_notes": "..."
  }
}
json
{
  "metadata": {
    "task": "处理任务描述",
    "totalItems": 1247,
    "successfulItems": 1210,
    "failedItems": 22,
    "unrecoverableItems": 15,
    "processingDate": "2026-04-10T12:00:00Z",
    "swarmSize": 25,
    "waves": 2
  },
  "results": [...],
  "failures": [...],
  "summary": {
    "key_patterns": "...",
    "notable_findings": "...",
    "data_quality_notes": "..."
  }
}

Markdown Report

Markdown报告

markdown
undefined
markdown
undefined

Processing Results: [Task Name]

处理结果:[任务名称]

Summary

摘要

  • Processed: 1,247 items
  • Success rate: 97%
  • Key findings: [aggregated insights]
  • 已处理:1,247条条目
  • 成功率:97%
  • 关键发现:[聚合洞察]

Results Table

结果表格

ItemResult Field 1Result Field 2Status
............
条目结果字段1结果字段2状态
............

Failures

失败条目

ItemReasonAttempted Retries
.........
undefined
条目原因重试次数
.........
undefined

Individual Files

单个文件输出

For tasks where each item produces a standalone document (e.g., generating 500 blog post outlines):
output/
  001-item-name.md
  002-item-name.md
  ...
  500-item-name.md
  _summary.md
  _failures.md
对于每条目生成独立文档的任务(例如,生成500篇博客文章大纲):
output/
  001-item-name.md
  002-item-name.md
  ...
  500-item-name.md
  _summary.md
  _failures.md

Step 8: Summary Report

步骤8:汇总报告

Always end with a comprehensive summary:
undefined
最终需提供全面的汇总报告:
undefined

Swarm Processing Complete

集群处理完成

Execution Summary

执行摘要

  • Task: [description]
  • Data source: [source]
  • Total items: 1,247
  • Swarm size: 25 agents across 2 waves
  • Total processing time: ~3 minutes
  • 任务:[描述]
  • 数据源:[来源]
  • 总条目数:1,247
  • 集群规模:25个Agent,分2轮部署
  • 总处理时间:约3分钟

Results

结果

  • Successful: 1,210 (97.0%)
  • Failed (recovered on retry): 22 (1.8%)
  • Unrecoverable: 15 (1.2%)
  • Output written to: [path]
  • 成功:1,210(97.0%)
  • 失败(重试后恢复):22(1.8%)
  • 不可恢复:15(1.2%)
  • 输出已写入:[路径]

Quality Metrics

质量指标

  • Schema compliance: 100% of successful results match output schema
  • Average confidence: 0.82
  • Items flagged for review: 37 (low confidence < 0.5)
  • 数据结构合规性:100%的成功结果匹配输出数据结构
  • 平均置信度:0.82
  • 标记需审核的条目:37条(置信度<0.5)

Patterns Observed

观察到的模式

  • [Any interesting patterns the swarm noticed across the data]
  • [Data quality issues found]
  • [Recommendations for future processing]
  • [集群在数据中发现的任何有趣模式]
  • [发现的数据质量问题]
  • [未来处理的建议]

Cost

成本

  • Agents deployed: 25 (Wave 1) + 1 (retry) = 26
  • Estimated tokens consumed: ~1.2M input + ~400K output
undefined
  • 部署的Agent数量:25(第一轮) + 1(重试) = 26
  • 估算消耗令牌数:~1.2M输入 + ~400K输出
undefined

Swarm Configurations for Common Tasks

常见任务的集群配置

Sentiment Analysis (1000 reviews)

情感分析(1000条评论)

yaml
task: Classify each review as positive/negative/neutral with confidence score
batch_size: 100  # Reviews are short, pack more per agent
swarm_size: 10
output_schema:
  review_id: string
  sentiment: enum(positive, negative, neutral, mixed)
  confidence: float(0-1)
  key_phrases: string[]
  summary: string(1 sentence)
yaml
task: 将每条评论分类为正面/负面/中性并给出置信度评分
batch_size: 100  # 评论内容较短,每个Agent可处理更多条目
swarm_size: 10
output_schema:
  review_id: string
  sentiment: enum(positive, negative, neutral, mixed)
  confidence: float(0-1)
  key_phrases: string[]
  summary: string(1句话)

Lead Scoring (2000 contacts)

销售线索评分(2000个联系人)

yaml
task: Score each lead 1-100 based on ICP fit criteria
batch_size: 40  # Each lead needs more analysis context
swarm_size: 50
waves: 3
output_schema:
  lead_id: string
  score: int(1-100)
  icp_match: object
    company_size: bool
    industry: bool
    tech_stack: bool
    title_seniority: bool
  buying_signals: string[]
  recommended_action: enum(hot, warm, nurture, disqualify)
yaml
task: 根据ICP匹配标准为每条线索打1-100分
batch_size: 40  # 每条线索需要更多分析上下文
swarm_size: 50
waves: 3
output_schema:
  lead_id: string
  score: int(1-100)
  icp_match: object
    company_size: bool
    industry: bool
    tech_stack: bool
    title_seniority: bool
  buying_signals: string[]
  recommended_action: enum(hot, warm, nurture, disqualify)

Content Generation (500 product descriptions)

内容生成(500个产品描述)

yaml
task: Write a 100-word product description from product data
batch_size: 25  # Output is longer, needs more generation tokens
swarm_size: 20
output_schema:
  product_id: string
  title: string
  description: string(100 words)
  key_features: string[3]
  seo_keywords: string[5]
yaml
task: 根据产品数据撰写100字的产品描述
batch_size: 25  # 输出内容较长,需要更多生成令牌
swarm_size: 20
output_schema:
  product_id: string
  title: string
  description: string(100字)
  key_features: string[3]
  seo_keywords: string[5]

Document Summarization (300 papers)

文档总结(300篇论文)

yaml
task: Summarize each paper in 3 bullet points with key findings
batch_size: 15  # Papers are long, fewer per agent
swarm_size: 20
output_schema:
  paper_id: string
  title: string
  summary_bullets: string[3]
  key_finding: string
  methodology: string
  relevance_score: float(0-1)
yaml
task: 用3个要点总结每篇论文的关键发现
batch_size: 15  # 论文内容较长,每个Agent处理更少条目
swarm_size: 20
output_schema:
  paper_id: string
  title: string
  summary_bullets: string[3]
  key_finding: string
  methodology: string
  relevance_score: float(0-1)

Error Handling

错误处理

ErrorCauseResponse
Agent returns no outputAgent timeout or crashRe-deploy that batch with a fresh agent
Agent returns partial resultsContext overflow or mid-processing failureIdentify processed items, re-deploy unprocessed items
Agent returns malformed JSONOutput parsing failureAttempt to extract results from raw text, re-deploy if impossible
Duplicate results across agentsBatch overlap miscalculationDeduplicate by item ID, keep first occurrence
All agents failSystemic issue (bad brief, impossible task)Abort, report to user, suggest task reformulation
Retry agent also failsItem is truly unprocessableMark as unrecoverable, include raw input in failure report
Data source unavailableFile missing, permission deniedAbort before deploying swarm, report to user
Output file write failsDisk space, permissionsAttempt alternative location, or return results in conversation
错误原因应对措施
Agent无输出返回Agent超时或崩溃使用新Agent重新部署该批次
Agent返回部分结果上下文溢出或处理中途失败识别已处理条目,重新部署未处理条目
Agent返回格式错误的JSON输出解析失败尝试从原始文本提取结果,若无法则重新部署
不同Agent返回重复结果批次重叠计算错误按条目ID去重,保留第一个结果
所有Agent失败系统性问题(任务说明错误、任务不可完成)中止,向用户报告,建议重新制定任务
重试Agent也失败条目确实无法处理标记为不可恢复,在失败报告中包含原始输入
数据源不可用文件缺失、权限不足部署集群前中止,向用户报告
输出文件写入失败磁盘空间不足、权限问题尝试备用位置,或在对话中返回结果

Scaling Guidelines

扩展指南

Total ItemsRecommended Batch SizeSwarm SizeWavesNotes
10-5010-252-51Small job, minimal overhead
50-20025-504-101Standard processing
200-50040-605-151Moderate scale
500-100050-8010-201-2Large scale, may need waves
1000-500050-10020+2-5Multi-wave deployment
5000+100-20050+5+Enterprise scale, consider chunking
总条目数推荐批次大小集群规模轮次备注
10-5010-252-51小型任务,开销最小
50-20025-504-101标准处理
200-50040-605-151中等规模
500-100050-8010-201-2大规模,可能需要分轮
1000-500050-10020+2-5多轮部署
5000+100-20050+5+企业级规模,考虑分块处理

Anti-Patterns to Avoid

需避免的反模式

  1. Do not use swarm for sequential tasks -- If item N depends on the result of item N-1, a swarm is the wrong tool. Use a chain instead.
  2. Do not deploy 100 agents for 100 items -- One item per agent wastes overhead. Batch them.
  3. Do not skip the schema definition -- Without a schema, merging results from 25 agents becomes a nightmare.
  4. Do not ignore failures -- Even at 99% success rate, 1% of 10,000 items is 100 failures. Always run retries.
  5. Do not deploy without a sample run -- Process 5 items manually first to validate the task definition and output quality before scaling.
  1. 不要将集群用于顺序任务 -- 如果条目N依赖于条目N-1的结果,集群不是合适的工具。应使用链式处理。
  2. 不要为100条条目部署100个Agent -- 每个Agent处理一条条目会浪费开销。应进行批次处理。
  3. 不要跳过数据结构定义 -- 没有数据结构,合并25个Agent的结果会变得异常困难。
  4. 不要忽略失败条目 -- 即使成功率99%,10,000条条目中的1%也是100条失败条目。务必执行重试。
  5. 不要在未进行样本测试的情况下部署 -- 先手动处理5条条目以验证任务定义和输出质量,再进行扩展。