agent-swarm-deployer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAgent Swarm Deployer
Agent集群部署工具
A high-throughput parallel data processing framework that deploys swarms of sub-agents to handle massive data tasks. While Agent Army is built for code changes (edit files, respect dependency graphs, verify builds), Agent Swarm is built for data operations where individual units of work are independent and results need aggregation.
这是一个高吞吐量的并行数据处理框架,通过部署子Agent集群来处理大规模数据任务。Agent Army专为代码变更场景设计(编辑文件、遵循依赖图、验证构建),而Agent Swarm则适用于工作单元相互独立且需要聚合结果的数据操作场景。
Agent Army vs Agent Swarm
Agent Army vs Agent Swarm
| Dimension | Agent Army | Agent Swarm |
|---|---|---|
| Purpose | Code changes across files | Data processing at scale |
| Units of work | Files in a codebase | Documents, records, rows, items |
| Dependencies | Import graph matters | Items are independent |
| Output | Modified source files | Aggregated results (reports, datasets, content) |
| Verification | Build check, pattern scan | Result validation, completeness check |
| Error handling | Fix and re-verify code | Retry failed items, collect partial results |
| Typical scale | 10-200 files | 100-10,000+ items |
| Key risk | Breaking the build | Data loss, incomplete processing |
| 维度 | Agent Army | Agent Swarm |
|---|---|---|
| 用途 | 跨文件代码变更 | 大规模数据处理 |
| 工作单元 | 代码库中的文件 | 文档、记录、行、条目 |
| 依赖关系 | 导入图至关重要 | 工作单元相互独立 |
| 输出 | 修改后的源文件 | 聚合结果(报告、数据集、内容) |
| 验证方式 | 构建检查、模式扫描 | 结果验证、完整性检查 |
| 错误处理 | 修复并重新验证代码 | 重试失败条目、收集部分结果 |
| 典型规模 | 10-200个文件 | 100-10,000+个条目 |
| 核心风险 | 破坏构建 | 数据丢失、处理不完整 |
Use Cases
适用场景
Document Processing
文档处理
- Analyze 500 customer support tickets for sentiment and categorization
- Extract key information from 200 legal contracts
- Summarize 1000 research paper abstracts
- Parse 300 resumes for qualification matching
- 分析500份客户支持工单的情感倾向并进行分类
- 从200份法律合同中提取关键信息
- 总结1000篇研究论文的摘要
- 解析300份简历以匹配任职资格
Dataset Analysis
数据集分析
- Score and rank 2000 leads by ICP fit
- Classify 5000 product reviews by topic and sentiment
- Audit 1000 blog posts for SEO compliance
- Grade 500 sales call transcripts against a methodology
- 根据理想客户画像(ICP)匹配度为2000条销售线索打分并排名
- 按主题和情感倾向对5000条产品评论进行分类
- 审核1000篇博客文章的SEO合规性
- 根据方法论为500份销售通话记录评分
Bulk Content Generation
批量内容生成
- Generate personalized email first lines for 500 prospects
- Create product descriptions for 1000 SKUs
- Write social media posts for 200 blog articles
- Generate meta descriptions for 800 web pages
- 为500个潜在客户生成个性化邮件开头
- 为1000个SKU创建产品描述
- 为200篇博客文章撰写社交媒体帖子
- 为800个网页生成元描述
Data Transformation
数据转换
- Convert 1000 CSV rows into structured JSON objects
- Normalize 500 address records
- Translate 300 support articles into 5 languages
- Reformat 2000 database records from schema A to schema B
- 将1000行CSV数据转换为结构化JSON对象
- 标准化500条地址记录
- 将300篇支持文章翻译成5种语言
- 将2000条数据库记录从Schema A重新格式化为Schema B
Architecture
架构
You (Commander)
|
|-- Phase 1: Intake & Inventory
| |-- Count total items
| |-- Sample items for schema detection
| |-- Estimate token budget per item
|
|-- Phase 2: Swarm Design
| |-- Calculate optimal batch size
| |-- Determine number of swarm agents
| |-- Define result schema
|
|-- Phase 3: Deploy Swarm (parallel)
| |-- Agent S1: items 1-50 ---\
| |-- Agent S2: items 51-100 ---|
| |-- Agent S3: items 101-150 ---|-- All run in parallel
| |-- Agent S4: items 151-200 ---|
| |-- ... ---/
|
|-- Phase 4: Collect & Aggregate
| |-- Gather all agent results
| |-- Merge into unified output
| |-- Identify failures
|
|-- Phase 5: Recovery (if needed)
| |-- Retry failed items
| |-- Fill gaps
|
|-- Phase 6: Deliver
| |-- Write final output
| |-- Generate summary report你(指挥官)
|
|-- 阶段1:数据采集与盘点
| |-- 统计总条目数
| |-- 采样条目以检测数据结构
| |-- 估算每条目所需令牌预算
|
|-- 阶段2:集群设计
| |-- 计算最优批次大小
| |-- 确定集群Agent数量
| |-- 定义结果数据结构
|
|-- 阶段3:部署集群(并行执行)
| |-- Agent S1:条目1-50 ---\
| |-- Agent S2:条目51-100 ---|
| |-- Agent S3:条目101-150 ---|-- 全部并行运行
| |-- Agent S4:条目151-200 ---|
| |-- ... ---/
|
|-- 阶段4:收集与聚合
| |-- 收集所有Agent的结果
| |-- 合并为统一输出
| |-- 识别失败条目
|
|-- 阶段5:恢复(如有需要)
| |-- 重试失败条目
| |-- 填补缺失内容
|
|-- 阶段6:交付
| |-- 写入最终输出
| |-- 生成汇总报告Execution Protocol
执行流程
Step 0: Understand the Task
步骤0:理解任务
Before deploying any agents, fully understand what the user needs:
- Data source -- Where is the data? Files on disk? A CSV? A directory of documents? Inline in the conversation?
- Operation -- What should be done to each item? Summarize? Classify? Extract? Transform? Generate?
- Output format -- What should the result look like? JSON? CSV? Markdown? Individual files?
- Output destination -- Where should results go? Single file? Directory of files? Returned in conversation?
- Quality requirements -- Are there validation rules? Schemas? Scoring criteria?
If any of these are ambiguous, ask the user before proceeding. Getting the spec wrong wastes all agent compute.
在部署任何Agent之前,需完全理解用户需求:
- 数据源 -- 数据存储在哪里?磁盘文件?CSV?文档目录?对话内联内容?
- 操作要求 -- 对每条目执行什么操作?总结?分类?提取?转换?生成?
- 输出格式 -- 结果应是什么格式?JSON?CSV?Markdown?单个文件?
- 输出目标 -- 结果应存储到哪里?单个文件?文件目录?返回至对话?
- 质量要求 -- 是否有验证规则?数据结构?评分标准?
如果任何信息不明确,需先询问用户再继续。任务规格错误会浪费所有Agent的计算资源。
Step 1: Intake and Inventory
步骤1:数据采集与盘点
1a. Discover and Count Items
1a. 发现并统计条目
Locate all data items and count them:
Glob/Bash: Find all files matching the pattern
Read: Sample first 3-5 items to understand structure
Bash: Count total items (wc -l for CSVs, file count for directories, etc.)Report:
undefined定位所有数据条目并统计数量:
Glob/Bash:查找所有匹配模式的文件
Read:采样前3-5条条目以了解结构
Bash:统计总条目数(CSV用wc -l,目录用文件计数等)报告示例:
undefinedIntake Report
采集报告
- Data source: /path/to/data/ (or inline, or CSV)
- Total items found: 1,247
- Item format: JSON files, avg 2KB each
- Sample item structure: { name, email, company, title, linkedin_url }
- Estimated tokens per item: ~500
- Total estimated tokens: ~623,500
undefined- 数据源:/path/to/data/(或内联内容、CSV)
- 发现总条目数:1,247
- 条目格式:JSON文件,平均2KB每个
- 采样条目结构:{ name, email, company, title, linkedin_url }
- 每条目估算令牌数:~500
- 总估算令牌数:~623,500
undefined1b. Detect Schema
1b. 检测数据结构
From the sample items, define the input schema:
json
{
"inputSchema": {
"type": "object",
"fields": {
"name": "string",
"email": "string",
"company": "string",
"title": "string",
"linkedin_url": "string"
}
}
}从采样条目中定义输入数据结构:
json
{
"inputSchema": {
"type": "object",
"fields": {
"name": "string",
"email": "string",
"company": "string",
"title": "string",
"linkedin_url": "string"
}
}
}1c. Define Output Schema
1c. 定义输出数据结构
Based on the task, define what each processed item should look like:
json
{
"outputSchema": {
"type": "object",
"fields": {
"name": "string (from input)",
"company": "string (from input)",
"personalized_first_line": "string (generated, 1-2 sentences)",
"pain_point_guess": "string (inferred from title + company)",
"confidence": "number (0-1)",
"processing_status": "enum: success | failed | skipped"
}
}
}根据任务要求,定义每条处理后条目应有的格式:
json
{
"outputSchema": {
"type": "object",
"fields": {
"name": "string (来自输入)",
"company": "string (来自输入)",
"personalized_first_line": "string (生成内容,1-2句话)",
"pain_point_guess": "string (从职位+公司推断)",
"confidence": "number (0-1)",
"processing_status": "enum: success | failed | skipped"
}
}
}Step 2: Swarm Design
步骤2:集群设计
2a. Calculate Batch Size
2a. 计算批次大小
The batch size determines how many items each agent processes. Factors:
| Factor | Guideline |
|---|---|
| Token budget per item | Input tokens + expected output tokens + instruction overhead |
| Agent context limit | ~200K usable tokens per agent (conservative, leaves room for instructions) |
| Instruction overhead | ~2K tokens for the agent's brief |
| Safety margin | Use 70% of theoretical capacity |
Formula:
usable_tokens = 200,000 * 0.70 = 140,000
tokens_per_item = input_tokens + output_tokens + 100 (overhead)
batch_size = floor(usable_tokens / tokens_per_item)Practical limits:
- Minimum batch size: 5 items (agent overhead is not worth it for fewer)
- Maximum batch size: 200 items (beyond this, agent context gets cluttered)
- Sweet spot: 20-80 items per agent for most text processing tasks
批次大小决定每个Agent处理的条目数量,需考虑以下因素:
| 因素 | 指导原则 |
|---|---|
| 每条目令牌预算 | 输入令牌 + 预期输出令牌 + 指令开销 |
| Agent上下文限制 | 每个Agent约200K可用令牌(保守值,预留指令空间) |
| 指令开销 | Agent任务说明约2K令牌 |
| 安全余量 | 使用理论容量的70% |
计算公式:
usable_tokens = 200,000 * 0.70 = 140,000
tokens_per_item = input_tokens + output_tokens + 100(开销)
batch_size = floor(usable_tokens / tokens_per_item)实际限制:
- 最小批次大小:5条条目(Agent开销不值得处理更少条目)
- 最大批次大小:200条条目(超过此值,Agent上下文会变得混乱)
- 最优区间:大多数文本处理任务中,每个Agent处理20-80条条目
2b. Determine Swarm Size
2b. 确定集群规模
total_items = 1,247
batch_size = 50
swarm_size = ceil(1,247 / 50) = 25 agentsPractical limits:
- Maximum parallel agents: 20 per wave (prevents system overload)
- If swarm_size > 20: Deploy in waves. Wave 1: agents 1-20. Wave 2: agents 21-25.
total_items = 1,247
batch_size = 50
swarm_size = ceil(1,247 / 50) = 25个Agent实际限制:
- 最大并行Agent数:每轮20个(防止系统过载)
- 如果集群规模>20:分轮部署。第一轮:Agent 1-20。第二轮:Agent 21-25。
2c. Present Swarm Plan
2c. 提交集群计划
undefinedundefinedSwarm Plan
集群计划
- Total items: 1,247
- Batch size: 50 items per agent
- Swarm size: 25 agents
- Waves: 2 (Wave 1: 20 agents, Wave 2: 5 agents)
- Estimated processing: All items covered
- Output format: Single CSV file with all results
- 总条目数:1,247
- 批次大小:每个Agent处理50条条目
- 集群规模:25个Agent
- 部署轮次:2轮(第一轮:20个Agent,第二轮:5个Agent)
- 估算覆盖范围:所有条目均会被处理
- 输出格式:包含所有结果的单个CSV文件
Agent Assignments
Agent分配
| Agent | Items | Range | Notes |
|---|---|---|---|
| swarm-01 | 50 | items 1-50 | -- |
| swarm-02 | 50 | items 51-100 | -- |
| ... | ... | ... | -- |
| swarm-25 | 47 | items 1201-1247 | Partial batch |
Proceed? (Y to deploy / N to adjust batch size)
undefined| Agent | 条目数 | 范围 | 备注 |
|---|---|---|---|
| swarm-01 | 50 | 条目1-50 | -- |
| swarm-02 | 50 | 条目51-100 | -- |
| ... | ... | ... | -- |
| swarm-25 | 47 | 条目1201-1247 | 部分批次 |
是否继续?(Y部署 / N调整批次大小)
undefinedStep 3: Prepare Agent Briefs
步骤3:准备Agent任务说明
Each swarm agent receives a self-contained brief. The brief must include:
- Role: "You are a data processing agent in a swarm of 25. Your job is to process items 51-100."
- Task description: Exactly what to do with each item (the operation).
- Input data: The actual data items for this batch (embedded in the brief or read from a file range).
- Output schema: The exact format for results. Include a concrete example.
- Quality rules: Validation criteria, edge case handling, what to do with malformed items.
- Error protocol: "If an item cannot be processed, mark it as failed with a reason. Do not skip items silently."
- Output format: "Return your results as a JSON array. Each element must match the output schema."
每个集群Agent会收到一份独立的任务说明,必须包含以下内容:
- 角色:"你是由25个Agent组成的数据处理集群中的一员,负责处理条目51-100。"
- 任务描述:明确对每条目执行的操作。
- 输入数据:此批次的实际数据条目(嵌入任务说明或从文件范围读取)。
- 输出数据结构:结果的精确格式,包含具体示例。
- 质量规则:验证标准、边缘情况处理、格式错误条目的处理方式。
- 错误协议:"如果某条目无法处理,标记为失败并注明原因。请勿静默跳过条目。"
- 输出格式:"以JSON数组形式返回结果。每个元素必须匹配输出数据结构。"
Agent Brief Template
Agent任务说明模板
You are swarm-agent-{N}, processing batch {N} of {TOTAL_BATCHES} in a data processing swarm.你是swarm-agent-{N},在数据处理集群中处理第{N}批次(共{TOTAL_BATCHES}批次)。Your Task
你的任务
{TASK_DESCRIPTION}
{TASK_DESCRIPTION}
Input Data
输入数据
You will process the following {BATCH_SIZE} items:
{ITEMS_AS_STRUCTURED_DATA}
你将处理以下{BATCH_SIZE}条条目:
{ITEMS_AS_STRUCTURED_DATA}
Output Schema
输出数据结构
For each item, produce a result matching this schema:
{OUTPUT_SCHEMA_WITH_EXAMPLE}
对每条条目,生成符合以下结构的结果:
{OUTPUT_SCHEMA_WITH_EXAMPLE}
Example
示例
Input:
{EXAMPLE_INPUT}
Expected output:
{EXAMPLE_OUTPUT}
输入:
{EXAMPLE_INPUT}
预期输出:
{EXAMPLE_OUTPUT}
Quality Rules
质量规则
- {RULE_1}
- {RULE_2}
- {RULE_3}
- Every item MUST appear in your output, even if processing failed.
- For failed items, set processing_status to "failed" and include an error_reason field.
- {RULE_1}
- {RULE_2}
- {RULE_3}
- 每条条目必须出现在输出中,即使处理失败。
- 对于失败条目,将processing_status设置为"failed"并添加error_reason字段。
Error Handling
错误处理
- Malformed input: Mark as "failed", reason: "malformed input: {description}"
- Ambiguous data: Make your best judgment, set confidence to < 0.5
- Missing required field: Mark as "skipped", reason: "missing {field}"
- 格式错误输入:标记为"failed",原因:"malformed input: {description}"
- 模糊数据:做出最佳判断,将confidence设置为< 0.5
- 缺失必填字段:标记为"skipped",原因:"missing {field}"
Output Format
输出格式
Return a JSON object with this structure:
{
"agentId": "swarm-agent-{N}",
"batchRange": "{START}-{END}",
"totalProcessed": {NUMBER},
"totalSuccess": {NUMBER},
"totalFailed": {NUMBER},
"totalSkipped": {NUMBER},
"results": [
{OUTPUT_SCHEMA_ITEM_1},
{OUTPUT_SCHEMA_ITEM_2},
...
],
"errors": [
{"itemIndex": N, "reason": "description"}
],
"notes": "Any observations about the data quality or patterns"
}
undefined返回具有以下结构的JSON对象:
{
"agentId": "swarm-agent-{N}",
"batchRange": "{START}-{END}",
"totalProcessed": {NUMBER},
"totalSuccess": {NUMBER},
"totalFailed": {NUMBER},
"totalSkipped": {NUMBER},
"results": [
{OUTPUT_SCHEMA_ITEM_1},
{OUTPUT_SCHEMA_ITEM_2},
...
],
"errors": [
{"itemIndex": N, "reason": "description"}
],
"notes": "关于数据质量或模式的任何观察结果"
}
undefinedStep 4: Deploy Swarm
步骤4:部署集群
4a. Wave Deployment
4a. 分轮部署
Deploy agents in waves to manage system load:
Wave 1: Launch up to 20 agents in parallel using the Agent tool with . Send ALL agent calls in a single message.
run_in_background: trueLaunch swarm-agent-01 through swarm-agent-20 in parallel.
Each receives its batch of items and the standardized brief.Wave 2+ (if needed): After Wave 1 completes, launch remaining agents.
分轮部署Agent以管理系统负载:
第一轮:使用Agent工具并行启动最多20个Agent,设置。在单条消息中发送所有Agent调用指令。
run_in_background: true并行启动swarm-agent-01至swarm-agent-20。
每个Agent将收到对应的批次条目和标准化任务说明。后续轮次(如有需要):第一轮完成后,启动剩余Agent。
4b. Data Distribution
4b. 数据分发
How to get data to each agent depends on the data source:
| Source Type | Distribution Method |
|---|---|
| Directory of files | Tell each agent which file paths to Read |
| Single large CSV | Pre-split into batch files using Bash, tell each agent its file |
| Single JSON array | Pre-split into batch files using Bash |
| Inline data | Embed directly in the agent brief (for small datasets) |
| Database export | Export to CSV first, then split |
For CSVs and large files, pre-split before deploying:
bash
undefined向每个Agent传递数据的方式取决于数据源类型:
| 数据源类型 | 分发方式 |
|---|---|
| 文件目录 | 告知每个Agent需要读取的文件路径 |
| 大型CSV文件 | 使用Bash预先拆分为批次文件,告知每个Agent对应的文件 |
| 大型JSON数组 | 使用Bash预先拆分为批次文件 |
| 内联数据 | 直接嵌入Agent任务说明(适用于小型数据集) |
| 数据库导出 | 先导出为CSV,再拆分 |
对于CSV和大型文件,部署前需预先拆分:
bash
undefinedSplit a CSV into batches of 50 rows (preserving header)
将CSV拆分为50行的批次(保留表头)
head -1 data.csv > header.csv
tail -n +2 data.csv | split -l 50 - batch_
for f in batch_*; do cat header.csv "$f" > "batches/${f}.csv" && rm "$f"; done
rm header.csv
undefinedhead -1 data.csv > header.csv
tail -n +2 data.csv | split -l 50 - batch_
for f in batch_*; do cat header.csv "$f" > "batches/${f}.csv" && rm "$f"; done
rm header.csv
undefined4c. Progress Tracking
4c. 进度追踪
As agents complete, track progress:
undefinedAgent完成任务后,追踪进度:
undefinedSwarm Progress
集群进度
Wave 1 (20 agents):
[################----] 16/20 complete
| Agent | Status | Processed | Success | Failed | Duration |
|---|---|---|---|---|---|
| swarm-01 | DONE | 50 | 48 | 2 | 45s |
| swarm-02 | DONE | 50 | 50 | 0 | 38s |
| swarm-03 | RUNNING | -- | -- | -- | -- |
| ... | ... | ... | ... | ... | ... |
Cumulative: 800/1247 items processed (64%)
undefined第一轮(20个Agent):
[################----] 16/20 已完成
| Agent | 状态 | 已处理 | 成功 | 失败 | 耗时 |
|---|---|---|---|---|---|
| swarm-01 | 已完成 | 50 | 48 | 2 | 45s |
| swarm-02 | 已完成 | 50 | 50 | 0 | 38s |
| swarm-03 | 运行中 | -- | -- | -- | -- |
| ... | ... | ... | ... | ... | ... |
累计:800/1247条条目已处理(64%)
undefinedStep 5: Collect and Aggregate Results
步骤5:收集与聚合结果
5a. Gather Agent Outputs
5a. 收集Agent输出
As each agent completes, collect its JSON output. Parse and validate:
- Schema validation -- Does each result match the output schema?
- Completeness check -- Does the agent's result count match its batch size?
- Duplicate detection -- Check for duplicate item IDs across agents.
- Error extraction -- Pull out all failed/skipped items for the retry queue.
每个Agent完成后,收集其JSON输出并解析验证:
- 数据结构验证 -- 每个结果是否匹配输出数据结构?
- 完整性检查 -- Agent返回的结果数量是否与批次大小一致?
- 重复检测 -- 检查不同Agent之间是否存在重复条目ID。
- 错误提取 -- 提取所有失败/跳过的条目至重试队列。
5b. Merge Results
5b. 合并结果
Combine all agent results into a single unified output:
python
undefined将所有Agent的结果合并为统一输出:
python
undefinedConceptual merge logic
概念性合并逻辑
merged_results = []
failed_items = []
skipped_items = []
for agent_output in all_agent_outputs:
for result in agent_output["results"]:
if result["processing_status"] == "success":
merged_results.append(result)
elif result["processing_status"] == "failed":
failed_items.append(result)
elif result["processing_status"] == "skipped":
skipped_items.append(result)
merged_results = []
failed_items = []
skipped_items = []
for agent_output in all_agent_outputs:
for result in agent_output["results"]:
if result["processing_status"] == "success":
merged_results.append(result)
elif result["processing_status"] == "failed":
failed_items.append(result)
elif result["processing_status"] == "skipped":
skipped_items.append(result)
Sort by original item order
按原始条目顺序排序
merged_results.sort(key=lambda x: x["original_index"])
undefinedmerged_results.sort(key=lambda x: x["original_index"])
undefined5c. Validate Completeness
5c. 验证完整性
undefinedundefinedAggregation Report
聚合报告
- Total items in input: 1,247
- Total results received: 1,247
- Successful: 1,198 (96.1%)
- Failed: 34 (2.7%)
- Skipped: 15 (1.2%)
- 输入总条目数:1,247
- 收到的总结果数:1,247
- 成功:1,198(96.1%)
- 失败:34(2.7%)
- 跳过:15(1.2%)
Coverage Check
覆盖检查
- Items with results: 1,247 / 1,247 (100% coverage)
- Missing items: 0
- Duplicate results: 0
- 有结果的条目:1,247 / 1,247(100%覆盖)
- 缺失条目:0
- 重复结果:0
Failure Analysis
失败分析
| Failure Reason | Count |
|---|---|
| Malformed input | 12 |
| Missing required field | 8 |
| Ambiguous data | 14 |
Proceed to retry failed items? (Y / skip / manual review)
undefined| 失败原因 | 数量 |
|---|---|
| 格式错误输入 | 12 |
| 缺失必填字段 | 8 |
| 模糊数据 | 14 |
是否重试失败条目?(Y / 跳过 / 人工审核)
undefinedStep 6: Error Recovery
步骤6:错误恢复
6a. Retry Queue
6a. 重试队列
Collect all failed and skipped items into a retry batch:
Retry batch: 49 items (34 failed + 15 skipped)
Retry strategy: Single agent with enhanced instructions将所有失败和跳过的条目收集到重试批次:
重试批次:49条条目(34条失败 + 15条跳过)
重试策略:单个Agent搭配增强版任务说明6b. Enhanced Retry Brief
6b. 增强版重试任务说明
The retry agent gets special instructions:
You are the retry agent. These items failed or were skipped in the first pass.
For each item, I am providing the original item AND the failure reason from the first attempt.
Your job:
1. Try harder -- use more creative interpretation for ambiguous items
2. For truly malformed items, extract whatever you can and note what is missing
3. For items that failed due to missing fields, infer the field if possible or mark as "unrecoverable"
The bar is lower for retries: partial results are better than no results.重试Agent会收到特殊指令:
你是重试Agent。这些条目在第一次处理中失败或被跳过。
对于每条条目,我会提供原始条目以及第一次尝试的失败原因。
你的任务:
1. 尝试更灵活的处理方式——对模糊条目使用更具创造性的解读
2. 对于确实格式错误的条目,提取尽可能多的信息并注明缺失内容
3. 对于因缺失字段失败的条目,尽可能推断字段内容,否则标记为"unrecoverable"
重试的标准更低:部分结果优于无结果。6c. Retry Limits
6c. 重试限制
- Maximum retries: 2 (original attempt + 2 retries = 3 total attempts)
- After max retries: Mark item as "unrecoverable" and include in the final report
- Unrecoverable threshold: If > 10% of items are unrecoverable, flag to the user for manual review
- 最大重试次数:2次(原始尝试 + 2次重试 = 共3次尝试)
- 达到最大重试次数后:将条目标记为"unrecoverable"并纳入最终报告
- 不可恢复阈值:如果超过10%的条目不可恢复,需向用户标记并建议人工审核
Step 7: Write Final Output
步骤7:写入最终输出
Based on the user's requested output format:
根据用户要求的输出格式执行:
CSV Output
CSV输出
bash
undefinedbash
undefinedWrite header + all successful results as CSV
写入表头 + 所有成功结果为CSV
Include a separate failures.csv for failed items
为失败条目单独创建failures.csv
undefinedundefinedJSON Output
JSON输出
json
{
"metadata": {
"task": "description of what was processed",
"totalItems": 1247,
"successfulItems": 1210,
"failedItems": 22,
"unrecoverableItems": 15,
"processingDate": "2026-04-10T12:00:00Z",
"swarmSize": 25,
"waves": 2
},
"results": [...],
"failures": [...],
"summary": {
"key_patterns": "...",
"notable_findings": "...",
"data_quality_notes": "..."
}
}json
{
"metadata": {
"task": "处理任务描述",
"totalItems": 1247,
"successfulItems": 1210,
"failedItems": 22,
"unrecoverableItems": 15,
"processingDate": "2026-04-10T12:00:00Z",
"swarmSize": 25,
"waves": 2
},
"results": [...],
"failures": [...],
"summary": {
"key_patterns": "...",
"notable_findings": "...",
"data_quality_notes": "..."
}
}Markdown Report
Markdown报告
markdown
undefinedmarkdown
undefinedProcessing Results: [Task Name]
处理结果:[任务名称]
Summary
摘要
- Processed: 1,247 items
- Success rate: 97%
- Key findings: [aggregated insights]
- 已处理:1,247条条目
- 成功率:97%
- 关键发现:[聚合洞察]
Results Table
结果表格
| Item | Result Field 1 | Result Field 2 | Status |
|---|---|---|---|
| ... | ... | ... | ... |
| 条目 | 结果字段1 | 结果字段2 | 状态 |
|---|---|---|---|
| ... | ... | ... | ... |
Failures
失败条目
| Item | Reason | Attempted Retries |
|---|---|---|
| ... | ... | ... |
undefined| 条目 | 原因 | 重试次数 |
|---|---|---|
| ... | ... | ... |
undefinedIndividual Files
单个文件输出
For tasks where each item produces a standalone document (e.g., generating 500 blog post outlines):
output/
001-item-name.md
002-item-name.md
...
500-item-name.md
_summary.md
_failures.md对于每条目生成独立文档的任务(例如,生成500篇博客文章大纲):
output/
001-item-name.md
002-item-name.md
...
500-item-name.md
_summary.md
_failures.mdStep 8: Summary Report
步骤8:汇总报告
Always end with a comprehensive summary:
undefined最终需提供全面的汇总报告:
undefinedSwarm Processing Complete
集群处理完成
Execution Summary
执行摘要
- Task: [description]
- Data source: [source]
- Total items: 1,247
- Swarm size: 25 agents across 2 waves
- Total processing time: ~3 minutes
- 任务:[描述]
- 数据源:[来源]
- 总条目数:1,247
- 集群规模:25个Agent,分2轮部署
- 总处理时间:约3分钟
Results
结果
- Successful: 1,210 (97.0%)
- Failed (recovered on retry): 22 (1.8%)
- Unrecoverable: 15 (1.2%)
- Output written to: [path]
- 成功:1,210(97.0%)
- 失败(重试后恢复):22(1.8%)
- 不可恢复:15(1.2%)
- 输出已写入:[路径]
Quality Metrics
质量指标
- Schema compliance: 100% of successful results match output schema
- Average confidence: 0.82
- Items flagged for review: 37 (low confidence < 0.5)
- 数据结构合规性:100%的成功结果匹配输出数据结构
- 平均置信度:0.82
- 标记需审核的条目:37条(置信度<0.5)
Patterns Observed
观察到的模式
- [Any interesting patterns the swarm noticed across the data]
- [Data quality issues found]
- [Recommendations for future processing]
- [集群在数据中发现的任何有趣模式]
- [发现的数据质量问题]
- [未来处理的建议]
Cost
成本
- Agents deployed: 25 (Wave 1) + 1 (retry) = 26
- Estimated tokens consumed: ~1.2M input + ~400K output
undefined- 部署的Agent数量:25(第一轮) + 1(重试) = 26
- 估算消耗令牌数:~1.2M输入 + ~400K输出
undefinedSwarm Configurations for Common Tasks
常见任务的集群配置
Sentiment Analysis (1000 reviews)
情感分析(1000条评论)
yaml
task: Classify each review as positive/negative/neutral with confidence score
batch_size: 100 # Reviews are short, pack more per agent
swarm_size: 10
output_schema:
review_id: string
sentiment: enum(positive, negative, neutral, mixed)
confidence: float(0-1)
key_phrases: string[]
summary: string(1 sentence)yaml
task: 将每条评论分类为正面/负面/中性并给出置信度评分
batch_size: 100 # 评论内容较短,每个Agent可处理更多条目
swarm_size: 10
output_schema:
review_id: string
sentiment: enum(positive, negative, neutral, mixed)
confidence: float(0-1)
key_phrases: string[]
summary: string(1句话)Lead Scoring (2000 contacts)
销售线索评分(2000个联系人)
yaml
task: Score each lead 1-100 based on ICP fit criteria
batch_size: 40 # Each lead needs more analysis context
swarm_size: 50
waves: 3
output_schema:
lead_id: string
score: int(1-100)
icp_match: object
company_size: bool
industry: bool
tech_stack: bool
title_seniority: bool
buying_signals: string[]
recommended_action: enum(hot, warm, nurture, disqualify)yaml
task: 根据ICP匹配标准为每条线索打1-100分
batch_size: 40 # 每条线索需要更多分析上下文
swarm_size: 50
waves: 3
output_schema:
lead_id: string
score: int(1-100)
icp_match: object
company_size: bool
industry: bool
tech_stack: bool
title_seniority: bool
buying_signals: string[]
recommended_action: enum(hot, warm, nurture, disqualify)Content Generation (500 product descriptions)
内容生成(500个产品描述)
yaml
task: Write a 100-word product description from product data
batch_size: 25 # Output is longer, needs more generation tokens
swarm_size: 20
output_schema:
product_id: string
title: string
description: string(100 words)
key_features: string[3]
seo_keywords: string[5]yaml
task: 根据产品数据撰写100字的产品描述
batch_size: 25 # 输出内容较长,需要更多生成令牌
swarm_size: 20
output_schema:
product_id: string
title: string
description: string(100字)
key_features: string[3]
seo_keywords: string[5]Document Summarization (300 papers)
文档总结(300篇论文)
yaml
task: Summarize each paper in 3 bullet points with key findings
batch_size: 15 # Papers are long, fewer per agent
swarm_size: 20
output_schema:
paper_id: string
title: string
summary_bullets: string[3]
key_finding: string
methodology: string
relevance_score: float(0-1)yaml
task: 用3个要点总结每篇论文的关键发现
batch_size: 15 # 论文内容较长,每个Agent处理更少条目
swarm_size: 20
output_schema:
paper_id: string
title: string
summary_bullets: string[3]
key_finding: string
methodology: string
relevance_score: float(0-1)Error Handling
错误处理
| Error | Cause | Response |
|---|---|---|
| Agent returns no output | Agent timeout or crash | Re-deploy that batch with a fresh agent |
| Agent returns partial results | Context overflow or mid-processing failure | Identify processed items, re-deploy unprocessed items |
| Agent returns malformed JSON | Output parsing failure | Attempt to extract results from raw text, re-deploy if impossible |
| Duplicate results across agents | Batch overlap miscalculation | Deduplicate by item ID, keep first occurrence |
| All agents fail | Systemic issue (bad brief, impossible task) | Abort, report to user, suggest task reformulation |
| Retry agent also fails | Item is truly unprocessable | Mark as unrecoverable, include raw input in failure report |
| Data source unavailable | File missing, permission denied | Abort before deploying swarm, report to user |
| Output file write fails | Disk space, permissions | Attempt alternative location, or return results in conversation |
| 错误 | 原因 | 应对措施 |
|---|---|---|
| Agent无输出返回 | Agent超时或崩溃 | 使用新Agent重新部署该批次 |
| Agent返回部分结果 | 上下文溢出或处理中途失败 | 识别已处理条目,重新部署未处理条目 |
| Agent返回格式错误的JSON | 输出解析失败 | 尝试从原始文本提取结果,若无法则重新部署 |
| 不同Agent返回重复结果 | 批次重叠计算错误 | 按条目ID去重,保留第一个结果 |
| 所有Agent失败 | 系统性问题(任务说明错误、任务不可完成) | 中止,向用户报告,建议重新制定任务 |
| 重试Agent也失败 | 条目确实无法处理 | 标记为不可恢复,在失败报告中包含原始输入 |
| 数据源不可用 | 文件缺失、权限不足 | 部署集群前中止,向用户报告 |
| 输出文件写入失败 | 磁盘空间不足、权限问题 | 尝试备用位置,或在对话中返回结果 |
Scaling Guidelines
扩展指南
| Total Items | Recommended Batch Size | Swarm Size | Waves | Notes |
|---|---|---|---|---|
| 10-50 | 10-25 | 2-5 | 1 | Small job, minimal overhead |
| 50-200 | 25-50 | 4-10 | 1 | Standard processing |
| 200-500 | 40-60 | 5-15 | 1 | Moderate scale |
| 500-1000 | 50-80 | 10-20 | 1-2 | Large scale, may need waves |
| 1000-5000 | 50-100 | 20+ | 2-5 | Multi-wave deployment |
| 5000+ | 100-200 | 50+ | 5+ | Enterprise scale, consider chunking |
| 总条目数 | 推荐批次大小 | 集群规模 | 轮次 | 备注 |
|---|---|---|---|---|
| 10-50 | 10-25 | 2-5 | 1 | 小型任务,开销最小 |
| 50-200 | 25-50 | 4-10 | 1 | 标准处理 |
| 200-500 | 40-60 | 5-15 | 1 | 中等规模 |
| 500-1000 | 50-80 | 10-20 | 1-2 | 大规模,可能需要分轮 |
| 1000-5000 | 50-100 | 20+ | 2-5 | 多轮部署 |
| 5000+ | 100-200 | 50+ | 5+ | 企业级规模,考虑分块处理 |
Anti-Patterns to Avoid
需避免的反模式
- Do not use swarm for sequential tasks -- If item N depends on the result of item N-1, a swarm is the wrong tool. Use a chain instead.
- Do not deploy 100 agents for 100 items -- One item per agent wastes overhead. Batch them.
- Do not skip the schema definition -- Without a schema, merging results from 25 agents becomes a nightmare.
- Do not ignore failures -- Even at 99% success rate, 1% of 10,000 items is 100 failures. Always run retries.
- Do not deploy without a sample run -- Process 5 items manually first to validate the task definition and output quality before scaling.
- 不要将集群用于顺序任务 -- 如果条目N依赖于条目N-1的结果,集群不是合适的工具。应使用链式处理。
- 不要为100条条目部署100个Agent -- 每个Agent处理一条条目会浪费开销。应进行批次处理。
- 不要跳过数据结构定义 -- 没有数据结构,合并25个Agent的结果会变得异常困难。
- 不要忽略失败条目 -- 即使成功率99%,10,000条条目中的1%也是100条失败条目。务必执行重试。
- 不要在未进行样本测试的情况下部署 -- 先手动处理5条条目以验证任务定义和输出质量,再进行扩展。