massgen-log-analyzer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMassGen Log Analyzer
MassGen 日志分析器
This skill provides a structured workflow for running MassGen experiments and analyzing the resulting traces and logs using Logfire.
本技能提供了结构化的工作流,用于运行MassGen实验并使用Logfire分析生成的追踪数据和日志。
Purpose
用途
The log-analyzer skill helps you:
- Run MassGen experiments with proper instrumentation
- Query and analyze traces hierarchically
- Debug agent behavior and coordination patterns
- Measure performance and identify bottlenecks
- Improve the logging structure itself
- Generate markdown analysis reports saved to the log directory
日志分析器技能可帮助您:
- 运行带有合适埋点的MassGen实验
- 分层查询和分析追踪数据
- 调试Agent行为与协调模式
- 衡量性能并识别瓶颈
- 优化日志结构本身
- 生成Markdown分析报告并保存到日志目录
CLI Quick Reference
CLI快速参考
The CLI provides quick access to log analysis:
massgen logsmassgen logsList Logs with Analysis Status
查看带分析状态的日志列表
bash
uv run massgen logs list # Show all recent logs with analysis status
uv run massgen logs list --analyzed # Only logs with ANALYSIS_REPORT.md
uv run massgen logs list --unanalyzed # Only logs needing analysis
uv run massgen logs list --limit 20 # Show more logsbash
uv run massgen logs list # 显示所有近期日志及分析状态
uv run massgen logs list --analyzed # 仅显示已生成ANALYSIS_REPORT.md的日志
uv run massgen logs list --unanalyzed # 仅显示待分析的日志
uv run massgen logs list --limit 20 # 显示更多日志Generate Analysis Prompt
生成分析提示
bash
undefinedbash
undefinedRun from within your coding CLI (e.g., Claude Code) so it sees output
在代码CLI(如Claude Code)中运行,以便查看输出
uv run massgen logs analyze # Analyze latest turn of latest log
uv run massgen logs analyze --log-dir PATH # Analyze specific log
uv run massgen logs analyze --turn 1 # Analyze specific turn
The prompt output tells your coding CLI to use this skill on the specified log directory.uv run massgen logs analyze # 分析最新日志的最新轮次
uv run massgen logs analyze --log-dir PATH # 分析指定日志
uv run massgen logs analyze --turn 1 # 分析指定轮次
提示输出会告知您的代码CLI要对指定日志目录使用本技能。Multi-Agent Self-Analysis
多Agent自分析
bash
uv run massgen logs analyze --mode self # Run 3-agent analysis team (prompts if report exists)
uv run massgen logs analyze --mode self --force # Overwrite existing report without prompting
uv run massgen logs analyze --mode self --turn 2 # Analyze specific turn
uv run massgen logs analyze --mode self --config PATH # Use custom configSelf-analysis mode runs MassGen with multiple agents to analyze logs from different perspectives (correctness, efficiency, behavior) and produces a combined ANALYSIS_REPORT.md.
bash
uv run massgen logs analyze --mode self # 运行3-Agent分析团队(若报告已存在会提示)
uv run massgen logs analyze --mode self --force # 直接覆盖现有报告,不提示
uv run massgen logs analyze --mode self --turn 2 # 分析指定轮次
uv run massgen logs analyze --mode self --config PATH # 使用自定义配置自分析模式会运行MassGen多Agent团队,从不同维度(正确性、效率、行为)分析日志,并生成合并后的ANALYSIS_REPORT.md。
Multi-Turn Sessions
多轮会话
MassGen log directories support multiple turns (coordination sessions). Each turn has its own directory with attempts inside:
turn_N/text
log_YYYYMMDD_HHMMSS/
├── turn_1/ # First coordination session
│ ├── ANALYSIS_REPORT.md # Report for turn 1
│ ├── attempt_1/ # First attempt
│ └── attempt_2/ # Retry if orchestration restarted
├── turn_2/ # Second coordination session (if multi-turn)
│ ├── ANALYSIS_REPORT.md # Report for turn 2
│ └── attempt_1/When analyzing, the flag specifies which turn to analyze. Without it, the latest turn is analyzed.
--turnMassGen日志目录支持多轮次(协调会话)。每个轮次都有独立的目录,内部包含多次尝试:
turn_N/text
log_YYYYMMDD_HHMMSS/
├── turn_1/ # 第一次协调会话
│ ├── ANALYSIS_REPORT.md # 第一轮次的分析报告
│ ├── attempt_1/ # 第一次尝试
│ └── attempt_2/ # 若编排重启则重试
├── turn_2/ # 第二轮协调会话(多轮场景下)
│ ├── ANALYSIS_REPORT.md # 第二轮次的分析报告
│ └── attempt_1/分析时,参数指定要分析的轮次。若未指定,则分析最新轮次。
--turnWhen to Use Logfire vs Local Logs
何时使用Logfire vs 本地日志
Use Local Log Files When:
- Analyzing command patterns and repetition (commands are in )
streaming_debug.log - Checking detailed tool arguments and outputs (in )
coordination_events.json - Reading vote reasoning and agent decisions (in )
agent_*/*/vote.json - Viewing the coordination flow table (in )
coordination_table.txt - Getting cost/token summaries (in )
metrics_summary.json
Use Logfire When:
- You need precise timing data with millisecond accuracy
- Analyzing span hierarchy and parent-child relationships
- Finding exceptions and error stack traces
- Creating shareable trace links for collaboration
- Querying across multiple sessions (e.g., "find all sessions with errors")
- Real-time monitoring of running experiments
Rate Limiting: If Logfire returns a rate limit error, wait up to 60 seconds and retry rather than falling back to local logs. The rate limit resets quickly and Logfire data is worth waiting for when timing/hierarchy analysis is needed.
Key Local Log Files:
| File | Contains |
|---|---|
| Real-time status with agent reliability metrics (enforcement events, buffer loss) |
| Cost, tokens, tool stats, round history |
| Full event timeline with tool calls |
| Human-readable coordination flow |
| Raw streaming data including command strings |
| Vote reasoning and context |
| Full tool calls, arguments, results, and reasoning - invaluable for debugging |
| Config and session metadata |
Execution Traces ():
These are the most detailed debug artifacts. Each agent snapshot includes an execution trace with:
execution_trace.md- Complete tool calls with full arguments (not truncated)
- Full tool results (not truncated)
- Reasoning/thinking blocks from the model
- Timestamps and round markers
Use execution traces when you need to understand exactly what an agent did and why - they capture everything the agent saw and produced during that answer/vote iteration.
Enforcement Reliability ():
The file includes per-agent reliability metrics that track workflow enforcement events:
status.jsonstatus.jsonjson
{
"agents": {
"agent_a": {
"reliability": {
"enforcement_attempts": [
{
"round": 0,
"attempt": 1,
"max_attempts": 3,
"reason": "no_workflow_tool",
"tool_calls": ["search", "read_file"],
"error_message": "Must use workflow tools",
"buffer_preview": "First 500 chars of lost content...",
"buffer_chars": 1500,
"timestamp": 1736683468.123
}
],
"by_round": {"0": {"count": 2, "reasons": ["no_workflow_tool", "invalid_vote_id"]}},
"unknown_tools": ["execute_command"],
"workflow_errors": ["invalid_vote_id"],
"total_enforcement_retries": 2,
"total_buffer_chars_lost": 3000,
"outcome": "ok"
}
}
}
}Enforcement Reason Codes:
| Reason | Description |
|---|---|
| Agent called tools but none were |
| Agent provided text-only response, no tools called |
| Agent voted for non-existent agent ID |
| Agent tried to vote when no answers exist |
| Agent used both |
| Agent hit max answer count limit |
| Answer too similar to existing answers |
| Exact duplicate of existing answer |
| API/streaming error (e.g., "peer closed connection") |
| API stream ended early, recovered with preserved context |
| MCP server disconnected mid-session (e.g., "Server 'X' not connected") |
This data is invaluable for understanding why agents needed retries and how much content was lost due to enforcement restarts.
使用本地日志文件的场景:
- 分析命令模式与重复执行情况(命令记录在)
streaming_debug.log - 查看详细的工具参数与输出(在中)
coordination_events.json - 读取投票推理过程与Agent决策(在中)
agent_*/*/vote.json - 查看协调流程表(在中)
coordination_table.txt - 获取成本/Token汇总(在中)
metrics_summary.json
使用Logfire的场景:
- 需要毫秒级精度的精准时序数据
- 分析Span层级与父子关系
- 查找异常与错误堆栈跟踪
- 创建可共享的追踪链接用于协作
- 跨多个会话查询(例如:"查找所有包含错误的会话")
- 实时监控运行中的实验
速率限制: 如果Logfire返回速率限制错误,请等待60秒后重试,不要直接切换到本地日志。速率限制会快速重置,当需要时序/层级分析时,Logfire的数据值得等待。
关键本地日志文件:
| 文件 | 包含内容 |
|---|---|
| 实时状态,包含Agent可靠性指标(强制执行事件、缓冲区丢失) |
| 成本、Token、工具统计、轮次历史 |
| 完整事件时间线,包含工具调用记录 |
| 人类可读的协调流程 |
| 原始流数据,包含命令字符串 |
| 投票推理过程与上下文 |
| 完整的工具调用、参数、结果与推理过程 - 调试必备 |
| 配置与会话元数据 |
执行追踪():
这是最详细的调试工件。每个Agent快照都包含执行追踪,其中包括:
execution_trace.md- 完整的工具调用及参数(无截断)
- 完整的工具结果(无截断)
- 模型的推理/思考块
- 时间戳与轮次标记
当您需要准确了解Agent做了什么以及为什么这么做时,请使用执行追踪 - 它们捕获了Agent在该回答/投票迭代中看到和生成的所有内容。
强制执行可靠性():
文件包含每个Agent的可靠性指标,用于跟踪工作流强制执行事件:
status.jsonstatus.jsonjson
{
"agents": {
"agent_a": {
"reliability": {
"enforcement_attempts": [
{
"round": 0,
"attempt": 1,
"max_attempts": 3,
"reason": "no_workflow_tool",
"tool_calls": ["search", "read_file"],
"error_message": "必须使用工作流工具",
"buffer_preview": "丢失内容的前500个字符...",
"buffer_chars": 1500,
"timestamp": 1736683468.123
}
],
"by_round": {"0": {"count": 2, "reasons": ["no_workflow_tool", "invalid_vote_id"]}},
"unknown_tools": ["execute_command"],
"workflow_errors": ["invalid_vote_id"],
"total_enforcement_retries": 2,
"total_buffer_chars_lost": 3000,
"outcome": "ok"
}
}
}
}强制执行原因代码:
| 原因 | 描述 |
|---|---|
| Agent调用了工具,但未使用 |
| Agent仅提供了文本响应,未调用任何工具 |
| Agent为不存在的Agent ID投票 |
| Agent在无可用答案时尝试投票 |
| Agent在同一响应中同时使用了 |
| Agent达到了最大答案数量限制 |
| 答案与现有答案过于相似 |
| 与现有答案完全重复 |
| API/流错误(例如:"peer closed connection") |
| API流提前结束,已通过保留上下文恢复 |
| MCP服务器在会话中途断开连接(例如:"Server 'X' not connected") |
这些数据对于理解Agent为何需要重试以及由于强制执行重启导致多少内容丢失至关重要。
Logfire Setup
Logfire 设置
Before using this skill, you need to set up Logfire for observability.
使用本技能前,您需要设置Logfire以实现可观测性。
Step 1: Install MassGen with Observability Support
步骤1:安装带可观测性支持的MassGen
bash
pip install "massgen[observability]"bash
pip install "massgen[observability]"Or with uv
或使用uv
uv pip install "massgen[observability]"
undefineduv pip install "massgen[observability]"
undefinedStep 2: Create a Logfire Account
步骤2:创建Logfire账户
Go to https://logfire.pydantic.dev/ and create a free account.
访问 https://logfire.pydantic.dev/ 并创建免费账户。
Step 3: Authenticate with Logfire
步骤3:Logfire身份验证
bash
undefinedbash
undefinedThis creates ~/.logfire/credentials.json
此命令会创建 ~/.logfire/credentials.json
uv run logfire auth
uv run logfire auth
Or set the token directly as an environment variable
或直接将令牌设置为环境变量
export LOGFIRE_TOKEN=your_token_here
undefinedexport LOGFIRE_TOKEN=your_token_here
undefinedStep 4: Get Your Read Token for the MCP Server
步骤4:获取MCP服务器的读取令牌
- Go to https://logfire.pydantic.dev/ and log in
- Navigate to your project settings
- Create a Read Token (this is different from the write token used for authentication)
- Copy the token for use in Step 5
- 访问 https://logfire.pydantic.dev/ 并登录
- 导航到项目设置
- 创建一个读取令牌(这与用于身份验证的写入令牌不同)
- 复制令牌用于步骤5
Step 5: Add the Logfire MCP Server
步骤5:添加Logfire MCP服务器
bash
claude mcp add logfire -e LOGFIRE_READ_TOKEN="your-read-token-here" -- uvx logfire-mcp@latestThen restart Claude Code and re-invoke this skill.
bash
claude mcp add logfire -e LOGFIRE_READ_TOKEN="your-read-token-here" -- uvx logfire-mcp@latest然后重启Claude Code并重新调用本技能。
Prerequisites
前置条件
Logfire MCP Server (Optional but Recommended):
The Logfire MCP server provides enhanced analysis with precise timing data and cross-session queries. If is not set, self-analysis mode will automatically disable the Logfire MCP and fall back to local log files only.
LOGFIRE_READ_TOKENWhen configured, the MCP server provides these tools:
- - Run SQL queries against logfire data
mcp__logfire__arbitrary_query - - Get the database schema
mcp__logfire__schema_reference - - Find exceptions in a file
mcp__logfire__find_exceptions_in_file - - Create links to traces in the UI
mcp__logfire__logfire_link
Required Flags:
- - Clean output for programmatic parsing -- see
--automationskill for more info on this flagmassgen-develops-massgen - - Enable Logfire tracing (optional, but required to populate Logfire data)
--logfire
Logfire MCP服务器(可选但推荐):
Logfire MCP服务器提供增强的分析功能,包括精准时序数据和跨会话查询。如果未设置,自分析模式会自动禁用Logfire MCP,仅回退到本地日志文件。
LOGFIRE_READ_TOKEN配置完成后,MCP服务器提供以下工具:
- - 对Logfire数据运行SQL查询
mcp__logfire__arbitrary_query - - 获取数据库 schema
mcp__logfire__schema_reference - - 在文件中查找异常
mcp__logfire__find_exceptions_in_file - - 在UI中创建追踪链接
mcp__logfire__logfire_link
必填参数:
- - 用于程序解析的简洁输出 - 有关此参数的更多信息,请查看
--automation技能massgen-develops-massgen - - 启用Logfire追踪(可选,但需要此参数来填充Logfire数据)
--logfire
Part 1: Running MassGen Experiments
第一部分:运行MassGen实验
Basic Command Format
基本命令格式
bash
uv run massgen --automation --logfire --config [config_file] "[question]"bash
uv run massgen --automation --logfire --config [config_file] "[question]"Running in Background (Recommended)
后台运行(推荐)
Use (or however you run tasks in the background) to run experiments asynchronously so you can monitor progress and end early if needed.
run_in_background: trueExpected Output (first lines):
LOG_DIR: .massgen/massgen_logs/log_YYYYMMDD_HHMMSS_ffffff
STATUS: .massgen/massgen_logs/log_YYYYMMDD_HHMMSS_ffffff/turn_1/attempt_1/status.json
QUESTION: Your task here
[Coordination in progress - monitor status.json for real-time updates]Parse the LOG_DIR - you'll need this for file-based analysis!
使用(或您常用的后台任务运行方式)异步运行实验,以便您可以监控进度并在需要时提前结束。
run_in_background: true预期输出(前几行):
LOG_DIR: .massgen/massgen_logs/log_YYYYMMDD_HHMMSS_ffffff
STATUS: .massgen/massgen_logs/log_YYYYMMDD_HHMMSS_ffffff/turn_1/attempt_1/status.json
QUESTION: 您的任务内容
[协调进行中 - 监控status.json获取实时更新]记录LOG_DIR - 基于文件的分析需要用到此路径!
Monitoring Progress
监控进度
status.jsonbash
cat [log_dir]/turn_1/attempt_1/status.jsonKey fields to monitor:
- (0-100)
coordination.completion_percentage - - "initial_answer", "enforcement", "presentation"
coordination.phase - - null while running, agent_id when complete
results.winner - - "waiting", "streaming", "answered", "voted", "error"
agents[].status - - null if ok, error details if failed
agents[].error
status.jsonbash
cat [log_dir]/turn_1/attempt_1/status.json需监控的关键字段:
- (0-100)
coordination.completion_percentage - - "initial_answer"(初始回答)、"enforcement"(强制执行)、"presentation"(结果展示)
coordination.phase - - 运行中为null,完成后为agent_id
results.winner - - "waiting"(等待)、"streaming"(流处理)、"answered"(已回答)、"voted"(已投票)、"error"(错误)
agents[].status - - 正常时为null,失败时显示错误详情
agents[].error
Reading Final Results
查看最终结果
After completion (exit code 0):
bash
undefined完成后(退出码为0):
bash
undefinedRead the final answer
读取最终答案
cat [log_dir]/turn_1/attempt_1/final/[winner]/answer.txt
**Other useful files:**
- `execution_metadata.yaml` - Full config and execution details
- `coordination_events.json` - Complete event log
- `coordination_table.txt` - Human-readable coordination summarycat [log_dir]/turn_1/attempt_1/final/[winner]/answer.txt
**其他有用的文件:**
- `execution_metadata.yaml` - 完整配置与执行详情
- `coordination_events.json` - 完整事件日志
- `coordination_table.txt` - 人类可读的协调摘要Part 2: Querying Logfire
第二部分:查询Logfire
Database Schema
数据库Schema
The main table is with these key columns:
records| Column | Description |
|---|---|
| Name of the span (e.g., "agent.agent_a.round_0") |
| Unique identifier for this span |
| ID of the parent span (null for root) |
| Groups all spans in a single trace |
| Time in seconds |
| When the span started |
| When the span ended |
| JSON blob with custom attributes |
| Log message |
| Boolean for errors |
| Type of exception if any |
| Exception message |
主表为,包含以下关键列:
records| 列 | 描述 |
|---|---|
| Span名称(例如:"agent.agent_a.round_0") |
| 此Span的唯一标识符 |
| 父Span的ID(根Span为null) |
| 将单个追踪中的所有Span分组 |
| 持续时间(秒) |
| Span开始时间 |
| Span结束时间 |
| 包含自定义属性的JSON blob |
| 日志消息 |
| 错误标记(布尔值) |
| 异常类型(如有) |
| 异常消息(如有) |
MassGen Span Hierarchy
MassGen Span层级
MassGen creates hierarchical spans:
coordination.session (root)
├── Coordination event: coordination_started
├── agent.agent_a.round_0
│ ├── llm.openrouter.stream
│ ├── mcp.filesystem.write_file
│ └── Tool execution: mcp__filesystem__write_file
├── agent.agent_b.round_0
├── Agent answer: agent1.1
├── agent.agent_a.round_1 (voting round)
├── Agent vote: agent_a -> agent1.1
├── Coordination event: winner_selected
└── agent.agent_a.presentation
├── Winner selected: agent1.1
├── llm.openrouter.stream
└── Final answer from agent_aMassGen创建层级化的Span:
coordination.session (根)
├── Coordination event: coordination_started
├── agent.agent_a.round_0
│ ├── llm.openrouter.stream
│ ├── mcp.filesystem.write_file
│ └── Tool execution: mcp__filesystem__write_file
├── agent.agent_b.round_0
├── Agent answer: agent1.1
├── agent.agent_a.round_1 (投票轮次)
├── Agent vote: agent_a -> agent1.1
├── Coordination event: winner_selected
└── agent.agent_a.presentation
├── Winner selected: agent1.1
├── llm.openrouter.stream
└── Final answer from agent_aCustom Attributes
自定义属性
MassGen spans include these custom attributes (access via ):
attributes->'key'| Attribute | Description |
|---|---|
| Agent identifier (agent_a, agent_b) |
| Current iteration number |
| Round number for this agent |
| "initial_answer", "voting", or "presentation" |
| Backend provider name |
| Number of answers in context |
| True for presentation spans |
| "vote", "answer", or "error" (set after round completes) |
| Agent ID voted for (only set for votes) |
| Answer label voted for (e.g., "agent1.1", only set for votes) |
| Answer label assigned (e.g., "agent1.1", only set for answers) |
| Error message (only set when outcome is "error") |
| Input token count |
| Output token count |
| Reasoning token count |
| Cached input token count |
| Estimated cost in USD |
MassGen Span包含以下自定义属性(通过访问):
attributes->'key'| 属性 | 描述 |
|---|---|
| Agent标识符(agent_a、agent_b等) |
| 当前迭代次数 |
| 此Agent的轮次编号 |
| "initial_answer"(初始回答)、"voting"(投票)或"presentation"(结果展示) |
| 后端提供商名称 |
| 上下文中的答案数量 |
| 结果展示Span为True |
| "vote"(投票)、"answer"(回答)或"error"(错误)(轮次完成后设置) |
| 投票指向的Agent ID(仅投票场景设置) |
| 投票指向的答案标签(例如:"agent1.1",仅投票场景设置) |
| 分配的答案标签(例如:"agent1.1",仅回答场景设置) |
| 错误消息(仅当outcome为"error"时设置) |
| 输入Token数量 |
| 输出Token数量 |
| 推理Token数量 |
| 缓存输入Token数量 |
| 估算成本(美元) |
Part 3: Common Analysis Queries
第三部分:常见分析查询
1. View Trace Hierarchy
1. 查看追踪层级
sql
SELECT span_name, span_id, parent_span_id, duration, start_timestamp
FROM records
WHERE trace_id = '[YOUR_TRACE_ID]'
ORDER BY start_timestamp
LIMIT 50sql
SELECT span_name, span_id, parent_span_id, duration, start_timestamp
FROM records
WHERE trace_id = '[YOUR_TRACE_ID]'
ORDER BY start_timestamp
LIMIT 502. Find Recent Sessions
2. 查找近期会话
sql
SELECT span_name, trace_id, duration, start_timestamp
FROM records
WHERE span_name = 'coordination.session'
ORDER BY start_timestamp DESC
LIMIT 10sql
SELECT span_name, trace_id, duration, start_timestamp
FROM records
WHERE span_name = 'coordination.session'
ORDER BY start_timestamp DESC
LIMIT 103. Agent Round Performance
3. Agent轮次性能
sql
SELECT
span_name,
duration,
attributes->>'massgen.agent_id' as agent_id,
attributes->>'massgen.round' as round,
attributes->>'massgen.round_type' as round_type
FROM records
WHERE span_name LIKE 'agent.%'
ORDER BY start_timestamp DESC
LIMIT 20sql
SELECT
span_name,
duration,
attributes->>'massgen.agent_id' as agent_id,
attributes->>'massgen.round' as round,
attributes->>'massgen.round_type' as round_type
FROM records
WHERE span_name LIKE 'agent.%'
ORDER BY start_timestamp DESC
LIMIT 204. Tool Call Analysis
4. 工具调用分析
sql
SELECT
span_name,
duration,
parent_span_id,
start_timestamp
FROM records
WHERE span_name LIKE 'mcp.%' OR span_name LIKE 'Tool execution:%'
ORDER BY start_timestamp DESC
LIMIT 30sql
SELECT
span_name,
duration,
parent_span_id,
start_timestamp
FROM records
WHERE span_name LIKE 'mcp.%' OR span_name LIKE 'Tool execution:%'
ORDER BY start_timestamp DESC
LIMIT 305. Find Errors
5. 查找错误
sql
SELECT
span_name,
exception_type,
exception_message,
trace_id,
start_timestamp
FROM records
WHERE is_exception = true
ORDER BY start_timestamp DESC
LIMIT 20sql
SELECT
span_name,
exception_type,
exception_message,
trace_id,
start_timestamp
FROM records
WHERE is_exception = true
ORDER BY start_timestamp DESC
LIMIT 206. LLM Call Performance
6. LLM调用性能
sql
SELECT
span_name,
duration,
attributes->>'gen_ai.request.model' as model,
start_timestamp
FROM records
WHERE span_name LIKE 'llm.%'
ORDER BY start_timestamp DESC
LIMIT 30sql
SELECT
span_name,
duration,
attributes->>'gen_ai.request.model' as model,
start_timestamp
FROM records
WHERE span_name LIKE 'llm.%'
ORDER BY start_timestamp DESC
LIMIT 307. Full Trace with Hierarchy (Nested View)
7. 带层级的完整追踪(嵌套视图)
sql
SELECT
CASE
WHEN parent_span_id IS NULL THEN span_name
ELSE ' └─ ' || span_name
END as hierarchy,
duration,
span_id,
parent_span_id
FROM records
WHERE trace_id = '[YOUR_TRACE_ID]'
ORDER BY start_timestampsql
SELECT
CASE
WHEN parent_span_id IS NULL THEN span_name
ELSE ' └─ ' || span_name
END as hierarchy,
duration,
span_id,
parent_span_id
FROM records
WHERE trace_id = '[YOUR_TRACE_ID]'
ORDER BY start_timestamp8. Coordination Events Timeline
8. 协调事件时间线
sql
SELECT span_name, message, duration, start_timestamp
FROM records
WHERE span_name LIKE 'Coordination event:%'
OR span_name LIKE 'Agent answer:%'
OR span_name LIKE 'Agent vote:%'
OR span_name LIKE 'Winner selected:%'
ORDER BY start_timestamp DESC
LIMIT 30sql
SELECT span_name, message, duration, start_timestamp
FROM records
WHERE span_name LIKE 'Coordination event:%'
OR span_name LIKE 'Agent answer:%'
OR span_name LIKE 'Agent vote:%'
OR span_name LIKE 'Winner selected:%'
ORDER BY start_timestamp DESC
LIMIT 30Part 4: Analysis Workflow
第四部分:分析工作流
Step 1: Run Experiment
步骤1:运行实验
bash
uv run massgen --automation --logfire --config [config] "[prompt]" 2>&1bash
uv run massgen --automation --logfire --config [config] "[prompt]" 2>&1Step 2: Find the Trace
步骤2:查找追踪ID
Query for recent sessions:
sql
SELECT trace_id, duration, start_timestamp
FROM records
WHERE span_name = 'coordination.session'
ORDER BY start_timestamp DESC
LIMIT 5查询近期会话:
sql
SELECT trace_id, duration, start_timestamp
FROM records
WHERE span_name = 'coordination.session'
ORDER BY start_timestamp DESC
LIMIT 5Step 3: Analyze Hierarchy
步骤3:分析层级
Get full trace structure:
sql
SELECT span_name, span_id, parent_span_id, duration
FROM records
WHERE trace_id = '[trace_id_from_step_2]'
ORDER BY start_timestamp获取完整追踪结构:
sql
SELECT span_name, span_id, parent_span_id, duration
FROM records
WHERE trace_id = '[trace_id_from_step_2]'
ORDER BY start_timestampStep 4: Investigate Specific Issues
步骤4:调查特定问题
Slow tool calls:
sql
SELECT span_name, duration, parent_span_id
FROM records
WHERE trace_id = '[trace_id]' AND span_name LIKE 'mcp.%'
ORDER BY duration DESCAgent comparison:
sql
SELECT
attributes->>'massgen.agent_id' as agent,
COUNT(*) as rounds,
SUM(duration) as total_time,
AVG(duration) as avg_round_time
FROM records
WHERE trace_id = '[trace_id]' AND span_name LIKE 'agent.%'
GROUP BY attributes->>'massgen.agent_id'缓慢的工具调用:
sql
SELECT span_name, duration, parent_span_id
FROM records
WHERE trace_id = '[trace_id]' AND span_name LIKE 'mcp.%'
ORDER BY duration DESCAgent对比:
sql
SELECT
attributes->>'massgen.agent_id' as agent,
COUNT(*) as rounds,
SUM(duration) as total_time,
AVG(duration) as avg_round_time
FROM records
WHERE trace_id = '[trace_id]' AND span_name LIKE 'agent.%'
GROUP BY attributes->>'massgen.agent_id'Step 5: Create Trace Link
步骤5:创建追踪链接
Use the MCP tool to create a viewable link:
mcp__logfire__logfire_link(trace_id="[your_trace_id]")使用MCP工具创建可查看的链接:
mcp__logfire__logfire_link(trace_id="[your_trace_id]")Part 5: Improving the Logging Structure
第五部分:优化日志结构
Current Span Types
当前Span类型
| Span Pattern | Source | Description |
|---|---|---|
| coordination_tracker.py | Root session span |
| orchestrator.py | Agent execution round |
| orchestrator.py | Winner's final presentation |
| mcp_tools/client.py | MCP tool execution |
| backends | LLM streaming call |
| base_with_custom_tool.py | Tool wrapper |
| coordination_tracker.py | Coordination events |
| coordination_tracker.py | Answer submission |
| coordination_tracker.py | Vote cast |
| Span模式 | 来源 | 描述 |
|---|---|---|
| coordination_tracker.py | 根会话Span |
| orchestrator.py | Agent执行轮次 |
| orchestrator.py | 获胜者的最终结果展示 |
| mcp_tools/client.py | MCP工具执行 |
| backends | LLM流调用 |
| base_with_custom_tool.py | 工具包装器 |
| coordination_tracker.py | 协调事件 |
| coordination_tracker.py | 答案提交 |
| coordination_tracker.py | 投票记录 |
Adding New Spans
添加新Span
Use the tracer from structured_logging:
python
from massgen.structured_logging import get_tracer
tracer = get_tracer()
with tracer.span("my_operation", attributes={
"massgen.custom_key": "value",
}):
do_work()使用structured_logging中的tracer:
python
from massgen.structured_logging import get_tracer
tracer = get_tracer()
with tracer.span("my_operation", attributes={
"massgen.custom_key": "value",
}):
do_work()Context Propagation Notes
上下文传播说明
Known limitation: When multiple agents run concurrently via , child spans may not nest correctly under agent round spans. This is an OpenTelemetry context propagation issue with concurrent async code. The presentation phase works correctly because only one agent runs.
asyncio.create_taskWorkaround: For accurate nesting in concurrent scenarios, explicit context passing with would be needed.
contextvars.copy_context()已知限制: 当通过并发运行多个Agent时,子Span可能无法正确嵌套在Agent轮次Span下。这是并发异步代码中的OpenTelemetry上下文传播问题。结果展示阶段可正常工作,因为该阶段仅运行一个Agent。
asyncio.create_task解决方法: 若要在并发场景中实现准确的嵌套,需要使用进行显式上下文传递。
contextvars.copy_context()Logfire Documentation Reference
Logfire 文档参考
Main Documentation: https://logfire.pydantic.dev/docs/
Key Pages to Know
需了解的关键页面
| Topic | URL | Description |
|---|---|---|
| Getting Started | | Overview, setup, and core concepts |
| Manual Tracing | | Creating spans, adding attributes |
| SQL Explorer | | Writing SQL queries in the UI |
| Live View | | Real-time trace monitoring |
| Query API | | Programmatic access to data |
| OpenAI Integration | | LLM call instrumentation |
| 主题 | URL | 描述 |
|---|---|---|
| 快速入门 | | 概述、设置与核心概念 |
| 手动追踪 | | 创建Span、添加属性 |
| SQL Explorer | | 在UI中编写SQL查询 |
| 实时视图 | | 实时追踪监控 |
| 查询API | | 程序化访问数据 |
| OpenAI集成 | | LLM调用埋点 |
Logfire Concepts
Logfire 概念
Spans vs Logs:
- Spans represent operations with measurable duration (use )
with logfire.span(): - Logs capture point-in-time events (use ,
logfire.info(), etc.)logfire.error() - Spans and logs inside a span block become children of that span
Span Names vs Messages:
- = the first argument (used for filtering, keep low-cardinality)
span_name - = formatted result with attribute values interpolated
message - Example: → span_name="Hello {name}", message="Hello Alice"
logfire.info('Hello {name}', name='Alice')
Attributes:
- Keyword arguments become structured JSON attributes
- Access in SQL via or
attributes->>'key'attributes->'key' - Cast when needed:
(attributes->'cost')::float
Span vs 日志:
- Span 表示具有可测量持续时间的操作(使用)
with logfire.span(): - 日志 捕获时间点事件(使用、
logfire.info()等)logfire.error() - Span块内的Span和日志会成为该Span的子项
Span名称 vs 消息:
- = 第一个参数(用于过滤,需保持低基数)
span_name - = 插入属性值后的格式化结果
message - 示例:→ span_name="Hello {name}", message="Hello Alice"
logfire.info('Hello {name}', name='Alice')
属性:
- 关键字参数会成为结构化JSON属性
- 在SQL中通过或
attributes->>'key'访问attributes->'key' - 需要时进行类型转换:
(attributes->'cost')::float
Live View Features
实时视图功能
The Logfire Live View UI (https://logfire.pydantic.dev/) provides:
- Real-time streaming of traces as they arrive
- SQL search pane (press to open) with auto-complete
/ - Natural language to SQL - describe what you want and get a query
- Timeline histogram showing span counts over time
- Trace details panel with attributes, exceptions, and OpenTelemetry data
- Cross-linking between SQL results and trace view via trace_id/span_id
Logfire实时视图UI(https://logfire.pydantic.dev/)提供:
- 实时流传输 追踪数据
- SQL搜索面板(按打开),带自动补全
/ - 自然语言转SQL - 描述需求即可生成查询语句
- 时间线直方图 显示Span数量随时间的变化
- 追踪详情面板 包含属性、异常与OpenTelemetry数据
- 交叉链接 通过trace_id/span_id在SQL结果与追踪视图间跳转
SQL Explorer Tips
SQL Explorer 技巧
The Explore page uses Apache DataFusion SQL syntax (similar to Postgres):
sql
-- Subqueries and CTEs work
WITH recent AS (
SELECT * FROM records
WHERE start_timestamp > now() - interval '1 hour'
)
SELECT * FROM recent WHERE is_exception;
-- Access nested JSON
SELECT attributes->>'massgen.agent_id' as agent FROM records;
-- Cast JSON values
SELECT (attributes->'token_count')::int as tokens FROM records;
-- Time filtering is efficient
WHERE start_timestamp > now() - interval '30 minutes'Explore页面使用Apache DataFusion SQL语法(与Postgres类似):
sql
-- 支持子查询和CTE
WITH recent AS (
SELECT * FROM records
WHERE start_timestamp > now() - interval '1 hour'
)
SELECT * FROM recent WHERE is_exception;
-- 访问嵌套JSON
SELECT attributes->>'massgen.agent_id' as agent FROM records;
-- JSON值类型转换
SELECT (attributes->'token_count')::int as tokens FROM records;
-- 时间过滤效率高
WHERE start_timestamp > now() - interval '30 minutes'LLM Instrumentation
LLM 埋点
Logfire auto-instruments OpenAI calls when configured:
- Captures conversation display, token usage, response metadata
- Creates separate spans for streaming requests vs responses
- Works with both sync and async clients
MassGen's backends use this for spans.
llm.{provider}.stream配置完成后,Logfire会自动埋点OpenAI调用:
- 捕获对话展示、Token使用、响应元数据
- 为流请求与响应创建独立Span
- 支持同步与异步客户端
MassGen的后端使用此功能生成 Span。
llm.{provider}.streamReference Documentation
参考文档
Logfire:
- Main docs: https://logfire.pydantic.dev/docs/
- Live View: https://logfire.pydantic.dev/docs/guides/web-ui/live/
- SQL Explorer: https://logfire.pydantic.dev/docs/guides/web-ui/explore/
- Query API: https://logfire.pydantic.dev/docs/how-to-guides/query-api/
- Manual tracing: https://logfire.pydantic.dev/docs/guides/onboarding-checklist/add-manual-tracing/
- OpenAI integration: https://logfire.pydantic.dev/docs/integrations/llms/openai/
- Schema reference: Use tool
mcp__logfire__schema_reference
MassGen:
- Automation mode:
AI_USAGE.md - Status file reference:
docs/source/reference/status_file.rst - Structured logging:
massgen/structured_logging.py
Logfire:
- 主文档:https://logfire.pydantic.dev/docs/
- 实时视图:https://logfire.pydantic.dev/docs/guides/web-ui/live/
- SQL Explorer:https://logfire.pydantic.dev/docs/guides/web-ui/explore/
- 查询API:https://logfire.pydantic.dev/docs/how-to-guides/query-api/
- 手动追踪:https://logfire.pydantic.dev/docs/guides/onboarding-checklist/add-manual-tracing/
- OpenAI集成:https://logfire.pydantic.dev/docs/integrations/llms/openai/
- Schema参考:使用工具
mcp__logfire__schema_reference
MassGen:
- 自动化模式:
AI_USAGE.md - 状态文件参考:
docs/source/reference/status_file.rst - 结构化日志:
massgen/structured_logging.py
Tips for Effective Analysis
有效分析技巧
- Always use both flags: together
--automation --logfire - Run in background for long tasks to monitor progress
- Query by trace_id to isolate specific sessions
- Check parent_span_id to understand hierarchy
- Use duration to identify bottlenecks
- Look at attributes for MassGen-specific context
- Create trace links to share findings with team
Note that you may get an error like so:
bash
Error: Error executing tool arbitrary_query: b'{"detail":"Rate limit exceeded for organization xxx: per minute
limit reached."}'In this case, please sleep (for up to a minute) and try again.
- 始终同时使用两个参数:
--automation --logfire - 后台运行 长任务以便监控进度
- 按trace_id查询 以隔离特定会话
- 检查parent_span_id 以理解层级关系
- 使用duration 识别瓶颈
- 查看attributes 获取MassGen特定上下文
- 创建追踪链接 与团队共享发现
注意您可能会遇到如下错误:
bash
Error: Error executing tool arbitrary_query: b'{"detail":"Rate limit exceeded for organization xxx: per minute
limit reached."}'遇到此情况时,请等待(最多一分钟)后重试。
Part 6: Comprehensive Log Analysis Report
第六部分:全面日志分析报告
When asked to analyze a MassGen log run, generate a markdown report saved to where N is the turn being analyzed. Each turn (coordination session) gets its own analysis report as a sibling to the attempt directories. The report must cover the Standard Analysis Questions below.
[log_dir]/turn_N/ANALYSIS_REPORT.md当需要分析MassGen日志运行结果时,请生成Markdown报告并保存到,其中N为待分析的轮次编号。每个轮次(协调会话)都有独立的分析报告,作为尝试目录的同级文件。报告必须涵盖以下标准分析问题。
[log_dir]/turn_N/ANALYSIS_REPORT.mdImportant: Ground Truth and Correctness
重要提示:基准事实与正确性
CRITICAL: Do not assume any agent's answer is "correct" unless the user explicitly provides ground truth.
- Report what each agent claimed/produced without asserting correctness
- Note when agents agree or disagree, but don't claim agreement = correctness
- If agents produce different answers, present both neutrally
- Only mark answers as correct/incorrect if user provides the actual answer
- Phrases to avoid: "correctly identified", "got the right answer", "solved correctly"
- Phrases to use: "claimed", "produced", "submitted", "arrived at"
关键注意事项:除非用户明确提供基准事实,否则不要假设任何Agent的答案是"正确的"。
- 报告每个Agent声称/生成的内容,但不要断言其正确性
- 注意Agent之间的一致或分歧,但不要声称一致等于正确
- 若Agent生成不同答案,中立呈现两者
- 仅当用户提供实际答案时,才标记答案正确/错误
- 避免使用的表述:"正确识别"、"得到正确答案"、"成功解决"
- 推荐使用的表述:"声称"、"生成"、"提交"、"得出"
Standard Analysis Questions
标准分析问题
Every analysis report MUST answer these questions:
每份分析报告必须回答以下问题:
1. Correctness
1. 正确性
- Did coordination complete successfully?
- Did all agents submit answers?
- Did voting occur correctly?
- Was a winner selected and did they provide a final answer?
- 协调是否成功完成?
- 所有Agent是否都提交了答案?
- 投票是否正确进行?
- 是否选出了获胜者并提供了最终答案?
2. Efficiency & Bottlenecks
2. 效率与瓶颈
- What was the total duration and breakdown by phase?
- What were the slowest operations?
- Which tools took the most time?
- 总时长及各阶段的时间分布?
- 最慢的操作是什么?
- 哪些工具耗时最长?
3. Command Pattern Analysis
3. 命令模式分析
- Were there frequently repeated commands that could be avoided? (e.g., ,
openskills read,npm install)ls -R - What commands produced unnecessarily long output? (e.g., skill docs, directory listings)
- What were the slowest patterns? (e.g., web scraping, package installs)
execute_command
- 是否存在可避免的频繁重复命令?(例如:、
openskills read、npm install)ls -R - 哪些命令产生了不必要的长输出?(例如:技能文档、目录列表)
- 最慢的模式是什么?(例如:网页抓取、包安装)
execute_command
4. Work Duplication Analysis
4. 工作重复分析
- Was expensive work (like image generation) unnecessarily redone?
- Did both agents generate similar/identical assets?
- Were assets regenerated after restarts instead of being reused?
- Could caching or sharing have saved time/cost?
- 是否不必要地重复了高成本工作(如图像生成)?
- 两个Agent是否生成了相似/相同的资产?
- 重启后是否重新生成了资产而非重用?
- 缓存或共享是否可以节省时间/成本?
5. Agent Behavior & Decision Making
5. Agent行为与决策
- How did agents evaluate previous answers? What reasoning did they provide?
- How did agents decide between voting vs providing a new answer?
- Did agents genuinely build upon each other's work or work in isolation?
- Were there timeouts or incomplete rounds?
- Agent如何评估之前的答案? 他们提供了哪些推理依据?
- Agent如何决定投票还是提供新答案?
- Agent是否真正基于彼此的工作进行构建,还是独立工作?
- 是否存在超时或未完成的轮次?
6. Cost & Token Analysis
6. 成本与Token分析
- Total cost and breakdown by agent
- Token usage (input, output, reasoning, cached)
- Cache hit rate
- 总成本及各Agent的成本分布
- Token使用情况(输入、输出、推理、缓存)
- 缓存命中率
7. Errors & Issues
7. 错误与问题
- Any exceptions or failures?
- Any timeouts?
- Any agent errors?
- 是否存在异常或失败?
- 是否存在超时?
- 是否存在Agent错误?
8. Agent Reasoning & Behavior Analysis (CRITICAL)
8. Agent推理与行为分析(关键)
This is the most important section. Analyzing how agents think and act reveals root causes of successes and failures.
Data Sources:
- - Full output including reasoning (if available)
agent_outputs/agent_*.txt - - Complete tool calls with arguments and results
agent_*/*/execution_trace.md - - Raw streaming chunks
streaming_debug.log
Note: Some models don't emit explicit reasoning traces. For these, analyze tool call patterns and content instead - the sequence of actions still reveals decision-making.
For EACH agent, analyze:
- Strategy - What approach did they take? (from reasoning OR tool sequence)
- Tool Responses - How did they handle successes/failures/inconsistencies?
- Error Recovery - Did they detect problems? Implement workarounds?
- Decision Quality - Logical errors? Over/under-verification? Analysis paralysis?
- Cross-Agent Comparison - Which had best reasoning? What patterns led to success?
Key Patterns:
| Pattern | Good Sign | Bad Sign |
|---|---|---|
| Failure detection | Pivots after 2-3 failures | Repeats broken approach 6+ times |
| Result validation | Cross-validates outputs | Accepts first result blindly |
| Inconsistency handling | Investigates conflicts | Ignores contradictions |
| Workarounds | Creative alternatives when stuck | Gives up or loops |
| Time management | Commits when confident | Endless verification, no answer |
Extract Key Evidence: For each agent, include 2-3 quotes (if reasoning available) OR describe key tool sequences that illustrate their decision quality.
这是最重要的部分。 分析Agent的思考与行为可揭示成功与失败的根本原因。
数据源:
- - 完整输出,包括推理过程(若可用)
agent_outputs/agent_*.txt - - 完整的工具调用、参数与结果
agent_*/*/execution_trace.md - - 原始流数据块
streaming_debug.log
注意: 部分模型不会输出显式的追踪推理过程。对于此类模型,请分析工具调用模式与内容 - 操作序列仍可揭示决策过程。
针对每个Agent,分析:
- 策略 - 他们采用了什么方法?(从推理过程或工具序列中分析)
- 工具响应 - 他们如何处理成功/失败/不一致?
- 错误恢复 - 他们是否检测到问题?是否实施了变通方案?
- 决策质量 - 是否存在逻辑错误?过度/不足验证?分析瘫痪?
- Agent间对比 - 哪个Agent的推理最佳?哪些模式导致了成功?
关键模式:
| 模式 | 积极信号 | 消极信号 |
|---|---|---|
| 故障检测 | 2-3次失败后转向其他方案 | 重复错误方法6次以上 |
| 结果验证 | 交叉验证输出 | 盲目接受第一个结果 |
| 不一致处理 | 调查冲突 | 忽略矛盾 |
| 变通方案 | 遇到困境时采用创造性替代方案 | 放弃或循环执行 |
| 时间管理 | 有信心时提交结果 | 无休止验证,无最终答案 |
提取关键证据: 针对每个Agent,引用2-3条推理过程中的语句(若可用),或描述能体现其决策质量的关键工具序列。
9. Tool Reliability Analysis
9. 工具可靠性分析
Analyze tool behavior patterns beyond simple error listing:
- Consistency - Same input, same output? Document variance.
- False Positives/Negatives - Tools reporting wrong success/failure status?
- Root Cause Hypotheses - For each failure pattern, propose likely causes (path issues, rate limits, model limitations, etc.)
分析工具行为模式,而非简单列出错误:
- 一致性 - 相同输入是否得到相同输出?记录差异。
- 误报/漏报 - 工具是否报告错误的成功/失败状态?
- 根本原因假设 - 针对每个失败模式,提出可能的原因(路径问题、速率限制、模型限制等)
10. Enforcement & Workflow Reliability Analysis
10. 强制执行与工作流可靠性分析
Data Source: →
status.jsonagents[].reliabilityCheck if agents needed retries due to workflow violations. Key metrics:
- - How many times agent was forced to retry
total_enforcement_retries - - Content discarded due to restarts
total_buffer_chars_lost - - Hallucinated tool names
unknown_tools - - Which rounds had issues
by_round
Red Flags: >=2 retries per round, >5000 chars lost, populated list.
unknown_toolsSee "Enforcement Reliability" in the Key Local Log Files section for the full schema and reason codes.
数据源: →
status.jsonagents[].reliability检查Agent是否因违反工作流而需要重试。关键指标:
- - Agent被强制重试的次数
total_enforcement_retries - - 因重启而丢弃的内容字符数
total_buffer_chars_lost - - 幻觉出的工具名称
unknown_tools - - 哪些轮次存在问题
by_round
危险信号: 每轮重试>=2次、丢失字符数>5000、列表非空。
unknown_tools有关完整schema和原因代码,请参阅"关键本地日志文件"部分中的"强制执行可靠性"内容。
Data Sources for Each Question
各问题的数据源
| Question | Primary Source | Secondary Source |
|---|---|---|
| Correctness | | Logfire coordination events |
| Efficiency | | Logfire duration queries |
| Command patterns | | - |
| Work duplication | | |
| Agent decisions | | Logfire vote spans |
| Cost/tokens | | Logfire usage attributes |
| Errors | | Logfire |
| Enforcement | | - |
| 问题 | 主要数据源 | 次要数据源 |
|---|---|---|
| 正确性 | | Logfire协调事件 |
| 效率 | | Logfire时长查询 |
| 命令模式 | | - |
| 工作重复 | | |
| Agent决策 | | Logfire投票Span |
| 成本/Token | | Logfire使用属性 |
| 错误 | | Logfire |
| 强制执行 | | - |
Analysis Commands
分析命令
Find repeated commands:
bash
grep -o '"command": "[^"]*"' streaming_debug.log | sed 's/"command": "//;s/"$//' | sort | uniq -c | sort -rn | head -30Find generate_media prompts (to check for duplication):
bash
grep -o '"prompts": \[.*\]' streaming_debug.logCheck vote reasoning:
bash
cat agent_*/*/vote.json | jq '.reason'Find timeout events:
bash
cat coordination_events.json | jq '.events[] | select(.event_type == "agent_timeout")'查找重复命令:
bash
grep -o '"command": "[^"]*"' streaming_debug.log | sed 's/"command": "//;s/"$//' | sort | uniq -c | sort -rn | head -30查找generate_media提示(检查重复):
bash
grep -o '"prompts": \[.*\]' streaming_debug.log检查投票推理:
bash
cat agent_*/*/vote.json | jq '.reason'查找超时事件:
bash
cat coordination_events.json | jq '.events[] | select(.event_type == "agent_timeout")'Report Template
报告模板
Save this report to (where N is the turn number being analyzed):
[log_dir]/turn_N/ANALYSIS_REPORT.mdmarkdown
undefined将此报告保存到(N为待分析的轮次编号):
[log_dir]/turn_N/ANALYSIS_REPORT.mdmarkdown
undefinedMassGen Log Analysis Report
MassGen 日志分析报告
Session: [log_dir name]
Trace ID: [trace_id if available]
Generated: [timestamp]
Logfire Link: [link if available]
会话: [log_dir名称]
追踪ID: [若可用则填写trace_id]
生成时间: [时间戳]
Logfire链接: [若可用则填写链接]
Executive Summary
执行摘要
[2-3 sentence summary of the run: what was the task, did it succeed, key findings]
[2-3句话总结运行情况:任务内容、是否成功、关键发现]
Session Overview
会话概述
| Metric | Value |
|---|---|
| Duration | X minutes |
| Agents | [list] |
| Winner | [agent_id] |
| Total Cost | $X.XX |
| Total Answers | X |
| Total Votes | X |
| Total Restarts | X |
| 指标 | 数值 |
|---|---|
| 总时长 | X分钟 |
| Agent列表 | [列表] |
| 获胜Agent | [agent_id] |
| 总成本 | $X.XX |
| 总答案数 | X |
| 总投票数 | X |
| 总重启次数 | X |
1. Correctness Analysis
1. 正确性分析
Coordination Flow
协调流程
[Timeline of key events]
[关键事件时间线]
Status
状态
- All phases completed
- All agents submitted answers
- Voting completed correctly
- Winner selected
- Final answer delivered
- 所有阶段已完成
- 所有Agent已提交答案
- 投票已正确完成
- 已选出获胜者
- 已交付最终答案
Issues Found
发现的问题
[List any correctness issues]
[列出所有正确性问题]
2. Efficiency Analysis
2. 效率分析
Phase Duration Breakdown
阶段时长分布
| Phase | Count | Avg (s) | Max (s) | Total (s) | % of Total |
|---|---|---|---|---|---|
| initial_answer | |||||
| voting | |||||
| presentation |
| 阶段 | 次数 | 平均时长(秒) | 最长时长(秒) | 总时长(秒) | 占比 |
|---|---|---|---|---|---|
| initial_answer | |||||
| voting | |||||
| presentation |
Top Bottlenecks
主要瓶颈
- [Operation] - X seconds (X% of total)
- [Operation] - X seconds
- [Operation] - X seconds
- [操作] - X秒(占总时长X%)
- [操作] - X秒
- [操作] - X秒
3. Command Pattern Analysis
3. 命令模式分析
Frequently Repeated Commands
频繁重复的命令
| Command | Times Run | Issue | Recommendation |
|---|---|---|---|
| X | Long output (~5KB) re-read after restarts | Cache skill docs |
| X | Reinstalled after each restart | Persist node_modules |
| ... |
| 命令 | 运行次数 | 问题 | 建议 |
|---|---|---|---|
| X | 重启后重复读取长输出(约5KB) | 缓存技能文档 |
| X | 每次重启后重新安装 | 持久化node_modules |
| ... |
Commands with Excessive Output
输出过大的命令
| Command | Output Size | Issue |
|---|---|---|
| 命令 | 输出大小 | 问题 |
|---|---|---|
Slowest Command Patterns
最慢的命令模式
| Pattern | Max Time | Avg Time | Notes |
|---|---|---|---|
| Web scraping (crawl4ai) | Xs | Xs | |
| npm install | Xs | Xs | |
| PPTX pipeline | Xs | Xs |
| 模式 | 最长耗时 | 平均耗时 | 说明 |
|---|---|---|---|
| 网页抓取(crawl4ai) | X秒 | X秒 | |
| npm install | X秒 | X秒 | |
| PPTX流水线 | X秒 | X秒 |
4. Work Duplication Analysis
4. 工作重复分析
Duplicated Work Found
发现的重复工作
| Work Type | Times Repeated | Wasted Time | Wasted Cost |
|---|---|---|---|
| Image generation | X | X min | $X.XX |
| Research/scraping | X | X min | - |
| Package installs | X | X min | - |
| 工作类型 | 重复次数 | 浪费时间 | 浪费成本 |
|---|---|---|---|
| 图像生成 | X | X分钟 | $X.XX |
| 研究/抓取 | X | X分钟 | - |
| 包安装 | X | X分钟 | - |
Specific Examples
具体示例
[List specific examples of duplicated work with prompts/commands]
[列出重复工作的具体示例,含提示/命令]
Recommendations
建议
- [Specific recommendation to avoid duplication]
- [Specific recommendation]
- [避免重复的具体建议]
- [具体建议]
5. Agent Behavior Analysis
5. Agent行为分析
Answer Progression
答案演进
| Label | Agent | Time | Summary |
|---|---|---|---|
| agent1.1 | agent_a | HH:MM | [brief description] |
| agent2.1 | agent_b | HH:MM | [brief description] |
| ... |
| 标签 | Agent | 时间 | 摘要 |
|---|---|---|---|
| agent1.1 | agent_a | HH:MM | [简要描述] |
| agent2.1 | agent_b | HH:MM | [简要描述] |
| ... |
Voting Analysis
投票分析
| Voter | Voted For | Reasoning Summary |
|---|---|---|
| agent_b | agent1.1 | "[key quote from reasoning]" |
| 投票Agent | 投票对象 | 推理摘要 |
|---|---|---|
| agent_b | agent1.1 | "[推理过程中的关键引用]" |
Vote vs New Answer Decisions
投票 vs 新答案决策
[Explain how agents decided whether to vote or provide new answers]
[解释Agent如何决定投票还是提供新答案]
Agent Collaboration Quality
Agent协作质量
- Did agents read each other's answers? [Yes/No with evidence]
- Did agents build upon previous work? [Yes/No with evidence]
- Did agents provide genuine evaluation? [Yes/No with evidence]
- Agent是否阅读了彼此的答案?[是/否,附证据]
- Agent是否基于之前的工作进行构建?[是/否,附证据]
- Agent是否提供了真实的评估?[是/否,附证据]
Timeouts/Incomplete Rounds
超时/未完成轮次
[List any timeouts with context]
[列出所有超时事件及上下文]
6. Cost & Token Analysis
6. 成本与Token分析
Cost Breakdown
成本分布
| Agent | Input Tokens | Output Tokens | Reasoning | Cost |
|---|---|---|---|---|
| agent_a | $X.XX | |||
| agent_b | $X.XX | |||
| Total | $X.XX |
| Agent | 输入Token | 输出Token | 推理Token | 成本 |
|---|---|---|---|---|
| agent_a | $X.XX | |||
| agent_b | $X.XX | |||
| 总计 | $X.XX |
Cache Efficiency
缓存效率
- Cached input tokens: X (X% cache hit rate)
- 缓存输入Token:X(缓存命中率X%)
Tool Cost Impact
工具成本影响
| Tool | Calls | Est. Time Cost | Notes |
|---|---|---|---|
| generate_media | X | X min | |
| command_line | X | X min |
| 工具 | 调用次数 | 估算时间成本 | 说明 |
|---|---|---|---|
| generate_media | X | X分钟 | |
| command_line | X | X分钟 |
7. Errors & Issues
7. 错误与问题
Exceptions
异常
[List any exceptions with type and message]
[列出所有异常,含类型与消息]
Failed Tool Calls
失败的工具调用
[List any failed tools]
[列出所有失败的工具]
Agent Errors
Agent错误
[List any agent-level errors]
[列出所有Agent级别的错误]
Timeouts
超时
[List any timeouts with duration and context]
[列出所有超时事件及时长与上下文]
8. Recommendations
8. 建议
High Priority
高优先级
- [Issue]: [Specific actionable recommendation]
- [Issue]: [Specific actionable recommendation]
- [问题]:[具体可操作建议]
- [问题]:[具体可操作建议]
Medium Priority
中优先级
- [Issue]: [Recommendation]
- [问题]:[建议]
Low Priority / Future Improvements
低优先级/未来改进
- [Issue]: [Recommendation]
- [问题]:[建议]
9. Suggested Linear Issues
9. 建议的Linear问题
Based on the analysis, the following issues are suggested for tracking. If you have access to the Linear project and the session is interactive, present these to the user for approval before creating. Regardless of access, you should write them in a section as below, as we want to learn from the logs to propose and later solve concrete issues:
| Priority | Title | Description | Labels |
|---|---|---|---|
| High | [Short title] | [1-2 sentence description] | log-analysis, [area] |
| Medium | [Short title] | [Description] | log-analysis, [area] |
After user approval, create issues in Linear with:
- Project: MassGen
- Label: (to identify issues from log analysis)
log-analysis - Additional labels as appropriate (e.g., ,
performance,agent-behavior)tooling
基于分析结果,建议创建以下跟踪问题。如果您有权访问Linear项目且会话为交互式,请先提交给用户审批后再创建。无论是否有权访问,都应在本节中列出,以便我们从日志中学习并提出具体的解决方案:
| 优先级 | 标题 | 描述 | 标签 |
|---|---|---|---|
| 高 | [简短标题] | [1-2句话描述] | log-analysis, [领域] |
| 中 | [简短标题] | [描述] | log-analysis, [领域] |
用户审批后,在Linear中创建问题:
- 项目:MassGen
- 标签:(用于标识来自日志分析的问题)
log-analysis - 附加标签(如:、
performance、agent-behavior)tooling
Appendix
附录
Configuration Used
使用的配置
[Key config settings from execution_metadata.yaml]
[来自execution_metadata.yaml的关键配置设置]
Files Generated
生成的文件
[List of output files in the workspace]
undefined[工作区中的输出文件列表]
undefinedWorkflow for Generating Report
报告生成工作流
- Read local files first (metrics_summary.json, coordination_table.txt, coordination_events.json)
- Query Logfire for trace_id and timing data (if available; wait and retry on rate limits)
- Analyze streaming_debug.log for command patterns
- Check vote.json files for agent reasoning
- Generate the report using the template
- Save to (N = turn number being analyzed)
[log_dir]/turn_N/ANALYSIS_REPORT.md - Print summary to the user
- Suggest Linear issues based on findings - present to user for approval, if session is interactive
- Create approved issues in Linear with label
log-analysis
- 先读取本地文件(metrics_summary.json、coordination_table.txt、coordination_events.json)
- 查询Logfire 获取trace_id和时序数据(若可用;遇到速率限制时等待并重试)
- 分析streaming_debug.log 查找命令模式
- 检查vote.json文件 获取Agent推理过程
- 使用模板生成报告
- 保存到 (N = 待分析的轮次编号)
[log_dir]/turn_N/ANALYSIS_REPORT.md - 向用户打印摘要
- 基于发现建议Linear问题 - 若会话为交互式,提交给用户审批
- 创建已审批的问题 在Linear中添加标签
log-analysis
Part 7: Quick Reference - SQL Queries
第七部分:快速参考 - SQL查询
Correctness Queries
正确性查询
sql
-- Check coordination flow
SELECT span_name, start_timestamp, duration
FROM records
WHERE trace_id = '[TRACE_ID]'
AND (span_name LIKE 'Coordination event:%'
OR span_name LIKE 'Agent answer:%'
OR span_name LIKE 'Agent vote:%'
OR span_name LIKE 'Winner selected:%'
OR span_name LIKE 'Final answer%')
ORDER BY start_timestampsql
-- 检查协调流程
SELECT span_name, start_timestamp, duration
FROM records
WHERE trace_id = '[TRACE_ID]'
AND (span_name LIKE 'Coordination event:%'
OR span_name LIKE 'Agent answer:%'
OR span_name LIKE 'Agent vote:%'
OR span_name LIKE 'Winner selected:%'
OR span_name LIKE 'Final answer%')
ORDER BY start_timestampEfficiency Queries
效率查询
sql
-- Phase duration breakdown
SELECT
CASE
WHEN span_name LIKE 'agent.%.round_0' THEN 'initial_answer'
WHEN span_name LIKE 'agent.%.round_%' THEN 'voting'
WHEN span_name LIKE 'agent.%.presentation' THEN 'presentation'
ELSE 'other'
END as phase,
COUNT(*) as count,
ROUND(AVG(duration)::numeric, 2) as avg_duration_sec,
ROUND(MAX(duration)::numeric, 2) as max_duration_sec,
ROUND(SUM(duration)::numeric, 2) as total_duration_sec
FROM records
WHERE trace_id = '[TRACE_ID]'
AND span_name LIKE 'agent.%'
GROUP BY 1
ORDER BY total_duration_sec DESCsql
-- 阶段时长分布
SELECT
CASE
WHEN span_name LIKE 'agent.%.round_0' THEN 'initial_answer'
WHEN span_name LIKE 'agent.%.round_%' THEN 'voting'
WHEN span_name LIKE 'agent.%.presentation' THEN 'presentation'
ELSE 'other'
END as phase,
COUNT(*) as count,
ROUND(AVG(duration)::numeric, 2) as avg_duration_sec,
ROUND(MAX(duration)::numeric, 2) as max_duration_sec,
ROUND(SUM(duration)::numeric, 2) as total_duration_sec
FROM records
WHERE trace_id = '[TRACE_ID]'
AND span_name LIKE 'agent.%'
GROUP BY 1
ORDER BY total_duration_sec DESCError Queries
错误查询
sql
-- Find all exceptions
SELECT span_name, exception_type, exception_message, start_timestamp
FROM records
WHERE trace_id = '[TRACE_ID]' AND is_exception = true
ORDER BY start_timestampsql
-- 查找所有异常
SELECT span_name, exception_type, exception_message, start_timestamp
FROM records
WHERE trace_id = '[TRACE_ID]' AND is_exception = true
ORDER BY start_timestampCost Queries
成本查询
sql
-- Token usage by agent
SELECT
attributes->>'massgen.agent_id' as agent,
SUM((attributes->'massgen.usage.input')::int) as total_input_tokens,
SUM((attributes->'massgen.usage.output')::int) as total_output_tokens,
SUM((attributes->'massgen.usage.cost')::float) as total_cost_usd
FROM records
WHERE trace_id = '[TRACE_ID]'
AND span_name LIKE 'agent.%'
AND attributes->>'massgen.usage.input' IS NOT NULL
GROUP BY attributes->>'massgen.agent_id'sql
-- Agent的Token使用情况
SELECT
attributes->>'massgen.agent_id' as agent,
SUM((attributes->'massgen.usage.input')::int) as total_input_tokens,
SUM((attributes->'massgen.usage.output')::int) as total_output_tokens,
SUM((attributes->'massgen.usage.cost')::float) as total_cost_usd
FROM records
WHERE trace_id = '[TRACE_ID]'
AND span_name LIKE 'agent.%'
AND attributes->>'massgen.usage.input' IS NOT NULL
GROUP BY attributes->>'massgen.agent_id'undefined