massgen-log-analyzer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

MassGen Log Analyzer

MassGen 日志分析器

This skill provides a structured workflow for running MassGen experiments and analyzing the resulting traces and logs using Logfire.

本技能提供了结构化的工作流，用于运行MassGen实验并使用Logfire分析生成的追踪数据和日志。

Purpose

用途

The log-analyzer skill helps you:

Run MassGen experiments with proper instrumentation
Query and analyze traces hierarchically
Debug agent behavior and coordination patterns
Measure performance and identify bottlenecks
Improve the logging structure itself
Generate markdown analysis reports saved to the log directory

日志分析器技能可帮助您：

运行带有合适埋点的MassGen实验
分层查询和分析追踪数据
调试Agent行为与协调模式
衡量性能并识别瓶颈
优化日志结构本身
生成Markdown分析报告并保存到日志目录

CLI Quick Reference

CLI快速参考

The

massgen logs

CLI provides quick access to log analysis:

massgen logs

CLI提供了日志分析的快捷入口：

List Logs with Analysis Status

查看带分析状态的日志列表

bash

uv run massgen logs list                    # Show all recent logs with analysis status
uv run massgen logs list --analyzed         # Only logs with ANALYSIS_REPORT.md
uv run massgen logs list --unanalyzed       # Only logs needing analysis
uv run massgen logs list --limit 20         # Show more logs

bash

uv run massgen logs list                    # 显示所有近期日志及分析状态
uv run massgen logs list --analyzed         # 仅显示已生成ANALYSIS_REPORT.md的日志
uv run massgen logs list --unanalyzed       # 仅显示待分析的日志
uv run massgen logs list --limit 20         # 显示更多日志

Generate Analysis Prompt

生成分析提示

bash

undefined

bash

undefined

Run from within your coding CLI (e.g., Claude Code) so it sees output

在代码CLI（如Claude Code）中运行，以便查看输出

uv run massgen logs analyze # Analyze latest turn of latest log uv run massgen logs analyze --log-dir PATH # Analyze specific log uv run massgen logs analyze --turn 1 # Analyze specific turn


The prompt output tells your coding CLI to use this skill on the specified log directory.

uv run massgen logs analyze # 分析最新日志的最新轮次 uv run massgen logs analyze --log-dir PATH # 分析指定日志 uv run massgen logs analyze --turn 1 # 分析指定轮次


提示输出会告知您的代码CLI要对指定日志目录使用本技能。

Multi-Agent Self-Analysis

多Agent自分析

bash

uv run massgen logs analyze --mode self                 # Run 3-agent analysis team (prompts if report exists)
uv run massgen logs analyze --mode self --force         # Overwrite existing report without prompting
uv run massgen logs analyze --mode self --turn 2        # Analyze specific turn
uv run massgen logs analyze --mode self --config PATH   # Use custom config

Self-analysis mode runs MassGen with multiple agents to analyze logs from different perspectives (correctness, efficiency, behavior) and produces a combined ANALYSIS_REPORT.md.

bash

uv run massgen logs analyze --mode self                 # 运行3-Agent分析团队（若报告已存在会提示）
uv run massgen logs analyze --mode self --force         # 直接覆盖现有报告，不提示
uv run massgen logs analyze --mode self --turn 2        # 分析指定轮次
uv run massgen logs analyze --mode self --config PATH   # 使用自定义配置

自分析模式会运行MassGen多Agent团队，从不同维度（正确性、效率、行为）分析日志，并生成合并后的ANALYSIS_REPORT.md。

Multi-Turn Sessions

多轮会话

MassGen log directories support multiple turns (coordination sessions). Each turn has its own

turn_N/

directory with attempts inside:

text

log_YYYYMMDD_HHMMSS/
├── turn_1/                    # First coordination session
│   ├── ANALYSIS_REPORT.md     # Report for turn 1
│   ├── attempt_1/             # First attempt
│   └── attempt_2/             # Retry if orchestration restarted
├── turn_2/                    # Second coordination session (if multi-turn)
│   ├── ANALYSIS_REPORT.md     # Report for turn 2
│   └── attempt_1/

When analyzing, the

--turn

flag specifies which turn to analyze. Without it, the latest turn is analyzed.

MassGen日志目录支持多轮次（协调会话）。每个轮次都有独立的

turn_N/

目录，内部包含多次尝试：

text

log_YYYYMMDD_HHMMSS/
├── turn_1/                    # 第一次协调会话
│   ├── ANALYSIS_REPORT.md     # 第一轮次的分析报告
│   ├── attempt_1/             # 第一次尝试
│   └── attempt_2/             # 若编排重启则重试
├── turn_2/                    # 第二轮协调会话（多轮场景下）
│   ├── ANALYSIS_REPORT.md     # 第二轮次的分析报告
│   └── attempt_1/

分析时，

--turn

参数指定要分析的轮次。若未指定，则分析最新轮次。

When to Use Logfire vs Local Logs

何时使用Logfire vs 本地日志

Use Local Log Files When:

Analyzing command patterns and repetition (commands are in
```
streaming_debug.log
```
)
Checking detailed tool arguments and outputs (in
```
coordination_events.json
```
)
Reading vote reasoning and agent decisions (in
```
agent_*/*/vote.json
```
)
Viewing the coordination flow table (in
```
coordination_table.txt
```
)
Getting cost/token summaries (in
```
metrics_summary.json
```
)

Use Logfire When:

You need precise timing data with millisecond accuracy
Analyzing span hierarchy and parent-child relationships
Finding exceptions and error stack traces
Creating shareable trace links for collaboration
Querying across multiple sessions (e.g., "find all sessions with errors")
Real-time monitoring of running experiments

Rate Limiting: If Logfire returns a rate limit error, wait up to 60 seconds and retry rather than falling back to local logs. The rate limit resets quickly and Logfire data is worth waiting for when timing/hierarchy analysis is needed.

Key Local Log Files:

File	Contains
`status.json`	Real-time status with agent reliability metrics (enforcement events, buffer loss)
`metrics_summary.json`	Cost, tokens, tool stats, round history
`coordination_events.json`	Full event timeline with tool calls
`coordination_table.txt`	Human-readable coordination flow
`streaming_debug.log`	Raw streaming data including command strings
`agent_//vote.json`	Vote reasoning and context
`agent_//execution_trace.md`	Full tool calls, arguments, results, and reasoning - invaluable for debugging
`execution_metadata.yaml`	Config and session metadata

Execution Traces (
execution_trace.md
): These are the most detailed debug artifacts. Each agent snapshot includes an execution trace with:

Complete tool calls with full arguments (not truncated)
Full tool results (not truncated)
Reasoning/thinking blocks from the model
Timestamps and round markers

Use execution traces when you need to understand exactly what an agent did and why - they capture everything the agent saw and produced during that answer/vote iteration.

Enforcement Reliability (
status.json
): The

status.json

file includes per-agent reliability metrics that track workflow enforcement events:

json

{
  "agents": {
    "agent_a": {
      "reliability": {
        "enforcement_attempts": [
          {
            "round": 0,
            "attempt": 1,
            "max_attempts": 3,
            "reason": "no_workflow_tool",
            "tool_calls": ["search", "read_file"],
            "error_message": "Must use workflow tools",
            "buffer_preview": "First 500 chars of lost content...",
            "buffer_chars": 1500,
            "timestamp": 1736683468.123
          }
        ],
        "by_round": {"0": {"count": 2, "reasons": ["no_workflow_tool", "invalid_vote_id"]}},
        "unknown_tools": ["execute_command"],
        "workflow_errors": ["invalid_vote_id"],
        "total_enforcement_retries": 2,
        "total_buffer_chars_lost": 3000,
        "outcome": "ok"
      }
    }
  }
}

Enforcement Reason Codes:

Reason	Description
`no_workflow_tool`	Agent called tools but none were `vote` or `new_answer`
`no_tool_calls`	Agent provided text-only response, no tools called
`invalid_vote_id`	Agent voted for non-existent agent ID
`vote_no_answers`	Agent tried to vote when no answers exist
`vote_and_answer`	Agent used both `vote` and `new_answer` in same response
`answer_limit`	Agent hit max answer count limit
`answer_novelty`	Answer too similar to existing answers
`answer_duplicate`	Exact duplicate of existing answer
`api_error`	API/streaming error (e.g., "peer closed connection")
`connection_recovery`	API stream ended early, recovered with preserved context
`mcp_disconnected`	MCP server disconnected mid-session (e.g., "Server 'X' not connected")

This data is invaluable for understanding why agents needed retries and how much content was lost due to enforcement restarts.

使用本地日志文件的场景：

分析命令模式与重复执行情况（命令记录在
```
streaming_debug.log
```
）
查看详细的工具参数与输出（在
```
coordination_events.json
```
中）
读取投票推理过程与Agent决策（在
```
agent_*/*/vote.json
```
中）
查看协调流程表（在
```
coordination_table.txt
```
中）
获取成本/Token汇总（在
```
metrics_summary.json
```
中）

使用Logfire的场景：

需要毫秒级精度的精准时序数据
分析Span层级与父子关系
查找异常与错误堆栈跟踪
创建可共享的追踪链接用于协作
跨多个会话查询（例如："查找所有包含错误的会话"）
实时监控运行中的实验

速率限制： 如果Logfire返回速率限制错误，请等待60秒后重试，不要直接切换到本地日志。速率限制会快速重置，当需要时序/层级分析时，Logfire的数据值得等待。

关键本地日志文件：

文件	包含内容
`status.json`	实时状态，包含Agent可靠性指标（强制执行事件、缓冲区丢失）
`metrics_summary.json`	成本、Token、工具统计、轮次历史
`coordination_events.json`	完整事件时间线，包含工具调用记录
`coordination_table.txt`	人类可读的协调流程
`streaming_debug.log`	原始流数据，包含命令字符串
`agent_//vote.json`	投票推理过程与上下文
`agent_//execution_trace.md`	完整的工具调用、参数、结果与推理过程 - 调试必备
`execution_metadata.yaml`	配置与会话元数据

执行追踪（
execution_trace.md
）：这是最详细的调试工件。每个Agent快照都包含执行追踪，其中包括：

完整的工具调用及参数（无截断）
完整的工具结果（无截断）
模型的推理/思考块
时间戳与轮次标记

当您需要准确了解Agent做了什么以及为什么这么做时，请使用执行追踪 - 它们捕获了Agent在该回答/投票迭代中看到和生成的所有内容。

强制执行可靠性（
status.json
）：

status.json

文件包含每个Agent的可靠性指标，用于跟踪工作流强制执行事件：

json

{
  "agents": {
    "agent_a": {
      "reliability": {
        "enforcement_attempts": [
          {
            "round": 0,
            "attempt": 1,
            "max_attempts": 3,
            "reason": "no_workflow_tool",
            "tool_calls": ["search", "read_file"],
            "error_message": "必须使用工作流工具",
            "buffer_preview": "丢失内容的前500个字符...",
            "buffer_chars": 1500,
            "timestamp": 1736683468.123
          }
        ],
        "by_round": {"0": {"count": 2, "reasons": ["no_workflow_tool", "invalid_vote_id"]}},
        "unknown_tools": ["execute_command"],
        "workflow_errors": ["invalid_vote_id"],
        "total_enforcement_retries": 2,
        "total_buffer_chars_lost": 3000,
        "outcome": "ok"
      }
    }
  }
}

强制执行原因代码：

原因	描述
`no_workflow_tool`	Agent调用了工具，但未使用 `vote` 或 `new_answer`
`no_tool_calls`	Agent仅提供了文本响应，未调用任何工具
`invalid_vote_id`	Agent为不存在的Agent ID投票
`vote_no_answers`	Agent在无可用答案时尝试投票
`vote_and_answer`	Agent在同一响应中同时使用了 `vote` 和 `new_answer`
`answer_limit`	Agent达到了最大答案数量限制
`answer_novelty`	答案与现有答案过于相似
`answer_duplicate`	与现有答案完全重复
`api_error`	API/流错误（例如："peer closed connection"）
`connection_recovery`	API流提前结束，已通过保留上下文恢复
`mcp_disconnected`	MCP服务器在会话中途断开连接（例如："Server 'X' not connected"）

这些数据对于理解Agent为何需要重试以及由于强制执行重启导致多少内容丢失至关重要。

Logfire Setup

Logfire 设置

Before using this skill, you need to set up Logfire for observability.

使用本技能前，您需要设置Logfire以实现可观测性。

Step 1: Install MassGen with Observability Support

步骤1：安装带可观测性支持的MassGen

bash

pip install "massgen[observability]"

bash

pip install "massgen[observability]"

Or with uv

或使用uv

uv pip install "massgen[observability]"

undefined

uv pip install "massgen[observability]"

undefined

Step 2: Create a Logfire Account

步骤2：创建Logfire账户

Go to https://logfire.pydantic.dev/ and create a free account.

访问 https://logfire.pydantic.dev/ 并创建免费账户。

Step 3: Authenticate with Logfire

步骤3：Logfire身份验证

bash

undefined

bash

undefined

This creates ~/.logfire/credentials.json

此命令会创建 ~/.logfire/credentials.json

uv run logfire auth

Or set the token directly as an environment variable

或直接将令牌设置为环境变量

export LOGFIRE_TOKEN=your_token_here

undefined

export LOGFIRE_TOKEN=your_token_here

undefined

Step 4: Get Your Read Token for the MCP Server

步骤4：获取MCP服务器的读取令牌

Go to https://logfire.pydantic.dev/ and log in
Navigate to your project settings
Create a Read Token (this is different from the write token used for authentication)
Copy the token for use in Step 5

访问 https://logfire.pydantic.dev/ 并登录
导航到项目设置
创建一个读取令牌（这与用于身份验证的写入令牌不同）
复制令牌用于步骤5

Step 5: Add the Logfire MCP Server

步骤5：添加Logfire MCP服务器

bash

claude mcp add logfire -e LOGFIRE_READ_TOKEN="your-read-token-here" -- uvx logfire-mcp@latest

Then restart Claude Code and re-invoke this skill.

bash

claude mcp add logfire -e LOGFIRE_READ_TOKEN="your-read-token-here" -- uvx logfire-mcp@latest

然后重启Claude Code并重新调用本技能。

Prerequisites

前置条件

Logfire MCP Server (Optional but Recommended): The Logfire MCP server provides enhanced analysis with precise timing data and cross-session queries. If

LOGFIRE_READ_TOKEN

is not set, self-analysis mode will automatically disable the Logfire MCP and fall back to local log files only.

When configured, the MCP server provides these tools:

```
mcp__logfire__arbitrary_query
```
- Run SQL queries against logfire data
```
mcp__logfire__schema_reference
```
- Get the database schema
```
mcp__logfire__find_exceptions_in_file
```
- Find exceptions in a file
```
mcp__logfire__logfire_link
```
- Create links to traces in the UI

Required Flags:

```
--automation
```
- Clean output for programmatic parsing -- see
```
massgen-develops-massgen
```
skill for more info on this flag
```
--logfire
```
- Enable Logfire tracing (optional, but required to populate Logfire data)

Logfire MCP服务器（可选但推荐）： Logfire MCP服务器提供增强的分析功能，包括精准时序数据和跨会话查询。如果未设置

LOGFIRE_READ_TOKEN

，自分析模式会自动禁用Logfire MCP，仅回退到本地日志文件。

配置完成后，MCP服务器提供以下工具：

```
mcp__logfire__arbitrary_query
```
- 对Logfire数据运行SQL查询
```
mcp__logfire__schema_reference
```
- 获取数据库 schema
```
mcp__logfire__find_exceptions_in_file
```
- 在文件中查找异常
```
mcp__logfire__logfire_link
```
- 在UI中创建追踪链接

必填参数：

```
--automation
```
- 用于程序解析的简洁输出 - 有关此参数的更多信息，请查看
```
massgen-develops-massgen
```
技能
```
--logfire
```
- 启用Logfire追踪（可选，但需要此参数来填充Logfire数据）

Part 1: Running MassGen Experiments

第一部分：运行MassGen实验

Basic Command Format

基本命令格式

bash

uv run massgen --automation --logfire --config [config_file] "[question]"

bash

uv run massgen --automation --logfire --config [config_file] "[question]"

Running in Background (Recommended)

后台运行（推荐）

Use

run_in_background: true

(or however you run tasks in the background) to run experiments asynchronously so you can monitor progress and end early if needed.

Expected Output (first lines):

LOG_DIR: .massgen/massgen_logs/log_YYYYMMDD_HHMMSS_ffffff
STATUS: .massgen/massgen_logs/log_YYYYMMDD_HHMMSS_ffffff/turn_1/attempt_1/status.json
QUESTION: Your task here
[Coordination in progress - monitor status.json for real-time updates]

Parse the LOG_DIR - you'll need this for file-based analysis!

使用

run_in_background: true

（或您常用的后台任务运行方式）异步运行实验，以便您可以监控进度并在需要时提前结束。

预期输出（前几行）：

LOG_DIR: .massgen/massgen_logs/log_YYYYMMDD_HHMMSS_ffffff
STATUS: .massgen/massgen_logs/log_YYYYMMDD_HHMMSS_ffffff/turn_1/attempt_1/status.json
QUESTION: 您的任务内容
[协调进行中 - 监控status.json获取实时更新]

记录LOG_DIR - 基于文件的分析需要用到此路径！

Monitoring Progress

监控进度

status.json

updates every 2 seconds; use that to track progress.

bash

cat [log_dir]/turn_1/attempt_1/status.json

Key fields to monitor:

```
coordination.completion_percentage
```
(0-100)
```
coordination.phase
```
- "initial_answer", "enforcement", "presentation"
```
results.winner
```
- null while running, agent_id when complete
```
agents[].status
```
- "waiting", "streaming", "answered", "voted", "error"
```
agents[].error
```
- null if ok, error details if failed

status.json

每2秒更新一次；可通过该文件跟踪进度。

bash

cat [log_dir]/turn_1/attempt_1/status.json

需监控的关键字段：

```
coordination.completion_percentage
```
（0-100）
```
coordination.phase
```
- "initial_answer"（初始回答）、"enforcement"（强制执行）、"presentation"（结果展示）
```
results.winner
```
- 运行中为null，完成后为agent_id
```
agents[].status
```
- "waiting"（等待）、"streaming"（流处理）、"answered"（已回答）、"voted"（已投票）、"error"（错误）
```
agents[].error
```
- 正常时为null，失败时显示错误详情

Reading Final Results

查看最终结果

After completion (exit code 0):

bash

undefined

完成后（退出码为0）：

bash

undefined

Read the final answer

读取最终答案

cat [log_dir]/turn_1/attempt_1/final/[winner]/answer.txt


**Other useful files:**
- `execution_metadata.yaml` - Full config and execution details
- `coordination_events.json` - Complete event log
- `coordination_table.txt` - Human-readable coordination summary

cat [log_dir]/turn_1/attempt_1/final/[winner]/answer.txt


**其他有用的文件：**
- `execution_metadata.yaml` - 完整配置与执行详情
- `coordination_events.json` - 完整事件日志
- `coordination_table.txt` - 人类可读的协调摘要

Part 2: Querying Logfire

第二部分：查询Logfire

Database Schema

数据库Schema

The main table is

records

with these key columns:

Column	Description
`span_name`	Name of the span (e.g., "agent.agent_a.round_0")
`span_id`	Unique identifier for this span
`parent_span_id`	ID of the parent span (null for root)
`trace_id`	Groups all spans in a single trace
`duration`	Time in seconds
`start_timestamp`	When the span started
`end_timestamp`	When the span ended
`attributes`	JSON blob with custom attributes
`message`	Log message
`is_exception`	Boolean for errors
`exception_type`	Type of exception if any
`exception_message`	Exception message

主表为

records

，包含以下关键列：

列	描述
`span_name`	Span名称（例如："agent.agent_a.round_0"）
`span_id`	此Span的唯一标识符
`parent_span_id`	父Span的ID（根Span为null）
`trace_id`	将单个追踪中的所有Span分组
`duration`	持续时间（秒）
`start_timestamp`	Span开始时间
`end_timestamp`	Span结束时间
`attributes`	包含自定义属性的JSON blob
`message`	日志消息
`is_exception`	错误标记（布尔值）
`exception_type`	异常类型（如有）
`exception_message`	异常消息（如有）

MassGen Span Hierarchy

MassGen Span层级

MassGen creates hierarchical spans:

coordination.session (root)
├── Coordination event: coordination_started
├── agent.agent_a.round_0
│   ├── llm.openrouter.stream
│   ├── mcp.filesystem.write_file
│   └── Tool execution: mcp__filesystem__write_file
├── agent.agent_b.round_0
├── Agent answer: agent1.1
├── agent.agent_a.round_1 (voting round)
├── Agent vote: agent_a -> agent1.1
├── Coordination event: winner_selected
└── agent.agent_a.presentation
    ├── Winner selected: agent1.1
    ├── llm.openrouter.stream
    └── Final answer from agent_a

MassGen创建层级化的Span：

coordination.session (根)
├── Coordination event: coordination_started
├── agent.agent_a.round_0
│   ├── llm.openrouter.stream
│   ├── mcp.filesystem.write_file
│   └── Tool execution: mcp__filesystem__write_file
├── agent.agent_b.round_0
├── Agent answer: agent1.1
├── agent.agent_a.round_1 (投票轮次)
├── Agent vote: agent_a -> agent1.1
├── Coordination event: winner_selected
└── agent.agent_a.presentation
    ├── Winner selected: agent1.1
    ├── llm.openrouter.stream
    └── Final answer from agent_a

Custom Attributes

自定义属性

MassGen spans include these custom attributes (access via

attributes->'key'

Attribute	Description
`massgen.agent_id`	Agent identifier (agent_a, agent_b)
`massgen.iteration`	Current iteration number
`massgen.round`	Round number for this agent
`massgen.round_type`	"initial_answer", "voting", or "presentation"
`massgen.backend`	Backend provider name
`massgen.num_context_answers`	Number of answers in context
`massgen.is_winner`	True for presentation spans
`massgen.outcome`	"vote", "answer", or "error" (set after round completes)
`massgen.voted_for`	Agent ID voted for (only set for votes)
`massgen.voted_for_label`	Answer label voted for (e.g., "agent1.1", only set for votes)
`massgen.answer_label`	Answer label assigned (e.g., "agent1.1", only set for answers)
`massgen.error_message`	Error message (only set when outcome is "error")
`massgen.usage.input`	Input token count
`massgen.usage.output`	Output token count
`massgen.usage.reasoning`	Reasoning token count
`massgen.usage.cached_input`	Cached input token count
`massgen.usage.cost`	Estimated cost in USD

MassGen Span包含以下自定义属性（通过

attributes->'key'

访问）：

属性	描述
`massgen.agent_id`	Agent标识符（agent_a、agent_b等）
`massgen.iteration`	当前迭代次数
`massgen.round`	此Agent的轮次编号
`massgen.round_type`	"initial_answer"（初始回答）、"voting"（投票）或"presentation"（结果展示）
`massgen.backend`	后端提供商名称
`massgen.num_context_answers`	上下文中的答案数量
`massgen.is_winner`	结果展示Span为True
`massgen.outcome`	"vote"（投票）、"answer"（回答）或"error"（错误）（轮次完成后设置）
`massgen.voted_for`	投票指向的Agent ID（仅投票场景设置）
`massgen.voted_for_label`	投票指向的答案标签（例如："agent1.1"，仅投票场景设置）
`massgen.answer_label`	分配的答案标签（例如："agent1.1"，仅回答场景设置）
`massgen.error_message`	错误消息（仅当outcome为"error"时设置）
`massgen.usage.input`	输入Token数量
`massgen.usage.output`	输出Token数量
`massgen.usage.reasoning`	推理Token数量
`massgen.usage.cached_input`	缓存输入Token数量
`massgen.usage.cost`	估算成本（美元）

Part 3: Common Analysis Queries

第三部分：常见分析查询

1. View Trace Hierarchy

1. 查看追踪层级

sql

SELECT span_name, span_id, parent_span_id, duration, start_timestamp
FROM records
WHERE trace_id = '[YOUR_TRACE_ID]'
ORDER BY start_timestamp
LIMIT 50

sql

SELECT span_name, span_id, parent_span_id, duration, start_timestamp
FROM records
WHERE trace_id = '[YOUR_TRACE_ID]'
ORDER BY start_timestamp
LIMIT 50

2. Find Recent Sessions

2. 查找近期会话

sql

SELECT span_name, trace_id, duration, start_timestamp
FROM records
WHERE span_name = 'coordination.session'
ORDER BY start_timestamp DESC
LIMIT 10

sql

SELECT span_name, trace_id, duration, start_timestamp
FROM records
WHERE span_name = 'coordination.session'
ORDER BY start_timestamp DESC
LIMIT 10

3. Agent Round Performance

3. Agent轮次性能

sql

SELECT
  span_name,
  duration,
  attributes->>'massgen.agent_id' as agent_id,
  attributes->>'massgen.round' as round,
  attributes->>'massgen.round_type' as round_type
FROM records
WHERE span_name LIKE 'agent.%'
ORDER BY start_timestamp DESC
LIMIT 20

sql

SELECT
  span_name,
  duration,
  attributes->>'massgen.agent_id' as agent_id,
  attributes->>'massgen.round' as round,
  attributes->>'massgen.round_type' as round_type
FROM records
WHERE span_name LIKE 'agent.%'
ORDER BY start_timestamp DESC
LIMIT 20

4. Tool Call Analysis

4. 工具调用分析

sql

SELECT
  span_name,
  duration,
  parent_span_id,
  start_timestamp
FROM records
WHERE span_name LIKE 'mcp.%' OR span_name LIKE 'Tool execution:%'
ORDER BY start_timestamp DESC
LIMIT 30

sql

SELECT
  span_name,
  duration,
  parent_span_id,
  start_timestamp
FROM records
WHERE span_name LIKE 'mcp.%' OR span_name LIKE 'Tool execution:%'
ORDER BY start_timestamp DESC
LIMIT 30

5. Find Errors

5. 查找错误

sql

SELECT
  span_name,
  exception_type,
  exception_message,
  trace_id,
  start_timestamp
FROM records
WHERE is_exception = true
ORDER BY start_timestamp DESC
LIMIT 20

sql

SELECT
  span_name,
  exception_type,
  exception_message,
  trace_id,
  start_timestamp
FROM records
WHERE is_exception = true
ORDER BY start_timestamp DESC
LIMIT 20

6. LLM Call Performance

6. LLM调用性能

sql

SELECT
  span_name,
  duration,
  attributes->>'gen_ai.request.model' as model,
  start_timestamp
FROM records
WHERE span_name LIKE 'llm.%'
ORDER BY start_timestamp DESC
LIMIT 30

sql

SELECT
  span_name,
  duration,
  attributes->>'gen_ai.request.model' as model,
  start_timestamp
FROM records
WHERE span_name LIKE 'llm.%'
ORDER BY start_timestamp DESC
LIMIT 30

7. Full Trace with Hierarchy (Nested View)

7. 带层级的完整追踪（嵌套视图）

sql

SELECT
  CASE
    WHEN parent_span_id IS NULL THEN span_name
    ELSE '  └─ ' || span_name
  END as hierarchy,
  duration,
  span_id,
  parent_span_id
FROM records
WHERE trace_id = '[YOUR_TRACE_ID]'
ORDER BY start_timestamp

sql

SELECT
  CASE
    WHEN parent_span_id IS NULL THEN span_name
    ELSE '  └─ ' || span_name
  END as hierarchy,
  duration,
  span_id,
  parent_span_id
FROM records
WHERE trace_id = '[YOUR_TRACE_ID]'
ORDER BY start_timestamp

8. Coordination Events Timeline

8. 协调事件时间线

sql

SELECT span_name, message, duration, start_timestamp
FROM records
WHERE span_name LIKE 'Coordination event:%'
   OR span_name LIKE 'Agent answer:%'
   OR span_name LIKE 'Agent vote:%'
   OR span_name LIKE 'Winner selected:%'
ORDER BY start_timestamp DESC
LIMIT 30

sql

SELECT span_name, message, duration, start_timestamp
FROM records
WHERE span_name LIKE 'Coordination event:%'
   OR span_name LIKE 'Agent answer:%'
   OR span_name LIKE 'Agent vote:%'
   OR span_name LIKE 'Winner selected:%'
ORDER BY start_timestamp DESC
LIMIT 30

Part 4: Analysis Workflow

第四部分：分析工作流

Step 1: Run Experiment

步骤1：运行实验

bash

uv run massgen --automation --logfire --config [config] "[prompt]" 2>&1

bash

uv run massgen --automation --logfire --config [config] "[prompt]" 2>&1

Step 2: Find the Trace

步骤2：查找追踪ID

Query for recent sessions:

sql

SELECT trace_id, duration, start_timestamp
FROM records
WHERE span_name = 'coordination.session'
ORDER BY start_timestamp DESC
LIMIT 5

查询近期会话：

sql

SELECT trace_id, duration, start_timestamp
FROM records
WHERE span_name = 'coordination.session'
ORDER BY start_timestamp DESC
LIMIT 5

Step 3: Analyze Hierarchy

步骤3：分析层级

Get full trace structure:

sql

SELECT span_name, span_id, parent_span_id, duration
FROM records
WHERE trace_id = '[trace_id_from_step_2]'
ORDER BY start_timestamp

获取完整追踪结构：

sql

SELECT span_name, span_id, parent_span_id, duration
FROM records
WHERE trace_id = '[trace_id_from_step_2]'
ORDER BY start_timestamp

Step 4: Investigate Specific Issues

步骤4：调查特定问题

Slow tool calls:

sql

SELECT span_name, duration, parent_span_id
FROM records
WHERE trace_id = '[trace_id]' AND span_name LIKE 'mcp.%'
ORDER BY duration DESC

Agent comparison:

sql

SELECT
  attributes->>'massgen.agent_id' as agent,
  COUNT(*) as rounds,
  SUM(duration) as total_time,
  AVG(duration) as avg_round_time
FROM records
WHERE trace_id = '[trace_id]' AND span_name LIKE 'agent.%'
GROUP BY attributes->>'massgen.agent_id'

缓慢的工具调用：

sql

SELECT span_name, duration, parent_span_id
FROM records
WHERE trace_id = '[trace_id]' AND span_name LIKE 'mcp.%'
ORDER BY duration DESC

Agent对比：

sql

SELECT
  attributes->>'massgen.agent_id' as agent,
  COUNT(*) as rounds,
  SUM(duration) as total_time,
  AVG(duration) as avg_round_time
FROM records
WHERE trace_id = '[trace_id]' AND span_name LIKE 'agent.%'
GROUP BY attributes->>'massgen.agent_id'

Step 5: Create Trace Link

步骤5：创建追踪链接

Use the MCP tool to create a viewable link:

mcp__logfire__logfire_link(trace_id="[your_trace_id]")

使用MCP工具创建可查看的链接：

mcp__logfire__logfire_link(trace_id="[your_trace_id]")

Part 5: Improving the Logging Structure

第五部分：优化日志结构

Current Span Types

当前Span类型

Span Pattern	Source	Description
`coordination.session`	coordination_tracker.py	Root session span
`agent.{id}.round_{n}`	orchestrator.py	Agent execution round
`agent.{id}.presentation`	orchestrator.py	Winner's final presentation
`mcp.{server}.{tool}`	mcp_tools/client.py	MCP tool execution
`llm.{provider}.stream`	backends	LLM streaming call
`Tool execution: {name}`	base_with_custom_tool.py	Tool wrapper
`Coordination event: *`	coordination_tracker.py	Coordination events
`Agent answer: {label}`	coordination_tracker.py	Answer submission
`Agent vote: {from} -> {to}`	coordination_tracker.py	Vote cast

Span模式	来源	描述
`coordination.session`	coordination_tracker.py	根会话Span
`agent.{id}.round_{n}`	orchestrator.py	Agent执行轮次
`agent.{id}.presentation`	orchestrator.py	获胜者的最终结果展示
`mcp.{server}.{tool}`	mcp_tools/client.py	MCP工具执行
`llm.{provider}.stream`	backends	LLM流调用
`Tool execution: {name}`	base_with_custom_tool.py	工具包装器
`Coordination event: *`	coordination_tracker.py	协调事件
`Agent answer: {label}`	coordination_tracker.py	答案提交
`Agent vote: {from} -> {to}`	coordination_tracker.py	投票记录

Adding New Spans

添加新Span

Use the tracer from structured_logging:

python

from massgen.structured_logging import get_tracer

tracer = get_tracer()
with tracer.span("my_operation", attributes={
    "massgen.custom_key": "value",
}):
    do_work()

使用structured_logging中的tracer：

python

from massgen.structured_logging import get_tracer

tracer = get_tracer()
with tracer.span("my_operation", attributes={
    "massgen.custom_key": "value",
}):
    do_work()

Context Propagation Notes

上下文传播说明

Known limitation: When multiple agents run concurrently via

asyncio.create_task

, child spans may not nest correctly under agent round spans. This is an OpenTelemetry context propagation issue with concurrent async code. The presentation phase works correctly because only one agent runs.

Workaround: For accurate nesting in concurrent scenarios, explicit context passing with

contextvars.copy_context()

would be needed.

已知限制： 当通过

asyncio.create_task

并发运行多个Agent时，子Span可能无法正确嵌套在Agent轮次Span下。这是并发异步代码中的OpenTelemetry上下文传播问题。结果展示阶段可正常工作，因为该阶段仅运行一个Agent。

解决方法： 若要在并发场景中实现准确的嵌套，需要使用

contextvars.copy_context()

进行显式上下文传递。

Logfire Documentation Reference

Logfire 文档参考

Main Documentation: https://logfire.pydantic.dev/docs/

主文档： https://logfire.pydantic.dev/docs/

Key Pages to Know

需了解的关键页面

Topic	URL	Description
Getting Started	`/docs/`	Overview, setup, and core concepts
Manual Tracing	`/docs/guides/onboarding-checklist/add-manual-tracing/`	Creating spans, adding attributes
SQL Explorer	`/docs/guides/web-ui/explore/`	Writing SQL queries in the UI
Live View	`/docs/guides/web-ui/live/`	Real-time trace monitoring
Query API	`/docs/how-to-guides/query-api/`	Programmatic access to data
OpenAI Integration	`/docs/integrations/llms/openai/`	LLM call instrumentation

主题	URL	描述
快速入门	`/docs/`	概述、设置与核心概念
手动追踪	`/docs/guides/onboarding-checklist/add-manual-tracing/`	创建Span、添加属性
SQL Explorer	`/docs/guides/web-ui/explore/`	在UI中编写SQL查询
实时视图	`/docs/guides/web-ui/live/`	实时追踪监控
查询API	`/docs/how-to-guides/query-api/`	程序化访问数据
OpenAI集成	`/docs/integrations/llms/openai/`	LLM调用埋点

Logfire Concepts

Logfire 概念

Spans vs Logs:

Spans represent operations with measurable duration (use
```
with logfire.span():
```
)
Logs capture point-in-time events (use
```
logfire.info()
```
,
```
logfire.error()
```
, etc.)
Spans and logs inside a span block become children of that span

Span Names vs Messages:

```
span_name
```
= the first argument (used for filtering, keep low-cardinality)
```
message
```
= formatted result with attribute values interpolated
Example:
```
logfire.info('Hello {name}', name='Alice')
```
→ span_name="Hello {name}", message="Hello Alice"

Attributes:

Keyword arguments become structured JSON attributes
Access in SQL via
```
attributes->>'key'
```
or
```
attributes->'key'
```
Cast when needed:
```
(attributes->'cost')::float
```

Span vs 日志：

Span 表示具有可测量持续时间的操作（使用
```
with logfire.span():
```
）
日志捕获时间点事件（使用
```
logfire.info()
```
、
```
logfire.error()
```
等）
Span块内的Span和日志会成为该Span的子项

Span名称 vs 消息：

```
span_name
```
= 第一个参数（用于过滤，需保持低基数）
```
message
```
= 插入属性值后的格式化结果
示例：
```
logfire.info('Hello {name}', name='Alice')
```
→ span_name="Hello {name}", message="Hello Alice"

属性：

关键字参数会成为结构化JSON属性
在SQL中通过
```
attributes->>'key'
```
或
```
attributes->'key'
```
访问
需要时进行类型转换：
```
(attributes->'cost')::float
```

Live View Features

实时视图功能

The Logfire Live View UI (https://logfire.pydantic.dev/) provides:

Real-time streaming of traces as they arrive
SQL search pane (press
```
/
```
to open) with auto-complete
Natural language to SQL - describe what you want and get a query
Timeline histogram showing span counts over time
Trace details panel with attributes, exceptions, and OpenTelemetry data
Cross-linking between SQL results and trace view via trace_id/span_id

Logfire实时视图UI（https://logfire.pydantic.dev/）提供：

实时流传输 追踪数据
SQL搜索面板（按
```
/
```
打开），带自动补全
自然语言转SQL - 描述需求即可生成查询语句
时间线直方图 显示Span数量随时间的变化
追踪详情面板 包含属性、异常与OpenTelemetry数据
交叉链接 通过trace_id/span_id在SQL结果与追踪视图间跳转

SQL Explorer Tips

SQL Explorer 技巧

The Explore page uses Apache DataFusion SQL syntax (similar to Postgres):

sql

-- Subqueries and CTEs work
WITH recent AS (
  SELECT * FROM records
  WHERE start_timestamp > now() - interval '1 hour'
)
SELECT * FROM recent WHERE is_exception;

-- Access nested JSON
SELECT attributes->>'massgen.agent_id' as agent FROM records;

-- Cast JSON values
SELECT (attributes->'token_count')::int as tokens FROM records;

-- Time filtering is efficient
WHERE start_timestamp > now() - interval '30 minutes'

Explore页面使用Apache DataFusion SQL语法（与Postgres类似）：

sql

-- 支持子查询和CTE
WITH recent AS (
  SELECT * FROM records
  WHERE start_timestamp > now() - interval '1 hour'
)
SELECT * FROM recent WHERE is_exception;

-- 访问嵌套JSON
SELECT attributes->>'massgen.agent_id' as agent FROM records;

-- JSON值类型转换
SELECT (attributes->'token_count')::int as tokens FROM records;

-- 时间过滤效率高
WHERE start_timestamp > now() - interval '30 minutes'

LLM Instrumentation

LLM 埋点

Logfire auto-instruments OpenAI calls when configured:

Captures conversation display, token usage, response metadata
Creates separate spans for streaming requests vs responses
Works with both sync and async clients

MassGen's backends use this for

llm.{provider}.stream

spans.

配置完成后，Logfire会自动埋点OpenAI调用：

捕获对话展示、Token使用、响应元数据
为流请求与响应创建独立Span
支持同步与异步客户端

MassGen的后端使用此功能生成

llm.{provider}.stream

Span。

Reference Documentation

参考文档

Logfire:

Main docs: https://logfire.pydantic.dev/docs/
Live View: https://logfire.pydantic.dev/docs/guides/web-ui/live/
SQL Explorer: https://logfire.pydantic.dev/docs/guides/web-ui/explore/
Query API: https://logfire.pydantic.dev/docs/how-to-guides/query-api/
Manual tracing: https://logfire.pydantic.dev/docs/guides/onboarding-checklist/add-manual-tracing/
OpenAI integration: https://logfire.pydantic.dev/docs/integrations/llms/openai/
Schema reference: Use
```
mcp__logfire__schema_reference
```
tool

MassGen:

Automation mode:
```
AI_USAGE.md
```
Status file reference:
```
docs/source/reference/status_file.rst
```
Structured logging:
```
massgen/structured_logging.py
```

Logfire：

主文档：https://logfire.pydantic.dev/docs/
实时视图：https://logfire.pydantic.dev/docs/guides/web-ui/live/
SQL Explorer：https://logfire.pydantic.dev/docs/guides/web-ui/explore/
查询API：https://logfire.pydantic.dev/docs/how-to-guides/query-api/
手动追踪：https://logfire.pydantic.dev/docs/guides/onboarding-checklist/add-manual-tracing/
OpenAI集成：https://logfire.pydantic.dev/docs/integrations/llms/openai/
Schema参考：使用
```
mcp__logfire__schema_reference
```
工具

MassGen：

自动化模式：
```
AI_USAGE.md
```
状态文件参考：
```
docs/source/reference/status_file.rst
```
结构化日志：
```
massgen/structured_logging.py
```

Tips for Effective Analysis

有效分析技巧

Always use both flags:
```
--automation --logfire
```
together
Run in background for long tasks to monitor progress
Query by trace_id to isolate specific sessions
Check parent_span_id to understand hierarchy
Use duration to identify bottlenecks
Look at attributes for MassGen-specific context
Create trace links to share findings with team

Note that you may get an error like so:

bash

Error: Error executing tool arbitrary_query: b'{"detail":"Rate limit exceeded for organization xxx: per minute
     limit reached."}'

In this case, please sleep (for up to a minute) and try again.

始终同时使用两个参数：
```
--automation --logfire
```
后台运行 长任务以便监控进度
按trace_id查询 以隔离特定会话
检查parent_span_id 以理解层级关系
使用duration 识别瓶颈
查看attributes 获取MassGen特定上下文
创建追踪链接 与团队共享发现

注意您可能会遇到如下错误：

bash

Error: Error executing tool arbitrary_query: b'{"detail":"Rate limit exceeded for organization xxx: per minute
     limit reached."}'

遇到此情况时，请等待（最多一分钟）后重试。

Part 6: Comprehensive Log Analysis Report

第六部分：全面日志分析报告

When asked to analyze a MassGen log run, generate a markdown report saved to

[log_dir]/turn_N/ANALYSIS_REPORT.md

where N is the turn being analyzed. Each turn (coordination session) gets its own analysis report as a sibling to the attempt directories. The report must cover the Standard Analysis Questions below.

当需要分析MassGen日志运行结果时，请生成Markdown报告并保存到

[log_dir]/turn_N/ANALYSIS_REPORT.md

，其中N为待分析的轮次编号。每个轮次（协调会话）都有独立的分析报告，作为尝试目录的同级文件。报告必须涵盖以下标准分析问题。

Important: Ground Truth and Correctness

重要提示：基准事实与正确性

CRITICAL: Do not assume any agent's answer is "correct" unless the user explicitly provides ground truth.

Report what each agent claimed/produced without asserting correctness
Note when agents agree or disagree, but don't claim agreement = correctness
If agents produce different answers, present both neutrally
Only mark answers as correct/incorrect if user provides the actual answer
Phrases to avoid: "correctly identified", "got the right answer", "solved correctly"
Phrases to use: "claimed", "produced", "submitted", "arrived at"

关键注意事项：除非用户明确提供基准事实，否则不要假设任何Agent的答案是"正确的"。

报告每个Agent声称/生成的内容，但不要断言其正确性
注意Agent之间的一致或分歧，但不要声称一致等于正确
若Agent生成不同答案，中立呈现两者
仅当用户提供实际答案时，才标记答案正确/错误
避免使用的表述："正确识别"、"得到正确答案"、"成功解决"
推荐使用的表述："声称"、"生成"、"提交"、"得出"

Standard Analysis Questions

标准分析问题

Every analysis report MUST answer these questions:

每份分析报告必须回答以下问题：

1. Correctness

1. 正确性

Did coordination complete successfully?
Did all agents submit answers?
Did voting occur correctly?
Was a winner selected and did they provide a final answer?

协调是否成功完成？
所有Agent是否都提交了答案？
投票是否正确进行？
是否选出了获胜者并提供了最终答案？

2. Efficiency & Bottlenecks

2. 效率与瓶颈

What was the total duration and breakdown by phase?
What were the slowest operations?
Which tools took the most time?

总时长及各阶段的时间分布？
最慢的操作是什么？
哪些工具耗时最长？

3. Command Pattern Analysis

3. 命令模式分析

Were there frequently repeated commands that could be avoided? (e.g.,
```
openskills read
```
,
```
npm install
```
,
```
ls -R
```
)
What commands produced unnecessarily long output? (e.g., skill docs, directory listings)
What were the slowest
execute_command
patterns? (e.g., web scraping, package installs)

是否存在可避免的频繁重复命令？（例如：
```
openskills read
```
、
```
npm install
```
、
```
ls -R
```
）
哪些命令产生了不必要的长输出？（例如：技能文档、目录列表）
最慢的
execute_command
模式是什么？（例如：网页抓取、包安装）

4. Work Duplication Analysis

4. 工作重复分析

Was expensive work (like image generation) unnecessarily redone?
Did both agents generate similar/identical assets?
Were assets regenerated after restarts instead of being reused?
Could caching or sharing have saved time/cost?

是否不必要地重复了高成本工作（如图像生成）？
两个Agent是否生成了相似/相同的资产？
重启后是否重新生成了资产而非重用？
缓存或共享是否可以节省时间/成本？

5. Agent Behavior & Decision Making

5. Agent行为与决策

How did agents evaluate previous answers? What reasoning did they provide?
How did agents decide between voting vs providing a new answer?
Did agents genuinely build upon each other's work or work in isolation?
Were there timeouts or incomplete rounds?

Agent如何评估之前的答案？ 他们提供了哪些推理依据？
Agent如何决定投票还是提供新答案？
Agent是否真正基于彼此的工作进行构建，还是独立工作？
是否存在超时或未完成的轮次？

6. Cost & Token Analysis

6. 成本与Token分析

Total cost and breakdown by agent
Token usage (input, output, reasoning, cached)
Cache hit rate

总成本及各Agent的成本分布
Token使用情况（输入、输出、推理、缓存）
缓存命中率

7. Errors & Issues

7. 错误与问题

Any exceptions or failures?
Any timeouts?
Any agent errors?

是否存在异常或失败？
是否存在超时？
是否存在Agent错误？

8. Agent Reasoning & Behavior Analysis (CRITICAL)

8. Agent推理与行为分析（关键）

This is the most important section. Analyzing how agents think and act reveals root causes of successes and failures.

Data Sources:

```
agent_outputs/agent_*.txt
```
- Full output including reasoning (if available)
```
agent_*/*/execution_trace.md
```
- Complete tool calls with arguments and results
```
streaming_debug.log
```
- Raw streaming chunks

Note: Some models don't emit explicit reasoning traces. For these, analyze tool call patterns and content instead - the sequence of actions still reveals decision-making.

For EACH agent, analyze:

Strategy - What approach did they take? (from reasoning OR tool sequence)
Tool Responses - How did they handle successes/failures/inconsistencies?
Error Recovery - Did they detect problems? Implement workarounds?
Decision Quality - Logical errors? Over/under-verification? Analysis paralysis?
Cross-Agent Comparison - Which had best reasoning? What patterns led to success?

Key Patterns:

Pattern	Good Sign	Bad Sign
Failure detection	Pivots after 2-3 failures	Repeats broken approach 6+ times
Result validation	Cross-validates outputs	Accepts first result blindly
Inconsistency handling	Investigates conflicts	Ignores contradictions
Workarounds	Creative alternatives when stuck	Gives up or loops
Time management	Commits when confident	Endless verification, no answer

Extract Key Evidence: For each agent, include 2-3 quotes (if reasoning available) OR describe key tool sequences that illustrate their decision quality.

这是最重要的部分。 分析Agent的思考与行为可揭示成功与失败的根本原因。

数据源：

```
agent_outputs/agent_*.txt
```
- 完整输出，包括推理过程（若可用）
```
agent_*/*/execution_trace.md
```
- 完整的工具调用、参数与结果
```
streaming_debug.log
```
- 原始流数据块

注意： 部分模型不会输出显式的追踪推理过程。对于此类模型，请分析工具调用模式与内容 - 操作序列仍可揭示决策过程。

针对每个Agent，分析：

策略 - 他们采用了什么方法？（从推理过程或工具序列中分析）
工具响应 - 他们如何处理成功/失败/不一致？
错误恢复 - 他们是否检测到问题？是否实施了变通方案？
决策质量 - 是否存在逻辑错误？过度/不足验证？分析瘫痪？
Agent间对比 - 哪个Agent的推理最佳？哪些模式导致了成功？

关键模式：

模式	积极信号	消极信号
故障检测	2-3次失败后转向其他方案	重复错误方法6次以上
结果验证	交叉验证输出	盲目接受第一个结果
不一致处理	调查冲突	忽略矛盾
变通方案	遇到困境时采用创造性替代方案	放弃或循环执行
时间管理	有信心时提交结果	无休止验证，无最终答案

提取关键证据： 针对每个Agent，引用2-3条推理过程中的语句（若可用），或描述能体现其决策质量的关键工具序列。

9. Tool Reliability Analysis

9. 工具可靠性分析

Analyze tool behavior patterns beyond simple error listing:

Consistency - Same input, same output? Document variance.
False Positives/Negatives - Tools reporting wrong success/failure status?
Root Cause Hypotheses - For each failure pattern, propose likely causes (path issues, rate limits, model limitations, etc.)

分析工具行为模式，而非简单列出错误：

一致性 - 相同输入是否得到相同输出？记录差异。
误报/漏报 - 工具是否报告错误的成功/失败状态？
根本原因假设 - 针对每个失败模式，提出可能的原因（路径问题、速率限制、模型限制等）

10. Enforcement & Workflow Reliability Analysis

10. 强制执行与工作流可靠性分析

Data Source:

status.json

→

agents[].reliability

Check if agents needed retries due to workflow violations. Key metrics:

```
total_enforcement_retries
```
- How many times agent was forced to retry
```
total_buffer_chars_lost
```
- Content discarded due to restarts
```
unknown_tools
```
- Hallucinated tool names
```
by_round
```
- Which rounds had issues

Red Flags: >=2 retries per round, >5000 chars lost, populated

unknown_tools

list.

See "Enforcement Reliability" in the Key Local Log Files section for the full schema and reason codes.

数据源：

status.json

→

agents[].reliability

检查Agent是否因违反工作流而需要重试。关键指标：

```
total_enforcement_retries
```
- Agent被强制重试的次数
```
total_buffer_chars_lost
```
- 因重启而丢弃的内容字符数
```
unknown_tools
```
- 幻觉出的工具名称
```
by_round
```
- 哪些轮次存在问题

危险信号： 每轮重试>=2次、丢失字符数>5000、

unknown_tools

列表非空。

有关完整schema和原因代码，请参阅"关键本地日志文件"部分中的"强制执行可靠性"内容。

Data Sources for Each Question

各问题的数据源

Question	Primary Source	Secondary Source
Correctness	`coordination_events.json` , `coordination_table.txt`	Logfire coordination events
Efficiency	`metrics_summary.json`	Logfire duration queries
Command patterns	`streaming_debug.log` (grep for `"command":` )	-
Work duplication	`streaming_debug.log` (grep for tool prompts/args)	`metrics_summary.json` tool counts
Agent decisions	`agent_//vote.json` , `coordination_events.json`	Logfire vote spans
Cost/tokens	`metrics_summary.json`	Logfire usage attributes
Errors	`coordination_events.json` , `metrics_summary.json`	Logfire `is_exception=true`
Enforcement	`status.json` → `agents[].reliability`	-

问题	主要数据源	次要数据源
正确性	`coordination_events.json` 、 `coordination_table.txt`	Logfire协调事件
效率	`metrics_summary.json`	Logfire时长查询
命令模式	`streaming_debug.log` （grep `"command":` ）	-
工作重复	`streaming_debug.log` （grep工具提示/参数）	`metrics_summary.json` 工具调用次数
Agent决策	`agent_//vote.json` 、 `coordination_events.json`	Logfire投票Span
成本/Token	`metrics_summary.json`	Logfire使用属性
错误	`coordination_events.json` 、 `metrics_summary.json`	Logfire `is_exception=true`
强制执行	`status.json` → `agents[].reliability`	-

Analysis Commands

分析命令

Find repeated commands:

bash

grep -o '"command": "[^"]*"' streaming_debug.log | sed 's/"command": "//;s/"$//' | sort | uniq -c | sort -rn | head -30

Find generate_media prompts (to check for duplication):

bash

grep -o '"prompts": \[.*\]' streaming_debug.log

Check vote reasoning:

bash

cat agent_*/*/vote.json | jq '.reason'

Find timeout events:

bash

cat coordination_events.json | jq '.events[] | select(.event_type == "agent_timeout")'

查找重复命令：

bash

grep -o '"command": "[^"]*"' streaming_debug.log | sed 's/"command": "//;s/"$//' | sort | uniq -c | sort -rn | head -30

查找generate_media提示（检查重复）：

bash

grep -o '"prompts": \[.*\]' streaming_debug.log

检查投票推理：

bash

cat agent_*/*/vote.json | jq '.reason'

查找超时事件：

bash

cat coordination_events.json | jq '.events[] | select(.event_type == "agent_timeout")'

Report Template

报告模板

Save this report to

[log_dir]/turn_N/ANALYSIS_REPORT.md

(where N is the turn number being analyzed):

markdown

undefined

将此报告保存到

[log_dir]/turn_N/ANALYSIS_REPORT.md

（N为待分析的轮次编号）：

markdown

undefined

MassGen Log Analysis Report

MassGen 日志分析报告

Session: [log_dir name] Trace ID: [trace_id if available] Generated: [timestamp] Logfire Link: [link if available]

会话： [log_dir名称] 追踪ID： [若可用则填写trace_id] 生成时间： [时间戳] Logfire链接： [若可用则填写链接]

Executive Summary

执行摘要

[2-3 sentence summary of the run: what was the task, did it succeed, key findings]

[2-3句话总结运行情况：任务内容、是否成功、关键发现]

Session Overview

会话概述

Metric	Value
Duration	X minutes
Agents	[list]
Winner	[agent_id]
Total Cost	$X.XX
Total Answers	X
Total Votes	X
Total Restarts	X

指标	数值
总时长	X分钟
Agent列表	[列表]
获胜Agent	[agent_id]
总成本	$X.XX
总答案数	X
总投票数	X
总重启次数	X

1. Correctness Analysis

1. 正确性分析

Coordination Flow

协调流程

[Timeline of key events]

[关键事件时间线]

Status

状态

Issues Found

发现的问题

[List any correctness issues]

[列出所有正确性问题]

2. Efficiency Analysis

2. 效率分析

Phase Duration Breakdown

阶段时长分布

Phase	Count	Avg (s)	Max (s)	Total (s)	% of Total
initial_answer
voting
presentation

阶段	次数	平均时长(秒)	最长时长(秒)	总时长(秒)	占比
initial_answer
voting
presentation

Top Bottlenecks

主要瓶颈

[Operation] - X seconds (X% of total)
[Operation] - X seconds
[Operation] - X seconds

[操作] - X秒（占总时长X%）
[操作] - X秒
[操作] - X秒

3. Command Pattern Analysis

3. 命令模式分析

Frequently Repeated Commands

频繁重复的命令

Command	Times Run	Issue	Recommendation
`openskills read pptx`	X	Long output (~5KB) re-read after restarts	Cache skill docs
`npm install ...`	X	Reinstalled after each restart	Persist node_modules
...

命令	运行次数	问题	建议
`openskills read pptx`	X	重启后重复读取长输出（约5KB）	缓存技能文档
`npm install ...`	X	每次重启后重新安装	持久化node_modules
...

Commands with Excessive Output

输出过大的命令

Command	Output Size	Issue

命令	输出大小	问题

Slowest Command Patterns

最慢的命令模式

Pattern	Max Time	Avg Time
Web scraping (crawl4ai)	Xs	Xs
npm install	Xs	Xs
PPTX pipeline	Xs	Xs

模式	最长耗时	平均耗时
网页抓取（crawl4ai）	X秒	X秒
npm install	X秒	X秒
PPTX流水线	X秒	X秒

4. Work Duplication Analysis

4. 工作重复分析

Duplicated Work Found

发现的重复工作

Work Type	Times Repeated	Wasted Time	Wasted Cost
Image generation	X	X min	$X.XX
Research/scraping	X	X min	-
Package installs	X	X min	-

工作类型	重复次数	浪费时间	浪费成本
图像生成	X	X分钟	$X.XX
研究/抓取	X	X分钟	-
包安装	X	X分钟	-

Specific Examples

具体示例

[List specific examples of duplicated work with prompts/commands]

[列出重复工作的具体示例，含提示/命令]

Recommendations

建议

[Specific recommendation to avoid duplication]
[Specific recommendation]

[避免重复的具体建议]
[具体建议]

5. Agent Behavior Analysis

5. Agent行为分析

Answer Progression

答案演进

Label	Agent	Time	Summary
agent1.1	agent_a	HH:MM	[brief description]
agent2.1	agent_b	HH:MM	[brief description]
...

标签	Agent	时间	摘要
agent1.1	agent_a	HH:MM	[简要描述]
agent2.1	agent_b	HH:MM	[简要描述]
...

Voting Analysis

投票分析

Voter	Voted For	Reasoning Summary
agent_b	agent1.1	"[key quote from reasoning]"

投票Agent	投票对象	推理摘要
agent_b	agent1.1	"[推理过程中的关键引用]"

Vote vs New Answer Decisions

投票 vs 新答案决策

[Explain how agents decided whether to vote or provide new answers]

[解释Agent如何决定投票还是提供新答案]

Agent Collaboration Quality

Agent协作质量

Did agents read each other's answers? [Yes/No with evidence]
Did agents build upon previous work? [Yes/No with evidence]
Did agents provide genuine evaluation? [Yes/No with evidence]

Agent是否阅读了彼此的答案？[是/否，附证据]
Agent是否基于之前的工作进行构建？[是/否，附证据]
Agent是否提供了真实的评估？[是/否，附证据]

Timeouts/Incomplete Rounds

超时/未完成轮次

[List any timeouts with context]

[列出所有超时事件及上下文]

6. Cost & Token Analysis

6. 成本与Token分析

Cost Breakdown

成本分布

Agent	Input Tokens	Output Tokens	Reasoning	Cost
agent_a				$X.XX
agent_b				$X.XX
Total				$X.XX

Agent	输入Token	输出Token	推理Token	成本
agent_a				$X.XX
agent_b				$X.XX
总计				$X.XX

Cache Efficiency

缓存效率

Cached input tokens: X (X% cache hit rate)

缓存输入Token：X（缓存命中率X%）

Tool Cost Impact

工具成本影响

Tool	Calls	Est. Time Cost	Notes
generate_media	X	X min
command_line	X	X min

工具	调用次数	估算时间成本	说明
generate_media	X	X分钟
command_line	X	X分钟

7. Errors & Issues

7. 错误与问题

Exceptions

异常

[List any exceptions with type and message]

[列出所有异常，含类型与消息]

Failed Tool Calls

失败的工具调用

[List any failed tools]

[列出所有失败的工具]

Agent Errors

Agent错误

[List any agent-level errors]

[列出所有Agent级别的错误]

Timeouts

超时

[List any timeouts with duration and context]

[列出所有超时事件及时长与上下文]

8. Recommendations

8. 建议

High Priority

高优先级

[Issue]: [Specific actionable recommendation]
[Issue]: [Specific actionable recommendation]

[问题]：[具体可操作建议]
[问题]：[具体可操作建议]

Medium Priority

中优先级

[Issue]: [Recommendation]

[问题]：[建议]

Low Priority / Future Improvements

低优先级/未来改进

[Issue]: [Recommendation]

[问题]：[建议]

9. Suggested Linear Issues

9. 建议的Linear问题

Based on the analysis, the following issues are suggested for tracking. If you have access to the Linear project and the session is interactive, present these to the user for approval before creating. Regardless of access, you should write them in a section as below, as we want to learn from the logs to propose and later solve concrete issues:

Priority	Title	Description	Labels
High	[Short title]	[1-2 sentence description]	log-analysis, [area]
Medium	[Short title]	[Description]	log-analysis, [area]

After user approval, create issues in Linear with:

Project: MassGen
Label:
```
log-analysis
```
(to identify issues from log analysis)
Additional labels as appropriate (e.g.,
```
performance
```
,
```
agent-behavior
```
,
```
tooling
```
)

基于分析结果，建议创建以下跟踪问题。如果您有权访问Linear项目且会话为交互式，请先提交给用户审批后再创建。无论是否有权访问，都应在本节中列出，以便我们从日志中学习并提出具体的解决方案：

优先级	标题	描述	标签
高	[简短标题]	[1-2句话描述]	log-analysis, [领域]
中	[简短标题]	[描述]	log-analysis, [领域]

用户审批后，在Linear中创建问题：

项目：MassGen
标签：
```
log-analysis
```
（用于标识来自日志分析的问题）
附加标签（如：
```
performance
```
、
```
agent-behavior
```
、
```
tooling
```
）

Appendix

附录

Configuration Used

使用的配置

[Key config settings from execution_metadata.yaml]

[来自execution_metadata.yaml的关键配置设置]

Files Generated

生成的文件

[List of output files in the workspace]

undefined

[工作区中的输出文件列表]

undefined

Workflow for Generating Report

报告生成工作流

Read local files first (metrics_summary.json, coordination_table.txt, coordination_events.json)
Query Logfire for trace_id and timing data (if available; wait and retry on rate limits)
Analyze streaming_debug.log for command patterns
Check vote.json files for agent reasoning
Generate the report using the template
Save to
```
[log_dir]/turn_N/ANALYSIS_REPORT.md
```
(N = turn number being analyzed)
Print summary to the user
Suggest Linear issues based on findings - present to user for approval, if session is interactive
Create approved issues in Linear with
```
log-analysis
```
label

先读取本地文件（metrics_summary.json、coordination_table.txt、coordination_events.json）
查询Logfire 获取trace_id和时序数据（若可用；遇到速率限制时等待并重试）
分析streaming_debug.log 查找命令模式
检查vote.json文件 获取Agent推理过程
使用模板生成报告
保存到
```
[log_dir]/turn_N/ANALYSIS_REPORT.md
```
（N = 待分析的轮次编号）
向用户打印摘要
基于发现建议Linear问题 - 若会话为交互式，提交给用户审批
创建已审批的问题 在Linear中添加
```
log-analysis
```
标签

Part 7: Quick Reference - SQL Queries

第七部分：快速参考 - SQL查询

Correctness Queries

正确性查询

sql

-- Check coordination flow
SELECT span_name, start_timestamp, duration
FROM records
WHERE trace_id = '[TRACE_ID]'
  AND (span_name LIKE 'Coordination event:%'
       OR span_name LIKE 'Agent answer:%'
       OR span_name LIKE 'Agent vote:%'
       OR span_name LIKE 'Winner selected:%'
       OR span_name LIKE 'Final answer%')
ORDER BY start_timestamp

sql

-- 检查协调流程
SELECT span_name, start_timestamp, duration
FROM records
WHERE trace_id = '[TRACE_ID]'
  AND (span_name LIKE 'Coordination event:%'
       OR span_name LIKE 'Agent answer:%'
       OR span_name LIKE 'Agent vote:%'
       OR span_name LIKE 'Winner selected:%'
       OR span_name LIKE 'Final answer%')
ORDER BY start_timestamp

Efficiency Queries

效率查询

sql

-- Phase duration breakdown
SELECT
  CASE
    WHEN span_name LIKE 'agent.%.round_0' THEN 'initial_answer'
    WHEN span_name LIKE 'agent.%.round_%' THEN 'voting'
    WHEN span_name LIKE 'agent.%.presentation' THEN 'presentation'
    ELSE 'other'
  END as phase,
  COUNT(*) as count,
  ROUND(AVG(duration)::numeric, 2) as avg_duration_sec,
  ROUND(MAX(duration)::numeric, 2) as max_duration_sec,
  ROUND(SUM(duration)::numeric, 2) as total_duration_sec
FROM records
WHERE trace_id = '[TRACE_ID]'
  AND span_name LIKE 'agent.%'
GROUP BY 1
ORDER BY total_duration_sec DESC

sql

-- 阶段时长分布
SELECT
  CASE
    WHEN span_name LIKE 'agent.%.round_0' THEN 'initial_answer'
    WHEN span_name LIKE 'agent.%.round_%' THEN 'voting'
    WHEN span_name LIKE 'agent.%.presentation' THEN 'presentation'
    ELSE 'other'
  END as phase,
  COUNT(*) as count,
  ROUND(AVG(duration)::numeric, 2) as avg_duration_sec,
  ROUND(MAX(duration)::numeric, 2) as max_duration_sec,
  ROUND(SUM(duration)::numeric, 2) as total_duration_sec
FROM records
WHERE trace_id = '[TRACE_ID]'
  AND span_name LIKE 'agent.%'
GROUP BY 1
ORDER BY total_duration_sec DESC

Error Queries

错误查询

sql

-- Find all exceptions
SELECT span_name, exception_type, exception_message, start_timestamp
FROM records
WHERE trace_id = '[TRACE_ID]' AND is_exception = true
ORDER BY start_timestamp

sql

-- 查找所有异常
SELECT span_name, exception_type, exception_message, start_timestamp
FROM records
WHERE trace_id = '[TRACE_ID]' AND is_exception = true
ORDER BY start_timestamp

Cost Queries

成本查询

sql

-- Token usage by agent
SELECT
  attributes->>'massgen.agent_id' as agent,
  SUM((attributes->'massgen.usage.input')::int) as total_input_tokens,
  SUM((attributes->'massgen.usage.output')::int) as total_output_tokens,
  SUM((attributes->'massgen.usage.cost')::float) as total_cost_usd
FROM records
WHERE trace_id = '[TRACE_ID]'
  AND span_name LIKE 'agent.%'
  AND attributes->>'massgen.usage.input' IS NOT NULL
GROUP BY attributes->>'massgen.agent_id'

sql

-- Agent的Token使用情况
SELECT
  attributes->>'massgen.agent_id' as agent,
  SUM((attributes->'massgen.usage.input')::int) as total_input_tokens,
  SUM((attributes->'massgen.usage.output')::int) as total_output_tokens,
  SUM((attributes->'massgen.usage.cost')::float) as total_cost_usd
FROM records
WHERE trace_id = '[TRACE_ID]'
  AND span_name LIKE 'agent.%'
  AND attributes->>'massgen.usage.input' IS NOT NULL
GROUP BY attributes->>'massgen.agent_id'

undefined