massgen-log-analyzer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MassGen Log Analyzer

MassGen 日志分析器

This skill provides a structured workflow for running MassGen experiments and analyzing the resulting traces and logs using Logfire.
本技能提供了结构化的工作流,用于运行MassGen实验并使用Logfire分析生成的追踪数据和日志。

Purpose

用途

The log-analyzer skill helps you:
  • Run MassGen experiments with proper instrumentation
  • Query and analyze traces hierarchically
  • Debug agent behavior and coordination patterns
  • Measure performance and identify bottlenecks
  • Improve the logging structure itself
  • Generate markdown analysis reports saved to the log directory
日志分析器技能可帮助您:
  • 运行带有合适埋点的MassGen实验
  • 分层查询和分析追踪数据
  • 调试Agent行为与协调模式
  • 衡量性能并识别瓶颈
  • 优化日志结构本身
  • 生成Markdown分析报告并保存到日志目录

CLI Quick Reference

CLI快速参考

The
massgen logs
CLI provides quick access to log analysis:
massgen logs
CLI提供了日志分析的快捷入口:

List Logs with Analysis Status

查看带分析状态的日志列表

bash
uv run massgen logs list                    # Show all recent logs with analysis status
uv run massgen logs list --analyzed         # Only logs with ANALYSIS_REPORT.md
uv run massgen logs list --unanalyzed       # Only logs needing analysis
uv run massgen logs list --limit 20         # Show more logs
bash
uv run massgen logs list                    # 显示所有近期日志及分析状态
uv run massgen logs list --analyzed         # 仅显示已生成ANALYSIS_REPORT.md的日志
uv run massgen logs list --unanalyzed       # 仅显示待分析的日志
uv run massgen logs list --limit 20         # 显示更多日志

Generate Analysis Prompt

生成分析提示

bash
undefined
bash
undefined

Run from within your coding CLI (e.g., Claude Code) so it sees output

在代码CLI(如Claude Code)中运行,以便查看输出

uv run massgen logs analyze # Analyze latest turn of latest log uv run massgen logs analyze --log-dir PATH # Analyze specific log uv run massgen logs analyze --turn 1 # Analyze specific turn

The prompt output tells your coding CLI to use this skill on the specified log directory.
uv run massgen logs analyze # 分析最新日志的最新轮次 uv run massgen logs analyze --log-dir PATH # 分析指定日志 uv run massgen logs analyze --turn 1 # 分析指定轮次

提示输出会告知您的代码CLI要对指定日志目录使用本技能。

Multi-Agent Self-Analysis

多Agent自分析

bash
uv run massgen logs analyze --mode self                 # Run 3-agent analysis team (prompts if report exists)
uv run massgen logs analyze --mode self --force         # Overwrite existing report without prompting
uv run massgen logs analyze --mode self --turn 2        # Analyze specific turn
uv run massgen logs analyze --mode self --config PATH   # Use custom config
Self-analysis mode runs MassGen with multiple agents to analyze logs from different perspectives (correctness, efficiency, behavior) and produces a combined ANALYSIS_REPORT.md.
bash
uv run massgen logs analyze --mode self                 # 运行3-Agent分析团队(若报告已存在会提示)
uv run massgen logs analyze --mode self --force         # 直接覆盖现有报告,不提示
uv run massgen logs analyze --mode self --turn 2        # 分析指定轮次
uv run massgen logs analyze --mode self --config PATH   # 使用自定义配置
自分析模式会运行MassGen多Agent团队,从不同维度(正确性、效率、行为)分析日志,并生成合并后的ANALYSIS_REPORT.md。

Multi-Turn Sessions

多轮会话

MassGen log directories support multiple turns (coordination sessions). Each turn has its own
turn_N/
directory with attempts inside:
text
log_YYYYMMDD_HHMMSS/
├── turn_1/                    # First coordination session
│   ├── ANALYSIS_REPORT.md     # Report for turn 1
│   ├── attempt_1/             # First attempt
│   └── attempt_2/             # Retry if orchestration restarted
├── turn_2/                    # Second coordination session (if multi-turn)
│   ├── ANALYSIS_REPORT.md     # Report for turn 2
│   └── attempt_1/
When analyzing, the
--turn
flag specifies which turn to analyze. Without it, the latest turn is analyzed.
MassGen日志目录支持多轮次(协调会话)。每个轮次都有独立的
turn_N/
目录,内部包含多次尝试:
text
log_YYYYMMDD_HHMMSS/
├── turn_1/                    # 第一次协调会话
│   ├── ANALYSIS_REPORT.md     # 第一轮次的分析报告
│   ├── attempt_1/             # 第一次尝试
│   └── attempt_2/             # 若编排重启则重试
├── turn_2/                    # 第二轮协调会话(多轮场景下)
│   ├── ANALYSIS_REPORT.md     # 第二轮次的分析报告
│   └── attempt_1/
分析时,
--turn
参数指定要分析的轮次。若未指定,则分析最新轮次。

When to Use Logfire vs Local Logs

何时使用Logfire vs 本地日志

Use Local Log Files When:
  • Analyzing command patterns and repetition (commands are in
    streaming_debug.log
    )
  • Checking detailed tool arguments and outputs (in
    coordination_events.json
    )
  • Reading vote reasoning and agent decisions (in
    agent_*/*/vote.json
    )
  • Viewing the coordination flow table (in
    coordination_table.txt
    )
  • Getting cost/token summaries (in
    metrics_summary.json
    )
Use Logfire When:
  • You need precise timing data with millisecond accuracy
  • Analyzing span hierarchy and parent-child relationships
  • Finding exceptions and error stack traces
  • Creating shareable trace links for collaboration
  • Querying across multiple sessions (e.g., "find all sessions with errors")
  • Real-time monitoring of running experiments
Rate Limiting: If Logfire returns a rate limit error, wait up to 60 seconds and retry rather than falling back to local logs. The rate limit resets quickly and Logfire data is worth waiting for when timing/hierarchy analysis is needed.
Key Local Log Files:
FileContains
status.json
Real-time status with agent reliability metrics (enforcement events, buffer loss)
metrics_summary.json
Cost, tokens, tool stats, round history
coordination_events.json
Full event timeline with tool calls
coordination_table.txt
Human-readable coordination flow
streaming_debug.log
Raw streaming data including command strings
agent_*/*/vote.json
Vote reasoning and context
agent_*/*/execution_trace.md
Full tool calls, arguments, results, and reasoning - invaluable for debugging
execution_metadata.yaml
Config and session metadata
Execution Traces (
execution_trace.md
):
These are the most detailed debug artifacts. Each agent snapshot includes an execution trace with:
  • Complete tool calls with full arguments (not truncated)
  • Full tool results (not truncated)
  • Reasoning/thinking blocks from the model
  • Timestamps and round markers
Use execution traces when you need to understand exactly what an agent did and why - they capture everything the agent saw and produced during that answer/vote iteration.
Enforcement Reliability (
status.json
):
The
status.json
file includes per-agent reliability metrics that track workflow enforcement events:
json
{
  "agents": {
    "agent_a": {
      "reliability": {
        "enforcement_attempts": [
          {
            "round": 0,
            "attempt": 1,
            "max_attempts": 3,
            "reason": "no_workflow_tool",
            "tool_calls": ["search", "read_file"],
            "error_message": "Must use workflow tools",
            "buffer_preview": "First 500 chars of lost content...",
            "buffer_chars": 1500,
            "timestamp": 1736683468.123
          }
        ],
        "by_round": {"0": {"count": 2, "reasons": ["no_workflow_tool", "invalid_vote_id"]}},
        "unknown_tools": ["execute_command"],
        "workflow_errors": ["invalid_vote_id"],
        "total_enforcement_retries": 2,
        "total_buffer_chars_lost": 3000,
        "outcome": "ok"
      }
    }
  }
}
Enforcement Reason Codes:
ReasonDescription
no_workflow_tool
Agent called tools but none were
vote
or
new_answer
no_tool_calls
Agent provided text-only response, no tools called
invalid_vote_id
Agent voted for non-existent agent ID
vote_no_answers
Agent tried to vote when no answers exist
vote_and_answer
Agent used both
vote
and
new_answer
in same response
answer_limit
Agent hit max answer count limit
answer_novelty
Answer too similar to existing answers
answer_duplicate
Exact duplicate of existing answer
api_error
API/streaming error (e.g., "peer closed connection")
connection_recovery
API stream ended early, recovered with preserved context
mcp_disconnected
MCP server disconnected mid-session (e.g., "Server 'X' not connected")
This data is invaluable for understanding why agents needed retries and how much content was lost due to enforcement restarts.
使用本地日志文件的场景:
  • 分析命令模式与重复执行情况(命令记录在
    streaming_debug.log
  • 查看详细的工具参数与输出(在
    coordination_events.json
    中)
  • 读取投票推理过程与Agent决策(在
    agent_*/*/vote.json
    中)
  • 查看协调流程表(在
    coordination_table.txt
    中)
  • 获取成本/Token汇总(在
    metrics_summary.json
    中)
使用Logfire的场景:
  • 需要毫秒级精度的精准时序数据
  • 分析Span层级与父子关系
  • 查找异常与错误堆栈跟踪
  • 创建可共享的追踪链接用于协作
  • 跨多个会话查询(例如:"查找所有包含错误的会话")
  • 实时监控运行中的实验
速率限制: 如果Logfire返回速率限制错误,请等待60秒后重试,不要直接切换到本地日志。速率限制会快速重置,当需要时序/层级分析时,Logfire的数据值得等待。
关键本地日志文件:
文件包含内容
status.json
实时状态,包含Agent可靠性指标(强制执行事件、缓冲区丢失)
metrics_summary.json
成本、Token、工具统计、轮次历史
coordination_events.json
完整事件时间线,包含工具调用记录
coordination_table.txt
人类可读的协调流程
streaming_debug.log
原始流数据,包含命令字符串
agent_*/*/vote.json
投票推理过程与上下文
agent_*/*/execution_trace.md
完整的工具调用、参数、结果与推理过程 - 调试必备
execution_metadata.yaml
配置与会话元数据
执行追踪(
execution_trace.md
):
这是最详细的调试工件。每个Agent快照都包含执行追踪,其中包括:
  • 完整的工具调用及参数(无截断)
  • 完整的工具结果(无截断)
  • 模型的推理/思考块
  • 时间戳与轮次标记
当您需要准确了解Agent做了什么以及为什么这么做时,请使用执行追踪 - 它们捕获了Agent在该回答/投票迭代中看到和生成的所有内容。
强制执行可靠性(
status.json
):
status.json
文件包含每个Agent的可靠性指标,用于跟踪工作流强制执行事件:
json
{
  "agents": {
    "agent_a": {
      "reliability": {
        "enforcement_attempts": [
          {
            "round": 0,
            "attempt": 1,
            "max_attempts": 3,
            "reason": "no_workflow_tool",
            "tool_calls": ["search", "read_file"],
            "error_message": "必须使用工作流工具",
            "buffer_preview": "丢失内容的前500个字符...",
            "buffer_chars": 1500,
            "timestamp": 1736683468.123
          }
        ],
        "by_round": {"0": {"count": 2, "reasons": ["no_workflow_tool", "invalid_vote_id"]}},
        "unknown_tools": ["execute_command"],
        "workflow_errors": ["invalid_vote_id"],
        "total_enforcement_retries": 2,
        "total_buffer_chars_lost": 3000,
        "outcome": "ok"
      }
    }
  }
}
强制执行原因代码:
原因描述
no_workflow_tool
Agent调用了工具,但未使用
vote
new_answer
no_tool_calls
Agent仅提供了文本响应,未调用任何工具
invalid_vote_id
Agent为不存在的Agent ID投票
vote_no_answers
Agent在无可用答案时尝试投票
vote_and_answer
Agent在同一响应中同时使用了
vote
new_answer
answer_limit
Agent达到了最大答案数量限制
answer_novelty
答案与现有答案过于相似
answer_duplicate
与现有答案完全重复
api_error
API/流错误(例如:"peer closed connection")
connection_recovery
API流提前结束,已通过保留上下文恢复
mcp_disconnected
MCP服务器在会话中途断开连接(例如:"Server 'X' not connected")
这些数据对于理解Agent为何需要重试以及由于强制执行重启导致多少内容丢失至关重要。

Logfire Setup

Logfire 设置

Before using this skill, you need to set up Logfire for observability.
使用本技能前,您需要设置Logfire以实现可观测性。

Step 1: Install MassGen with Observability Support

步骤1:安装带可观测性支持的MassGen

bash
pip install "massgen[observability]"
bash
pip install "massgen[observability]"

Or with uv

或使用uv

uv pip install "massgen[observability]"
undefined
uv pip install "massgen[observability]"
undefined

Step 2: Create a Logfire Account

步骤2:创建Logfire账户

Go to https://logfire.pydantic.dev/ and create a free account.
访问 https://logfire.pydantic.dev/ 并创建免费账户。

Step 3: Authenticate with Logfire

步骤3:Logfire身份验证

bash
undefined
bash
undefined

This creates ~/.logfire/credentials.json

此命令会创建 ~/.logfire/credentials.json

uv run logfire auth
uv run logfire auth

Or set the token directly as an environment variable

或直接将令牌设置为环境变量

export LOGFIRE_TOKEN=your_token_here
undefined
export LOGFIRE_TOKEN=your_token_here
undefined

Step 4: Get Your Read Token for the MCP Server

步骤4:获取MCP服务器的读取令牌

  1. Go to https://logfire.pydantic.dev/ and log in
  2. Navigate to your project settings
  3. Create a Read Token (this is different from the write token used for authentication)
  4. Copy the token for use in Step 5
  1. 访问 https://logfire.pydantic.dev/ 并登录
  2. 导航到项目设置
  3. 创建一个读取令牌(这与用于身份验证的写入令牌不同)
  4. 复制令牌用于步骤5

Step 5: Add the Logfire MCP Server

步骤5:添加Logfire MCP服务器

bash
claude mcp add logfire -e LOGFIRE_READ_TOKEN="your-read-token-here" -- uvx logfire-mcp@latest
Then restart Claude Code and re-invoke this skill.
bash
claude mcp add logfire -e LOGFIRE_READ_TOKEN="your-read-token-here" -- uvx logfire-mcp@latest
然后重启Claude Code并重新调用本技能。

Prerequisites

前置条件

Logfire MCP Server (Optional but Recommended): The Logfire MCP server provides enhanced analysis with precise timing data and cross-session queries. If
LOGFIRE_READ_TOKEN
is not set, self-analysis mode will automatically disable the Logfire MCP and fall back to local log files only.
When configured, the MCP server provides these tools:
  • mcp__logfire__arbitrary_query
    - Run SQL queries against logfire data
  • mcp__logfire__schema_reference
    - Get the database schema
  • mcp__logfire__find_exceptions_in_file
    - Find exceptions in a file
  • mcp__logfire__logfire_link
    - Create links to traces in the UI
Required Flags:
  • --automation
    - Clean output for programmatic parsing -- see
    massgen-develops-massgen
    skill for more info on this flag
  • --logfire
    - Enable Logfire tracing (optional, but required to populate Logfire data)
Logfire MCP服务器(可选但推荐): Logfire MCP服务器提供增强的分析功能,包括精准时序数据和跨会话查询。如果未设置
LOGFIRE_READ_TOKEN
,自分析模式会自动禁用Logfire MCP,仅回退到本地日志文件。
配置完成后,MCP服务器提供以下工具:
  • mcp__logfire__arbitrary_query
    - 对Logfire数据运行SQL查询
  • mcp__logfire__schema_reference
    - 获取数据库 schema
  • mcp__logfire__find_exceptions_in_file
    - 在文件中查找异常
  • mcp__logfire__logfire_link
    - 在UI中创建追踪链接
必填参数:
  • --automation
    - 用于程序解析的简洁输出 - 有关此参数的更多信息,请查看
    massgen-develops-massgen
    技能
  • --logfire
    - 启用Logfire追踪(可选,但需要此参数来填充Logfire数据)

Part 1: Running MassGen Experiments

第一部分:运行MassGen实验

Basic Command Format

基本命令格式

bash
uv run massgen --automation --logfire --config [config_file] "[question]"
bash
uv run massgen --automation --logfire --config [config_file] "[question]"

Running in Background (Recommended)

后台运行(推荐)

Use
run_in_background: true
(or however you run tasks in the background) to run experiments asynchronously so you can monitor progress and end early if needed.
Expected Output (first lines):
LOG_DIR: .massgen/massgen_logs/log_YYYYMMDD_HHMMSS_ffffff
STATUS: .massgen/massgen_logs/log_YYYYMMDD_HHMMSS_ffffff/turn_1/attempt_1/status.json
QUESTION: Your task here
[Coordination in progress - monitor status.json for real-time updates]
Parse the LOG_DIR - you'll need this for file-based analysis!
使用
run_in_background: true
(或您常用的后台任务运行方式)异步运行实验,以便您可以监控进度并在需要时提前结束。
预期输出(前几行):
LOG_DIR: .massgen/massgen_logs/log_YYYYMMDD_HHMMSS_ffffff
STATUS: .massgen/massgen_logs/log_YYYYMMDD_HHMMSS_ffffff/turn_1/attempt_1/status.json
QUESTION: 您的任务内容
[协调进行中 - 监控status.json获取实时更新]
记录LOG_DIR - 基于文件的分析需要用到此路径!

Monitoring Progress

监控进度

status.json
updates every 2 seconds; use that to track progress.
bash
cat [log_dir]/turn_1/attempt_1/status.json
Key fields to monitor:
  • coordination.completion_percentage
    (0-100)
  • coordination.phase
    - "initial_answer", "enforcement", "presentation"
  • results.winner
    - null while running, agent_id when complete
  • agents[].status
    - "waiting", "streaming", "answered", "voted", "error"
  • agents[].error
    - null if ok, error details if failed
status.json
每2秒更新一次;可通过该文件跟踪进度。
bash
cat [log_dir]/turn_1/attempt_1/status.json
需监控的关键字段:
  • coordination.completion_percentage
    (0-100)
  • coordination.phase
    - "initial_answer"(初始回答)、"enforcement"(强制执行)、"presentation"(结果展示)
  • results.winner
    - 运行中为null,完成后为agent_id
  • agents[].status
    - "waiting"(等待)、"streaming"(流处理)、"answered"(已回答)、"voted"(已投票)、"error"(错误)
  • agents[].error
    - 正常时为null,失败时显示错误详情

Reading Final Results

查看最终结果

After completion (exit code 0):
bash
undefined
完成后(退出码为0):
bash
undefined

Read the final answer

读取最终答案

cat [log_dir]/turn_1/attempt_1/final/[winner]/answer.txt

**Other useful files:**
- `execution_metadata.yaml` - Full config and execution details
- `coordination_events.json` - Complete event log
- `coordination_table.txt` - Human-readable coordination summary
cat [log_dir]/turn_1/attempt_1/final/[winner]/answer.txt

**其他有用的文件:**
- `execution_metadata.yaml` - 完整配置与执行详情
- `coordination_events.json` - 完整事件日志
- `coordination_table.txt` - 人类可读的协调摘要

Part 2: Querying Logfire

第二部分:查询Logfire

Database Schema

数据库Schema

The main table is
records
with these key columns:
ColumnDescription
span_name
Name of the span (e.g., "agent.agent_a.round_0")
span_id
Unique identifier for this span
parent_span_id
ID of the parent span (null for root)
trace_id
Groups all spans in a single trace
duration
Time in seconds
start_timestamp
When the span started
end_timestamp
When the span ended
attributes
JSON blob with custom attributes
message
Log message
is_exception
Boolean for errors
exception_type
Type of exception if any
exception_message
Exception message
主表为
records
,包含以下关键列:
描述
span_name
Span名称(例如:"agent.agent_a.round_0")
span_id
此Span的唯一标识符
parent_span_id
父Span的ID(根Span为null)
trace_id
将单个追踪中的所有Span分组
duration
持续时间(秒)
start_timestamp
Span开始时间
end_timestamp
Span结束时间
attributes
包含自定义属性的JSON blob
message
日志消息
is_exception
错误标记(布尔值)
exception_type
异常类型(如有)
exception_message
异常消息(如有)

MassGen Span Hierarchy

MassGen Span层级

MassGen creates hierarchical spans:
coordination.session (root)
├── Coordination event: coordination_started
├── agent.agent_a.round_0
│   ├── llm.openrouter.stream
│   ├── mcp.filesystem.write_file
│   └── Tool execution: mcp__filesystem__write_file
├── agent.agent_b.round_0
├── Agent answer: agent1.1
├── agent.agent_a.round_1 (voting round)
├── Agent vote: agent_a -> agent1.1
├── Coordination event: winner_selected
└── agent.agent_a.presentation
    ├── Winner selected: agent1.1
    ├── llm.openrouter.stream
    └── Final answer from agent_a
MassGen创建层级化的Span:
coordination.session (根)
├── Coordination event: coordination_started
├── agent.agent_a.round_0
│   ├── llm.openrouter.stream
│   ├── mcp.filesystem.write_file
│   └── Tool execution: mcp__filesystem__write_file
├── agent.agent_b.round_0
├── Agent answer: agent1.1
├── agent.agent_a.round_1 (投票轮次)
├── Agent vote: agent_a -> agent1.1
├── Coordination event: winner_selected
└── agent.agent_a.presentation
    ├── Winner selected: agent1.1
    ├── llm.openrouter.stream
    └── Final answer from agent_a

Custom Attributes

自定义属性

MassGen spans include these custom attributes (access via
attributes->'key'
):
AttributeDescription
massgen.agent_id
Agent identifier (agent_a, agent_b)
massgen.iteration
Current iteration number
massgen.round
Round number for this agent
massgen.round_type
"initial_answer", "voting", or "presentation"
massgen.backend
Backend provider name
massgen.num_context_answers
Number of answers in context
massgen.is_winner
True for presentation spans
massgen.outcome
"vote", "answer", or "error" (set after round completes)
massgen.voted_for
Agent ID voted for (only set for votes)
massgen.voted_for_label
Answer label voted for (e.g., "agent1.1", only set for votes)
massgen.answer_label
Answer label assigned (e.g., "agent1.1", only set for answers)
massgen.error_message
Error message (only set when outcome is "error")
massgen.usage.input
Input token count
massgen.usage.output
Output token count
massgen.usage.reasoning
Reasoning token count
massgen.usage.cached_input
Cached input token count
massgen.usage.cost
Estimated cost in USD
MassGen Span包含以下自定义属性(通过
attributes->'key'
访问):
属性描述
massgen.agent_id
Agent标识符(agent_a、agent_b等)
massgen.iteration
当前迭代次数
massgen.round
此Agent的轮次编号
massgen.round_type
"initial_answer"(初始回答)、"voting"(投票)或"presentation"(结果展示)
massgen.backend
后端提供商名称
massgen.num_context_answers
上下文中的答案数量
massgen.is_winner
结果展示Span为True
massgen.outcome
"vote"(投票)、"answer"(回答)或"error"(错误)(轮次完成后设置)
massgen.voted_for
投票指向的Agent ID(仅投票场景设置)
massgen.voted_for_label
投票指向的答案标签(例如:"agent1.1",仅投票场景设置)
massgen.answer_label
分配的答案标签(例如:"agent1.1",仅回答场景设置)
massgen.error_message
错误消息(仅当outcome为"error"时设置)
massgen.usage.input
输入Token数量
massgen.usage.output
输出Token数量
massgen.usage.reasoning
推理Token数量
massgen.usage.cached_input
缓存输入Token数量
massgen.usage.cost
估算成本(美元)

Part 3: Common Analysis Queries

第三部分:常见分析查询

1. View Trace Hierarchy

1. 查看追踪层级

sql
SELECT span_name, span_id, parent_span_id, duration, start_timestamp
FROM records
WHERE trace_id = '[YOUR_TRACE_ID]'
ORDER BY start_timestamp
LIMIT 50
sql
SELECT span_name, span_id, parent_span_id, duration, start_timestamp
FROM records
WHERE trace_id = '[YOUR_TRACE_ID]'
ORDER BY start_timestamp
LIMIT 50

2. Find Recent Sessions

2. 查找近期会话

sql
SELECT span_name, trace_id, duration, start_timestamp
FROM records
WHERE span_name = 'coordination.session'
ORDER BY start_timestamp DESC
LIMIT 10
sql
SELECT span_name, trace_id, duration, start_timestamp
FROM records
WHERE span_name = 'coordination.session'
ORDER BY start_timestamp DESC
LIMIT 10

3. Agent Round Performance

3. Agent轮次性能

sql
SELECT
  span_name,
  duration,
  attributes->>'massgen.agent_id' as agent_id,
  attributes->>'massgen.round' as round,
  attributes->>'massgen.round_type' as round_type
FROM records
WHERE span_name LIKE 'agent.%'
ORDER BY start_timestamp DESC
LIMIT 20
sql
SELECT
  span_name,
  duration,
  attributes->>'massgen.agent_id' as agent_id,
  attributes->>'massgen.round' as round,
  attributes->>'massgen.round_type' as round_type
FROM records
WHERE span_name LIKE 'agent.%'
ORDER BY start_timestamp DESC
LIMIT 20

4. Tool Call Analysis

4. 工具调用分析

sql
SELECT
  span_name,
  duration,
  parent_span_id,
  start_timestamp
FROM records
WHERE span_name LIKE 'mcp.%' OR span_name LIKE 'Tool execution:%'
ORDER BY start_timestamp DESC
LIMIT 30
sql
SELECT
  span_name,
  duration,
  parent_span_id,
  start_timestamp
FROM records
WHERE span_name LIKE 'mcp.%' OR span_name LIKE 'Tool execution:%'
ORDER BY start_timestamp DESC
LIMIT 30

5. Find Errors

5. 查找错误

sql
SELECT
  span_name,
  exception_type,
  exception_message,
  trace_id,
  start_timestamp
FROM records
WHERE is_exception = true
ORDER BY start_timestamp DESC
LIMIT 20
sql
SELECT
  span_name,
  exception_type,
  exception_message,
  trace_id,
  start_timestamp
FROM records
WHERE is_exception = true
ORDER BY start_timestamp DESC
LIMIT 20

6. LLM Call Performance

6. LLM调用性能

sql
SELECT
  span_name,
  duration,
  attributes->>'gen_ai.request.model' as model,
  start_timestamp
FROM records
WHERE span_name LIKE 'llm.%'
ORDER BY start_timestamp DESC
LIMIT 30
sql
SELECT
  span_name,
  duration,
  attributes->>'gen_ai.request.model' as model,
  start_timestamp
FROM records
WHERE span_name LIKE 'llm.%'
ORDER BY start_timestamp DESC
LIMIT 30

7. Full Trace with Hierarchy (Nested View)

7. 带层级的完整追踪(嵌套视图)

sql
SELECT
  CASE
    WHEN parent_span_id IS NULL THEN span_name
    ELSE '  └─ ' || span_name
  END as hierarchy,
  duration,
  span_id,
  parent_span_id
FROM records
WHERE trace_id = '[YOUR_TRACE_ID]'
ORDER BY start_timestamp
sql
SELECT
  CASE
    WHEN parent_span_id IS NULL THEN span_name
    ELSE '  └─ ' || span_name
  END as hierarchy,
  duration,
  span_id,
  parent_span_id
FROM records
WHERE trace_id = '[YOUR_TRACE_ID]'
ORDER BY start_timestamp

8. Coordination Events Timeline

8. 协调事件时间线

sql
SELECT span_name, message, duration, start_timestamp
FROM records
WHERE span_name LIKE 'Coordination event:%'
   OR span_name LIKE 'Agent answer:%'
   OR span_name LIKE 'Agent vote:%'
   OR span_name LIKE 'Winner selected:%'
ORDER BY start_timestamp DESC
LIMIT 30
sql
SELECT span_name, message, duration, start_timestamp
FROM records
WHERE span_name LIKE 'Coordination event:%'
   OR span_name LIKE 'Agent answer:%'
   OR span_name LIKE 'Agent vote:%'
   OR span_name LIKE 'Winner selected:%'
ORDER BY start_timestamp DESC
LIMIT 30

Part 4: Analysis Workflow

第四部分:分析工作流

Step 1: Run Experiment

步骤1:运行实验

bash
uv run massgen --automation --logfire --config [config] "[prompt]" 2>&1
bash
uv run massgen --automation --logfire --config [config] "[prompt]" 2>&1

Step 2: Find the Trace

步骤2:查找追踪ID

Query for recent sessions:
sql
SELECT trace_id, duration, start_timestamp
FROM records
WHERE span_name = 'coordination.session'
ORDER BY start_timestamp DESC
LIMIT 5
查询近期会话:
sql
SELECT trace_id, duration, start_timestamp
FROM records
WHERE span_name = 'coordination.session'
ORDER BY start_timestamp DESC
LIMIT 5

Step 3: Analyze Hierarchy

步骤3:分析层级

Get full trace structure:
sql
SELECT span_name, span_id, parent_span_id, duration
FROM records
WHERE trace_id = '[trace_id_from_step_2]'
ORDER BY start_timestamp
获取完整追踪结构:
sql
SELECT span_name, span_id, parent_span_id, duration
FROM records
WHERE trace_id = '[trace_id_from_step_2]'
ORDER BY start_timestamp

Step 4: Investigate Specific Issues

步骤4:调查特定问题

Slow tool calls:
sql
SELECT span_name, duration, parent_span_id
FROM records
WHERE trace_id = '[trace_id]' AND span_name LIKE 'mcp.%'
ORDER BY duration DESC
Agent comparison:
sql
SELECT
  attributes->>'massgen.agent_id' as agent,
  COUNT(*) as rounds,
  SUM(duration) as total_time,
  AVG(duration) as avg_round_time
FROM records
WHERE trace_id = '[trace_id]' AND span_name LIKE 'agent.%'
GROUP BY attributes->>'massgen.agent_id'
缓慢的工具调用:
sql
SELECT span_name, duration, parent_span_id
FROM records
WHERE trace_id = '[trace_id]' AND span_name LIKE 'mcp.%'
ORDER BY duration DESC
Agent对比:
sql
SELECT
  attributes->>'massgen.agent_id' as agent,
  COUNT(*) as rounds,
  SUM(duration) as total_time,
  AVG(duration) as avg_round_time
FROM records
WHERE trace_id = '[trace_id]' AND span_name LIKE 'agent.%'
GROUP BY attributes->>'massgen.agent_id'

Step 5: Create Trace Link

步骤5:创建追踪链接

Use the MCP tool to create a viewable link:
mcp__logfire__logfire_link(trace_id="[your_trace_id]")
使用MCP工具创建可查看的链接:
mcp__logfire__logfire_link(trace_id="[your_trace_id]")

Part 5: Improving the Logging Structure

第五部分:优化日志结构

Current Span Types

当前Span类型

Span PatternSourceDescription
coordination.session
coordination_tracker.pyRoot session span
agent.{id}.round_{n}
orchestrator.pyAgent execution round
agent.{id}.presentation
orchestrator.pyWinner's final presentation
mcp.{server}.{tool}
mcp_tools/client.pyMCP tool execution
llm.{provider}.stream
backendsLLM streaming call
Tool execution: {name}
base_with_custom_tool.pyTool wrapper
Coordination event: *
coordination_tracker.pyCoordination events
Agent answer: {label}
coordination_tracker.pyAnswer submission
Agent vote: {from} -> {to}
coordination_tracker.pyVote cast
Span模式来源描述
coordination.session
coordination_tracker.py根会话Span
agent.{id}.round_{n}
orchestrator.pyAgent执行轮次
agent.{id}.presentation
orchestrator.py获胜者的最终结果展示
mcp.{server}.{tool}
mcp_tools/client.pyMCP工具执行
llm.{provider}.stream
backendsLLM流调用
Tool execution: {name}
base_with_custom_tool.py工具包装器
Coordination event: *
coordination_tracker.py协调事件
Agent answer: {label}
coordination_tracker.py答案提交
Agent vote: {from} -> {to}
coordination_tracker.py投票记录

Adding New Spans

添加新Span

Use the tracer from structured_logging:
python
from massgen.structured_logging import get_tracer

tracer = get_tracer()
with tracer.span("my_operation", attributes={
    "massgen.custom_key": "value",
}):
    do_work()
使用structured_logging中的tracer:
python
from massgen.structured_logging import get_tracer

tracer = get_tracer()
with tracer.span("my_operation", attributes={
    "massgen.custom_key": "value",
}):
    do_work()

Context Propagation Notes

上下文传播说明

Known limitation: When multiple agents run concurrently via
asyncio.create_task
, child spans may not nest correctly under agent round spans. This is an OpenTelemetry context propagation issue with concurrent async code. The presentation phase works correctly because only one agent runs.
Workaround: For accurate nesting in concurrent scenarios, explicit context passing with
contextvars.copy_context()
would be needed.
已知限制: 当通过
asyncio.create_task
并发运行多个Agent时,子Span可能无法正确嵌套在Agent轮次Span下。这是并发异步代码中的OpenTelemetry上下文传播问题。结果展示阶段可正常工作,因为该阶段仅运行一个Agent。
解决方法: 若要在并发场景中实现准确的嵌套,需要使用
contextvars.copy_context()
进行显式上下文传递。

Logfire Documentation Reference

Logfire 文档参考

Key Pages to Know

需了解的关键页面

TopicURLDescription
Getting Started
/docs/
Overview, setup, and core concepts
Manual Tracing
/docs/guides/onboarding-checklist/add-manual-tracing/
Creating spans, adding attributes
SQL Explorer
/docs/guides/web-ui/explore/
Writing SQL queries in the UI
Live View
/docs/guides/web-ui/live/
Real-time trace monitoring
Query API
/docs/how-to-guides/query-api/
Programmatic access to data
OpenAI Integration
/docs/integrations/llms/openai/
LLM call instrumentation
主题URL描述
快速入门
/docs/
概述、设置与核心概念
手动追踪
/docs/guides/onboarding-checklist/add-manual-tracing/
创建Span、添加属性
SQL Explorer
/docs/guides/web-ui/explore/
在UI中编写SQL查询
实时视图
/docs/guides/web-ui/live/
实时追踪监控
查询API
/docs/how-to-guides/query-api/
程序化访问数据
OpenAI集成
/docs/integrations/llms/openai/
LLM调用埋点

Logfire Concepts

Logfire 概念

Spans vs Logs:
  • Spans represent operations with measurable duration (use
    with logfire.span():
    )
  • Logs capture point-in-time events (use
    logfire.info()
    ,
    logfire.error()
    , etc.)
  • Spans and logs inside a span block become children of that span
Span Names vs Messages:
  • span_name
    = the first argument (used for filtering, keep low-cardinality)
  • message
    = formatted result with attribute values interpolated
  • Example:
    logfire.info('Hello {name}', name='Alice')
    → span_name="Hello {name}", message="Hello Alice"
Attributes:
  • Keyword arguments become structured JSON attributes
  • Access in SQL via
    attributes->>'key'
    or
    attributes->'key'
  • Cast when needed:
    (attributes->'cost')::float
Span vs 日志:
  • Span 表示具有可测量持续时间的操作(使用
    with logfire.span():
  • 日志 捕获时间点事件(使用
    logfire.info()
    logfire.error()
    等)
  • Span块内的Span和日志会成为该Span的子项
Span名称 vs 消息:
  • span_name
    = 第一个参数(用于过滤,需保持低基数)
  • message
    = 插入属性值后的格式化结果
  • 示例:
    logfire.info('Hello {name}', name='Alice')
    → span_name="Hello {name}", message="Hello Alice"
属性:
  • 关键字参数会成为结构化JSON属性
  • 在SQL中通过
    attributes->>'key'
    attributes->'key'
    访问
  • 需要时进行类型转换:
    (attributes->'cost')::float

Live View Features

实时视图功能

The Logfire Live View UI (https://logfire.pydantic.dev/) provides:
  • Real-time streaming of traces as they arrive
  • SQL search pane (press
    /
    to open) with auto-complete
  • Natural language to SQL - describe what you want and get a query
  • Timeline histogram showing span counts over time
  • Trace details panel with attributes, exceptions, and OpenTelemetry data
  • Cross-linking between SQL results and trace view via trace_id/span_id
Logfire实时视图UI(https://logfire.pydantic.dev/)提供:
  • 实时流传输 追踪数据
  • SQL搜索面板(按
    /
    打开),带自动补全
  • 自然语言转SQL - 描述需求即可生成查询语句
  • 时间线直方图 显示Span数量随时间的变化
  • 追踪详情面板 包含属性、异常与OpenTelemetry数据
  • 交叉链接 通过trace_id/span_id在SQL结果与追踪视图间跳转

SQL Explorer Tips

SQL Explorer 技巧

The Explore page uses Apache DataFusion SQL syntax (similar to Postgres):
sql
-- Subqueries and CTEs work
WITH recent AS (
  SELECT * FROM records
  WHERE start_timestamp > now() - interval '1 hour'
)
SELECT * FROM recent WHERE is_exception;

-- Access nested JSON
SELECT attributes->>'massgen.agent_id' as agent FROM records;

-- Cast JSON values
SELECT (attributes->'token_count')::int as tokens FROM records;

-- Time filtering is efficient
WHERE start_timestamp > now() - interval '30 minutes'
Explore页面使用Apache DataFusion SQL语法(与Postgres类似):
sql
-- 支持子查询和CTE
WITH recent AS (
  SELECT * FROM records
  WHERE start_timestamp > now() - interval '1 hour'
)
SELECT * FROM recent WHERE is_exception;

-- 访问嵌套JSON
SELECT attributes->>'massgen.agent_id' as agent FROM records;

-- JSON值类型转换
SELECT (attributes->'token_count')::int as tokens FROM records;

-- 时间过滤效率高
WHERE start_timestamp > now() - interval '30 minutes'

LLM Instrumentation

LLM 埋点

Logfire auto-instruments OpenAI calls when configured:
  • Captures conversation display, token usage, response metadata
  • Creates separate spans for streaming requests vs responses
  • Works with both sync and async clients
MassGen's backends use this for
llm.{provider}.stream
spans.
配置完成后,Logfire会自动埋点OpenAI调用:
  • 捕获对话展示、Token使用、响应元数据
  • 为流请求与响应创建独立Span
  • 支持同步与异步客户端
MassGen的后端使用此功能生成
llm.{provider}.stream
Span。

Reference Documentation

参考文档

Logfire:
MassGen:
  • Automation mode:
    AI_USAGE.md
  • Status file reference:
    docs/source/reference/status_file.rst
  • Structured logging:
    massgen/structured_logging.py
Logfire:
MassGen:
  • 自动化模式:
    AI_USAGE.md
  • 状态文件参考:
    docs/source/reference/status_file.rst
  • 结构化日志:
    massgen/structured_logging.py

Tips for Effective Analysis

有效分析技巧

  1. Always use both flags:
    --automation --logfire
    together
  2. Run in background for long tasks to monitor progress
  3. Query by trace_id to isolate specific sessions
  4. Check parent_span_id to understand hierarchy
  5. Use duration to identify bottlenecks
  6. Look at attributes for MassGen-specific context
  7. Create trace links to share findings with team
Note that you may get an error like so:
bash
Error: Error executing tool arbitrary_query: b'{"detail":"Rate limit exceeded for organization xxx: per minute
     limit reached."}'
In this case, please sleep (for up to a minute) and try again.
  1. 始终同时使用两个参数:
    --automation --logfire
  2. 后台运行 长任务以便监控进度
  3. 按trace_id查询 以隔离特定会话
  4. 检查parent_span_id 以理解层级关系
  5. 使用duration 识别瓶颈
  6. 查看attributes 获取MassGen特定上下文
  7. 创建追踪链接 与团队共享发现
注意您可能会遇到如下错误:
bash
Error: Error executing tool arbitrary_query: b'{"detail":"Rate limit exceeded for organization xxx: per minute
     limit reached."}'
遇到此情况时,请等待(最多一分钟)后重试。

Part 6: Comprehensive Log Analysis Report

第六部分:全面日志分析报告

When asked to analyze a MassGen log run, generate a markdown report saved to
[log_dir]/turn_N/ANALYSIS_REPORT.md
where N is the turn being analyzed. Each turn (coordination session) gets its own analysis report as a sibling to the attempt directories. The report must cover the Standard Analysis Questions below.
当需要分析MassGen日志运行结果时,请生成Markdown报告并保存到
[log_dir]/turn_N/ANALYSIS_REPORT.md
,其中N为待分析的轮次编号。每个轮次(协调会话)都有独立的分析报告,作为尝试目录的同级文件。报告必须涵盖以下标准分析问题

Important: Ground Truth and Correctness

重要提示:基准事实与正确性

CRITICAL: Do not assume any agent's answer is "correct" unless the user explicitly provides ground truth.
  • Report what each agent claimed/produced without asserting correctness
  • Note when agents agree or disagree, but don't claim agreement = correctness
  • If agents produce different answers, present both neutrally
  • Only mark answers as correct/incorrect if user provides the actual answer
  • Phrases to avoid: "correctly identified", "got the right answer", "solved correctly"
  • Phrases to use: "claimed", "produced", "submitted", "arrived at"
关键注意事项:除非用户明确提供基准事实,否则不要假设任何Agent的答案是"正确的"。
  • 报告每个Agent声称/生成的内容,但不要断言其正确性
  • 注意Agent之间的一致或分歧,但不要声称一致等于正确
  • 若Agent生成不同答案,中立呈现两者
  • 仅当用户提供实际答案时,才标记答案正确/错误
  • 避免使用的表述:"正确识别"、"得到正确答案"、"成功解决"
  • 推荐使用的表述:"声称"、"生成"、"提交"、"得出"

Standard Analysis Questions

标准分析问题

Every analysis report MUST answer these questions:
每份分析报告必须回答以下问题:

1. Correctness

1. 正确性

  • Did coordination complete successfully?
  • Did all agents submit answers?
  • Did voting occur correctly?
  • Was a winner selected and did they provide a final answer?
  • 协调是否成功完成?
  • 所有Agent是否都提交了答案?
  • 投票是否正确进行?
  • 是否选出了获胜者并提供了最终答案?

2. Efficiency & Bottlenecks

2. 效率与瓶颈

  • What was the total duration and breakdown by phase?
  • What were the slowest operations?
  • Which tools took the most time?
  • 总时长及各阶段的时间分布?
  • 最慢的操作是什么?
  • 哪些工具耗时最长?

3. Command Pattern Analysis

3. 命令模式分析

  • Were there frequently repeated commands that could be avoided? (e.g.,
    openskills read
    ,
    npm install
    ,
    ls -R
    )
  • What commands produced unnecessarily long output? (e.g., skill docs, directory listings)
  • What were the slowest
    execute_command
    patterns?
    (e.g., web scraping, package installs)
  • 是否存在可避免的频繁重复命令?(例如:
    openskills read
    npm install
    ls -R
  • 哪些命令产生了不必要的长输出?(例如:技能文档、目录列表)
  • 最慢的
    execute_command
    模式是什么?
    (例如:网页抓取、包安装)

4. Work Duplication Analysis

4. 工作重复分析

  • Was expensive work (like image generation) unnecessarily redone?
  • Did both agents generate similar/identical assets?
  • Were assets regenerated after restarts instead of being reused?
  • Could caching or sharing have saved time/cost?
  • 是否不必要地重复了高成本工作(如图像生成)?
  • 两个Agent是否生成了相似/相同的资产?
  • 重启后是否重新生成了资产而非重用?
  • 缓存或共享是否可以节省时间/成本?

5. Agent Behavior & Decision Making

5. Agent行为与决策

  • How did agents evaluate previous answers? What reasoning did they provide?
  • How did agents decide between voting vs providing a new answer?
  • Did agents genuinely build upon each other's work or work in isolation?
  • Were there timeouts or incomplete rounds?
  • Agent如何评估之前的答案? 他们提供了哪些推理依据?
  • Agent如何决定投票还是提供新答案?
  • Agent是否真正基于彼此的工作进行构建,还是独立工作?
  • 是否存在超时或未完成的轮次?

6. Cost & Token Analysis

6. 成本与Token分析

  • Total cost and breakdown by agent
  • Token usage (input, output, reasoning, cached)
  • Cache hit rate
  • 总成本及各Agent的成本分布
  • Token使用情况(输入、输出、推理、缓存)
  • 缓存命中率

7. Errors & Issues

7. 错误与问题

  • Any exceptions or failures?
  • Any timeouts?
  • Any agent errors?
  • 是否存在异常或失败?
  • 是否存在超时?
  • 是否存在Agent错误?

8. Agent Reasoning & Behavior Analysis (CRITICAL)

8. Agent推理与行为分析(关键)

This is the most important section. Analyzing how agents think and act reveals root causes of successes and failures.
Data Sources:
  • agent_outputs/agent_*.txt
    - Full output including reasoning (if available)
  • agent_*/*/execution_trace.md
    - Complete tool calls with arguments and results
  • streaming_debug.log
    - Raw streaming chunks
Note: Some models don't emit explicit reasoning traces. For these, analyze tool call patterns and content instead - the sequence of actions still reveals decision-making.
For EACH agent, analyze:
  1. Strategy - What approach did they take? (from reasoning OR tool sequence)
  2. Tool Responses - How did they handle successes/failures/inconsistencies?
  3. Error Recovery - Did they detect problems? Implement workarounds?
  4. Decision Quality - Logical errors? Over/under-verification? Analysis paralysis?
  5. Cross-Agent Comparison - Which had best reasoning? What patterns led to success?
Key Patterns:
PatternGood SignBad Sign
Failure detectionPivots after 2-3 failuresRepeats broken approach 6+ times
Result validationCross-validates outputsAccepts first result blindly
Inconsistency handlingInvestigates conflictsIgnores contradictions
WorkaroundsCreative alternatives when stuckGives up or loops
Time managementCommits when confidentEndless verification, no answer
Extract Key Evidence: For each agent, include 2-3 quotes (if reasoning available) OR describe key tool sequences that illustrate their decision quality.
这是最重要的部分。 分析Agent的思考与行为可揭示成功与失败的根本原因。
数据源:
  • agent_outputs/agent_*.txt
    - 完整输出,包括推理过程(若可用)
  • agent_*/*/execution_trace.md
    - 完整的工具调用、参数与结果
  • streaming_debug.log
    - 原始流数据块
注意: 部分模型不会输出显式的追踪推理过程。对于此类模型,请分析工具调用模式与内容 - 操作序列仍可揭示决策过程。
针对每个Agent,分析:
  1. 策略 - 他们采用了什么方法?(从推理过程或工具序列中分析)
  2. 工具响应 - 他们如何处理成功/失败/不一致?
  3. 错误恢复 - 他们是否检测到问题?是否实施了变通方案?
  4. 决策质量 - 是否存在逻辑错误?过度/不足验证?分析瘫痪?
  5. Agent间对比 - 哪个Agent的推理最佳?哪些模式导致了成功?
关键模式:
模式积极信号消极信号
故障检测2-3次失败后转向其他方案重复错误方法6次以上
结果验证交叉验证输出盲目接受第一个结果
不一致处理调查冲突忽略矛盾
变通方案遇到困境时采用创造性替代方案放弃或循环执行
时间管理有信心时提交结果无休止验证,无最终答案
提取关键证据: 针对每个Agent,引用2-3条推理过程中的语句(若可用),或描述能体现其决策质量的关键工具序列。

9. Tool Reliability Analysis

9. 工具可靠性分析

Analyze tool behavior patterns beyond simple error listing:
  1. Consistency - Same input, same output? Document variance.
  2. False Positives/Negatives - Tools reporting wrong success/failure status?
  3. Root Cause Hypotheses - For each failure pattern, propose likely causes (path issues, rate limits, model limitations, etc.)
分析工具行为模式,而非简单列出错误:
  1. 一致性 - 相同输入是否得到相同输出?记录差异。
  2. 误报/漏报 - 工具是否报告错误的成功/失败状态?
  3. 根本原因假设 - 针对每个失败模式,提出可能的原因(路径问题、速率限制、模型限制等)

10. Enforcement & Workflow Reliability Analysis

10. 强制执行与工作流可靠性分析

Data Source:
status.json
agents[].reliability
Check if agents needed retries due to workflow violations. Key metrics:
  • total_enforcement_retries
    - How many times agent was forced to retry
  • total_buffer_chars_lost
    - Content discarded due to restarts
  • unknown_tools
    - Hallucinated tool names
  • by_round
    - Which rounds had issues
Red Flags: >=2 retries per round, >5000 chars lost, populated
unknown_tools
list.
See "Enforcement Reliability" in the Key Local Log Files section for the full schema and reason codes.
数据源:
status.json
agents[].reliability
检查Agent是否因违反工作流而需要重试。关键指标:
  • total_enforcement_retries
    - Agent被强制重试的次数
  • total_buffer_chars_lost
    - 因重启而丢弃的内容字符数
  • unknown_tools
    - 幻觉出的工具名称
  • by_round
    - 哪些轮次存在问题
危险信号: 每轮重试>=2次、丢失字符数>5000、
unknown_tools
列表非空。
有关完整schema和原因代码,请参阅"关键本地日志文件"部分中的"强制执行可靠性"内容。

Data Sources for Each Question

各问题的数据源

QuestionPrimary SourceSecondary Source
Correctness
coordination_events.json
,
coordination_table.txt
Logfire coordination events
Efficiency
metrics_summary.json
Logfire duration queries
Command patterns
streaming_debug.log
(grep for
"command":
)
-
Work duplication
streaming_debug.log
(grep for tool prompts/args)
metrics_summary.json
tool counts
Agent decisions
agent_*/*/vote.json
,
coordination_events.json
Logfire vote spans
Cost/tokens
metrics_summary.json
Logfire usage attributes
Errors
coordination_events.json
,
metrics_summary.json
Logfire
is_exception=true
Enforcement
status.json
agents[].reliability
-
问题主要数据源次要数据源
正确性
coordination_events.json
coordination_table.txt
Logfire协调事件
效率
metrics_summary.json
Logfire时长查询
命令模式
streaming_debug.log
(grep
"command":
-
工作重复
streaming_debug.log
(grep工具提示/参数)
metrics_summary.json
工具调用次数
Agent决策
agent_*/*/vote.json
coordination_events.json
Logfire投票Span
成本/Token
metrics_summary.json
Logfire使用属性
错误
coordination_events.json
metrics_summary.json
Logfire
is_exception=true
强制执行
status.json
agents[].reliability
-

Analysis Commands

分析命令

Find repeated commands:
bash
grep -o '"command": "[^"]*"' streaming_debug.log | sed 's/"command": "//;s/"$//' | sort | uniq -c | sort -rn | head -30
Find generate_media prompts (to check for duplication):
bash
grep -o '"prompts": \[.*\]' streaming_debug.log
Check vote reasoning:
bash
cat agent_*/*/vote.json | jq '.reason'
Find timeout events:
bash
cat coordination_events.json | jq '.events[] | select(.event_type == "agent_timeout")'
查找重复命令:
bash
grep -o '"command": "[^"]*"' streaming_debug.log | sed 's/"command": "//;s/"$//' | sort | uniq -c | sort -rn | head -30
查找generate_media提示(检查重复):
bash
grep -o '"prompts": \[.*\]' streaming_debug.log
检查投票推理:
bash
cat agent_*/*/vote.json | jq '.reason'
查找超时事件:
bash
cat coordination_events.json | jq '.events[] | select(.event_type == "agent_timeout")'

Report Template

报告模板

Save this report to
[log_dir]/turn_N/ANALYSIS_REPORT.md
(where N is the turn number being analyzed):
markdown
undefined
将此报告保存到
[log_dir]/turn_N/ANALYSIS_REPORT.md
(N为待分析的轮次编号):
markdown
undefined

MassGen Log Analysis Report

MassGen 日志分析报告

Session: [log_dir name] Trace ID: [trace_id if available] Generated: [timestamp] Logfire Link: [link if available]
会话: [log_dir名称] 追踪ID: [若可用则填写trace_id] 生成时间: [时间戳] Logfire链接: [若可用则填写链接]

Executive Summary

执行摘要

[2-3 sentence summary of the run: what was the task, did it succeed, key findings]
[2-3句话总结运行情况:任务内容、是否成功、关键发现]

Session Overview

会话概述

MetricValue
DurationX minutes
Agents[list]
Winner[agent_id]
Total Cost$X.XX
Total AnswersX
Total VotesX
Total RestartsX
指标数值
总时长X分钟
Agent列表[列表]
获胜Agent[agent_id]
总成本$X.XX
总答案数X
总投票数X
总重启次数X

1. Correctness Analysis

1. 正确性分析

Coordination Flow

协调流程

[Timeline of key events]
[关键事件时间线]

Status

状态

  • All phases completed
  • All agents submitted answers
  • Voting completed correctly
  • Winner selected
  • Final answer delivered
  • 所有阶段已完成
  • 所有Agent已提交答案
  • 投票已正确完成
  • 已选出获胜者
  • 已交付最终答案

Issues Found

发现的问题

[List any correctness issues]
[列出所有正确性问题]

2. Efficiency Analysis

2. 效率分析

Phase Duration Breakdown

阶段时长分布

PhaseCountAvg (s)Max (s)Total (s)% of Total
initial_answer
voting
presentation
阶段次数平均时长(秒)最长时长(秒)总时长(秒)占比
initial_answer
voting
presentation

Top Bottlenecks

主要瓶颈

  1. [Operation] - X seconds (X% of total)
  2. [Operation] - X seconds
  3. [Operation] - X seconds
  1. [操作] - X秒(占总时长X%)
  2. [操作] - X秒
  3. [操作] - X秒

3. Command Pattern Analysis

3. 命令模式分析

Frequently Repeated Commands

频繁重复的命令

CommandTimes RunIssueRecommendation
openskills read pptx
XLong output (~5KB) re-read after restartsCache skill docs
npm install ...
XReinstalled after each restartPersist node_modules
...
命令运行次数问题建议
openskills read pptx
X重启后重复读取长输出(约5KB)缓存技能文档
npm install ...
X每次重启后重新安装持久化node_modules
...

Commands with Excessive Output

输出过大的命令

CommandOutput SizeIssue
命令输出大小问题

Slowest Command Patterns

最慢的命令模式

PatternMax TimeAvg TimeNotes
Web scraping (crawl4ai)XsXs
npm installXsXs
PPTX pipelineXsXs
模式最长耗时平均耗时说明
网页抓取(crawl4ai)X秒X秒
npm installX秒X秒
PPTX流水线X秒X秒

4. Work Duplication Analysis

4. 工作重复分析

Duplicated Work Found

发现的重复工作

Work TypeTimes RepeatedWasted TimeWasted Cost
Image generationXX min$X.XX
Research/scrapingXX min-
Package installsXX min-
工作类型重复次数浪费时间浪费成本
图像生成XX分钟$X.XX
研究/抓取XX分钟-
包安装XX分钟-

Specific Examples

具体示例

[List specific examples of duplicated work with prompts/commands]
[列出重复工作的具体示例,含提示/命令]

Recommendations

建议

  1. [Specific recommendation to avoid duplication]
  2. [Specific recommendation]
  1. [避免重复的具体建议]
  2. [具体建议]

5. Agent Behavior Analysis

5. Agent行为分析

Answer Progression

答案演进

LabelAgentTimeSummary
agent1.1agent_aHH:MM[brief description]
agent2.1agent_bHH:MM[brief description]
...
标签Agent时间摘要
agent1.1agent_aHH:MM[简要描述]
agent2.1agent_bHH:MM[简要描述]
...

Voting Analysis

投票分析

VoterVoted ForReasoning Summary
agent_bagent1.1"[key quote from reasoning]"
投票Agent投票对象推理摘要
agent_bagent1.1"[推理过程中的关键引用]"

Vote vs New Answer Decisions

投票 vs 新答案决策

[Explain how agents decided whether to vote or provide new answers]
[解释Agent如何决定投票还是提供新答案]

Agent Collaboration Quality

Agent协作质量

  • Did agents read each other's answers? [Yes/No with evidence]
  • Did agents build upon previous work? [Yes/No with evidence]
  • Did agents provide genuine evaluation? [Yes/No with evidence]
  • Agent是否阅读了彼此的答案?[是/否,附证据]
  • Agent是否基于之前的工作进行构建?[是/否,附证据]
  • Agent是否提供了真实的评估?[是/否,附证据]

Timeouts/Incomplete Rounds

超时/未完成轮次

[List any timeouts with context]
[列出所有超时事件及上下文]

6. Cost & Token Analysis

6. 成本与Token分析

Cost Breakdown

成本分布

AgentInput TokensOutput TokensReasoningCost
agent_a$X.XX
agent_b$X.XX
Total$X.XX
Agent输入Token输出Token推理Token成本
agent_a$X.XX
agent_b$X.XX
总计$X.XX

Cache Efficiency

缓存效率

  • Cached input tokens: X (X% cache hit rate)
  • 缓存输入Token:X(缓存命中率X%)

Tool Cost Impact

工具成本影响

ToolCallsEst. Time CostNotes
generate_mediaXX min
command_lineXX min
工具调用次数估算时间成本说明
generate_mediaXX分钟
command_lineXX分钟

7. Errors & Issues

7. 错误与问题

Exceptions

异常

[List any exceptions with type and message]
[列出所有异常,含类型与消息]

Failed Tool Calls

失败的工具调用

[List any failed tools]
[列出所有失败的工具]

Agent Errors

Agent错误

[List any agent-level errors]
[列出所有Agent级别的错误]

Timeouts

超时

[List any timeouts with duration and context]
[列出所有超时事件及时长与上下文]

8. Recommendations

8. 建议

High Priority

高优先级

  1. [Issue]: [Specific actionable recommendation]
  2. [Issue]: [Specific actionable recommendation]
  1. [问题]:[具体可操作建议]
  2. [问题]:[具体可操作建议]

Medium Priority

中优先级

  1. [Issue]: [Recommendation]
  1. [问题]:[建议]

Low Priority / Future Improvements

低优先级/未来改进

  1. [Issue]: [Recommendation]
  1. [问题]:[建议]

9. Suggested Linear Issues

9. 建议的Linear问题

Based on the analysis, the following issues are suggested for tracking. If you have access to the Linear project and the session is interactive, present these to the user for approval before creating. Regardless of access, you should write them in a section as below, as we want to learn from the logs to propose and later solve concrete issues:
PriorityTitleDescriptionLabels
High[Short title][1-2 sentence description]log-analysis, [area]
Medium[Short title][Description]log-analysis, [area]
After user approval, create issues in Linear with:
  • Project: MassGen
  • Label:
    log-analysis
    (to identify issues from log analysis)
  • Additional labels as appropriate (e.g.,
    performance
    ,
    agent-behavior
    ,
    tooling
    )
基于分析结果,建议创建以下跟踪问题。如果您有权访问Linear项目且会话为交互式,请先提交给用户审批后再创建。无论是否有权访问,都应在本节中列出,以便我们从日志中学习并提出具体的解决方案:
优先级标题描述标签
[简短标题][1-2句话描述]log-analysis, [领域]
[简短标题][描述]log-analysis, [领域]
用户审批后,在Linear中创建问题:
  • 项目:MassGen
  • 标签:
    log-analysis
    (用于标识来自日志分析的问题)
  • 附加标签(如:
    performance
    agent-behavior
    tooling

Appendix

附录

Configuration Used

使用的配置

[Key config settings from execution_metadata.yaml]
[来自execution_metadata.yaml的关键配置设置]

Files Generated

生成的文件

[List of output files in the workspace]
undefined
[工作区中的输出文件列表]
undefined

Workflow for Generating Report

报告生成工作流

  1. Read local files first (metrics_summary.json, coordination_table.txt, coordination_events.json)
  2. Query Logfire for trace_id and timing data (if available; wait and retry on rate limits)
  3. Analyze streaming_debug.log for command patterns
  4. Check vote.json files for agent reasoning
  5. Generate the report using the template
  6. Save to
    [log_dir]/turn_N/ANALYSIS_REPORT.md
    (N = turn number being analyzed)
  7. Print summary to the user
  8. Suggest Linear issues based on findings - present to user for approval, if session is interactive
  9. Create approved issues in Linear with
    log-analysis
    label
  1. 先读取本地文件(metrics_summary.json、coordination_table.txt、coordination_events.json)
  2. 查询Logfire 获取trace_id和时序数据(若可用;遇到速率限制时等待并重试)
  3. 分析streaming_debug.log 查找命令模式
  4. 检查vote.json文件 获取Agent推理过程
  5. 使用模板生成报告
  6. 保存到
    [log_dir]/turn_N/ANALYSIS_REPORT.md
    (N = 待分析的轮次编号)
  7. 向用户打印摘要
  8. 基于发现建议Linear问题 - 若会话为交互式,提交给用户审批
  9. 创建已审批的问题 在Linear中添加
    log-analysis
    标签

Part 7: Quick Reference - SQL Queries

第七部分:快速参考 - SQL查询

Correctness Queries

正确性查询

sql
-- Check coordination flow
SELECT span_name, start_timestamp, duration
FROM records
WHERE trace_id = '[TRACE_ID]'
  AND (span_name LIKE 'Coordination event:%'
       OR span_name LIKE 'Agent answer:%'
       OR span_name LIKE 'Agent vote:%'
       OR span_name LIKE 'Winner selected:%'
       OR span_name LIKE 'Final answer%')
ORDER BY start_timestamp
sql
-- 检查协调流程
SELECT span_name, start_timestamp, duration
FROM records
WHERE trace_id = '[TRACE_ID]'
  AND (span_name LIKE 'Coordination event:%'
       OR span_name LIKE 'Agent answer:%'
       OR span_name LIKE 'Agent vote:%'
       OR span_name LIKE 'Winner selected:%'
       OR span_name LIKE 'Final answer%')
ORDER BY start_timestamp

Efficiency Queries

效率查询

sql
-- Phase duration breakdown
SELECT
  CASE
    WHEN span_name LIKE 'agent.%.round_0' THEN 'initial_answer'
    WHEN span_name LIKE 'agent.%.round_%' THEN 'voting'
    WHEN span_name LIKE 'agent.%.presentation' THEN 'presentation'
    ELSE 'other'
  END as phase,
  COUNT(*) as count,
  ROUND(AVG(duration)::numeric, 2) as avg_duration_sec,
  ROUND(MAX(duration)::numeric, 2) as max_duration_sec,
  ROUND(SUM(duration)::numeric, 2) as total_duration_sec
FROM records
WHERE trace_id = '[TRACE_ID]'
  AND span_name LIKE 'agent.%'
GROUP BY 1
ORDER BY total_duration_sec DESC
sql
-- 阶段时长分布
SELECT
  CASE
    WHEN span_name LIKE 'agent.%.round_0' THEN 'initial_answer'
    WHEN span_name LIKE 'agent.%.round_%' THEN 'voting'
    WHEN span_name LIKE 'agent.%.presentation' THEN 'presentation'
    ELSE 'other'
  END as phase,
  COUNT(*) as count,
  ROUND(AVG(duration)::numeric, 2) as avg_duration_sec,
  ROUND(MAX(duration)::numeric, 2) as max_duration_sec,
  ROUND(SUM(duration)::numeric, 2) as total_duration_sec
FROM records
WHERE trace_id = '[TRACE_ID]'
  AND span_name LIKE 'agent.%'
GROUP BY 1
ORDER BY total_duration_sec DESC

Error Queries

错误查询

sql
-- Find all exceptions
SELECT span_name, exception_type, exception_message, start_timestamp
FROM records
WHERE trace_id = '[TRACE_ID]' AND is_exception = true
ORDER BY start_timestamp
sql
-- 查找所有异常
SELECT span_name, exception_type, exception_message, start_timestamp
FROM records
WHERE trace_id = '[TRACE_ID]' AND is_exception = true
ORDER BY start_timestamp

Cost Queries

成本查询

sql
-- Token usage by agent
SELECT
  attributes->>'massgen.agent_id' as agent,
  SUM((attributes->'massgen.usage.input')::int) as total_input_tokens,
  SUM((attributes->'massgen.usage.output')::int) as total_output_tokens,
  SUM((attributes->'massgen.usage.cost')::float) as total_cost_usd
FROM records
WHERE trace_id = '[TRACE_ID]'
  AND span_name LIKE 'agent.%'
  AND attributes->>'massgen.usage.input' IS NOT NULL
GROUP BY attributes->>'massgen.agent_id'
sql
-- Agent的Token使用情况
SELECT
  attributes->>'massgen.agent_id' as agent,
  SUM((attributes->'massgen.usage.input')::int) as total_input_tokens,
  SUM((attributes->'massgen.usage.output')::int) as total_output_tokens,
  SUM((attributes->'massgen.usage.cost')::float) as total_cost_usd
FROM records
WHERE trace_id = '[TRACE_ID]'
  AND span_name LIKE 'agent.%'
  AND attributes->>'massgen.usage.input' IS NOT NULL
GROUP BY attributes->>'massgen.agent_id'
undefined