trace-annotation-tool

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Trace Annotation Tool Generator

跟踪记录注释工具生成器

Generate a custom local web application for open coding of LLM traces — the first qualitative pass of error analysis in the Analyze phase of the evaluation lifecycle.

生成一款用于LLM跟踪记录开放式编码的自定义本地Web应用——这是评估生命周期中分析阶段错误分析的首轮定性环节。

Core Workflow

核心工作流程

Step 1: Understand the User's Trace Data

步骤1：了解用户的跟踪记录数据

Ask the user to point to their trace data file (CSV, JSONL, JSON, or any structured format).
Read a sample of the data to understand its structure: field names, nesting depth, which fields represent the user query, intermediate steps, tool calls, and final output.
Identify a unique trace identifier field (or generate sequential IDs if none exists).
Confirm the structure with the user: "I see fields X, Y, Z — which represent the trace steps, and which is the user query?"

请用户提供其跟踪记录数据文件（CSV、JSONL、JSON或任何结构化格式）。
读取数据样本以了解其结构：字段名称、嵌套深度、哪些字段代表用户查询、中间步骤、工具调用和最终输出。
识别唯一的跟踪记录标识符字段（如果不存在则生成连续ID）。
与用户确认结构：“我看到字段X、Y、Z——哪些代表跟踪步骤，哪个是用户查询？”

Step 2: Ask About Additional Features

步骤2：询问额外功能需求

The tool includes these features by default:

Trace viewer: One trace at a time, with tailored visual rendering of the trace structure
Freeform notes: Text field for open coding observations
Pass / Fail / Defer: Binary judgment with a defer option for uncertain traces
Keyboard shortcuts: Navigation and annotation hotkeys
Progress indicator: "17 / 100 reviewed" with pass/fail/defer counts
Auto-save: Annotations saved to a separate JSONL file on every action

Ask the user: "These are the default features. Do you want anything else before I generate the tool?" Then incorporate any additional requests.

该工具默认包含以下功能：

跟踪记录查看器：每次显示一条跟踪记录，根据跟踪记录结构定制可视化渲染
自由格式笔记：用于开放式编码观察的文本字段
通过/失败/暂缓：带有暂缓选项的二元判断
键盘快捷键：导航和注释热键
进度指示器：“17 / 100 已审查”，同时显示通过/失败/暂缓的数量
自动保存：每次操作后将注释保存到单独的JSONL文件

询问用户：“以上是默认功能。生成工具前您是否需要其他功能？”然后整合所有额外需求。

Step 3: Generate the Application

步骤3：生成应用

Generate a single-directory Python web application with this structure:

trace-annotator/
├── app.py          # FastHTML application (single file, all routes)
├── requirements.txt # Dependencies (fasthtml, python-fasthtml)
└── README.md        # Brief usage instructions

生成单目录Python Web应用，结构如下：

trace-annotator/
├── app.py          # FastHTML应用（单文件，包含所有路由）
├── requirements.txt # 依赖项（fasthtml, python-fasthtml）
└── README.md        # 简要使用说明

Technology Stack

技术栈

FastHTML for the web framework (HTMX is built-in)

TailwindCSS via CDN (

<script src="https://cdn.tailwindcss.com">

) for styling

Vanilla JavaScript only for keyboard shortcut bindings

FastHTML 作为Web框架（内置HTMX）

TailwindCSS via CDN (

<script src="https://cdn.tailwindcss.com">

) 用于样式设计

原生JavaScript 仅用于绑定键盘快捷键

Application Architecture

应用架构

app.py
— a single-file FastHTML app with these routes:

```
GET /
```
— main annotation view showing the current trace, annotation form, and progress
```
POST /annotate
```
— save annotation (notes + pass/fail/defer) and advance to next trace
```
GET /trace/{n}
```
— navigate to a specific trace (used by prev/next and keyboard nav)
```
GET /progress
```
— return progress stats (for HTMX partial updates)

Data flow:

On startup, read the trace data file from a path specified via command-line argument or environment variable.
Load existing annotations from
```
annotations.jsonl
```
(if it exists) to preserve prior work.
On each annotation action, append/update the entry in
```
annotations.jsonl
```
immediately.
The annotations file is separate from the source data — the original file is never modified.

Annotations file format (

annotations.jsonl

json

{"trace_id": "abc-123", "status": "fail", "notes": "SQL query missed the pet-friendly constraint", "timestamp": "2025-01-15T10:32:00Z"}
{"trace_id": "abc-124", "status": "pass", "notes": "", "timestamp": "2025-01-15T10:32:45Z"}
{"trace_id": "abc-125", "status": "defer", "notes": "Not sure if tone is appropriate for investor", "timestamp": "2025-01-15T10:33:12Z"}

app.py
—— 单文件FastHTML应用，包含以下路由：

```
GET /
```
—— 主注释视图，显示当前跟踪记录、注释表单和进度
```
POST /annotate
```
—— 保存注释（笔记+通过/失败/暂缓）并跳转到下一条跟踪记录
```
GET /trace/{n}
```
—— 跳转到指定跟踪记录（用于上一条/下一条导航和键盘导航）
```
GET /progress
```
—— 返回统计进度（用于HTMX局部更新）

数据流：

启动时，通过命令行参数或环境变量指定的路径读取跟踪记录数据文件。
加载已有的注释（如果
```
annotations.jsonl
```
存在）以保留之前的工作进度。
每次执行注释操作时，立即在
```
annotations.jsonl
```
中追加/更新条目。
注释文件与源数据分离——原始文件绝不会被修改。

注释文件格式 (

annotations.jsonl

json

{"trace_id": "abc-123", "status": "fail", "notes": "SQL query missed the pet-friendly constraint", "timestamp": "2025-01-15T10:32:00Z"}
{"trace_id": "abc-124", "status": "pass", "notes": "", "timestamp": "2025-01-15T10:32:45Z"}
{"trace_id": "abc-125", "status": "defer", "notes": "Not sure if tone is appropriate for investor", "timestamp": "2025-01-15T10:33:12Z"}

Trace Rendering

跟踪记录渲染

This is the most important part of the tool. Tailor the HTML rendering to the user's specific trace structure. Apply these principles from HCI research on LLM review interfaces:

Visual hierarchy: Emphasize the user query and final output. Use distinct visual blocks (background colors, borders, indentation) for different trace components.
Collapsible sections: For multi-step traces, make intermediate steps (tool calls, reasoning, retrieval) collapsible — expanded by default for the first trace, then respecting the user's toggle state.
Domain-appropriate rendering: If the trace contains emails, render them like emails. If it contains SQL, syntax-highlight the SQL. If it contains JSON tool calls, format them as structured blocks. Match the visual presentation to the content type.
Readable text: Use comfortable line lengths (max-w-prose or similar), adequate spacing, and readable font sizes. Traces can be long — don't cram them.

这是工具最重要的部分。根据用户的特定跟踪记录结构定制HTML渲染。遵循HCI研究中关于LLM审查界面的以下原则：

视觉层级：突出显示用户查询和最终输出。为不同的跟踪记录组件使用不同的视觉块（背景色、边框、缩进）。
可折叠区域：对于多步骤跟踪记录，将中间步骤（工具调用、推理、检索）设置为可折叠——第一条跟踪记录默认展开，之后保留用户的切换状态。
领域适配渲染：如果跟踪记录包含电子邮件，则以邮件样式渲染；如果包含SQL，则对SQL进行语法高亮；如果包含JSON工具调用，则格式化为结构化块。使视觉呈现与内容类型匹配。
易读文本：使用舒适的行宽（类似max-w-prose）、充足的间距和易读的字体大小。跟踪记录可能很长——不要过度压缩内容。

Keyboard Shortcuts

键盘快捷键

Bind these shortcuts via a small inline

<script>

block. Display them in a help tooltip or footer so the user can reference them.

Key	Action
`p`	Mark as Pass and advance
`f`	Mark as Fail and advance
`d`	Mark as Defer and advance
`n`	Next trace (without annotating)
`b`	Previous trace (back)
`e`	Focus the notes text field
`?`	Toggle keyboard shortcut help

Shortcuts must be suppressed when the notes text field is focused (so the user can type normally). Re-enable them on blur.

通过小型内联

<script>

块绑定以下快捷键。在帮助提示框或页脚中显示这些快捷键，方便用户查阅。

按键	操作
`p`	标记为通过并前进
`f`	标记为失败并前进
`d`	标记为暂缓并前进
`n`	下一条跟踪记录（不进行注释）
`b`	上一条跟踪记录（返回）
`e`	聚焦到笔记文本字段
`?`	切换键盘快捷键帮助

当笔记文本字段获得焦点时，必须禁用快捷键（以便用户正常输入）。失去焦点后重新启用。

UI Layout

UI布局

Use a clean, minimal layout with TailwindCSS:

Top bar: Progress indicator ("17 / 100 reviewed — 12 pass, 3 fail, 2 defer"), trace navigation (prev/next buttons), and keyboard shortcut help toggle.
Main area: The rendered trace, taking up most of the viewport. Scrollable if the trace is long.
Bottom panel (sticky): Annotation controls — the notes text field, and pass/fail/defer buttons. Always visible so the user can annotate without scrolling back up.

使用TailwindCSS构建简洁、极简的布局：

顶部栏：进度指示器（“17 / 100 已审查 — 12条通过，3条失败，2条暂缓”）、跟踪记录导航（上一条/下一条按钮）和键盘快捷键帮助切换按钮。
主区域：渲染后的跟踪记录，占据视口的大部分空间。如果跟踪记录过长则可滚动。
底部面板（固定）：注释控件——笔记文本字段和通过/失败/暂缓按钮。始终可见，用户无需滚动即可进行注释。

Styling Guidelines

样式指南

Use TailwindCSS utility classes. The visual design should be:

Clean and minimal — this is a productivity tool, not a marketing page
High contrast for readability during long annotation sessions
Distinct visual treatment for different trace components (user input vs. LLM output vs. tool calls vs. metadata)
Responsive but optimized for desktop — this is a sit-down-and-work tool

使用TailwindCSS工具类。视觉设计应满足：

简洁极简——这是一款生产力工具，而非营销页面
高对比度，确保长时间注释会话中的可读性
为不同的跟踪记录组件提供独特的视觉处理（用户输入 vs LLM输出 vs 工具调用 vs 元数据）
响应式设计但优化桌面端——这是一款需要专注操作的工具

Step 4: Provide Usage Instructions

步骤4：提供使用说明

After generating the tool, tell the user how to run it:

bash

cd trace-annotator
pip install -r requirements.txt
python app.py path/to/traces.jsonl

Then explain the workflow:

Open the browser (FastHTML will print the local URL)
Read each trace carefully, noting the point of first failure (the most upstream issue)
Write a short freeform note describing the observation
Mark as pass, fail, or defer
Use keyboard shortcuts to move quickly through traces
Annotations are saved automatically — you can close and resume anytime

Mention that annotations are saved to

annotations.jsonl

in the same directory.

生成工具后，告知用户如何运行：

bash

cd trace-annotator
pip install -r requirements.txt
python app.py path/to/traces.jsonl

然后解释工作流程：

打开浏览器（FastHTML会打印本地URL）
仔细阅读每条跟踪记录，注意首次失败点（最上游的问题）
写下简短的自由格式笔记描述观察结果
标记为通过、失败或暂缓
使用键盘快捷键快速切换跟踪记录
注释会自动保存——您可以随时关闭并恢复工作

说明注释会保存到同一目录下的

annotations.jsonl

文件中。

What Open Coding Is (and Isn't)

什么是开放式编码（以及什么不是）

Open coding is the qualitative, exploratory first pass through trace data. The user reads traces and jots down raw observations about what's going wrong — without trying to categorize or structure the observations yet. The goal is to surface a broad, honest view of system behavior before imposing any taxonomy.

What to annotate: Focus on the point of first failure in each trace — the most upstream issue. In multi-step traces, a single early error often cascades into multiple downstream failures. Fixing the first error frequently resolves the entire chain.

When to stop: Continue until at least 20 failing traces are labeled and no fundamentally new failure patterns are appearing (theoretical saturation).

What comes next: Once the user has a body of freeform annotations, they move to axial coding — clustering those observations into structured, binary failure modes. This is covered by the

failure-taxonomy

skill.

开放式编码是对跟踪记录数据进行定性、探索性的首轮处理。用户阅读跟踪记录并记录关于问题的原始观察结果——无需尝试对观察结果进行分类或结构化。目标是在强加任何分类体系之前，全面、真实地呈现系统行为。

注释重点：关注每条跟踪记录中的首次失败点——最上游的问题。在多步骤跟踪记录中，一个早期的小错误往往会引发多个下游失败。修复首次错误通常能解决整个连锁问题。

停止时机：至少标记20条失败跟踪记录，且没有出现全新的失败模式（理论饱和）时即可停止。

后续步骤：当用户拥有大量自由格式注释后，将进入主轴编码阶段——将这些观察结果聚类为结构化的二元失败模式。这由

failure-taxonomy

技能覆盖。

Anti-Patterns to Avoid

需避免的反模式

Over-engineering the tool: The annotation tool is a means to an end. Generate a working tool quickly and let the user start annotating. Don't add features they didn't ask for.
Premature structure: Don't add structured failure mode checkboxes or tag systems to the initial tool. Open coding is deliberately unstructured — the taxonomy emerges later. See
```
references/beyond-open-coding.md
```
for when and how to add structure.
Generic trace rendering: Don't just dump raw JSON. Take the time to understand the trace format and render it in a way that makes failures easy to spot.
Ignoring keyboard shortcuts: The textbook is emphatic that annotation speed directly correlates with engineering velocity. Hotkeys are not optional.

过度设计工具：注释工具是实现目标的手段。快速生成可用工具，让用户尽快开始注释。不要添加用户未要求的功能。
过早结构化：不要在初始工具中添加结构化失败模式复选框或标签系统。开放式编码故意不进行结构化——分类体系会在后续阶段形成。如需了解何时及如何添加结构，请参阅
```
references/beyond-open-coding.md
```
。
通用跟踪记录渲染：不要直接输出原始JSON。花时间了解跟踪记录格式，以易于发现失败的方式进行渲染。
忽略键盘快捷键：实践表明，注释速度直接影响工程效率。热键是必不可少的。

Connecting to Next Steps

与后续步骤的衔接

After open coding, the user's workflow typically continues with:

Failure taxonomy (the
```
failure-taxonomy
```
skill): Cluster freeform annotations into structured, binary failure modes via axial coding.
LLM-as-Judge evaluators (the
```
llm-as-a-judge
```
skill): Once failure modes are defined, build automated evaluators for each one.
Extending the tool: The generated annotation tool can be extended to support structured failure tags after the taxonomy is built. See
```
references/beyond-open-coding.md
```
for guidance.

Mention these next steps when the tool is delivered.

完成开放式编码后，用户的工作流程通常会继续以下步骤：

失败分类体系（
```
failure-taxonomy
```
技能）：通过主轴编码将自由格式注释聚类为结构化的二元失败模式。
LLM作为评判者的评估器（
```
llm-as-a-judge
```
技能）：定义失败模式后，为每种模式构建自动化评估器。
扩展工具：生成的注释工具可在构建分类体系后扩展为支持结构化失败标签。如需指导，请参阅
```
references/beyond-open-coding.md
```
。

交付工具时请提及这些后续步骤。