wandb-primary

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

W&B Primary Skill

W&B核心技能指南

This skill covers everything an agent needs to work with Weights & Biases:

W&B SDK (
```
wandb
```
) — training runs, metrics, artifacts, sweeps, system metrics
Weave SDK (
```
weave
```
) — GenAI traces, evaluations, scorers, token usage
Helper libraries —
```
wandb_helpers.py
```
and
```
weave_helpers.py
```
for common operations
High-level Weave API (
```
weave_tools.weave_api
```
) — agent-friendly wrappers for Weave queries

本技能涵盖了Agent使用Weights & Biases所需的全部内容：

W&B SDK (
```
wandb
```
) — 训练运行、指标、工件、超参数调优、系统指标
Weave SDK (
```
weave
```
) — GenAI追踪、评估、评分器、Token使用量
辅助库 —
```
wandb_helpers.py
```
和
```
weave_helpers.py
```
用于执行常见操作
高级Weave API (
```
weave_tools.weave_api
```
) — 为Agent优化的Weave查询封装器

When to use what

场景选择指南

I need to...	Use
Query training runs, loss curves, hyperparameters	W&B SDK ( `wandb.Api()` ) — see `references/WANDB_SDK.md`
Query GenAI traces, calls, evaluations	High-level Weave API ( `weave_tools.weave_api` ) — see `references/WEAVE_API.md`
Convert Weave wrapper types to plain Python	`weave_helpers.unwrap()`
Build a DataFrame from training runs	`wandb_helpers.runs_to_dataframe()`
Extract eval results for analysis	`weave_helpers.eval_results_to_dicts()`
Do something the high-level API doesn't cover	Raw Weave SDK ( `weave.init()` , `client.get_calls()` ) — see `references/WEAVE_SDK_RAW.md`

我需要...	使用工具
查询训练运行、损失曲线、超参数	W&B SDK ( `wandb.Api()` ) — 详见 `references/WANDB_SDK.md`
查询GenAI追踪、调用记录、评估结果	高级Weave API ( `weave_tools.weave_api` ) — 详见 `references/WEAVE_API.md`
将Weave封装类型转换为纯Python类型	`weave_helpers.unwrap()`
从训练运行数据构建DataFrame	`wandb_helpers.runs_to_dataframe()`
提取评估结果用于分析	`weave_helpers.eval_results_to_dicts()`
执行高级API未覆盖的操作	原生Weave SDK ( `weave.init()` , `client.get_calls()` ) — 详见 `references/WEAVE_SDK_RAW.md`

Bundled files

配套文件

Helper libraries

辅助库

python

import sys
sys.path.insert(0, "skills/wandb-primary/scripts")

python

import sys
sys.path.insert(0, "skills/wandb-primary/scripts")

Weave helpers (traces, evals, GenAI)

from weave_helpers import ( unwrap, # Recursively convert Weave types -> plain Python get_token_usage, # Extract token counts from a call's summary eval_results_to_dicts, # predict_and_score calls -> list of result dicts pivot_solve_rate, # Build task-level pivot table across agents results_summary, # Print compact eval summary eval_health, # Extract status/counts from Evaluation.evaluate calls eval_efficiency, # Compute tokens-per-success across eval calls )

W&B helpers (training runs, metrics)

from wandb_helpers import ( runs_to_dataframe, # Convert runs to a clean pandas DataFrame diagnose_run, # Quick diagnostic summary of a training run compare_configs, # Side-by-side config diff between two runs )

undefined

undefined

Reference docs

参考文档

Read these as needed — they contain full API surfaces and recipes:

references/WEAVE_API.md
— High-level Weave API (
```
Project
```
,
```
Eval
```
,
```
CallsView
```
). Start here for Weave queries.
references/WANDB_SDK.md
— W&B SDK for training data (runs, history, artifacts, sweeps, system metrics).
references/WEAVE_SDK_RAW.md
— Low-level Weave SDK (
```
client.get_calls()
```
,
```
CallsFilter
```
). Use only when the high-level API isn't enough.

按需阅读以下文档，包含完整API说明和使用示例：

references/WEAVE_API.md
— 高级Weave API（
```
Project
```
、
```
Eval
```
丶
```
CallsView
```
）。查询Weave数据从这里开始。
references/WANDB_SDK.md
— 用于训练数据的W&B SDK（运行记录、历史数据、工件、超参数调优、系统指标）。
references/WEAVE_SDK_RAW.md
— 原生Weave SDK（
```
client.get_calls()
```
、
```
CallsFilter
```
）。仅在高级API无法满足需求时使用。

Critical rules

关键规则

Treat traces and runs as DATA

将追踪记录和运行数据视为结构化数据

Weave traces and W&B run histories can be enormous. Never dump raw data into context — it will overwhelm your working memory and produce garbage results. Always:

Inspect structure first — look at column names, dtypes, row counts
Load into pandas/numpy — compute stats programmatically
Summarize, don't dump — print computed statistics and tables, not raw rows

python

import pandas as pd
import numpy as np

Weave追踪记录和W&B运行历史数据量可能极大，切勿直接输出原始数据到上下文——这会耗尽内存并导致无效结果。请始终遵循以下步骤：

先检查结构 — 查看列名、数据类型、行数
加载到pandas/numpy — 以编程方式计算统计数据
输出摘要而非原始数据 — 打印计算后的统计信息和表格，而非原始行数据

python

import pandas as pd
import numpy as np

BAD: prints thousands of rows into context

for row in run.scan_history(keys=["loss"]): print(row)

GOOD: load into numpy, compute stats, print summary

losses = np.array([r["loss"] for r in run.scan_history(keys=["loss"])]) print(f"Loss: {len(losses)} steps, min={losses.min():.4f}, " f"final={losses[-1]:.4f}, mean_last_10%={losses[-len(losses)//10:].mean():.4f}")

undefined

undefined

Always deliver a final answer

始终给出最终结论

Do not end your work mid-analysis. Every task must conclude with a clear, structured response:

Query the data (1-2 scripts max)
Extract the numbers you need
Present: table + key findings + direct answers to each sub-question

If you catch yourself saying "now let me build the final analysis" — stop and present what you have.

不要在分析中途停止工作，每个任务都必须以清晰、结构化的响应收尾：

查询数据（最多1-2个脚本）
提取所需数据
呈现：表格 + 关键发现 + 每个子问题的直接答案

如果发现自己在想“现在我要完成最终分析”——请停止并立即呈现已有的结果。

Use

unwrap()

for unknown Weave data

对未知Weave数据使用

unwrap()

When you encounter Weave output and aren't sure of its type (WeaveDict? WeaveObject? ObjectRef?), unwrap it first:

python

from weave_helpers import unwrap
import json

output = unwrap(call.output)
print(json.dumps(output, indent=2, default=str))

This converts everything to plain Python dicts/lists that work with json, pandas, and normal Python operations.

当遇到Weave输出且不确定其类型（WeaveDict？WeaveObject？ObjectRef？）时，请先使用

unwrap()

转换：

python

from weave_helpers import unwrap
import json

output = unwrap(call.output)
print(json.dumps(output, indent=2, default=str))

这会将所有数据转换为纯Python字典/列表，可与json、pandas及常规Python操作兼容。

Environment setup

环境设置

The sandbox has Python 3.13,

uv

wandb

weave

pandas

, and

numpy

pre-installed.

python

import os
entity  = os.environ["WANDB_ENTITY"]
project = os.environ["WANDB_PROJECT"]

沙箱环境已预安装Python 3.13、

uv

、

wandb

、

weave

、

pandas

和

numpy

。

python

import os
entity  = os.environ["WANDB_ENTITY"]
project = os.environ["WANDB_PROJECT"]

Installing extra packages

安装额外包

bash

uv pip install matplotlib seaborn rich tabulate

bash

uv pip install matplotlib seaborn rich tabulate

Running scripts

运行脚本

bash

uv run script.py          # always use uv run, never bare python
uv run --with rich python -c "import rich; rich.print('hello')"

bash

uv run script.py          # always use uv run, never bare python
uv run --with rich python -c "import rich; rich.print('hello')"

Quick starts

快速入门

W&B SDK — training runs

W&B SDK — 训练运行

python

import wandb
import pandas as pd
api = wandb.Api()

path = f"{entity}/{project}"
runs = api.runs(path, filters={"state": "finished"}, order="-created_at")

python

import wandb
import pandas as pd
api = wandb.Api()

path = f"{entity}/{project}"
runs = api.runs(path, filters={"state": "finished"}, order="-created_at")

Convert to DataFrame (always slice — never list() all runs)

from wandb_helpers import runs_to_dataframe rows = runs_to_dataframe(runs, limit=100, metric_keys=["loss", "val_loss", "accuracy"]) df = pd.DataFrame(rows) print(df.describe())


For full W&B SDK reference (filters, history, artifacts, sweeps), read `references/WANDB_SDK.md`.

from wandb_helpers import runs_to_dataframe rows = runs_to_dataframe(runs, limit=100, metric_keys=["loss", "val_loss", "accuracy"]) df = pd.DataFrame(rows) print(df.describe())


完整W&B SDK参考（过滤条件、历史数据、工件、超参数调优）请阅读`references/WANDB_SDK.md`。

Weave — high-level API (preferred)

Weave — 高级API（推荐使用）

python

import sys
sys.path.insert(0, "skills/wandb-primary/scripts")
from weave_tools.weave_api import init, Project

init(f"{entity}/{project}")
project = Project.current()
print(project.summary())  # start here — shows ops, objects, evals, feedback

For full high-level API reference, read

references/WEAVE_API.md

python

import sys
sys.path.insert(0, "skills/wandb-primary/scripts")
from weave_tools.weave_api import init, Project

init(f"{entity}/{project}")
project = Project.current()
print(project.summary())  # start here — shows ops, objects, evals, feedback

完整高级API参考请阅读

references/WEAVE_API.md

。

Weave — raw SDK (when you need low-level access)

Weave — 原生SDK（需要底层访问时使用）

python

import weave
client = weave.init(f"{entity}/{project}")  # positional string, NOT keyword arg
calls = client.get_calls(limit=10)

For raw SDK patterns (CallsFilter, Query, advanced filtering), read

references/WEAVE_SDK_RAW.md

python

import weave
client = weave.init(f"{entity}/{project}")  # positional string, NOT keyword arg
calls = client.get_calls(limit=10)

原生SDK使用模式（CallsFilter、Query、高级过滤）请阅读

references/WEAVE_SDK_RAW.md

。

Key patterns

核心使用模式

Weave eval inspection

Weave评估检查

Evaluation calls follow this hierarchy:

Evaluation.evaluate (root)
  ├── Evaluation.predict_and_score (one per dataset row x trials)
  │     ├── model.predict (the actual model call)
  │     ├── scorer_1.score
  │     └── scorer_2.score
  └── Evaluation.summarize

Extract per-task results into a DataFrame:

python

from weave_helpers import eval_results_to_dicts, results_summary

评估调用遵循以下层级结构：

Evaluation.evaluate (根节点)
  ├── Evaluation.predict_and_score (每个数据集行x trials对应一个)
  │     ├── model.predict (实际模型调用)
  │     ├── scorer_1.score
  │     └── scorer_2.score
  └── Evaluation.summarize

将单任务结果提取为DataFrame：

python

from weave_helpers import eval_results_to_dicts, results_summary

pas_calls = list of predict_and_score call objects

results = eval_results_to_dicts(pas_calls, agent_name="my-agent") print(results_summary(results))

df = pd.DataFrame(results) print(df.groupby("passed")["score"].mean())

undefined

results = eval_results_to_dicts(pas_calls, agent_name="my-agent") print(results_summary(results))

df = pd.DataFrame(results) print(df.groupby("passed")["score"].mean())

undefined

Eval health and efficiency

评估健康度与效率

python

from weave_helpers import eval_health, eval_efficiency

health = eval_health(eval_calls)
df = pd.DataFrame(health)
print(df.to_string(index=False))

efficiency = eval_efficiency(eval_calls)
print(pd.DataFrame(efficiency).to_string(index=False))

python

from weave_helpers import eval_health, eval_efficiency

health = eval_health(eval_calls)
df = pd.DataFrame(health)
print(df.to_string(index=False))

efficiency = eval_efficiency(eval_calls)
print(pd.DataFrame(efficiency).to_string(index=False))

Token usage

Token使用量统计

python

from weave_helpers import get_token_usage

usage = get_token_usage(call)
print(f"Tokens: {usage['total_tokens']} (in={usage['input_tokens']}, out={usage['output_tokens']})")

python

from weave_helpers import get_token_usage

usage = get_token_usage(call)
print(f"Tokens: {usage['total_tokens']} (in={usage['input_tokens']}, out={usage['output_tokens']})")

Cost estimation

成本估算

python

call_with_costs = client.get_call("id", include_costs=True)
costs = call_with_costs.summary.get("weave", {}).get("costs", {})

python

call_with_costs = client.get_call("id", include_costs=True)
costs = call_with_costs.summary.get("weave", {}).get("costs", {})

Run diagnostics

运行诊断

python

from wandb_helpers import diagnose_run

run = api.run(f"{path}/run-id")
diag = diagnose_run(run)
for k, v in diag.items():
    print(f"  {k}: {v}")

python

from wandb_helpers import diagnose_run

run = api.run(f"{path}/run-id")
diag = diagnose_run(run)
for k, v in diag.items():
    print(f"  {k}: {v}")

Error analysis — open coding to axial coding

错误分析——从开放式编码到主轴编码

For structured failure analysis on eval results:

Understand data shape — use

project.summary()

calls.input_shape()

calls.output_shape()

Open coding — write a Weave Scorer that journals what went wrong per failing call
Axial coding — write a second Scorer that classifies notes into a taxonomy
Summarize — count primary labels with
```
collections.Counter
```

See

references/WEAVE_API.md

for the full

run_scorer

API.

针对评估结果进行结构化故障分析：

了解数据结构 — 使用

project.summary()

、

calls.input_shape()

、

calls.output_shape()

开放式编码 — 编写Weave评分器，记录每个失败调用的问题
主轴编码 — 编写第二个评分器，将问题分类为特定类别
总结 — 使用
```
collections.Counter
```
统计主要标签数量

完整

run_scorer

API请阅读

references/WEAVE_API.md

。

W&B Reports

W&B报告

bash

uv pip install "wandb[workspaces]"

python

from wandb.apis import reports as wr
import wandb_workspaces.expr as expr

report = wr.Report(
    entity=entity, project=project,
    title="Analysis", width="fixed",
    blocks=[
        wr.H1(text="Results"),
        wr.PanelGrid(
            runsets=[wr.Runset(entity=entity, project=project)],
            panels=[wr.LinePlot(title="Loss", x="_step", y=["loss"])],
        ),
    ],
)

bash

uv pip install "wandb[workspaces]"

python

from wandb.apis import reports as wr
import wandb_workspaces.expr as expr

report = wr.Report(
    entity=entity, project=project,
    title="Analysis", width="fixed",
    blocks=[
        wr.H1(text="Results"),
        wr.PanelGrid(
            runsets=[wr.Runset(entity=entity, project=project)],
            panels=[wr.LinePlot(title="Loss", x="_step", y=["loss"])],
        ),
    ],
)

report.save(draft=True) # only when asked to publish


Use `expr.Config("lr")`, `expr.Summary("loss")`, `expr.Tags().isin([...])` for runset filters — not dot-path strings.

---


使用`expr.Config("lr")`、`expr.Summary("loss")`、`expr.Tags().isin([...])`作为运行集过滤条件——不要使用点路径字符串。

---

Gotchas

常见陷阱

Weave API

Gotcha	Wrong	Right
weave.init args	`weave.init(project="x")`	`weave.init("x")` (positional)
Parent filter	`filter={'parent_id': 'x'}`	`filter={'parent_ids': ['x']}` (plural, list)
WeaveObject access	`rubric.get('passed')`	`getattr(rubric, 'passed', None)`
Nested output	`out.get('succeeded')`	`out.get('output').get('succeeded')` (output.output)
ObjectRef comparison	`name_ref == "foo"`	`str(name_ref) == "foo"`
CallsFilter import	`from weave import CallsFilter`	`from weave.trace.weave_client import CallsFilter`
Query import	`from weave import Query`	`from weave.trace_server.interface.query import Query`
Eval status path	`summary["status"]`	`summary["weave"]["status"]`
Eval success count	`summary["success_count"]`	`summary["weave"]["status_counts"]["success"]`
When in doubt	Guess the type	`unwrap()` first, then inspect

陷阱	错误用法	正确用法
weave.init参数	`weave.init(project="x")`	`weave.init("x")` （位置参数）
父节点过滤	`filter={'parent_id': 'x'}`	`filter={'parent_ids': ['x']}` （复数形式，列表）
WeaveObject访问	`rubric.get('passed')`	`getattr(rubric, 'passed', None)`
嵌套输出	`out.get('succeeded')`	`out.get('output').get('succeeded')` （output.output）
ObjectRef比较	`name_ref == "foo"`	`str(name_ref) == "foo"`
CallsFilter导入	`from weave import CallsFilter`	`from weave.trace.weave_client import CallsFilter`
Query导入	`from weave import Query`	`from weave.trace_server.interface.query import Query`
评估状态路径	`summary["status"]`	`summary["weave"]["status"]`
评估成功计数	`summary["success_count"]`	`summary["weave"]["status_counts"]["success"]`
不确定时	猜测类型	先使用 `unwrap()` 再检查

WeaveDict vs WeaveObject

WeaveDict: dict-like, supports
```
.get()
```
,
```
.keys()
```
,
```
[]
```
. Used for:
```
call.inputs
```
,
```
call.output
```
,
```
scores
```
dict
WeaveObject: attribute-based, use
```
getattr()
```
. Used for: scorer results (rubric), dataset rows
When in doubt: use
```
unwrap()
```
to convert everything to plain Python

WeaveDict：类字典结构，支持
```
.get()
```
、
```
.keys()
```
、
```
[]
```
，用于：
```
call.inputs
```
、
```
call.output
```
、
```
scores
```
字典
WeaveObject：基于属性访问，使用
```
getattr()
```
，用于：评分器结果（rubric）、数据集行
不确定时：使用
```
unwrap()
```
转换为纯Python类型

W&B API

Gotcha	Wrong	Right
Summary access	`run.summary["loss"]`	`run.summary_metrics.get("loss")`
Loading all runs	`list(api.runs(...))`	`runs[:200]` (always slice)
History — all fields	`run.history()`	`run.history(samples=500, keys=["loss"])`
scan_history — no keys	`scan_history()`	`scan_history(keys=["loss"])` (explicit)
Raw data in context	`print(run.history())`	Load into DataFrame, compute stats
Metric at step N	iterate entire history	`scan_history(keys=["loss"], min_step=N, max_step=N+1)`
Cache staleness	reading live run	`api.flush()` first

陷阱	错误用法	正确用法
摘要访问	`run.summary["loss"]`	`run.summary_metrics.get("loss")`
加载所有运行	`list(api.runs(...))`	`runs[:200]` （始终切片）
历史数据——所有字段	`run.history()`	`run.history(samples=500, keys=["loss"])`
scan_history——无指定字段	`scan_history()`	`scan_history(keys=["loss"])` （显式指定）
上下文输出原始数据	`print(run.history())`	加载到DataFrame后计算统计值
获取第N步的指标	遍历整个历史	`scan_history(keys=["loss"], min_step=N, max_step=N+1)`
缓存过期	直接读取实时运行数据	先执行 `api.flush()`

Package management

包管理

Gotcha	Wrong	Right
Installing packages	`pip install pandas`	`uv pip install pandas`
Running scripts	`python script.py`	`uv run script.py`
Quick one-off	`pip install rich && python -c ...`	`uv run --with rich python -c ...`

陷阱	错误用法	正确用法
安装包	`pip install pandas`	`uv pip install pandas`
运行脚本	`python script.py`	`uv run script.py`
快速一次性运行	`pip install rich && python -c ...`	`uv run --with rich python -c ...`

Weave logging noise

Weave日志噪音

Weave prints version warnings to stderr. Suppress with:

python

import logging
logging.getLogger("weave").setLevel(logging.ERROR)

Weave会向stderr输出版本警告，可通过以下方式抑制：

python

import logging
logging.getLogger("weave").setLevel(logging.ERROR)

Quick reference

快速参考

python

undefined

python

undefined

--- Weave: How many traces? ---

from weave_tools.weave_api import init, Project init(f"{entity}/{project}") project = Project.current() print(project.summary())

--- Weave: Recent evals ---

evals = project.evals(limit=10) for ev in evals: print(ev.summarize())

--- Weave: Failed calls ---

calls = project.calls(op="predict") failed = calls.limit(1000).filter(lambda c: c.status == "error")

--- W&B: Best run by loss ---

best = api.runs(path, filters={"state": "finished"}, order="+summary_metrics.loss")[:1] print(f"Best: {best[0].name}, loss={best[0].summary_metrics.get('loss')}")

--- W&B: Loss curve to numpy ---

losses = np.array([r["loss"] for r in run.scan_history(keys=["loss"])]) print(f"min={losses.min():.6f}, final={losses[-1]:.6f}, steps={len(losses)}")

--- W&B: Compare two runs ---

from wandb_helpers import compare_configs diffs = compare_configs(run_a, run_b) print(pd.DataFrame(diffs).to_string(index=False))

undefined

from wandb_helpers import compare_configs diffs = compare_configs(run_a, run_b) print(pd.DataFrame(diffs).to_string(index=False))

undefined

wandb-primary

Original

Translation

W&B Primary Skill

W&B核心技能指南

When to use what

场景选择指南

Bundled files

配套文件

Helper libraries

辅助库

Weave helpers (traces, evals, GenAI)

Weave helpers (traces, evals, GenAI)

W&B helpers (training runs, metrics)

W&B helpers (training runs, metrics)

Reference docs

参考文档

Critical rules

关键规则

Treat traces and runs as DATA

将追踪记录和运行数据视为结构化数据

BAD: prints thousands of rows into context

BAD: prints thousands of rows into context

GOOD: load into numpy, compute stats, print summary

GOOD: load into numpy, compute stats, print summary

Always deliver a final answer

始终给出最终结论

Use unwrap() for unknown Weave data

对未知Weave数据使用unwrap()

Environment setup

环境设置

Installing extra packages

安装额外包

Running scripts

运行脚本

Quick starts

快速入门

W&B SDK — training runs

W&B SDK — 训练运行

Convert to DataFrame (always slice — never list() all runs)

Convert to DataFrame (always slice — never list() all runs)

Weave — high-level API (preferred)

Weave — 高级API（推荐使用）

Weave — raw SDK (when you need low-level access)

Weave — 原生SDK（需要底层访问时使用）

Key patterns

核心使用模式

Weave eval inspection

Weave评估检查

pas_calls = list of predict_and_score call objects

pas_calls = list of predict_and_score call objects

Eval health and efficiency

评估健康度与效率

Token usage

Token使用量统计

Cost estimation

成本估算

Run diagnostics

运行诊断

Error analysis — open coding to axial coding

错误分析——从开放式编码到主轴编码

W&B Reports

W&B报告

report.save(draft=True) # only when asked to publish

report.save(draft=True) # only when asked to publish

Gotchas

常见陷阱

Weave API

Weave API

WeaveDict vs WeaveObject

WeaveDict vs WeaveObject

W&B API

W&B API

Package management

包管理

Weave logging noise

Weave日志噪音

Quick reference

快速参考

--- Weave: How many traces? ---

Use
`unwrap()`
for unknown Weave data

对未知Weave数据使用
`unwrap()`