section-mapper

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Section Mapper

小节映射器（Section Mapper）

Create a paper→subsection map that supports evidence building and later synthesis.

Good mapping is diverse (avoids reusing the same paper everywhere) and explainable (short semantic “why”, not just keyword overlap).

创建论文→小节的映射关系，为证据整理及后续内容合成提供支持。

优质的映射需具备多样性（避免在所有地方重复使用同一篇论文）和可解释性（简短的语义层面“映射理由”，而非仅依赖关键词匹配）。

When to use

适用场景

You have
```
outline/outline.yml
```
and a
```
papers/core_set.csv
```
and need coverage per subsection.
You want to identify weak-signal subsections early (so you can adjust scope or add papers).

已拥有
```
outline/outline.yml
```
和
```
papers/core_set.csv
```
，且需要追踪每个小节的论文覆盖率。
希望尽早识别出支撑不足的小节（以便调整范围或补充论文）。

Inputs

输入文件

```
papers/core_set.csv
```
```
outline/outline.yml
```

```
papers/core_set.csv
```
```
outline/outline.yml
```

Outputs

输出文件

```
outline/mapping.tsv
```
```
outline/mapping_report.md
```
(diagnostics: reuse hotspots, weak-signal subsections)

```
outline/mapping.tsv
```
```
outline/mapping_report.md
```
（诊断内容：论文重复使用热点、支撑不足的小节）

Freeze marker (explicit)

冻结标记（显式）

To prevent accidental overwrites after you refine mapping rationales:

Create
```
outline/mapping.refined.ok
```
.

If you rerun the script without this marker, it will back up the previous mapping to a timestamped file:

```
outline/mapping.tsv.bak.<timestamp>
```

为避免在优化映射理由后被意外覆盖：

创建
```
outline/mapping.refined.ok
```
文件。

如果未创建该标记就重新运行脚本，系统会将之前的映射文件备份到带时间戳的文件中：

```
outline/mapping.tsv.bak.<timestamp>
```

Workflow (heuristic)

工作流程（启发式）

Start from the outline subsections (each subsection should be “mappable”).
For each subsection, pick enough papers to support evidence-first writing (A150++ default: 28; smaller runs: ~12–20; lightweight: ~3–6) that are:
- representative (canonical / frequently-cited)
- complementary (different design choices, different eval setups)
- not overly reused elsewhere unless truly foundational
Fill
```
why
```
with a short semantic rationale (one line is enough), e.g.:
- mechanism: “decouples planner/executor; tool calling API”
- evaluation: “interactive web tasks; strong tool error analysis”
- safety: “agentic jailbreak surface; mitigation study”
After initial mapping, scan for:
- subsections with <3 papers → either broaden, merge, or expand retrieval
- a few papers mapped everywhere → diversify; reserve “foundational” papers for only the truly relevant parts

从大纲的各个小节开始（每个小节需具备“可映射性”）。
为每个小节挑选足够的论文以支撑“证据优先”的写作（A150++规模默认28篇；小型项目约12–20篇；轻量项目约3–6篇），所选论文需满足：
- 代表性（经典/高引用）
- 互补性（不同设计方案、不同评估设置）
- 除非是真正的基础论文，否则避免在多个小节过度重复使用
在
```
why
```
列中填写简短的语义层面理由（一行即可），例如：
- 机制类：“解耦规划器/执行器；工具调用API”
- 评估类：“交互式Web任务；深入的工具错误分析”
- 安全类：“Agent越狱风险面；缓解方案研究”
完成初始映射后，检查以下内容：
- 论文数量<3篇的小节→要么扩大范围、合并小节，要么扩展检索
- 被大量重复映射的少数论文→增加多样性；仅在真正相关的小节使用“基础论文”

Quality checklist

质量检查清单

```
outline/mapping.tsv
```
exists and is non-empty.
Most subsections have ≥3 mapped papers (or a clear exception noted in
```
why
```
).
```
why
```
is semantic (not just
```
matched_terms=...
```
).
No single paper dominates unrelated subsections.

```
outline/mapping.tsv
```
已生成且非空。
大多数小节的映射论文数量≥3篇（或在
```
why
```
列中注明明确的例外情况）。
```
why
```
列内容为语义层面的理由（而非仅
```
matched_terms=...
```
这类内容）。
没有单篇论文被大量用于无关小节。

Helper script (optional)

辅助脚本（可选）

Quick Start

快速开始

python .codex/skills/section-mapper/scripts/run.py --help

python .codex/skills/section-mapper/scripts/run.py --workspace <workspace_dir> --per-subsection 28

python .codex/skills/section-mapper/scripts/run.py --help

python .codex/skills/section-mapper/scripts/run.py --workspace <workspace_dir> --per-subsection 28

All Options

所有选项

```
--per-subsection <n>
```
: target mapped papers per subsection
```
--diversity-penalty <float>
```
: penalize repeated reuse of the same paper across many subsections
```
--soft-limit <n>
```
/
```
--hard-limit <n>
```
: caps for per-paper reuse (0 = auto)

```
--per-subsection <n>
```
: 每个小节的目标映射论文数量
```
--diversity-penalty <float>
```
: 对同一论文在多个小节重复使用的惩罚系数
```
--soft-limit <n>
```
/
```
--hard-limit <n>
```
: 单篇论文的重复使用上限（0表示自动设置）

Examples

示例

Higher diversity (reduce over-reuse):

python .codex/skills/section-mapper/scripts/run.py --workspace <ws> --per-subsection 4 --diversity-penalty 0.25

Tighter reuse caps:

python .codex/skills/section-mapper/scripts/run.py --workspace <ws> --per-subsection 3 --soft-limit 6 --hard-limit 10

提升多样性（减少重复使用）：

python .codex/skills/section-mapper/scripts/run.py --workspace <ws> --per-subsection 4 --diversity-penalty 0.25

更严格的重复使用上限：

python .codex/skills/section-mapper/scripts/run.py --workspace <ws> --per-subsection 3 --soft-limit 6 --hard-limit 10

Notes

注意事项

Writes
```
outline/mapping_report.md
```
diagnostics.
In
```
pipeline.py --strict
```
, mapping may be blocked until generic
```
why
```
rationales are replaced with semantic ones.

会生成
```
outline/mapping_report.md
```
诊断报告。
在
```
pipeline.py --strict
```
模式下，若
```
why
```
列的理由为通用表述而非语义层面的理由，映射流程可能会被阻塞。

Troubleshooting

故障排查

Common Issues

常见问题

Issue:

outline/mapping.tsv

is empty or low-coverage

问题：

outline/mapping.tsv

为空或覆盖率低

Symptom:

Mapping has few rows, or many subsections have <3 papers.

Causes:

Core set is too small or outline is too fine-grained.

Solutions:

Increase core set size (rerun
```
dedupe-rank
```
with larger
```
--core-size
```
).
Merge weak-signal subsections or broaden the scope/queries.

症状：

映射文件行数极少，或多个小节的论文数量<3篇。

原因：

核心论文集规模过小，或大纲划分过于精细。

解决方案：

扩大核心论文集规模（使用更大的
```
--core-size
```
参数重新运行
```
dedupe-rank
```
）。
合并支撑不足的小节，或扩大范围/检索查询词。

Issue: Mapping over-reuses the same papers

问题：映射过度重复使用同一批论文

Symptom:

Quality gate reports repeated papers across many unrelated subsections.

Causes:

Diversity penalty too low; limited core set.

Solutions:

Raise

--diversity-penalty

and/or set tighter

--soft-limit/--hard-limit

Manually diversify mappings for unrelated sections.

症状：

质量检查报告显示同一批论文被大量用于无关小节。

原因：

多样性惩罚系数过低；核心论文集规模有限。

解决方案：

提高
```
--diversity-penalty
```
系数，和/或设置更严格的
```
--soft-limit/--hard-limit
```
上限。
手动为无关小节增加映射的多样性。

Recovery Checklist

恢复检查清单

Each subsection has ≥3 mapped papers (target).
```
why
```
column contains semantic rationale (not just token overlap).

每个小节的映射论文数量≥3篇（目标值）。
```
why
```
列包含语义层面的理由（而非仅基于词元匹配）。

section-mapper

Original

Translation

Section Mapper

小节映射器（Section Mapper）

When to use

适用场景

Inputs

输入文件

Outputs

输出文件

Freeze marker (explicit)

冻结标记（显式）

Workflow (heuristic)

工作流程（启发式）

Quality checklist

质量检查清单

Helper script (optional)

辅助脚本（可选）

Quick Start

快速开始

All Options

所有选项

Examples

示例

Notes

注意事项

Troubleshooting

故障排查

Common Issues

常见问题

Issue:
`outline/mapping.tsv`
is empty or low-coverage

问题：
`outline/mapping.tsv`
为空或覆盖率低

Issue: Mapping over-reuses the same papers

问题：映射过度重复使用同一批论文

Recovery Checklist

恢复检查清单

section-mapper

Original

Translation

Section Mapper

小节映射器（Section Mapper）

When to use

适用场景

Inputs

输入文件

Outputs

输出文件

Freeze marker (explicit)

冻结标记（显式）

Workflow (heuristic)

工作流程（启发式）

Quality checklist

质量检查清单

Helper script (optional)

辅助脚本（可选）

Quick Start

快速开始

All Options

所有选项

Examples

示例

Notes

注意事项

Troubleshooting

故障排查

Common Issues

常见问题

Issue: outline/mapping.tsv is empty or low-coverage

问题：outline/mapping.tsv为空或覆盖率低

Issue: Mapping over-reuses the same papers

问题：映射过度重复使用同一批论文

Recovery Checklist

恢复检查清单

Issue:
`outline/mapping.tsv`
is empty or low-coverage

问题：
`outline/mapping.tsv`
为空或覆盖率低