section-mapper
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSection Mapper
小节映射器(Section Mapper)
Create a paper→subsection map that supports evidence building and later synthesis.
Good mapping is diverse (avoids reusing the same paper everywhere) and explainable (short semantic “why”, not just keyword overlap).
创建论文→小节的映射关系,为证据整理及后续内容合成提供支持。
优质的映射需具备多样性(避免在所有地方重复使用同一篇论文)和可解释性(简短的语义层面“映射理由”,而非仅依赖关键词匹配)。
When to use
适用场景
- You have and a
outline/outline.ymland need coverage per subsection.papers/core_set.csv - You want to identify weak-signal subsections early (so you can adjust scope or add papers).
- 已拥有和
outline/outline.yml,且需要追踪每个小节的论文覆盖率。papers/core_set.csv - 希望尽早识别出支撑不足的小节(以便调整范围或补充论文)。
Inputs
输入文件
papers/core_set.csvoutline/outline.yml
papers/core_set.csvoutline/outline.yml
Outputs
输出文件
outline/mapping.tsv- (diagnostics: reuse hotspots, weak-signal subsections)
outline/mapping_report.md
outline/mapping.tsv- (诊断内容:论文重复使用热点、支撑不足的小节)
outline/mapping_report.md
Freeze marker (explicit)
冻结标记(显式)
To prevent accidental overwrites after you refine mapping rationales:
- Create .
outline/mapping.refined.ok
If you rerun the script without this marker, it will back up the previous mapping to a timestamped file:
outline/mapping.tsv.bak.<timestamp>
为避免在优化映射理由后被意外覆盖:
- 创建文件。
outline/mapping.refined.ok
如果未创建该标记就重新运行脚本,系统会将之前的映射文件备份到带时间戳的文件中:
outline/mapping.tsv.bak.<timestamp>
Workflow (heuristic)
工作流程(启发式)
- Start from the outline subsections (each subsection should be “mappable”).
- For each subsection, pick enough papers to support evidence-first writing (A150++ default: 28; smaller runs: ~12–20; lightweight: ~3–6) that are:
- representative (canonical / frequently-cited)
- complementary (different design choices, different eval setups)
- not overly reused elsewhere unless truly foundational
- Fill with a short semantic rationale (one line is enough), e.g.:
why- mechanism: “decouples planner/executor; tool calling API”
- evaluation: “interactive web tasks; strong tool error analysis”
- safety: “agentic jailbreak surface; mitigation study”
- After initial mapping, scan for:
- subsections with <3 papers → either broaden, merge, or expand retrieval
- a few papers mapped everywhere → diversify; reserve “foundational” papers for only the truly relevant parts
- 从大纲的各个小节开始(每个小节需具备“可映射性”)。
- 为每个小节挑选足够的论文以支撑“证据优先”的写作(A150++规模默认28篇;小型项目约12–20篇;轻量项目约3–6篇),所选论文需满足:
- 代表性(经典/高引用)
- 互补性(不同设计方案、不同评估设置)
- 除非是真正的基础论文,否则避免在多个小节过度重复使用
- 在列中填写简短的语义层面理由(一行即可),例如:
why- 机制类:“解耦规划器/执行器;工具调用API”
- 评估类:“交互式Web任务;深入的工具错误分析”
- 安全类:“Agent越狱风险面;缓解方案研究”
- 完成初始映射后,检查以下内容:
- 论文数量<3篇的小节→要么扩大范围、合并小节,要么扩展检索
- 被大量重复映射的少数论文→增加多样性;仅在真正相关的小节使用“基础论文”
Quality checklist
质量检查清单
- exists and is non-empty.
outline/mapping.tsv - Most subsections have ≥3 mapped papers (or a clear exception noted in ).
why - is semantic (not just
why).matched_terms=... - No single paper dominates unrelated subsections.
- 已生成且非空。
outline/mapping.tsv - 大多数小节的映射论文数量≥3篇(或在列中注明明确的例外情况)。
why - 列内容为语义层面的理由(而非仅
why这类内容)。matched_terms=... - 没有单篇论文被大量用于无关小节。
Helper script (optional)
辅助脚本(可选)
Quick Start
快速开始
python .codex/skills/section-mapper/scripts/run.py --helppython .codex/skills/section-mapper/scripts/run.py --workspace <workspace_dir> --per-subsection 28
python .codex/skills/section-mapper/scripts/run.py --helppython .codex/skills/section-mapper/scripts/run.py --workspace <workspace_dir> --per-subsection 28
All Options
所有选项
- : target mapped papers per subsection
--per-subsection <n> - : penalize repeated reuse of the same paper across many subsections
--diversity-penalty <float> - /
--soft-limit <n>: caps for per-paper reuse (0 = auto)--hard-limit <n>
- : 每个小节的目标映射论文数量
--per-subsection <n> - : 对同一论文在多个小节重复使用的惩罚系数
--diversity-penalty <float> - /
--soft-limit <n>: 单篇论文的重复使用上限(0表示自动设置)--hard-limit <n>
Examples
示例
- Higher diversity (reduce over-reuse):
python .codex/skills/section-mapper/scripts/run.py --workspace <ws> --per-subsection 4 --diversity-penalty 0.25
- Tighter reuse caps:
python .codex/skills/section-mapper/scripts/run.py --workspace <ws> --per-subsection 3 --soft-limit 6 --hard-limit 10
- 提升多样性(减少重复使用):
python .codex/skills/section-mapper/scripts/run.py --workspace <ws> --per-subsection 4 --diversity-penalty 0.25
- 更严格的重复使用上限:
python .codex/skills/section-mapper/scripts/run.py --workspace <ws> --per-subsection 3 --soft-limit 6 --hard-limit 10
Notes
注意事项
- Writes diagnostics.
outline/mapping_report.md - In , mapping may be blocked until generic
pipeline.py --strictrationales are replaced with semantic ones.why
- 会生成诊断报告。
outline/mapping_report.md - 在模式下,若
pipeline.py --strict列的理由为通用表述而非语义层面的理由,映射流程可能会被阻塞。why
Troubleshooting
故障排查
Common Issues
常见问题
Issue: outline/mapping.tsv
is empty or low-coverage
outline/mapping.tsv问题:outline/mapping.tsv
为空或覆盖率低
outline/mapping.tsvSymptom:
- Mapping has few rows, or many subsections have <3 papers.
Causes:
- Core set is too small or outline is too fine-grained.
Solutions:
- Increase core set size (rerun with larger
dedupe-rank).--core-size - Merge weak-signal subsections or broaden the scope/queries.
症状:
- 映射文件行数极少,或多个小节的论文数量<3篇。
原因:
- 核心论文集规模过小,或大纲划分过于精细。
解决方案:
- 扩大核心论文集规模(使用更大的参数重新运行
--core-size)。dedupe-rank - 合并支撑不足的小节,或扩大范围/检索查询词。
Issue: Mapping over-reuses the same papers
问题:映射过度重复使用同一批论文
Symptom:
- Quality gate reports repeated papers across many unrelated subsections.
Causes:
- Diversity penalty too low; limited core set.
Solutions:
- Raise and/or set tighter
--diversity-penalty.--soft-limit/--hard-limit - Manually diversify mappings for unrelated sections.
症状:
- 质量检查报告显示同一批论文被大量用于无关小节。
原因:
- 多样性惩罚系数过低;核心论文集规模有限。
解决方案:
- 提高系数,和/或设置更严格的
--diversity-penalty上限。--soft-limit/--hard-limit - 手动为无关小节增加映射的多样性。
Recovery Checklist
恢复检查清单
- Each subsection has ≥3 mapped papers (target).
- column contains semantic rationale (not just token overlap).
why
- 每个小节的映射论文数量≥3篇(目标值)。
- 列包含语义层面的理由(而非仅基于词元匹配)。
why