paragraph-curator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Paragraph Curator (select -> evaluate -> subset -> fuse)

Paragraph Curator(选择→评估→筛选→融合)

Purpose: turn “keep rewriting and getting longer” into a controlled convergence step.
This skill adds a decision layer between “draft paragraphs” and “polish voice”:
  • keep the best paragraphs
  • merge redundant ones
  • rewrite for clearer argument moves
  • expand only when coverage is missing (using existing evidence cards)
This is a content-structure pass (not a style pass). Run
style-harmonizer
and
opener-variator
after curation.
目的:将“反复重写且内容不断变长”转变为可控的内容收敛步骤。
该Skill在“草稿段落”与“润色措辞”之间新增了一个决策层:
  • 保留最优段落
  • 合并冗余段落
  • 重写以让论证逻辑更清晰
  • 仅在内容覆盖不足时进行扩充(使用现有证据卡片)
这是一次内容结构优化(非风格优化)。整理完成后,请运行
style-harmonizer
opener-variator

Inputs

输入项

Required:
  • sections/
    (especially H3 bodies:
    sections/S<sub_id>.md
    )
  • outline/writer_context_packs.jsonl
    (what each H3 must cover + allowed citations)
  • output/ARGUMENT_SKELETON.md
    (single source of truth for terminology + premises)
Recommended:
  • output/SECTION_ARGUMENT_SUMMARIES.jsonl
    (paragraph moves + outputs)
  • output/SECTION_LOGIC_REPORT.md
    (paragraph linkage risks)
  • output/WRITER_SELFLOOP_TODO.md
    (style smells / scope/citation warnings)
必填:
  • sections/
    (尤其H3级内容:
    sections/S<sub_id>.md
  • outline/writer_context_packs.jsonl
    (每个H3需覆盖的内容范围 + 允许使用的引用)
  • output/ARGUMENT_SKELETON.md
    (术语与前提的唯一权威来源)
推荐:
  • output/SECTION_ARGUMENT_SUMMARIES.jsonl
    (段落功能+输出要点)
  • output/SECTION_LOGIC_REPORT.md
    (段落关联风险)
  • output/WRITER_SELFLOOP_TODO.md
    (风格问题 / 范围/引用警告)

Outputs

输出项

  • Updated
    sections/*.md
    (same filenames; body-only; no headings)
  • output/PARAGRAPH_CURATION_REPORT.md
    (short; PASS/FAIL + what changed)
  • Create
    sections/paragraphs_curated.refined.ok
    when done (empty file; pipeline contract signal)
  • 更新后的
    sections/*.md
    (文件名不变;仅包含内容主体;无标题)
  • output/PARAGRAPH_CURATION_REPORT.md
    (简短;包含PASS/FAIL状态 + 变更内容)
  • 完成后创建
    sections/paragraphs_curated.refined.ok
    (空文件;用于标识流程契约完成)

What this skill optimizes (rubric)

该Skill的优化方向(评分标准)

You are not trying to “shorten”. You are trying to increase information density while keeping the section verifiable.
Score each paragraph on a simple 0-2 rubric:
Criterion0 (bad)1 (ok)2 (good)
Coveragedoes not match any required axis/cardmatches one axis, thindirectly executes a must-use card/comparison
Noveltyrepeats nearby contentpartially redundantadds a distinct comparison/insight
Move clarityunclear what it doesmove exists, weak outputclear move + reusable output
Consistencypremise/term drift vs skeletonminor mismatchfully aligned with Consistency Contract
Citation hygieneuncited when it should be; cite-dump vibeacceptablecitations are local and anchored (not just tail)
Fusion readinesscannot merge; tangledmergeable with editsclean unit that can be fused or kept
Decision labels:
  • KEEP
    : keep mostly as-is
  • REWRITE
    : keep content, rewrite for clearer move/output
  • FUSE
    : merge with neighbor(s) and rewrite into one stronger paragraph
  • REPLACE
    : keep the slot, but rewrite using existing evidence cards (when coverage is missing)
目标并非“缩短内容”,而是在保证章节可验证性的同时提升信息密度。
采用0-2分的简单评分标准对每个段落进行评估:
评估维度0分(差)1分(合格)2分(优)
内容覆盖未匹配任何要求的维度/卡片匹配一个维度,但内容单薄直接对应必须使用的卡片/对比内容
内容新颖性重复附近内容部分冗余新增独特的对比/见解
逻辑清晰度功能定位模糊有明确功能,但输出效果弱功能清晰 + 输出内容可复用
一致性与骨架文档的前提/术语存在偏差存在微小不匹配完全符合一致性契约
引用规范应引用未引用;存在堆砌引用的情况符合要求引用本地化且锚定到对应语句(并非仅放在段落末尾)
可融合性无法合并;内容混乱经编辑后可合并内容规整,可直接融合或保留
决策标签:
  • KEEP
    :基本保持原样
  • REWRITE
    :保留内容,重写以让功能/输出更清晰
  • FUSE
    :与相邻段落合并并重写为更有力的单一段落
  • REPLACE
    :保留段落位置,但使用现有证据卡片重写(当内容覆盖不足时)

Paragraph budget (profile-aware)

段落数量预算(基于文档类型)

Default per-H3 target:
  • draft_profile=survey
    : 10-12 paragraphs
  • draft_profile=deep
    : 11-13 paragraphs
If you exceed the budget, do not delete content blindly. Prefer
FUSE
(merge redundancy) and make the fused paragraph denser.
默认每个H3的目标段落数:
  • draft_profile=survey
    :10-12段
  • draft_profile=deep
    :11-13段
若超出预算,请勿盲目删除内容。优先选择
FUSE
(合并冗余内容),并让融合后的段落更紧凑。

Must-have coverage checklist (per H3)

必选内容覆盖清单(每个H3)

Each H3 must contain at least:
  • 1x
    Definition/Setup
    (only if this H3 introduces a new term/protocol field)
  • 2x concrete
    Contrast
    paragraphs (A-vs-B comparisons; not just “many papers do...”)
  • 1x
    Evaluation anchor
    paragraph (task + metric + constraint/budget/tool access; cite-backed)
  • 1x cross-paper
    Synthesis
    paragraph (what generalizes, what does not; cite-backed)
  • 1x
    Boundary/Failure
    paragraph (limitations; threats to validity; cite-backed when possible)
  • 1x
    Local conclusion
    (a reusable takeaway used downstream)
If any item is missing, use
REPLACE
to write that paragraph from the writer context pack (do not invent new facts).
每个H3至少需包含:
  • 1个
    Definition/Setup
    段落(仅当该H3引入新术语/协议字段时需要)
  • 2个具体的
    Contrast
    段落(A与B的对比;而非仅“多篇论文提及...”)
  • 1个
    Evaluation anchor
    段落(任务+指标+约束/预算/工具权限;有引用支撑)
  • 1个跨论文的
    Synthesis
    段落(总结通用结论与例外情况;有引用支撑)
  • 1个
    Boundary/Failure
    段落(局限性;有效性威胁;尽可能有引用支撑)
  • 1个
    Local conclusion
    段落(可复用的结论,供后续环节使用)
若有任何项缺失,请使用
REPLACE
,基于writer context pack撰写对应段落(不得编造新内容)。

Workflow (minimal)

最简工作流程

  1. Pick the target set
  • Start with the H3 bodies listed in
    output/SECTION_LOGIC_REPORT.md
    , plus any H3 flagged in
    output/WRITER_SELFLOOP_TODO.md
    as repetitive/template-y, plus any H3 that keeps growing across edits.
  • Work file-by-file: each target is a concrete
    sections/S<sub_id>.md
    .
  1. Build a paragraph inventory (scratch only; do not paste into the paper)
  • If
    output/SECTION_ARGUMENT_SUMMARIES.jsonl
    exists, use its per-paragraph
    moves
    /
    output
    as the first draft of your inventory, then reconcile with the actual text. For each paragraph, write one line:
  • P<i> :: move(s) -> output (1 sentence) :: citations (keys)
  1. Apply the rubric and label each paragraph
  • Mark
    KEEP/REWRITE/FUSE/REPLACE
    .
  • If two adjacent paragraphs repeat the same axis,
    FUSE
    .
  • For any paragraph you plan to change (
    REWRITE
    /
    REPLACE
    /
    FUSE
    ), draft 2-3 candidate rewrites in parallel (different angles: contrast-first / protocol-first / synthesis-first).
    • Score candidates quickly with the rubric; keep one winner (or fuse two if they cover complementary axes).
    • Keep citation keys unchanged while sampling; you are choosing surface form + structure, not changing the evidence set.
  1. Construct the curated set
  • Use
    outline/writer_context_packs.jsonl
    to enforce must-have coverage (paragraph_plan/must_use/comparison_cards/limitation_hooks) without inventing new content.
  • Enforce the must-have coverage checklist.
  • Enforce the paragraph budget by fusing redundancy rather than deleting substance.
  1. Fuse + rewrite (keep citation keys fixed) Rules that keep the pipeline stable:
  • Do not add/remove citation keys; when fusing, carry citations forward and re-anchor them to the right sentence.
  • Do not move citations across subsections.
  • Avoid adjacent citation blocks (e.g.,
    [@a] [@b]
    ) and duplicate keys in one block (e.g.,
    [@a; @a]
    ).
  • When fusing, it is often faster to write two fused candidates (one contrast-heavy, one synthesis-heavy) and pick the better one.
  1. Write the report + marker
  • output/PARAGRAPH_CURATION_REPORT.md
    should be short and actionable:
    • - Status: PASS|FAIL
    • per H3: paragraph count before/after; what was fused; any remaining gaps
    • (minimal) how many candidates you tried for the main rewrites (e.g., 2-3), so future passes can see whether this was a real selection step
  • Create
    sections/paragraphs_curated.refined.ok
    .
  1. 选择目标范围
  • output/SECTION_LOGIC_REPORT.md
    中列出的H3内容开始,加上
    output/WRITER_SELFLOOP_TODO.md
    中标记为重复/模板化的H3,以及在多次编辑中持续变长的H3。
  • 逐文件处理:每个目标对应具体的
    sections/S<sub_id>.md
    文件。
  1. 建立段落清单(仅用于草稿;请勿粘贴到正式文档)
  • output/SECTION_ARGUMENT_SUMMARIES.jsonl
    存在,使用其中的段落
    moves
    /
    output
    作为清单初稿,再与实际文本核对。
  • 为每个段落编写一行信息:
  • P<i> :: move(s) -> output(一句话总结):: citations(引用标识)
  1. 应用评分标准并为每个段落打标签
  • 标记
    KEEP/REWRITE/FUSE/REPLACE
  • 若两个相邻段落重复同一维度的内容,标记为
    FUSE
  • 对于任何计划修改的段落(
    REWRITE
    /
    REPLACE
    /
    FUSE
    ),并行撰写2-3个重写候选版本(不同角度:对比优先 / 协议优先 / 综合优先)。
    • 用评分标准快速评估候选版本;保留最优版本(若两个版本覆盖互补维度,可融合)。
    • 评估过程中保持引用标识不变;仅调整表述形式与结构,不改变证据集合。
  1. 构建整理后的内容集合
  • 使用
    outline/writer_context_packs.jsonl
    确保必选内容覆盖(paragraph_plan/must_use/comparison_cards/limitation_hooks),且不编造新内容。
  • 执行必选内容覆盖清单要求。
  • 通过融合冗余内容而非删除核心信息来控制段落数量预算。
  1. 融合 + 重写(保持引用标识固定) 维持流程稳定的规则:
  • 不得添加/移除引用标识;融合时,将引用带入新段落并重新锚定到对应语句。
  • 不得跨小节移动引用。
  • 避免相邻的引用块(如
    [@a] [@b]
    )和同一引用块中的重复标识(如
    [@a; @a]
    )。
  • 融合时,通常可以快速撰写两个融合候选版本(一个侧重对比,一个侧重综合),再选择更优的版本。
  1. 撰写报告 + 标记文件
  • output/PARAGRAPH_CURATION_REPORT.md
    应简短且具备可操作性:
    • - 状态: PASS|FAIL
    • 每个H3:整理前后的段落数量;融合的内容;剩余的内容缺口
    • (简要说明)主要重写环节尝试了多少个候选版本(如2-3个),以便后续环节了解是否为真正的筛选步骤
  • 创建
    sections/paragraphs_curated.refined.ok
    文件。

Routing rules

路由规则

  • If you cannot fill a missing must-have paragraph without new evidence: stop and route upstream (
    evidence-selfloop
    / C3-C4). Do not pad.
  • If you feel forced to change a definition or evaluation premise: update
    output/ARGUMENT_SKELETON.md# Consistency Contract
    first, then rerun
    argument-selfloop
    .
  • If the only issue is surface cadence/openers: do not overwork curation; run
    style-harmonizer
    /
    opener-variator
    .
  • 若无法在不新增证据的情况下填补必选段落的缺口:停止操作并路由到上游环节(
    evidence-selfloop
    / C3-C4)。请勿凑数。
  • 若必须修改定义或评估前提:先更新
    output/ARGUMENT_SKELETON.md# Consistency Contract
    ,再重新运行
    argument-selfloop
  • 若仅存在措辞节奏/开头问题:无需过度投入整理工作;直接运行
    style-harmonizer
    /
    opener-variator

Done checklist

完成检查清单

  • Each targeted H3 stays within its paragraph budget (survey 10-12; deep 11-13) without losing required moves.
  • Redundant paragraphs are fused into denser, clearer ones (not just deleted).
  • No citation keys were added/removed; citation shape is reader-facing (no adjacent blocks, no dup keys).
  • output/PARAGRAPH_CURATION_REPORT.md
    exists and is understandable.
  • sections/paragraphs_curated.refined.ok
    exists.
  • 每个目标H3在不丢失必要功能的前提下,段落数量符合预算(survey类型10-12段;deep类型11-13段)。
  • 冗余段落已融合为更紧凑、清晰的段落(而非直接删除)。
  • 未添加/移除引用标识;引用格式便于阅读(无相邻引用块、无重复标识)。
  • output/PARAGRAPH_CURATION_REPORT.md
    已存在且内容易懂。
  • sections/paragraphs_curated.refined.ok
    已存在。