draft-polisher

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Draft Polisher (Audit-style editing)

草稿润色工具(审计式编辑)

Goal: turn a first-pass draft into readable survey prose without breaking the evidence contract.
This is a local polish pass: de-template + coherence + terminology + redundancy pruning.
Note: if the main issue is structural redundancy from section accumulation, push the change upstream to
sections/
and use
paragraph-curator
before merge.
draft-polisher
should not be the primary place where you decide which paragraphs to keep.
目标:将初稿转化为可读性强的调研文稿,且不破坏证据契约
这是一次局部润色处理:去模板化 + 连贯性优化 + 术语统一 + 冗余内容删减。
注意:如果主要问题是因章节累积导致的结构性冗余,请将修改推送到上游的
sections/
目录,并在合并前使用
paragraph-curator
工具。
draft-polisher
不应作为你决定保留哪些段落的主要工具。

Role cards (use explicitly)

角色卡片(需明确使用)

Style Harmonizer (editor)

风格协调者(编辑)

Mission: remove generator voice and make prose read like one author wrote it.
Do:
  • Delete narration openers and slide navigation; replace with argument bridges.
  • Vary rhythm; remove repeated template stems.
  • Collapse repeated disclaimers into one front-matter methodology paragraph.
Avoid:
  • Adding or removing citation keys.
  • Moving citations across subsections.
任务:移除生成式语气,让文稿读起来像是由同一作者撰写。
需执行:
  • 删除叙述式开头和幻灯片导航语句,替换为论点过渡句。
  • 调整语句节奏;移除重复的模板式开头。
  • 将重复的免责声明合并为前言中的一个方法论段落。
需避免:
  • 添加或删除引用键。
  • 跨小节移动引用内容。

Evidence Contract Guard (skeptic)

证据契约守护者(质疑者)

Mission: prevent polishing from inflating claims beyond evidence.
Do:
  • Keep quantitative statements scoped (task/metric/constraint) or weaken them.
  • Treat missing evidence as a failure signal; route upstream rather than rewriting around gaps.
Avoid:
  • Overconfident language when evidence is abstract-only.
任务:防止润色过程中出现论点超出证据支持范围的情况。
需执行:
  • 确保定量表述的范围明确(任务/指标/约束条件),或弱化表述。
  • 将缺失证据视为错误信号;将问题推送到上游处理,而非围绕空白内容重写。
需避免:
  • 当仅存在抽象证据时使用过于绝对的表述。

Role prompt: Style Harmonizer (editor expert)

角色提示词:风格协调者(编辑专家)

text
You are the style and coherence editor for a technical survey.

Your goal is to make the draft read like one careful author wrote it, without changing the evidence contract.

Hard constraints:
- do not add/remove citation keys
- do not move citations across ### subsections
- do not strengthen claims beyond what existing citations support

High-leverage edits:
- delete generator voice (This subsection..., Next we move..., We now turn...)
- replace navigation with argument bridges (content-bearing handoffs)
- collapse repeated disclaimers into one methodology paragraph in front matter
- keep quantitative statements well-scoped (task/metric/constraint in the same sentence)

Working style:
- rewrite sentences so they carry content, not process
- vary rhythm, but avoid “template stems” repeating across H3s
text
You are the style and coherence editor for a technical survey.

Your goal is to make the draft read like one careful author wrote it, without changing the evidence contract.

Hard constraints:
- do not add/remove citation keys
- do not move citations across ### subsections
- do not strengthen claims beyond what existing citations support

High-leverage edits:
- delete generator voice (This subsection..., Next we move..., We now turn...)
- replace navigation with argument bridges (content-bearing handoffs)
- collapse repeated disclaimers into one methodology paragraph in front matter
- keep quantitative statements well-scoped (task/metric/constraint in the same sentence)

Working style:
- rewrite sentences so they carry content, not process
- vary rhythm, but avoid “template stems” repeating across H3s

Inputs

输入项

  • output/DRAFT.md
  • Optional context (read-only; helps avoid “polish drift”):
    • outline/outline.yml
    • outline/subsection_briefs.jsonl
    • outline/evidence_drafts.jsonl
    • citations/ref.bib
  • output/DRAFT.md
  • 可选上下文(只读;有助于避免“润色偏差”):
    • outline/outline.yml
    • outline/subsection_briefs.jsonl
    • outline/evidence_drafts.jsonl
    • citations/ref.bib

Outputs

输出项

  • output/DRAFT.md
    (in-place refinement)
  • output/citation_anchors.prepolish.jsonl
    (baseline, generated on first run by the script)
  • output/DRAFT.md
    (原地优化)
  • output/citation_anchors.prepolish.jsonl
    (基准文件,首次运行脚本时生成)

Non-negotiables (hard rules)

不可协商的硬性规则

  1. Citation keys are immutable
  • Do not add new
    [@BibKey]
    keys.
  • Do not delete citation markers.
  • If
    citations/ref.bib
    exists, do not introduce any key that is not defined there.
  1. Citation anchoring is immutable
  • Do not move citations across
    ###
    subsections.
  • If you must restructure across subsections, stop and push the change upstream (outline/briefs/evidence), then regenerate.
  1. No evidence inflation
  • If a sentence sounds stronger than the evidence level (abstract-only), rewrite it into a qualified statement.
  • When in doubt, check the subsection’s evidence pack in
    outline/evidence_drafts.jsonl
    and keep claims aligned to snippets.
  1. Citation shape normalization
  • Merge adjacent citation blocks in the same sentence (avoid
    [@a] [@b]
    ).
  • Deduplicate keys inside one block (avoid
    [@a; @a]
    ).
  • Avoid tail-only citation dumps: keep some citations in the claim sentence itself (mid-sentence), not only paragraph end.
  1. Quantitative claim hygiene
  • If you keep a number, ensure the sentence also states (without guessing): task type + metric definition + relevant constraint (budget/cost/tool access), and the citation is embedded in that sentence.
  • Avoid ambiguous model naming (e.g., “GPT-5”) unless the cited paper uses that exact label; otherwise use the paper’s naming or a neutral description.
  1. No pipeline voice
  • Remove scaffolding phrases like:
    • “We use the following working claim …”
    • “The main axes we track are …”
    • “abstracts are treated as verification targets …”
    • “Method note (evidence policy): …” (avoid labels; rewrite as plain survey methodology)
    • “this run is …” (rewrite as survey methodology: “This survey is …”)
    • “Scope and definitions / Design space / Evaluation practice …”
    • “Next, we move from …”
    • “We now turn to …”
    • “From <X> to <Y>, ...” (title narration; rewrite as an argument bridge)
    • “In the next section/subsection …”
    • “Therefore/As a result, survey synthesis/comparisons should …” (rewrite as literature-facing observation)
  • Also remove generator-like thesis openers that read like outline narration:
    • “This subsection surveys …”
    • “This subsection argues …”
  1. 引用键不可修改
  • 不得添加新的
    [@BibKey]
    键。
  • 不得删除引用标记。
  • 如果
    citations/ref.bib
    存在,不得引入任何未在其中定义的键。
  1. 引用锚定不可修改
  • 不得跨
    ###
    小节移动引用内容。
  • 如果必须跨小节重构,请停止操作并将修改推送到上游(大纲/摘要/证据文件),然后重新生成内容。
  1. 不得夸大证据
  • 如果语句听起来比证据级别(仅抽象内容)更强,请将其重写为限定性表述。
  • 如有疑问,请查看
    outline/evidence_drafts.jsonl
    中小节的证据包,并确保论点与片段内容一致。
  1. 引用格式标准化
  • 合并同一句子中相邻的引用块(避免
    [@a] [@b]
    格式)。
  • 去重同一引用块内的重复键(避免
    [@a; @a]
    格式)。
  • 避免仅在段落末尾放置引用:将部分引用嵌入到论点语句中(句中位置),而非仅放在段落结尾。
  1. 定量表述规范
  • 如果保留数字,请确保句子同时明确说明(不得猜测):任务类型 + 指标定义 + 相关约束条件(预算/成本/工具权限),且引用内容嵌入该句子中。
  • 避免模糊的模型命名(例如“GPT-5”),除非被引用的论文使用了该确切标签;否则使用论文中的命名或中性描述。
  1. 不得使用流水线语气
  • 移除框架性语句,例如:
    • “We use the following working claim …”
    • “The main axes we track are …”
    • “abstracts are treated as verification targets …”
    • “Method note (evidence policy): …”(避免使用标签;重写为普通调研方法论)
    • “this run is …”(重写为调研方法论:“This survey is …”)
    • “Scope and definitions / Design space / Evaluation practice …”
    • “Next, we move from …”
    • “We now turn to …”
    • “From <X> to <Y>, ...”(标题式叙述;重写为论点过渡句)
    • “In the next section/subsection …”
    • “Therefore/As a result, survey synthesis/comparisons should …”(重写为面向文献的观察结论)
  • 同时移除类似生成器的论文开头,此类开头读起来像大纲叙述:
    • “This subsection surveys …”
    • “This subsection argues …”

Three passes (recommended)

推荐的三轮处理流程

Pass 1 — Subsection polish (structure + de-template)

第一轮 — 小节润色(结构 + 去模板化)

Best-of-2 micro-polish (recommended):
  • For any sentence/paragraph you touch, draft 2 candidate rewrites, then keep the better one.
  • Choose with a simple rubric: move clarity, no template stem, citations stay anchored, and citation shape stays reader-facing (no adjacent cite blocks / dup keys).
  • Do not keep both candidates. Pick one and move on (the goal is convergence, not endless rewriting).
Role split:
  • Editor: rewrite sentences for clarity and flow.
  • Skeptic: deletes any generic/template sentence.
Targets:
  • Each H3 reads like: tension → contrast → evidence → limitation.
  • Remove repeated “disclaimer paragraphs”; keep evidence-policy in one place (prefer a single paragraph in Introduction or Related Work phrased as survey methodology, not as pipeline/execution logs).
  • Use
    outline/outline.yml
    (if present) to avoid heading drift during edits.
  • If present, use
    outline/subsection_briefs.jsonl
    to keep each H3’s scope/RQ consistent while improving flow.
  • Do a quick “pattern sweep” (semantic, not mechanical):
    • delete outline narration:
      This subsection ...
      ,
      In this subsection ...
    • delete slide navigation:
      Next, we move from ...
      ,
      We now turn to ...
      ,
      In the next section ...
    • delete title narration:
      From <X> to <Y>, ...
    • replace with: content claims + argument bridges + organization sentences (no new facts/citations)
  • If
    citation-injector
    was used, smooth any budget-injection sentences so they read paper-like:
    • Keep the citation keys unchanged.
    • Avoid list-injection stems (e.g., “A few representative references include …”, “Notable lines of work include …”, “Concrete examples ... include ...”).
    • Prefer integrating the added citations into an existing argument sentence, or rewrite as a short parenthetical
      e.g., ...
      clause tied to the subsection’s lens (no new facts).
    • Vary phrasing; avoid repeating the same opener stem across many H3s.
  • Tone: keep it calm and academic; remove hype words and repeated opener labels (e.g., literal
    Key takeaway:
    across many H3s).
  • Reduce repeated synthesis stems (e.g., many paragraphs starting with
    Taken together, ...
    ); vary synthesis phrasing and keep it content-bearing.
    • Treat repeated "Taken together," as a generator-voice smell. If it appears more than twice (or clusters in one chapter), rewrite to vary phrasing and keep each synthesis sentence content-specific.
    • Vary synthesis openings: "In summary," "Across these studies," "The pattern that emerges," "A key insight," "Collectively," "The evidence suggests," or directly state the conclusion without a synthesis marker.
    • Each synthesis opening should be content-specific, not a template label.
Rewrite recipe for subsection openers (paper voice, no new facts):
  • Delete:
    This subsection surveys/argues...
    /
    In this subsection, we...
  • Replace with a compact opener that does 2–3 of these (no labels; vary across subsections):
    • Content claim: the subsection-specific tension/trade-off (optionally with 1–2 embedded citations)
    • Why it matters: link the claim to evaluation/engineering constraints (benchmark/protocol/cost/tool access)
    • Preview: what you will contrast next and on what lens (A vs B; then evaluation anchors; then limitations)
  • Example skeletons (paraphrase; don’t reuse verbatim):
    • Tension-first:
      A central tension is ...; ...; we contrast ...
    • Decision-first:
      For builders, the crux is ...; ...
    • Lens-first:
      Seen through the lens of ..., ...
推荐采用“二选一微润色”:
  • 对于你修改的任何句子/段落,撰写2个候选改写版本,然后保留更优的一个。
  • 选择标准:提升清晰度、无模板化开头、引用锚定不变、引用格式便于阅读(无相邻引用块/重复键)。
  • 不得同时保留两个候选版本。选择一个后继续处理(目标是达成定稿,而非无限重写)。
角色分工:
  • 编辑:重写句子以提升清晰度和流畅度。
  • 质疑者:删除任何通用/模板化语句。
目标:
  • 每个三级标题(H3)的内容结构为:矛盾 → 对比 → 证据 → 局限性。
  • 移除重复的“免责声明段落”;仅在一个位置保留证据政策(优先放在引言或相关工作章节中的单个段落,表述为调研方法论,而非流水线/执行日志)。
  • 使用
    outline/outline.yml
    (如果存在)以避免编辑时标题偏离主题。
  • 如果存在
    outline/subsection_briefs.jsonl
    ,请在提升流畅度的同时保持每个三级标题的范围/研究问题一致。
  • 快速进行“模式扫描”(语义层面,非机械扫描):
    • 删除大纲叙述语句:
      This subsection ...
      ,
      In this subsection ...
    • 删除幻灯片导航语句:
      Next, we move from ...
      ,
      We now turn to ...
      ,
      In the next section ...
    • 删除标题式叙述语句:
      From <X> to <Y>, ...
    • 替换为:内容论点 + 论点过渡句 + 组织性语句(不得添加新事实/引用)
  • 如果使用了
    citation-injector
    工具,请调整任何批量插入的语句使其读起来更像正式论文:
    • 保留引用键不变。
    • 避免列表插入式开头(例如“A few representative references include …”, “Notable lines of work include …”, “Concrete examples ... include ...”)。
    • 优先将添加的引用整合到现有的论点语句中,或重写为与小节视角相关的简短括号内的
      e.g., ...
      从句(不得添加新事实)。
    • 变换表述方式;避免在多个三级标题中重复使用相同的开头。
  • 语气:保持冷静、学术化;移除夸张词汇和重复的开头标签(例如多个三级标题中都出现的字面意义上的
    Key takeaway:
    )。
  • 减少重复的总结式开头(例如许多段落都以
    Taken together, ...
    开头);变换总结表述方式,使其承载具体内容。
    • 将重复出现的“Taken together,”视为生成式语气的信号。如果出现超过两次(或集中在同一章节),请重写以变换表述方式,并确保每个总结语句都具有内容特异性。
    • 变换总结开头:“In summary,”、“Across these studies,”、“The pattern that emerges,”、“A key insight,”、“Collectively,”、“The evidence suggests,”,或直接陈述结论而不使用总结标记。
    • 每个总结开头都应具有内容特异性,而非模板化标签。

Pass 2 — Terminology normalization

第二轮 — 术语标准化

Role split:
  • Taxonomist: chooses canonical terms and synonym policy.
  • Integrator: applies consistent replacements across the draft.
Targets:
  • One concept = one name across sections.
  • Headings, tables, and prose use the same canonical terms.
角色分工:
  • 分类学家:选择规范术语和同义词使用规则。
  • 整合者:在整个草稿中应用统一的替换规则。
目标:
  • 一个概念 = 一个名称,贯穿所有章节。
  • 标题、表格和文稿使用相同的规范术语。

Pass 3 — Redundancy pruning (global repetition)

第三轮 — 冗余内容删减(全局重复)

Role split:
  • Compressor: collapses repeated boilerplate.
  • Narrative keeper: ensures removing repetition does not break the argument chain.
Targets:
  • Cross-section repeated intros/outros are removed.
  • Only subsection-specific content remains inside subsections.
角色分工:
  • 压缩者:合并重复的套话内容。
  • 叙事守护者:确保删除重复内容不会破坏论点链。
目标:
  • 移除跨章节重复的引言/结语内容。
  • 仅保留小节内的特定内容。

Script

脚本说明

Quick Start

快速开始

  • python .codex/skills/draft-polisher/scripts/run.py --help
  • python .codex/skills/draft-polisher/scripts/run.py --workspace workspaces/<ws>
  • python .codex/skills/draft-polisher/scripts/run.py --help
  • python .codex/skills/draft-polisher/scripts/run.py --workspace workspaces/<ws>

All Options

所有选项

  • --workspace <dir>
    : workspace root
  • --unit-id <U###>
    : unit id (optional; for logs)
  • --inputs <semicolon-separated>
    : override inputs (rare; prefer defaults)
  • --outputs <semicolon-separated>
    : override outputs (rare; prefer defaults)
  • --checkpoint <C#>
    : checkpoint id (optional; for logs)
  • --workspace <dir>
    :工作区根目录
  • --unit-id <U###>
    :单元ID(可选;用于日志)
  • --inputs <semicolon-separated>
    :覆盖输入项(罕见;优先使用默认值)
  • --outputs <semicolon-separated>
    :覆盖输出项(罕见;优先使用默认值)
  • --checkpoint <C#>
    :检查点ID(可选;用于日志)

Examples

示例

  • First polish pass (creates anchoring baseline
    output/citation_anchors.prepolish.jsonl
    ):
    • python .codex/skills/draft-polisher/scripts/run.py --workspace workspaces/<ws>
  • Reset the anchoring baseline (only if you intentionally accept citation drift):
    • Delete
      output/citation_anchors.prepolish.jsonl
      , then rerun the polisher.
  • 首次润色处理(生成锚定基准文件
    output/citation_anchors.prepolish.jsonl
    ):
    • python .codex/skills/draft-polisher/scripts/run.py --workspace workspaces/<ws>
  • 重置锚定基准(仅当你有意接受引用偏差时):
    • 删除
      output/citation_anchors.prepolish.jsonl
      ,然后重新运行润色工具。

Acceptance checklist

验收检查清单

  • No
    TODO/TBD/FIXME/(placeholder)
    .
  • No
    or
    ...
    truncation.
  • No repeated boilerplate sentence across many subsections.
  • Citation anchoring passes (no cross-subsection drift).
  • Each H3 has at least one cross-paper synthesis paragraph (>=2 citations).
  • TODO/TBD/FIXME/(placeholder)
    标记。
  • ...
    省略号截断内容。
  • 无在多个小节中重复出现的套话语句。
  • 引用锚定通过校验(无跨小节偏差)。
  • 每个三级标题(H3)至少包含一个跨论文的总结段落(≥2个引用)。

Troubleshooting

故障排除

Issue: polishing causes citation drift across subsections

问题:润色导致引用跨小节偏差

Fix:
  • Keep citations inside the same
    ###
    subsection; if restructuring is intentional, delete
    output/citation_anchors.prepolish.jsonl
    and regenerate a new baseline.
解决方法
  • 确保引用内容保留在同一
    ###
    小节内;如果是有意重构,请删除
    output/citation_anchors.prepolish.jsonl
    并重新生成新的基准文件。

Issue: draft polishing is requested before writing approval

问题:在文稿内容获得批准前就要求进行润色

Fix:
  • Record the relevant approval in
    DECISIONS.md
    (typically
    Approve C2
    ) before doing prose-level edits.
解决方法
  • DECISIONS.md
    中记录相关批准信息(通常为
    Approve C2
    ),然后再进行文稿层面的编辑。