bibliography-builder

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Bibliography Builder

参考文献构建工具

You help researchers build bibliographies from manuscript citations by extracting in-text citations, matching them against a local
references.bib
file, identifying issues, and generating a formatted reference list.
你可以帮助研究人员从手稿引用内容构建参考文献目录:提取文内引用、将其与本地
references.bib
文件匹配、识别问题,最终生成格式化的参考文献列表。

Project Integration

项目集成

This skill reads from
project.yaml
when available:
yaml
undefined
该工具会在
project.yaml
存在时读取该文件:
yaml
undefined

From project.yaml

From project.yaml

paths: drafts: drafts/sections/

**Project type:** This skill works for **all project types**. Bibliography building is essential for any academic manuscript.

Updates `progress.yaml` when complete:
```yaml
status:
  bibliography: done
artifacts:
  bibliography: drafts/sections/bibliography.md
paths: drafts: drafts/sections/

**项目类型:** 该工具适用于**所有项目类型**,参考文献构建对所有学术手稿来说都是必要环节。

完成后会更新`progress.yaml`:
```yaml
status:
  bibliography: done
artifacts:
  bibliography: drafts/sections/bibliography.md

File Management

文件管理

This skill uses git to track progress across phases. Before modifying any output file at a new phase:
  1. Stage and commit current state:
    git add [files] && git commit -m "bibliography-builder: Phase N complete"
  2. Then proceed with modifications.
Do NOT create version-suffixed copies (e.g.,
-v2
,
-final
,
-working
). The git history serves as the version trail.
该工具使用git跟踪各阶段进度。在新阶段修改任意输出文件前:
  1. 暂存并提交当前状态:
    git add [files] && git commit -m "bibliography-builder: Phase N complete"
  2. 再执行修改操作。
请勿创建带版本后缀的副本(例如
-v2
-final
-working
),git历史会作为版本追溯依据。

What This Skill Does

工具功能

This is a utility skill that automates bibliography creation:
  1. Extract all in-text citations from a document — supports both Pandoc
    [@citationKey]
    format and legacy
    (Author Year)
    format
  2. Match each citation against the local
    references.bib
    file — direct
    citationKey
    lookup for Pandoc format, author+year string matching for legacy
  3. Review for issues: missing items, ambiguous matches, duplicates
  4. Generate a properly formatted bibliography in the requested style
这是一个自动化创建参考文献目录的实用工具
  1. 提取文档中的所有文内引用——支持Pandoc
    [@citationKey]
    格式和传统
    (作者 年份)
    格式
  2. 匹配每一条引用与本地
    references.bib
    文件的内容——Pandoc格式直接查找
    citationKey
    ,传统格式采用作者+年份字符串匹配
  3. 检查问题:缺失条目、匹配歧义、重复条目
  4. 生成符合要求格式的格式化参考文献列表

When to Use This Skill

适用场景

Use this skill when you have:
  • A manuscript with in-text citations — either Pandoc
    [@citationKey]
    format (from our writing skills) or legacy
    (Author Year)
    format
  • A
    references.bib
    BibTeX file containing your library entries
  • A need for a formatted bibliography (APA, ASA, Chicago, etc.)
Pandoc format manuscripts (written with our skills) get fast, deterministic matching via
citationKey
lookup directly in the
.bib
file. Legacy format manuscripts still work through author+year string matching against
.bib
entries.
当你满足以下条件时可以使用该工具:
  • 带文内引用的手稿——可以是(基于我们的写作工具生成的)Pandoc
    [@citationKey]
    格式,也可以是传统
    (作者 年份)
    格式
  • 有包含文献库条目的**
    references.bib
    ** BibTeX文件
  • 需要生成格式化参考文献目录(APA、ASA、Chicago等格式)
(基于我们的写作工具生成的)Pandoc格式手稿可以通过直接在
.bib
文件中查找
citationKey
实现快速、确定性匹配。传统格式手稿也可以通过作者+年份字符串与
.bib
条目匹配实现适配。

Requirements

使用要求

  • A
    references.bib
    BibTeX file containing the project's library entries. This file is typically located in the project root or a
    references/
    subdirectory and is populated by the local BibTeX pipeline (via
    ingest.py
    ).
  • pandoc
    installed with citeproc support (included by default in pandoc 2.11+). Used in Phase 4 to generate the formatted reference list.
  • A CSL style file for the target citation style. Common styles:
  • 包含项目文献库条目的**
    references.bib
    ** BibTeX文件。该文件通常位于项目根目录或
    references/
    子目录下,由本地BibTeX流水线(通过
    ingest.py
    )填充内容。
  • 安装了带citeproc支持的**
    pandoc
    **(pandoc 2.11+版本默认包含该支持),在第4阶段用于生成格式化参考文献列表。
  • 对应目标引用格式的CSL样式文件。常见样式:

Workflow Phases

工作流阶段

Phase 0: Intake

阶段0:信息采集

Goal: Read the document and confirm citation style.
Process:
  • Read the manuscript file
  • Identify citation format (Author Year, Author-Year with comma, etc.)
  • Count approximate citations
  • Confirm output format (APA, ASA, Chicago Author-Date, etc.)
Output: Citation inventory with format confirmation.
Pause: User confirms citation style and desired output format.

目标:读取文档并确认引用样式。
流程
  • 读取手稿文件
  • 识别引用格式(作者 年份、带逗号的作者-年份等)
  • 统计大致引用数量
  • 确认输出格式(APA、ASA、Chicago 作者-日期等)
输出:带格式确认的引用清单。
暂停:用户确认引用样式和预期输出格式。

Phase 1: Citation Extraction

阶段1:引用提取

Goal: Parse all in-text citations from the document.
Process:
  • Use regex patterns to find Author-Year citations
  • Handle variations:
    • Single author:
      (Smith 2020)
    • Two authors:
      (Smith and Jones 2020)
      or
      (Smith & Jones 2020)
    • Multiple authors:
      (Smith et al. 2020)
    • Multiple citations:
      (Smith 2020; Jones 2019)
    • Page numbers:
      (Smith 2020, p. 45)
      or
      (Smith 2020: 45)
    • Narrative citations:
      Smith (2020) argues...
  • Deduplicate and sort alphabetically
  • Create citation list with frequency counts
  • Verify with grep: Run shell commands to independently confirm extraction caught all citations (catches edge cases like McAdam, hyphenated names, accented characters)
Output: Extraction results presented in conversation (not saved to a file).
Pause: User reviews extracted citations for accuracy.

目标:解析文档中的所有文内引用。
流程
  • 使用正则表达式匹配作者-年份引用
  • 处理各类变体:
    • 单一作者:
      (Smith 2020)
    • 两位作者:
      (Smith and Jones 2020)
      (Smith & Jones 2020)
    • 多位作者:
      (Smith et al. 2020)
    • 多条引用:
      (Smith 2020; Jones 2019)
    • 页码:
      (Smith 2020, p. 45)
      (Smith 2020: 45)
    • 叙述式引用:
      Smith (2020) argues...
  • 去重并按字母顺序排序
  • 生成带出现频次统计的引用列表
  • 使用grep校验:运行shell命令独立确认提取结果覆盖了所有引用(覆盖McAdam、连字符姓名、带重音字符等边缘情况)
输出:在对话中展示提取结果(不保存到文件)。
暂停:用户校验提取的引用准确性。

Phase 2: BibTeX Matching

阶段2:BibTeX匹配

Goal: Find each citation in the local
references.bib
file.
Process:
  • Read
    references.bib
    into memory
  • For each extracted citation:
    • Pandoc format: look up
      citationKey
      directly against BibTeX entry keys
    • Legacy format: match author surname(s) and year against BibTeX
      author
      and
      year
      fields
    • Record match status: Found, Ambiguous, Not Found
  • Build match table with BibTeX entry keys
Output: Match results presented in conversation (not saved to a file).
Pause: User reviews matches, especially ambiguous/missing items.

目标:在本地
references.bib
文件中查找每一条引用对应的条目。
流程
  • references.bib
    内容读入内存
  • 对每一条提取到的引用:
    • Pandoc格式:直接通过BibTeX条目key查找
      citationKey
    • 传统格式:匹配作者姓氏和年份与BibTeX的
      author
      year
      字段
    • 记录匹配状态:已找到、匹配歧义、未找到
  • 生成带BibTeX条目标key的匹配表
输出:在对话中展示匹配结果(不保存到文件)。
暂停:用户审核匹配结果,尤其是歧义/缺失条目。

Phase 3: Issue Review

阶段3:问题审核

Goal: Identify and resolve problems.
Process:
  • Flag issues:
    • Missing: Citations not found in
      references.bib
    • Ambiguous: Multiple possible matches (same author, year)
    • Year mismatch: Author found but year differs
    • Name variations: "Smith" vs "Smith, J." vs "Smith, John"
  • Generate issue report with suggested actions
  • User provides resolutions for ambiguous cases
Output: Issues presented in conversation (not saved to a file).
Pause: User resolves any remaining issues.

目标:识别并解决问题。
流程
  • 标记问题:
    • 缺失:在
      references.bib
      中未找到的引用
    • 歧义:存在多个可能的匹配(相同作者、相同年份)
    • 年份不匹配:找到对应作者但年份不一致
    • 姓名变体:"Smith" 与 "Smith, J." 与 "Smith, John"
  • 生成带建议操作的问题报告
  • 用户提供歧义条目的解决方案
输出:在对话中展示问题(不保存到文件)。
暂停:用户解决所有剩余问题。

Phase 4: Bibliography Generation

阶段4:参考文献生成

Goal: Produce the formatted bibliography using pandoc with citeproc.
Process:
  1. Build a dummy markdown file containing only the matched citation keys as pandoc citations:
    markdown
    ---
    bibliography: references.bib
    csl: american-sociological-association.csl
    nocite: |
      @smithHousing2020, @jonesUrban2019, @williamsRace2021
    ---
    # References
    The
    nocite
    field lists every matched citation key (from Phases 2–3). This tells pandoc to include them in the bibliography even though they're not cited inline.
  2. Run pandoc to generate the formatted bibliography:
    bash
    pandoc dummy-refs.md --citeproc -o bibliography.md -t markdown
    Adjust the CSL file path as needed. Common styles:
    • ASA:
      --csl american-sociological-association.csl
    • APA 7th:
      --csl apa.csl
    • Chicago:
      --csl chicago-author-date.csl
  3. Clean up: Remove the dummy file. Review
    bibliography.md
    for any citeproc warnings (missing fields, unresolved keys).
  4. Append unmatched citations (from Phase 3) as a separate section at the end of
    bibliography.md
    :
    markdown
    ## Unmatched Citations (require manual lookup)
    - Smith (2020) — Not found in references.bib
Output:
bibliography.md
with pandoc/citeproc-formatted references.
Why pandoc? Pandoc's citeproc engine handles the full complexity of citation formatting — name particles, edited volumes, translations, sorting, punctuation — far more reliably than manual formatting. It uses the same CSL styles as Zotero, Mendeley, and other reference managers.

目标:使用带citeproc的pandoc生成格式化参考文献。
流程
  1. 构建一个临时markdown文件,仅包含作为pandoc引用的匹配后引用key:
    markdown
    ---
    bibliography: references.bib
    csl: american-sociological-association.csl
    nocite: |
      @smithHousing2020, @jonesUrban2019, @williamsRace2021
    ---
    # 参考文献
    nocite
    字段列出了所有匹配后的引用key(来自阶段2-3),该配置会告知pandoc即使这些引用没有在行内被引用,也要将其包含在参考文献中。
  2. 运行pandoc生成格式化参考文献:
    bash
    pandoc dummy-refs.md --citeproc -o bibliography.md -t markdown
    根据需要调整CSL文件路径。常见样式参数:
    • ASA:
      --csl american-sociological-association.csl
    • APA 7th:
      --csl apa.csl
    • Chicago:
      --csl chicago-author-date.csl
  3. 清理:删除临时文件,检查
    bibliography.md
    是否存在citeproc警告(缺失字段、未解析key等)。
  4. 将未匹配的引用(来自阶段3)作为独立章节追加到
    bibliography.md
    末尾
    markdown
    ## 未匹配引用(需要手动查找)
    - Smith (2020) — 未在references.bib中找到
输出:包含pandoc/citeproc格式化参考文献的
bibliography.md
文件。
为什么使用pandoc? Pandoc的citeproc引擎可以处理引用格式化的所有复杂场景——姓名前缀、编著书籍、译著、排序、标点符号——远比重手动格式化更可靠。它使用与Zotero、Mendeley和其他文献管理工具相同的CSL样式。

Citation Pattern Reference

引用模式参考

Pandoc Format (Primary — from our writing skills)

Pandoc格式(主要——来自我们的写作工具)

PatternExampleRegex
Parenthetical
[@smithHousing2020]
\[@([a-zA-Z0-9]+)\]
Multiple
[@smith2020; @jones2019]
\[@([a-zA-Z0-9]+(?:;\s*@[a-zA-Z0-9]+)*)\]
With page
[@smith2020, p. 45]
\[@([a-zA-Z0-9]+),\s*p\.\s*\d+\]
Narrative
@smithHousing2020 argues
(?<!\[)@([a-zA-Z0-9]+)(?!\])
Suppress author
[-@smith2020]
\[-@([a-zA-Z0-9]+)\]
String modifiers
[see @key1; cf. @key2]
\[(?:see|e\.g\.,|cf\.)\s*@
模式示例正则表达式
括号内引用
[@smithHousing2020]
\[@([a-zA-Z0-9]+)\]
多条引用
[@smith2020; @jones2019]
\[@([a-zA-Z0-9]+(?:;\s*@[a-zA-Z0-9]+)*)\]
带页码
[@smith2020, p. 45]
\[@([a-zA-Z0-9]+),\s*p\.\s*\d+\]
叙述式引用
@smithHousing2020 argues
(?<!\[)@([a-zA-Z0-9]+)(?!\])
隐藏作者
[-@smith2020]
\[-@([a-zA-Z0-9]+)\]
字符串修饰符
[see @key1; cf. @key2]
\[(?:see|e\.g\.,|cf\.)\s*@

Legacy Format (Fallback — for manuscripts not written with our skills)

传统格式(备选——适用于非我们的写作工具生成的手稿)

PatternExampleRegex
Single author
(Smith 2020)
\(([A-Z][a-z]+)\s+(\d{4})\)
Two authors
(Smith and Jones 2020)
`(([A-Z][a-z]+)\s+(?:and
Et al.
(Smith et al. 2020)
\(([A-Z][a-z]+)\s+et\s+al\.?\s+(\d{4})\)
Multiple citations
(Smith 2020; Jones 2019)
Split on
;\s*
then parse each
With page
(Smith 2020, p. 45)
\(([A-Z][a-z]+)\s+(\d{4}),?\s*p?p?\.?\s*\d+\)
Narrative
Smith (2020)
([A-Z][a-z]+)\s+\((\d{4})\)
模式示例正则表达式
单一作者
(Smith 2020)
\(([A-Z][a-z]+)\s+(\d{4})\)
两位作者
(Smith and Jones 2020)
`(([A-Z][a-z]+)\s+(?:and
多位作者
(Smith et al. 2020)
\(([A-Z][a-z]+)\s+et\s+al\.?\s+(\d{4})\)
多条引用
(Smith 2020; Jones 2019)
;\s*
分割后逐一解析
带页码
(Smith 2020, p. 45)
\(([A-Z][a-z]+)\s+(\d{4}),?\s*p?p?\.?\s*\d+\)
叙述式引用
Smith (2020)
([A-Z][a-z]+)\s+\((\d{4})\)

Edge Cases (Legacy Format)

边缘情况(传统格式)

  • Hyphenated names:
    (García-López 2020)
    - include hyphen in author pattern
  • Particles:
    (van der Berg 2020)
    - lowercase particles before surname
  • Organizations:
    (WHO 2020)
    - all-caps or mixed case organizations
  • No date:
    (Smith n.d.)
    - handle "n.d." as year placeholder
  • Forthcoming:
    (Smith forthcoming)
    - handle non-numeric years
  • 连字符姓名
    (García-López 2020)
    - 在作者匹配模式中包含连字符
  • 姓名前缀
    (van der Berg 2020)
    - 姓氏前的小写前缀
  • 机构作者
    (WHO 2020)
    - 全大写或混合大小写的机构名称
  • 无日期
    (Smith n.d.)
    - 处理"n.d."作为年份占位符
  • 待出版
    (Smith forthcoming)
    - 处理非数字年份

Output Formats

输出格式

Formatting is handled entirely by pandoc's citeproc engine using CSL style files. You do not need to manually format entries. Simply specify the correct CSL file for the target style:
StyleCSL fileNotes
ASA
american-sociological-association.csl
Default for sociology journals
APA 7th
apa.csl
Psychology and interdisciplinary
Chicago Author-Date
chicago-author-date.csl
History, some social sciences
Download CSL files from https://github.com/citation-style-language/styles if not already present in the project.
格式化完全由pandoc的citeproc引擎基于CSL样式文件处理,你无需手动格式化条目,只需指定对应目标样式的正确CSL文件即可:
样式CSL文件说明
ASA
american-sociological-association.csl
社会学期刊默认格式
APA 7th
apa.csl
心理学及跨学科领域格式
Chicago 作者-日期
chicago-author-date.csl
历史学、部分社会科学领域格式
如果项目中没有对应CSL文件,可以从https://github.com/citation-style-language/styles下载。

File Structure

文件结构

project/
├── manuscript.md           # Input: document with citations
├── bibliography/
│   └── bibliography.md     # Final output (Phase 4)
Phases 1–3 produce conversation output only. No intermediate files are saved.
project/
├── manuscript.md           # 输入:带引用的文档
├── bibliography/
│   └── bibliography.md     # 最终输出(阶段4)
阶段1-3仅生成对话输出,不保存任何中间文件。

Key Reminders

重要提醒

  • references.bib
    must be present
    and readable in the project directory
  • Author names vary: Match flexibly against the BibTeX
    author
    field (last name + year first, then refine)
  • Multiple matches are possible: Same author may have multiple works per year in the
    .bib
    file
  • Missing items: User may need to add entries to
    references.bib
    before proceeding
  • Format matters: Confirm desired style before generating bibliography
  • references.bib
    必须存在
    且在项目目录中可读取
  • 作者姓名存在变体:灵活匹配BibTeX的
    author
    字段(优先匹配姓氏+年份,再细化校验)
  • 可能存在多个匹配:同一作者同一年份可能在
    .bib
    文件中有多个作品
  • 缺失条目:用户可能需要先向
    references.bib
    中添加对应条目再继续
  • 格式很重要:生成参考文献前确认预期样式

Starting the Process

启动流程

When the user is ready to begin:
  1. Ask for the manuscript:
    "Please share the path to your manuscript file (markdown, .docx, or .txt)."
  2. Confirm citation style:
    "I'll extract Author-Year citations. What bibliography format do you need? (APA, ASA, Chicago, other)"
  3. Locate
    references.bib
    :
    "Let me verify the
    references.bib
    file is present. Is it in the project root or a
    references/
    subdirectory?"
  4. Proceed with Phase 0 to read the document and inventory citations.
当用户准备好开始时:
  1. 索要手稿
    "请分享你的手稿文件路径(markdown、.docx或.txt格式)。"
  2. 确认引用样式
    "我将提取作者-年份格式的引用,你需要什么格式的参考文献?(APA、ASA、Chicago或其他)"
  3. 定位
    references.bib
    "我将验证
    references.bib
    文件是否存在,它是在项目根目录还是
    references/
    子目录下?"
  4. 进入阶段0读取文档并统计引用清单。