bibliography-builder
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBibliography Builder
参考文献构建工具
You help researchers build bibliographies from manuscript citations by extracting in-text citations, matching them against a local file, identifying issues, and generating a formatted reference list.
references.bib你可以帮助研究人员从手稿引用内容构建参考文献目录:提取文内引用、将其与本地文件匹配、识别问题,最终生成格式化的参考文献列表。
references.bibProject Integration
项目集成
This skill reads from when available:
project.yamlyaml
undefined该工具会在存在时读取该文件:
project.yamlyaml
undefinedFrom project.yaml
From project.yaml
paths:
drafts: drafts/sections/
**Project type:** This skill works for **all project types**. Bibliography building is essential for any academic manuscript.
Updates `progress.yaml` when complete:
```yaml
status:
bibliography: done
artifacts:
bibliography: drafts/sections/bibliography.mdpaths:
drafts: drafts/sections/
**项目类型:** 该工具适用于**所有项目类型**,参考文献构建对所有学术手稿来说都是必要环节。
完成后会更新`progress.yaml`:
```yaml
status:
bibliography: done
artifacts:
bibliography: drafts/sections/bibliography.mdFile Management
文件管理
This skill uses git to track progress across phases. Before modifying any output file at a new phase:
- Stage and commit current state:
git add [files] && git commit -m "bibliography-builder: Phase N complete" - Then proceed with modifications.
Do NOT create version-suffixed copies (e.g., , , ). The git history serves as the version trail.
-v2-final-working该工具使用git跟踪各阶段进度。在新阶段修改任意输出文件前:
- 暂存并提交当前状态:
git add [files] && git commit -m "bibliography-builder: Phase N complete" - 再执行修改操作。
请勿创建带版本后缀的副本(例如、、),git历史会作为版本追溯依据。
-v2-final-workingWhat This Skill Does
工具功能
This is a utility skill that automates bibliography creation:
- Extract all in-text citations from a document — supports both Pandoc format and legacy
[@citationKey]format(Author Year) - Match each citation against the local file — direct
references.biblookup for Pandoc format, author+year string matching for legacycitationKey - Review for issues: missing items, ambiguous matches, duplicates
- Generate a properly formatted bibliography in the requested style
这是一个自动化创建参考文献目录的实用工具:
- 提取文档中的所有文内引用——支持Pandoc 格式和传统
[@citationKey]格式(作者 年份) - 匹配每一条引用与本地文件的内容——Pandoc格式直接查找
references.bib,传统格式采用作者+年份字符串匹配citationKey - 检查问题:缺失条目、匹配歧义、重复条目
- 生成符合要求格式的格式化参考文献列表
When to Use This Skill
适用场景
Use this skill when you have:
- A manuscript with in-text citations — either Pandoc format (from our writing skills) or legacy
[@citationKey]format(Author Year) - A BibTeX file containing your library entries
references.bib - A need for a formatted bibliography (APA, ASA, Chicago, etc.)
Pandoc format manuscripts (written with our skills) get fast, deterministic matching via lookup directly in the file. Legacy format manuscripts still work through author+year string matching against entries.
citationKey.bib.bib当你满足以下条件时可以使用该工具:
- 有带文内引用的手稿——可以是(基于我们的写作工具生成的)Pandoc 格式,也可以是传统
[@citationKey]格式(作者 年份) - 有包含文献库条目的**** BibTeX文件
references.bib - 需要生成格式化参考文献目录(APA、ASA、Chicago等格式)
(基于我们的写作工具生成的)Pandoc格式手稿可以通过直接在文件中查找实现快速、确定性匹配。传统格式手稿也可以通过作者+年份字符串与条目匹配实现适配。
.bibcitationKey.bibRequirements
使用要求
- A BibTeX file containing the project's library entries. This file is typically located in the project root or a
references.bibsubdirectory and is populated by the local BibTeX pipeline (viareferences/).ingest.py - installed with citeproc support (included by default in pandoc 2.11+). Used in Phase 4 to generate the formatted reference list.
pandoc - A CSL style file for the target citation style. Common styles:
- ASA:
american-sociological-association.csl - APA 7th:
apa.csl - Chicago Author-Date:
chicago-author-date.csl - Download from https://github.com/citation-style-language/styles if not already present. Place in project root or a subdirectory.
csl/
- ASA:
- 包含项目文献库条目的**** BibTeX文件。该文件通常位于项目根目录或
references.bib子目录下,由本地BibTeX流水线(通过references/)填充内容。ingest.py - 安装了带citeproc支持的****(pandoc 2.11+版本默认包含该支持),在第4阶段用于生成格式化参考文献列表。
pandoc - 对应目标引用格式的CSL样式文件。常见样式:
- ASA:
american-sociological-association.csl - APA 7th:
apa.csl - Chicago 作者-日期:
chicago-author-date.csl - 如果本地没有该文件,可以从https://github.com/citation-style-language/styles下载,放置在项目根目录或`csl/`子目录下即可。
- ASA:
Workflow Phases
工作流阶段
Phase 0: Intake
阶段0:信息采集
Goal: Read the document and confirm citation style.
Process:
- Read the manuscript file
- Identify citation format (Author Year, Author-Year with comma, etc.)
- Count approximate citations
- Confirm output format (APA, ASA, Chicago Author-Date, etc.)
Output: Citation inventory with format confirmation.
Pause: User confirms citation style and desired output format.
目标:读取文档并确认引用样式。
流程:
- 读取手稿文件
- 识别引用格式(作者 年份、带逗号的作者-年份等)
- 统计大致引用数量
- 确认输出格式(APA、ASA、Chicago 作者-日期等)
输出:带格式确认的引用清单。
暂停:用户确认引用样式和预期输出格式。
Phase 1: Citation Extraction
阶段1:引用提取
Goal: Parse all in-text citations from the document.
Process:
- Use regex patterns to find Author-Year citations
- Handle variations:
- Single author:
(Smith 2020) - Two authors: or
(Smith and Jones 2020)(Smith & Jones 2020) - Multiple authors:
(Smith et al. 2020) - Multiple citations:
(Smith 2020; Jones 2019) - Page numbers: or
(Smith 2020, p. 45)(Smith 2020: 45) - Narrative citations:
Smith (2020) argues...
- Single author:
- Deduplicate and sort alphabetically
- Create citation list with frequency counts
- Verify with grep: Run shell commands to independently confirm extraction caught all citations (catches edge cases like McAdam, hyphenated names, accented characters)
Output: Extraction results presented in conversation (not saved to a file).
Pause: User reviews extracted citations for accuracy.
目标:解析文档中的所有文内引用。
流程:
- 使用正则表达式匹配作者-年份引用
- 处理各类变体:
- 单一作者:
(Smith 2020) - 两位作者:或
(Smith and Jones 2020)(Smith & Jones 2020) - 多位作者:
(Smith et al. 2020) - 多条引用:
(Smith 2020; Jones 2019) - 页码:或
(Smith 2020, p. 45)(Smith 2020: 45) - 叙述式引用:
Smith (2020) argues...
- 单一作者:
- 去重并按字母顺序排序
- 生成带出现频次统计的引用列表
- 使用grep校验:运行shell命令独立确认提取结果覆盖了所有引用(覆盖McAdam、连字符姓名、带重音字符等边缘情况)
输出:在对话中展示提取结果(不保存到文件)。
暂停:用户校验提取的引用准确性。
Phase 2: BibTeX Matching
阶段2:BibTeX匹配
Goal: Find each citation in the local file.
references.bibProcess:
- Read into memory
references.bib - For each extracted citation:
- Pandoc format: look up directly against BibTeX entry keys
citationKey - Legacy format: match author surname(s) and year against BibTeX and
authorfieldsyear - Record match status: Found, Ambiguous, Not Found
- Pandoc format: look up
- Build match table with BibTeX entry keys
Output: Match results presented in conversation (not saved to a file).
Pause: User reviews matches, especially ambiguous/missing items.
目标:在本地文件中查找每一条引用对应的条目。
references.bib流程:
- 将内容读入内存
references.bib - 对每一条提取到的引用:
- Pandoc格式:直接通过BibTeX条目key查找
citationKey - 传统格式:匹配作者姓氏和年份与BibTeX的和
author字段year - 记录匹配状态:已找到、匹配歧义、未找到
- Pandoc格式:直接通过BibTeX条目key查找
- 生成带BibTeX条目标key的匹配表
输出:在对话中展示匹配结果(不保存到文件)。
暂停:用户审核匹配结果,尤其是歧义/缺失条目。
Phase 3: Issue Review
阶段3:问题审核
Goal: Identify and resolve problems.
Process:
- Flag issues:
- Missing: Citations not found in
references.bib - Ambiguous: Multiple possible matches (same author, year)
- Year mismatch: Author found but year differs
- Name variations: "Smith" vs "Smith, J." vs "Smith, John"
- Missing: Citations not found in
- Generate issue report with suggested actions
- User provides resolutions for ambiguous cases
Output: Issues presented in conversation (not saved to a file).
Pause: User resolves any remaining issues.
目标:识别并解决问题。
流程:
- 标记问题:
- 缺失:在中未找到的引用
references.bib - 歧义:存在多个可能的匹配(相同作者、相同年份)
- 年份不匹配:找到对应作者但年份不一致
- 姓名变体:"Smith" 与 "Smith, J." 与 "Smith, John"
- 缺失:在
- 生成带建议操作的问题报告
- 用户提供歧义条目的解决方案
输出:在对话中展示问题(不保存到文件)。
暂停:用户解决所有剩余问题。
Phase 4: Bibliography Generation
阶段4:参考文献生成
Goal: Produce the formatted bibliography using pandoc with citeproc.
Process:
-
Build a dummy markdown file containing only the matched citation keys as pandoc citations:markdown
--- bibliography: references.bib csl: american-sociological-association.csl nocite: | @smithHousing2020, @jonesUrban2019, @williamsRace2021 --- # ReferencesThefield lists every matched citation key (from Phases 2–3). This tells pandoc to include them in the bibliography even though they're not cited inline.nocite -
Run pandoc to generate the formatted bibliography:bash
pandoc dummy-refs.md --citeproc -o bibliography.md -t markdownAdjust the CSL file path as needed. Common styles:- ASA:
--csl american-sociological-association.csl - APA 7th:
--csl apa.csl - Chicago:
--csl chicago-author-date.csl
- ASA:
-
Clean up: Remove the dummy file. Reviewfor any citeproc warnings (missing fields, unresolved keys).
bibliography.md -
Append unmatched citations (from Phase 3) as a separate section at the end of:
bibliography.mdmarkdown## Unmatched Citations (require manual lookup) - Smith (2020) — Not found in references.bib
Output: with pandoc/citeproc-formatted references.
bibliography.mdWhy pandoc? Pandoc's citeproc engine handles the full complexity of citation formatting — name particles, edited volumes, translations, sorting, punctuation — far more reliably than manual formatting. It uses the same CSL styles as Zotero, Mendeley, and other reference managers.
目标:使用带citeproc的pandoc生成格式化参考文献。
流程:
-
构建一个临时markdown文件,仅包含作为pandoc引用的匹配后引用key:markdown
--- bibliography: references.bib csl: american-sociological-association.csl nocite: | @smithHousing2020, @jonesUrban2019, @williamsRace2021 --- # 参考文献字段列出了所有匹配后的引用key(来自阶段2-3),该配置会告知pandoc即使这些引用没有在行内被引用,也要将其包含在参考文献中。nocite -
运行pandoc生成格式化参考文献:bash
pandoc dummy-refs.md --citeproc -o bibliography.md -t markdown根据需要调整CSL文件路径。常见样式参数:- ASA:
--csl american-sociological-association.csl - APA 7th:
--csl apa.csl - Chicago:
--csl chicago-author-date.csl
- ASA:
-
清理:删除临时文件,检查是否存在citeproc警告(缺失字段、未解析key等)。
bibliography.md -
将未匹配的引用(来自阶段3)作为独立章节追加到末尾:
bibliography.mdmarkdown## 未匹配引用(需要手动查找) - Smith (2020) — 未在references.bib中找到
输出:包含pandoc/citeproc格式化参考文献的文件。
bibliography.md为什么使用pandoc? Pandoc的citeproc引擎可以处理引用格式化的所有复杂场景——姓名前缀、编著书籍、译著、排序、标点符号——远比重手动格式化更可靠。它使用与Zotero、Mendeley和其他文献管理工具相同的CSL样式。
Citation Pattern Reference
引用模式参考
Pandoc Format (Primary — from our writing skills)
Pandoc格式(主要——来自我们的写作工具)
| Pattern | Example | Regex |
|---|---|---|
| Parenthetical | | |
| Multiple | | |
| With page | | |
| Narrative | | |
| Suppress author | | |
| String modifiers | | |
| 模式 | 示例 | 正则表达式 |
|---|---|---|
| 括号内引用 | | |
| 多条引用 | | |
| 带页码 | | |
| 叙述式引用 | | |
| 隐藏作者 | | |
| 字符串修饰符 | | |
Legacy Format (Fallback — for manuscripts not written with our skills)
传统格式(备选——适用于非我们的写作工具生成的手稿)
| Pattern | Example | Regex |
|---|---|---|
| Single author | | |
| Two authors | | `(([A-Z][a-z]+)\s+(?:and |
| Et al. | | |
| Multiple citations | | Split on |
| With page | | |
| Narrative | | |
| 模式 | 示例 | 正则表达式 |
|---|---|---|
| 单一作者 | | |
| 两位作者 | | `(([A-Z][a-z]+)\s+(?:and |
| 多位作者 | | |
| 多条引用 | | 按 |
| 带页码 | | |
| 叙述式引用 | | |
Edge Cases (Legacy Format)
边缘情况(传统格式)
- Hyphenated names: - include hyphen in author pattern
(García-López 2020) - Particles: - lowercase particles before surname
(van der Berg 2020) - Organizations: - all-caps or mixed case organizations
(WHO 2020) - No date: - handle "n.d." as year placeholder
(Smith n.d.) - Forthcoming: - handle non-numeric years
(Smith forthcoming)
- 连字符姓名:- 在作者匹配模式中包含连字符
(García-López 2020) - 姓名前缀:- 姓氏前的小写前缀
(van der Berg 2020) - 机构作者:- 全大写或混合大小写的机构名称
(WHO 2020) - 无日期:- 处理"n.d."作为年份占位符
(Smith n.d.) - 待出版:- 处理非数字年份
(Smith forthcoming)
Output Formats
输出格式
Formatting is handled entirely by pandoc's citeproc engine using CSL style files. You do not need to manually format entries. Simply specify the correct CSL file for the target style:
| Style | CSL file | Notes |
|---|---|---|
| ASA | | Default for sociology journals |
| APA 7th | | Psychology and interdisciplinary |
| Chicago Author-Date | | History, some social sciences |
Download CSL files from https://github.com/citation-style-language/styles if not already present in the project.
格式化完全由pandoc的citeproc引擎基于CSL样式文件处理,你无需手动格式化条目,只需指定对应目标样式的正确CSL文件即可:
| 样式 | CSL文件 | 说明 |
|---|---|---|
| ASA | | 社会学期刊默认格式 |
| APA 7th | | 心理学及跨学科领域格式 |
| Chicago 作者-日期 | | 历史学、部分社会科学领域格式 |
如果项目中没有对应CSL文件,可以从https://github.com/citation-style-language/styles下载。
File Structure
文件结构
project/
├── manuscript.md # Input: document with citations
├── bibliography/
│ └── bibliography.md # Final output (Phase 4)Phases 1–3 produce conversation output only. No intermediate files are saved.
project/
├── manuscript.md # 输入:带引用的文档
├── bibliography/
│ └── bibliography.md # 最终输出(阶段4)阶段1-3仅生成对话输出,不保存任何中间文件。
Key Reminders
重要提醒
- must be present and readable in the project directory
references.bib - Author names vary: Match flexibly against the BibTeX field (last name + year first, then refine)
author - Multiple matches are possible: Same author may have multiple works per year in the file
.bib - Missing items: User may need to add entries to before proceeding
references.bib - Format matters: Confirm desired style before generating bibliography
- 必须存在且在项目目录中可读取
references.bib - 作者姓名存在变体:灵活匹配BibTeX的字段(优先匹配姓氏+年份,再细化校验)
author - 可能存在多个匹配:同一作者同一年份可能在文件中有多个作品
.bib - 缺失条目:用户可能需要先向中添加对应条目再继续
references.bib - 格式很重要:生成参考文献前确认预期样式
Starting the Process
启动流程
When the user is ready to begin:
-
Ask for the manuscript:"Please share the path to your manuscript file (markdown, .docx, or .txt)."
-
Confirm citation style:"I'll extract Author-Year citations. What bibliography format do you need? (APA, ASA, Chicago, other)"
-
Locate:
references.bib"Let me verify thefile is present. Is it in the project root or areferences.bibsubdirectory?"references/ -
Proceed with Phase 0 to read the document and inventory citations.
当用户准备好开始时:
-
索要手稿:"请分享你的手稿文件路径(markdown、.docx或.txt格式)。"
-
确认引用样式:"我将提取作者-年份格式的引用,你需要什么格式的参考文献?(APA、ASA、Chicago或其他)"
-
定位:
references.bib"我将验证文件是否存在,它是在项目根目录还是references.bib子目录下?"references/ -
进入阶段0读取文档并统计引用清单。