bib-search-citation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Bib Search Citation

Bib搜索引用

Overview

概述

Use this skill when the user provides a
.bib
file and wants research-oriented retrieval rather than just a single citation key lookup. This skill is designed for large bibliographies with mixed standard and custom fields, including fields such as
shorttitle
,
annotation
,
keywords
,
abstract
, and
file
.
Follow this workflow:
  1. Identify the
    .bib
    file to use.
  2. If
    rtk
    is available, prefer it for exploratory steps such as locating
    .bib
    files and inspecting representative fields.
  3. Translate the user's request into either a JSON search spec or a compact query expression.
  4. Run
    scripts/search_bib.py
    on the
    .bib
    file and keep its JSON output uncompressed.
  5. Optionally pipe the JSON into
    scripts/preview_bib_search.py
    for a compact human-readable summary.
  6. Review the results and present the best matches clearly.
  7. Include LaTeX and/or Typst citation snippets whenever the user asks for them or would benefit from them.
当用户提供.bib文件并需要面向研究的检索,而非仅查找单个引用键时,使用此skill。该skill专为包含标准字段和自定义字段的大型参考文献目录设计,涵盖
shorttitle
annotation
keywords
abstract
file
等字段。
遵循以下工作流:
  1. 确定要使用的.bib文件。
  2. 如果
    rtk
    可用,优先用它完成探索步骤,比如定位.bib文件和查看代表性字段。
  3. 将用户请求转换为JSON搜索规格或紧凑查询表达式。
  4. 在.bib文件上运行
    scripts/search_bib.py
    ,并保留其未压缩的JSON输出。
  5. 可选择性地将JSON传入
    scripts/preview_bib_search.py
    ,生成紧凑的人类可读摘要。
  6. 审核结果并清晰呈现最匹配的条目。
  7. 每当用户要求或从中受益时,包含LaTeX和/或Typst引用片段。

Input expectations

输入预期

The typical input is:
  • one
    .bib
    file provided by the user
  • a natural-language research query
  • optional structured filters such as year range, entry type, author, DOI presence, code availability, or custom field matches
  • optional compact filters such as
    author:cheng year>=2024 has:code type:article
  • optional output preferences such as
    latex
    ,
    typst
    ,
    both
    , or raw BibTeX
If the user gives a natural-language request only, infer a reasonable search spec and say what assumptions you made. If the user writes a compact filter expression directly, preserve it as closely as possible instead of converting it into vague prose.
典型输入包括:
  • 用户提供的一个.bib文件
  • 自然语言形式的研究查询
  • 可选的结构化筛选条件,如年份范围、条目类型、作者、DOI存在性、代码可用性或自定义字段匹配
  • 可选的紧凑筛选表达式,如
    author:cheng year>=2024 has:code type:article
  • 可选的输出偏好,如
    latex
    typst
    both
    或原始BibTeX
如果用户仅提供自然语言请求,推断合理的搜索规格并说明做出的假设。如果用户直接编写紧凑筛选表达式,尽可能保留其原样,而非转换为模糊的描述性文字。

Search planning

搜索规划

Before running the script, map the request into a search spec.
运行脚本前,将请求映射为搜索规格。

Common spec fields

常见规格字段

  • query
    : free-text topic query
  • filters.year_min
    ,
    filters.year_max
    ,
    filters.years_in
    ,
    filters.exclude_years
  • filters.author_contains
    ,
    filters.author_excludes
  • filters.type_in
    ,
    filters.exclude_type_in
  • filters.has
    ,
    filters.exclude_has
  • filters.field_contains
    ,
    filters.field_excludes
  • sort
    :
    relevance
    ,
    year_desc
    ,
    year_asc
    , or
    title
  • limit
    : default 5 unless the user asks for more
  • return_fields
    : fields to expose in the answer
  • include_raw_bib
    :
    true
    when the user asks for the original entry or when exact export matters
  • citation_mode
    :
    latex
    ,
    typst
    ,
    both
    , or
    none
  • query
    :自由文本主题查询
  • filters.year_min
    filters.year_max
    filters.years_in
    filters.exclude_years
  • filters.author_contains
    filters.author_excludes
  • filters.type_in
    filters.exclude_type_in
  • filters.has
    filters.exclude_has
  • filters.field_contains
    filters.field_excludes
  • sort
    relevance
    year_desc
    year_asc
    title
  • limit
    :默认5条,除非用户要求更多
  • return_fields
    :答案中要展示的字段
  • include_raw_bib
    :当用户要求原始条目或精确导出很重要时设为
    true
  • citation_mode
    latex
    typst
    both
    none

Heuristics for natural-language requests

自然语言请求的启发式规则

Use these defaults unless the user says otherwise:
  • research discovery request ->
    sort: relevance
  • no explicit limit ->
    limit: 5
  • no explicit field list -> return the research-oriented default fields:
    key
    ,
    title
    ,
    shorttitle
    ,
    author
    ,
    year
    ,
    venue
    ,
    doi
    ,
    eprint
    ,
    keywords
    ,
    annotation
    ,
    abstract
  • asks for "original", "full entry", or "bib" ->
    include_raw_bib: true
  • asks for both LaTeX and Typst, or just says "citation" in a mixed writing workflow ->
    citation_mode: both
除非用户另有说明,否则使用以下默认值:
  • 研究发现类请求 ->
    sort: relevance
  • 无明确数量限制 ->
    limit: 5
  • 无明确字段列表 -> 返回面向研究的默认字段:
    key
    title
    shorttitle
    author
    year
    venue
    doi
    eprint
    keywords
    annotation
    abstract
  • 用户要求“原始”、“完整条目”或“bib” ->
    include_raw_bib: true
  • 用户要求同时提供LaTeX和Typst,或在混合写作工作流中仅提及“引用” ->
    citation_mode: both

Compact query language

紧凑查询语言

The script can parse direct query expressions inside
--query
, and it can also parse them when they appear inside
spec.query
.
Supported compact operators:
  • author:cheng
  • year>=2024
  • year:2024
    or
    year:2023,2024
  • type:article,misc
  • -type:misc
  • has:code,doi
  • -has:pdf
  • annotation:CodeAvailable
  • keywords:mamba
  • sort:year_desc
  • limit:10
  • fields:key,title,year,doi
  • cite:latex
    ,
    cite:typst
    , or
    cite:both
  • raw:true
Unstructured tokens that do not match the compact syntax remain part of the topic query.
脚本可以解析
--query
中的直接查询表达式,也可以解析
spec.query
中的此类表达式。
支持的紧凑运算符:
  • author:cheng
  • year>=2024
  • year:2024
    year:2023,2024
  • type:article,misc
  • -type:misc
  • has:code,doi
  • -has:pdf
  • annotation:CodeAvailable
  • keywords:mamba
  • sort:year_desc
  • limit:10
  • fields:key,title,year,doi
  • cite:latex
    cite:typst
    cite:both
  • raw:true
不匹配紧凑语法的非结构化标记将保留为主题查询的一部分。

Supported
has
values

支持的
has

The script supports these useful
has
values:
  • doi
  • abstract
  • keywords
  • annotation
  • shorttitle
  • eprint
  • pdf
  • code
code
is inferred from fields such as
url
,
abstract
,
keywords
,
annotation
,
note
, or
howpublished
that mention GitHub, GitLab, code, repository, or source.
For more examples, see
references/query-syntax.md
.
脚本支持以下实用的
has
值:
  • doi
  • abstract
  • keywords
  • annotation
  • shorttitle
  • eprint
  • pdf
  • code
code
是从提及GitHub、GitLab、代码、仓库或源代码的字段(如
url
abstract
keywords
annotation
note
howpublished
)推断而来。
更多示例请查看
references/query-syntax.md

Running the script

运行脚本

Run the script with a JSON spec, a spec file, or a compact query.
使用JSON规格、规格文件或紧凑查询运行脚本。

RTK Fast Path

RTK快速路径

If
rtk
is available, prefer it only for model-facing exploration:
  • locate bibliography files with
    rtk find . -name "*.bib"
  • inspect a representative slice with
    rtk read /path/to/library.bib -l aggressive -m 80
  • confirm whether fields such as DOI, keywords, annotation, or eprint are present with
    rtk grep "doi|keywords|annotation|eprint" /path/to/library.bib
Keep machine-readable search results on the raw script path:
  • use raw
    python scripts/search_bib.py ...
    whenever another tool or script needs JSON
  • do not wrap
    search_bib.py
    output with RTK compression
  • use
    python scripts/preview_bib_search.py
    only after JSON has already been produced
如果
rtk
可用,仅将其用于面向模型的探索:
  • 使用
    rtk find . -name "*.bib"
    定位参考文献文件
  • 使用
    rtk read /path/to/library.bib -l aggressive -m 80
    查看代表性片段
  • 使用
    rtk grep "doi|keywords|annotation|eprint" /path/to/library.bib
    确认是否存在DOI、关键词、注释或eprint等字段
在原始脚本路径上保留机器可读的搜索结果:
  • 每当其他工具或脚本需要JSON时,使用原始的
    python scripts/search_bib.py ...
  • 不要用RTK压缩包裹
    search_bib.py
    的输出
  • 仅在生成JSON后使用
    python scripts/preview_bib_search.py

Inline JSON example

内联JSON示例

bash
python scripts/search_bib.py \
  --bib /path/to/library.bib \
  --spec-json '{
    "query": "mamba time series forecasting author:Cheng year>=2024 has:code",
    "sort": "relevance",
    "limit": 5,
    "citation_mode": "both",
    "include_raw_bib": false
  }'
bash
python scripts/search_bib.py \
  --bib /path/to/library.bib \
  --spec-json '{
    "query": "mamba time series forecasting author:Cheng year>=2024 has:code",
    "sort": "relevance",
    "limit": 5,
    "citation_mode": "both",
    "include_raw_bib": false
  }'

Compact query example

紧凑查询示例

bash
python scripts/search_bib.py \
  --bib /path/to/library.bib \
  --query 'mamba time series forecasting author:Cheng year>=2024 has:code type:article,misc cite:both limit:5'
bash
python scripts/search_bib.py \
  --bib /path/to/library.bib \
  --query 'mamba time series forecasting author:Cheng year>=2024 has:code type:article,misc cite:both limit:5'

Spec file example

规格文件示例

bash
python scripts/search_bib.py --bib /path/to/library.bib --spec-file /path/to/spec.json
bash
python scripts/search_bib.py --bib /path/to/library.bib --spec-file /path/to/spec.json

Human-readable preview example

人类可读预览示例

bash
python scripts/search_bib.py \
  --bib /path/to/library.bib \
  --query 'mamba time series forecasting author:Cheng year>=2024 has:code type:article,misc cite:both limit:5' \
| python scripts/preview_bib_search.py
If the user uploads a
.bib
file into the conversation, first make sure you know its local path in the execution environment, then run the script against that file.
bash
python scripts/search_bib.py \
  --bib /path/to/library.bib \
  --query 'mamba time series forecasting author:Cheng year>=2024 has:code type:article,misc cite:both limit:5' \
| python scripts/preview_bib_search.py
如果用户在对话中上传了.bib文件,请先确认其在执行环境中的本地路径,再针对该文件运行脚本。

Output expectations

输出预期

When presenting results to the user, prefer this order:
  1. brief summary of how many strong matches were found
  2. top matches with the requested research fields
  3. citation snippets in the requested format
  4. raw BibTeX only when requested or materially useful
For each selected entry, usually include:
  • citation key
  • title and optional shorttitle
  • authors
  • year and venue
  • DOI and/or eprint when present
  • the most relevant supporting fields for the query, such as keywords, annotation, or a short abstract excerpt
If the user asked for a compact query, it is helpful to echo the interpreted filters briefly, especially when negation or multiple field filters are involved.
When using the preview helper:
  • treat it as a compact rendering of the JSON, not as a separate search engine
  • keep
    search_bib.py
    as the source of truth for filtering, scoring, sorting, and citations
  • do not rely on the preview output when exact raw BibTeX preservation matters
向用户呈现结果时,优先遵循以下顺序:
  1. 简要说明找到多少个匹配度高的条目
  2. 带有所请求研究字段的顶级匹配条目
  3. 所请求格式的引用片段
  4. 仅在被要求或确实有用时提供原始BibTeX
对于每个选中的条目,通常包含:
  • 引用键
  • 标题及可选的短标题
  • 作者
  • 年份和期刊
  • DOI和/或eprint(如果存在)
  • 与查询最相关的支持字段,如关键词、注释或摘要的简短摘录
如果用户使用了紧凑查询,简要回显解析后的筛选条件会很有帮助,尤其是涉及否定或多字段筛选时。
使用预览工具时:
  • 将其视为JSON的紧凑渲染形式,而非独立的搜索引擎
  • search_bib.py
    作为筛选、评分、排序和引用的唯一可信来源
  • 当需要精确保留原始BibTeX时,不要依赖预览输出

Citation formatting rules

引用格式规则

LaTeX

LaTeX

When
citation_mode
includes
latex
, expose:
  • \\cite{key}
  • \\parencite{key}
  • \\textcite{key}
These are intended for
biblatex
workflows. If the user only wants one form, show only that form.
citation_mode
包含
latex
时,展示:
  • \\cite{key}
  • \\parencite{key}
  • \\textcite{key}
这些适用于
biblatex
工作流。如果用户只需要一种形式,仅展示该形式。

Typst

Typst

When
citation_mode
includes
typst
, expose:
  • @key
    when the key is simple enough for shorthand usage
  • #cite(<key>)
    when shorthand is fine
  • #cite(label("key"))
    when the key contains characters that make shorthand fragile
If the script reports
typst.needs_label = true
, prefer the explicit
label("...")
form instead of shorthand.
citation_mode
包含
typst
时,展示:
  • 当键足够简单时使用
    @key
    (简写形式)
  • 适合简写时使用
    #cite(<key>)
  • 当键包含会导致简写不稳定的字符时使用
    #cite(label("key"))
如果脚本报告
typst.needs_label = true
,优先使用显式的
label("...")
形式而非简写。

Result quality checks

结果质量检查

Before answering:
  • make sure the returned entries satisfy the user's explicit filters
  • do not overclaim topic relevance; if results are only approximate, say so
  • when several entries are similar, explain the difference briefly
  • preserve raw BibTeX exactly when quoting the original entry
回复前请确认:
  • 返回的条目满足用户明确的筛选条件
  • 不要夸大主题相关性;如果结果只是近似匹配,请说明
  • 当多个条目相似时,简要解释它们的差异
  • 引用原始条目时,完全保留原始BibTeX内容

Error handling

错误处理

Parse errors

解析错误

If the
.bib
file contains malformed entries (unbalanced braces, encoding issues, truncated fields), the script skips those entries silently and processes the rest. When a file fails to parse entirely, check the encoding (the script assumes UTF-8) and look for obvious structural corruption such as missing closing braces.
如果.bib文件包含格式错误的条目(括号不匹配、编码问题、字段截断),脚本会静默跳过这些条目并处理其余部分。当文件完全无法解析时,请检查编码(脚本默认使用UTF-8)并查找明显的结构损坏,如缺失闭合括号。

Empty result sets

空结果集

When zero entries match, suggest broadening the search:
  • remove
    has:
    constraints (e.g.
    has:code
    excludes many entries)
  • widen the year range or drop it entirely
  • use fewer or shorter topic keywords
  • check author name spelling or try partial matches
当没有匹配条目时,建议放宽搜索条件:
  • 移除
    has:
    约束(例如
    has:code
    会排除许多条目)
  • 扩大年份范围或完全移除年份限制
  • 使用更少或更短的主题关键词
  • 检查作者姓名拼写或尝试部分匹配

Large file performance

大文件性能

The script is pure Python with a linear scan and no external dependencies. For typical academic libraries (up to ~10,000 entries) it completes in seconds. For very large files (50,000+ entries), expect proportionally longer runtimes but no functional issues.
该脚本为纯Python实现,采用线性扫描且无外部依赖。对于典型的学术文献库(最多约10,000条条目),可在数秒内完成。对于超大文件(50,000+条条目),运行时间会相应延长,但不会出现功能问题。

Resources

资源

  • scripts/search_bib.py
    : parses
    .bib
    files, applies filters, ranks results, and formats citation snippets
  • scripts/preview_bib_search.py
    : renders
    search_bib.py
    JSON into a compact human-readable summary
  • references/query-syntax.md
    : examples for mapping user requests into structured search specs and compact expressions
  • scripts/search_bib.py
    :解析.bib文件、应用筛选条件、对结果排序并格式化引用片段
  • scripts/preview_bib_search.py
    :将
    search_bib.py
    的JSON输出渲染为紧凑的人类可读摘要
  • references/query-syntax.md
    :将用户请求映射为结构化搜索规格和紧凑表达式的示例