kb-retriever

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

本地知识库检索 Skill（kb-retriever）

Local Knowledge Base Retrieval Skill (kb-retriever)

知识库目录说明

Knowledge Base Directory Description

知识库存放在一个根目录下，包含多种文件类型（如
```
.md
```
/
```
.txt
```
、
```
.pdf
```
、
```
.xlsx
```
等），通常按类型或业务用途拆分为多级子目录。
采用分层目录索引文件：
- 根目录有一个
```
data_structure.md
```
  ，说明主要的「领域目录」及其用途。
- 每个领域目录下可以有自己的
```
data_structure.md
```
  ，说明该目录下有哪些子目录/文件，以及各自用途。
- 更深一层的子目录也可以继续有
```
data_structure.md
```
  ，形成多级索引树。
知识库根目录约定：
- 默认认为知识库位于当前项目根目录下的
```
knowledge/
```
  目录。
- 如果用户在对话中明确指定了其他路径（例如“我的知识库在 /data/kb”或“用 ./docs 这个目录作为知识库”），则以用户指定的路径作为根目录。
- 当默认路径
```
knowledge/
```
  不存在或访问失败时，应向用户确认实际的知识库根目录位置，而不是随意猜测。
单个业务文件可能很大：
- 不要直接用 Read 读取整文件
- 对 PDF、Excel 使用对应 Skill 进行结构化处理后，再结合 grep/局部读取做精细检索

The knowledge base is stored in a root directory, containing multiple file types (such as
```
.md
```
/
```
.txt
```
,
```
.pdf
```
,
```
.xlsx
```
, etc.), usually split into multi-level subdirectories by type or business purpose.
Adopt hierarchical directory index files:
- The root directory has a
```
data_structure.md
```
  that describes the main "domain directories" and their purposes.
- Each domain directory can have its own
```
data_structure.md
```
  , explaining which subdirectories/files are under this directory and their respective purposes.
- Deeper subdirectories can also have
```
data_structure.md
```
  , forming a multi-level index tree.
Knowledge base root directory conventions:
- By default, the knowledge base is located in the
```
knowledge/
```
  directory under the current project root.
- If the user explicitly specifies another path in the conversation (e.g., "My knowledge base is in /data/kb" or "Use ./docs as the knowledge base directory"), use the user-specified path as the root directory.
- When the default path
```
knowledge/
```
  does not exist or access fails, confirm the actual knowledge base root directory location with the user instead of guessing randomly.
Individual business files may be large:
- Do not directly read the entire file using Read
- For PDF and Excel files, perform structured processing using corresponding Skills first, then conduct precise retrieval combined with grep/local reading

定位

knowledge

根目录

Locate the

knowledge

Root Directory

根目录优先听用户：如果用户给了路径（如
```
./docs
```
、
```
./knowledge-personal
```
），直接用用户提供的路径。
默认根目录：否则约定根目录为当前项目下的
```
knowledge/
```
。
- 使用 shell 显式检查目录是否存在：优先使用
```
test -d knowledge
```
  ，或退而求其次使用
```
ls -d knowledge
```
  。
- 注意：禁止使用
```
Glob "knowledge" in .
```
  这类模式来判断目录是否存在，
```
Glob
```
  只返回文件路径，不返回目录本身，空结果并不能区分“目录不存在”和“目录存在但为空”。
只有在根目录已通过
```
test -d
```
等方式确认存在时，才使用 Glob 在该目录下检索内容，并把目录作为
```
path
```
，例如：
- 索引文件：
```
pattern="**/data_structure.md"
```
  ,
```
path="knowledge"
```
- 所有 Markdown：
```
pattern="**/*.md"
```
  ,
```
path="knowledge"
```
如果默认
```
knowledge/
```
不存在（
```
test -d
```
失败）：不要猜测其他目录，明确告诉用户未找到默认根目录，并让用户指定实际知识库路径。

Prioritize user-specified root directory: If the user provides a path (such as
```
./docs
```
,
```
./knowledge-personal
```
), use the user-provided path directly.
Default root directory: Otherwise, the root directory is约定为
```
knowledge/
```
under the current project.
- Explicitly check if the directory exists using shell commands: Prefer
```
test -d knowledge
```
  , or use
```
ls -d knowledge
```
  as a fallback.
- Note: Prohibit using patterns like
```
Glob "knowledge" in .
```
  to determine directory existence.
```
Glob
```
  only returns file paths, not directories themselves, and an empty result cannot distinguish between "directory does not exist" and "directory exists but is empty".
Only use Glob to retrieve content under the directory when the root directory is confirmed to exist via
```
test -d
```
or similar methods, and specify the directory as
```
path
```
, for example:
- Index files:
```
pattern="**/data_structure.md"
```
  ,
```
path="knowledge"
```
- All Markdown files:
```
pattern="**/*.md"
```
  ,
```
path="knowledge"
```
If the default
```
knowledge/
```
does not exist (failed
```
test -d
```
): Do not guess other directories, clearly inform the user that the default root directory was not found, and ask the user to specify the actual knowledge base path.

关键原则：先学习，再处理

Key Principle: Learn First, Then Process

遇到 PDF 或 Excel 文件时的强制检查清单：

✅ 已读取对应的 references 文档学习处理方法
✅ 已理解推荐的工具和命令
✅ 已将文件处理（提取/转换）完成
⏭️ 现在可以开始检索

禁止行为：

❌ 在未读取 pdf_reading.md 的情况下直接尝试处理 PDF
❌ 在未读取 excel_reading.md 的情况下直接尝试处理 Excel
❌ 跳过文件处理步骤，直接对原始 PDF/Excel 进行检索

Mandatory Checklist When Encountering PDF or Excel Files:

✅ Have read the corresponding references document to learn processing methods
✅ Have understood the recommended tools and commands
✅ Have completed file processing (extraction/conversion)
⏭️ Now you can start retrieval

Prohibited Actions:

❌ Attempt to process PDF directly without reading pdf_reading.md
❌ Attempt to process Excel directly without reading excel_reading.md
❌ Skip file processing steps and directly retrieve from original PDF/Excel files

总体流程

Overall Process

理解用户需求
- 读用户问题，提取：
  - 主题/领域关键词（如“销售报表”“系统架构”“接口文档”）
  - 时间或范围限定（如“2023 年 Q1”“最近版本”）
  - 需要的输出类型（解释、摘要、具体字段数值等）
- 确定知识库根目录：
  - 优先检查用户是否在问题中指定了知识库路径。
  - 否则使用默认根目录
```
knowledge/
```
    。
  - 若默认根目录不存在或目录结构异常，应向用户询问确认，而不是自行假设。
分层查看目录索引
```
data_structure.md
```
- 使用一个「当前工作目录」的概念：
  - 默认从用户指定的知识库根目录开始；如果用户未指定，则使用当前目录。
- 在当前工作目录下，如果存在
```
data_structure.md
```
  ：
  - 使用 Read 读取该文件的前若干行（例如 limit=300），必要时分段继续读取。
  - 目标：
    - 了解当前目录下有哪些子目录和文件
    - 理解每个子目录/文件的用途说明
  - 基于用户问题，挑选最相关的若干个子目录或文件，构成候选集合。
- 对于候选子目录：
  - 递归进入该子目录，将其作为新的「当前工作目录」，继续查找其中的
```
data_structure.md
```
    并重复上述过程。
  - 在递归过程中，避免一次性深入所有分支，优先沿着与问题最相关的路径向下钻取。
- 对于候选业务文件（md/文本、PDF、Excel 等）：
  - 在完成必要的目录层级探索后，收集这些文件为最终的检索目标列表。
- 在优先级排序时：
  - 优先选择用途说明与问题主题高度匹配的领域目录和文件
  - 其次考虑时间/版本等约束（如果索引中有体现）
  - 通用说明类文档（如 README.md、总体设计类文档）放在较后优先级
学习文件处理方法（遇到 PDF/Excel 时强制执行）
- 在处理 PDF 文件前：
  - 必须先读取 references/pdf_reading.md（注意这个目录位于 Skills 目录下，而不是 Knowledge 目录下）学习提取方法
  - 重点了解：pdftotext 命令、pdfplumber 用法、表格提取方法
- 在处理 Excel 文件前：
  - 必须先读取 references/excel_reading.md学习读取方法
  - 必须先读取 references/excel_analysis.md学习分析方法
  - 重点了解：pandas 读取、列筛选、数据过滤
- 目的：确保使用正确的工具和方法，避免盲目检索
按文件类型执行处理和检索
- 使用刚学到的方法处理文件（提取、转换、结构化）
- 对每类候选文件，按照下面「Markdown/文本」「PDF」「Excel」策略执行
- 总原则：
  - 优先从最相关、最精确的文件开始
  - 每个文件内都渐进式地局部检索，避免一次性加载全内容
  - 若当前文件得不到满意信息，切换到下一个候选文件
迭代检索
- 所有文件类型都使用统一的「多轮迭代检索机制」（见上文公共检索原则）
答案组织与溯源
- 汇总多轮检索得到的上下文，综合回答用户问题。
- 尽量：
  - 给出清晰、直接的回答
  - 指出使用过的文件名（必要时包含大致位置，如章节或大概行数/页数）
- 如果答案基于推断或信息不完全：
  - 明确标注假设与不确定性
  - 提示用户可以补充更具体的文件范围或关键词

Understand User Requirements
- Read the user's question and extract:
  - Topic/domain keywords (e.g., "sales report", "system architecture", "interface documentation")
  - Time or scope constraints (e.g., "Q1 2023", "latest version")
  - Required output type (explanation, summary, specific field values, etc.)
- Determine the knowledge base root directory:
  - First check if the user specified a knowledge base path in the question.
  - Otherwise, use the default root directory
```
knowledge/
```
    .
  - If the default root directory does not exist or has an abnormal structure, ask the user for confirmation instead of making assumptions.
View Directory Index
```
data_structure.md
```
Hierarchically
- Use the concept of a "current working directory":
  - Start from the user-specified knowledge base root directory by default; if not specified, use the current directory.
- If
```
data_structure.md
```
  exists in the current working directory:
  - Use Read to read the first few lines (e.g., limit=300), and read in segments if necessary.
  - Objectives:
    - Understand which subdirectories and files are under the current directory
    - Understand the purpose description of each subdirectory/file
  - Based on the user's question, select the most relevant subdirectories or files to form a candidate set.
- For candidate subdirectories:
  - Recursively enter the subdirectory, set it as the new "current working directory", continue to find the
```
data_structure.md
```
    inside and repeat the above process.
  - During recursion, avoid diving into all branches at once, prioritize drilling down along the path most relevant to the question.
- For candidate business files (md/text, PDF, Excel, etc.):
  - After completing the necessary directory level exploration, collect these files as the final retrieval target list.
- When prioritizing:
  - Prioritize domain directories and files whose purpose descriptions highly match the question topic
  - Secondly consider constraints such as time/version (if reflected in the index)
  - General explanatory documents (such as README.md, overall design documents) are given lower priority
Learn File Processing Methods (Mandatory When Encountering PDF/Excel)
- Before processing PDF files:
  - Must first read references/pdf_reading.md (note this directory is under the Skills directory, not the Knowledge directory) to learn extraction methods
  - Focus on understanding: pdftotext command, pdfplumber usage, table extraction methods
- Before processing Excel files:
  - Must first read references/excel_reading.md to learn reading methods
  - Must first read references/excel_analysis.md to learn analysis methods
  - Focus on understanding: pandas reading, column filtering, data filtering
- Purpose: Ensure correct tools and methods are used, avoid blind retrieval
Execute Processing and Retrieval by File Type
- Process files using the newly learned methods (extraction, conversion, structuring)
- For each type of candidate file, execute the strategy below for "Markdown/Text", "PDF", "Excel"
- General principles:
  - Start with the most relevant and precise files first
  - Perform progressive local retrieval within each file, avoid loading the entire content at once
  - Switch to the next candidate file if satisfactory information cannot be obtained from the current file
Iterative Retrieval
- All file types use a unified "multi-round iterative retrieval mechanism" (see Public Retrieval Principles above)
Answer Organization and Traceability
- Summarize the context obtained from multiple rounds of retrieval and comprehensively answer the user's question.
- Try to:
  - Provide clear and direct answers
  - Indicate the file names used (include approximate locations if necessary, such as chapters or approximate line numbers/page numbers)
- If the answer is based on inference or incomplete information:
  - Clearly mark assumptions and uncertainties
  - Prompt the user to supplement more specific file ranges or keywords

公共检索原则

Public Retrieval Principles

关键词选择策略

Keyword Selection Strategy

从用户问题提取 3-8 个关键词（含可能的英文缩写、同义词、上位/下位词）
可组合词组（如 "销售报表"、"API 接口超时"）
必要时包含业务词、技术术语、常见缩写（如 "uv"、"pv"、"GMV"）

Extract 3-8 keywords from the user's question (including possible English abbreviations, synonyms, hypernyms/hyponyms)
Can combine phrases (e.g., "sales report", "API interface timeout")
Include business terms, technical terminology, common abbreviations if necessary (e.g., "uv", "pv", "GMV")

grep 检索基本原则

Basic grep Retrieval Principles

始终指定尽量精准的 include 和 path，避免搜索整个目录
pattern 优先尝试问题中的核心名词、术语，再尝试同义词
对于每个命中，只读取匹配附近的局部区域（上下若干行）
保存「文件名 + 位置信息 + 文本片段」

Always specify as precise include and path as possible, avoid searching the entire directory
Prioritize trying core nouns and terms from the question as patterns, then try synonyms
For each hit, only read the local area near the match (several lines above and below)
Save "file name + location information + text snippet"

多轮迭代检索机制（最多 5 次）

Multi-round Iterative Retrieval Mechanism (Max 5 Times)

所有文件类型都采用统一的迭代策略：

迭代控制
- 维护「已尝试检索次数」计数，最多 5 次
- 每次检索后累加计数
每轮迭代流程
1. 基于问题生成/更新检索关键词（可包括同义词、扩展词）
2. 选择尚未充分检索的文件或文件部分
3. 执行检索（grep/局部读取/专用 Skill 调用）
4. 分析获取的上下文片段
5. 判断是否足够回答问题
终止条件
- 找到足够支撑回答的上下文；或
- 已达到 5 次尝试仍未找到合适信息
信息不足时的处理
- 明确告知用户信息缺失或可能不在当前知识库中
- 提供已找到的最接近信息，并说明不确定性
- 提示用户可以如何缩小范围（更具体的文件名、关键词、时间范围等）

All file types adopt the same iterative strategy:

Iteration Control
- Maintain a "number of retrieval attempts" count, maximum 5 times
- Increment the count after each retrieval
Per Iteration Process
1. Generate/update retrieval keywords based on the question (can include synonyms, extended words)
2. Select files or file parts that have not been fully retrieved
3. Execute retrieval (grep/local reading/dedicated Skill call)
4. Analyze the obtained context snippets
5. Judge whether the information is sufficient to answer the question
Termination Conditions
- Found sufficient context to support the answer; or
- Reached 5 attempts without finding suitable information
Handling Insufficient Information
- Clearly inform the user that information is missing or may not be in the current knowledge base
- Provide the closest information found and explain the uncertainty
- Prompt the user how to narrow down the scope (more specific file names, keywords, time ranges, etc.)

注意事项

Notes

禁止第一次就直接调用：
```
Glob "knowledge" in .
```
或任何试图用 Glob 判定目录存在性的调用，目录存在性应通过 shell 命令（如
```
test -d
```
）检查。
使用本 Skill 查询知识库时，禁止使用网络搜索等其他工具获取知识

Prohibit directly calling
```
Glob "knowledge" in .
```
or any call attempting to determine directory existence using Glob for the first time. Directory existence should be checked via shell commands (such as
```
test -d
```
).
When using this Skill to query the knowledge base, prohibit using other tools such as web search to obtain knowledge

针对不同文件类型的具体策略

Specific Strategies for Different File Types

1. Markdown / 文本类文件（.md, .txt, .log 等）

1. Markdown / Text Files (.md, .txt, .log, etc.)

候选文件选择
- 根据
```
data_structure.md
```
  和文件名、路径判断相关度
- 优先检索标题和目录类文件（如汇总文档、设计总览）
grep 定位与局部读取
- 使用 Grep 工具对指定候选文件，include 限定具体后缀（如 "*.md"）
- 对于有匹配的文件，使用 Read 仅读取匹配附近的局部区域：
  - 通过行号偏移和 limit 控制读取（例如从匹配行附近往前后各读取几十行）
  - 避免整文件读取
特殊处理
- 如内容仅是目录/标题，根据链接或小节名继续定位深入内容
- 应用「多轮迭代检索机制」（见上文公共检索原则）

Candidate File Selection
- Judge relevance based on
```
data_structure.md
```
  , file names and paths
- Prioritize retrieving title and directory files (such as summary documents, design overviews)
grep Positioning and Local Reading
- Use the Grep tool for specified candidate files, limit specific suffixes with include (e.g., "*.md")
- For files with matches, use Read to only read the local area near the match:
  - Control reading via line number offset and limit (e.g., read dozens of lines before and after the matching line)
  - Avoid reading the entire file
Special Handling
- If the content is only a directory/title, continue to locate and dive deeper based on links or section names
- Apply the "multi-round iterative retrieval mechanism" (see Public Retrieval Principles above)

2. PDF 文件检索策略

2. PDF File Retrieval Strategy

工作流：

首先：读取处理方法指南
- 在处理任何 PDF 之前，必须先读取 references/pdf_reading.md（注意这个目录位于 Skills 目录下，而不是 Knowledge 目录下）
- 重点了解：pdftotext 命令、pdfplumber 用法、表格提取方法、快速决策表
选择候选 PDF
- 根据
```
data_structure.md
```
  中的描述，选择最相关的 1-3 个文件
- 如果用户指明具体 PDF 文件，则优先使用该文件
应用学到的方法提取文本
- 使用 pdf_reading.md 中推荐的工具（优先 pdftotext 或 pdfplumber）
- 重要：使用
```
pdftotext input.pdf output.txt
```
  将文本提取到文件，不要直接输出到 stdout（避免占用大量 token）
- 如需提取表格，使用 pdfplumber 的表格提取功能
对提取结果执行检索
- 使用 grep 对提取的文本进行关键词搜索
- 对于每个命中，提取命中附近范围的上下文（上下数十行或相邻几页）
- 保存「文件名 + 页码/大致位置 + 文本片段」
- 应用「多轮迭代检索机制」（见上文公共检索原则）

Workflow:

First: Read Processing Method Guide
- Before processing any PDF, must first read references/pdf_reading.md (note this directory is under the Skills directory, not the Knowledge directory)
- Focus on understanding: pdftotext command, pdfplumber usage, table extraction methods, quick decision table
Select Candidate PDFs
- Select the most relevant 1-3 files based on descriptions in
```
data_structure.md
```
- If the user specifies a specific PDF file, prioritize using that file
Extract Text Using Learned Methods
- Use tools recommended in pdf_reading.md (prefer pdftotext or pdfplumber)
- Important: Use
```
pdftotext input.pdf output.txt
```
  to extract text to a file, do not output directly to stdout (avoid occupying a large number of tokens)
- If table extraction is needed, use pdfplumber's table extraction function
Execute Retrieval on Extracted Results
- Use grep to perform keyword search on the extracted text
- For each hit, extract the context around the hit (dozens of lines or adjacent pages)
- Save "file name + page number/approximate location + text snippet"
- Apply the "multi-round iterative retrieval mechanism" (see Public Retrieval Principles above)

3. Excel 文件检索策略

3. Excel File Retrieval Strategy

工作流：

首先：读取处理方法指南
- 在处理任何 Excel 之前，必须先读取：
  - references/excel_reading.md - 学习如何读取工作表（注意这个目录位于 Skills 目录下，而不是 Knowledge 目录下）
  - references/excel_analysis.md - 学习如何分析数据（注意这个目录位于 Skills 目录下，而不是 Knowledge 目录下）
- 重点了解：pandas 读取方法、列筛选、数据过滤、聚合操作
选择候选 Excel
- 根据
```
data_structure.md
```
  和文件/工作表命名，选择最相关的表
- 优先选择包含「报表」「统计」「日志」「配置」「映射」等关键词的工作簿/工作表
- 若用户指明具体 Excel 文件，优先使用该文件
应用学到的方法探索结构
- 使用 pandas 读取前 10-50 行（使用
```
nrows
```
  参数限制）
- 重点掌握：列名/字段名、数据类型（数值、日期、文本）、关键字段
- 将列名与用户问题比对，识别潜在关键字段（如「收入」「销售额」「error_code」等）
执行数据检索和分析
- 使用学到的 pandas 方法进行过滤和聚合（如
```
df[df['column'] == value]
```
  ）
- 每次只读取匹配行附近的数据，避免一次性读取整表
- 如问题包含时间范围，在检索中加入时间过滤
- 应用「多轮迭代检索机制」（见上文公共检索原则）

Workflow:

First: Read Processing Method Guides
- Before processing any Excel, must first read:
  - references/excel_reading.md - Learn how to read worksheets (note this directory is under the Skills directory, not the Knowledge directory)
  - references/excel_analysis.md - Learn how to analyze data (note this directory is under the Skills directory, not the Knowledge directory)
- Focus on understanding: pandas reading methods, column filtering, data filtering, aggregation operations
Select Candidate Excel Files
- Select the most relevant sheets based on
```
data_structure.md
```
  and workbook/worksheet names
- Prioritize workbooks/worksheets containing keywords such as "report", "statistics", "log", "configuration", "mapping"
- If the user specifies a specific Excel file, prioritize using that file
Explore Structure Using Learned Methods
- Use pandas to read the first 10-50 rows (use the
```
nrows
```
  parameter to limit)
- Focus on mastering: column names/field names, data types (numeric, date, text), key fields
- Compare column names with the user's question to identify potential key fields (e.g., "revenue", "sales", "error_code", etc.)
Execute Data Retrieval and Analysis
- Use learned pandas methods for filtering and aggregation (e.g.,
```
df[df['column'] == value]
```
  )
- Only read data near matching rows each time, avoid reading the entire table at once
- If the question includes a time range, add time filtering to the retrieval
- Apply the "multi-round iterative retrieval mechanism" (see Public Retrieval Principles above)

与其他工具的协同

Collaboration with Other Tools

PDF 处理

PDF Processing

在处理 PDF 前必须先读取 references/pdf_reading.md 学习处理方法
使用 pdfplumber/pypdf 进行文本提取、表格提取、元数据读取
优先使用 pdftotext 命令行工具进行快速文本提取

Must first read references/pdf_reading.md to learn processing methods before handling PDF
Use pdfplumber/pypdf for text extraction, table extraction, metadata reading
Prefer the pdftotext command-line tool for fast text extraction

Excel 处理

Excel Processing

在处理 Excel 前必须先读取：
- references/excel_reading.md - 学习读取方法
- references/excel_analysis.md - 学习分析方法
使用 pandas 进行数据探索、预览、过滤和分析

Must first read before handling Excel:
- references/excel_reading.md - Learn reading methods
- references/excel_analysis.md - Learn analysis methods
Use pandas for data exploration, preview, filtering and analysis

工具使用原则

Tool Usage Principles

Grep：用于按关键词在指定文件中查找行号与匹配片段，始终指定尽量精准的 include 和 path
Read：只用于局部读取文件，始终设置合理的 limit（如 200-500 行）和合适的偏移
对于任何可能很大的文件：
- 禁止直接从头读到尾
- 始终先通过索引、目录、关键词等方式缩小范围后再读

Grep: Used to find line numbers and matching snippets by keyword in specified files, always specify as precise include and path as possible
Read: Only used for local file reading, always set a reasonable limit (e.g., 200-500 lines) and appropriate offset
For any potentially large file:
- Prohibit reading from start to end directly
- Always narrow down the scope first via index, directory, keywords, etc., then read

回答风格与错误处理

Answer Style and Error Handling

回答风格
- 尽量用用户提问的语言（中文/英文）作答。
- 先给出结论，再给出简要依据。
- 如需要，可在后面列出引用的文件和大致位置，例如：
  - 来源：design/api_gateway.md 第 100 行附近
  - 来源：reports/2023_Q1_sales.xlsx Summary 工作表
信息缺失或不确定时
- 明确说明在当前知识库中没有找到完全匹配的信息或只能部分回答。
- 不臆造事实。
- 提示用户可以如何帮助缩小范围：
  - 指定更具体的目录/文件
  - 提供更精确的关键词或字段名
  - 指定时间/版本范围

Answer Style
- Try to answer in the language used by the user (Chinese/English).
- Give the conclusion first, then brief basis.
- If needed, list the referenced files and approximate locations at the end, for example:
  - Source: design/api_gateway.md near line 100
  - Source: reports/2023_Q1_sales.xlsx Summary worksheet
When Information is Missing or Uncertain
- Clearly state that no fully matching information was found in the current knowledge base or only partial answers can be provided.
- Do not fabricate facts.
- Prompt the user how to help narrow down the scope:
  - Specify more specific directories/files
  - Provide more precise keywords or field names
  - Specify time/version ranges