datasheet-intelligence
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDatasheet Intelligence
数据手册智能处理
Purpose
用途
Ingest multi-format datasheets into a single Markdown-first knowledge base so agents can answer hardware questions using consistent structure.
将多种格式的数据手册导入到以Markdown为主的统一知识库中,使agents能够通过一致的结构回答硬件相关问题。
Prerequisites
前置条件
Check first:
uvbash
uv --versionIf is not installed, install it by OS:
uvsh
undefined首先检查是否安装:
uvbash
uv --version如果未安装,请根据操作系统进行安装:
uvsh
undefinedmacOS (Homebrew)
macOS(Homebrew)
brew install uv
brew install uv
Linux (official installer)
Linux(官方安装脚本)
curl -LsSf https://astral.sh/uv/install.sh | sh
curl -LsSf https://astral.sh/uv/install.sh | sh
Windows (WinGet)
Windows(WinGet)
winget install --id=astral-sh.uv -e
After installation, restart the shell and verify `uv --version` again.winget install --id=astral-sh.uv -e
安装完成后,请重启终端并再次验证`uv --version`是否能正常输出。Workflow
工作流程
- Place source datasheets in a folder (for example ).
docs/datasheets/ - Run with
scripts/ingest_docs.py.uv run --with docling python3 - Read outputs from :
.context/knowledge/- : normalized markdown
<doc>/<doc>.md - : section chunks for retrieval
<doc>/<doc>.sections.jsonl - : table-focused markdown
<doc>/<doc>.tables.md - : conversion metadata and validation info
<doc>/<doc>.meta.json - : raw Docling structured export
<doc>/<doc>.docling.json - : extracted images
<doc>/_images/* - : corpus manifest
knowledge.index.json
- Use search/read helpers:
- for corpus search
scripts/search_docs.py - for focused section reads
scripts/read_docs.py
- 将源数据手册放入指定文件夹(例如)。
docs/datasheets/ - 使用命令运行
uv run --with docling python3。scripts/ingest_docs.py - 从目录读取输出文件:
.context/knowledge/- :标准化Markdown文件
<doc>/<doc>.md - :用于检索的章节块
<doc>/<doc>.sections.jsonl - :聚焦表格的Markdown文件
<doc>/<doc>.tables.md - :转换元数据和验证信息
<doc>/<doc>.meta.json - :Docling导出的原始结构化数据
<doc>/<doc>.docling.json - :提取出的图片文件
<doc>/_images/* - :语料库清单
knowledge.index.json
- 使用搜索/读取辅助工具:
- :用于语料库搜索
scripts/search_docs.py - :用于读取指定章节内容
scripts/read_docs.py
Commands
命令示例
bash
undefinedbash
undefinedIngest all supported datasheets from a directory
从指定目录导入所有支持格式的数据手册
uv run --with docling python3 scripts/ingest_docs.py docs/datasheets --output-dir .context/knowledge
uv run --with docling python3 scripts/ingest_docs.py docs/datasheets --output-dir .context/knowledge
Ingest only top-level files and skip OCR for faster runs
仅导入顶层文件并跳过OCR以加快运行速度
uv run --with docling python3 scripts/ingest_docs.py docs/datasheets --non-recursive --no-ocr
uv run --with docling python3 scripts/ingest_docs.py docs/datasheets --non-recursive --no-ocr
Search and focused read
搜索和指定章节读取
uv run python3 scripts/search_docs.py "SPI0 address" --knowledge-dir .context/knowledge
uv run python3 scripts/read_docs.py exynos_spi_v1 --anchor section-4-2
undefineduv run python3 scripts/search_docs.py "SPI0 address" --knowledge-dir .context/knowledge
uv run python3 scripts/read_docs.py exynos_spi_v1 --anchor section-4-2
undefinedOperational Rules
操作规则
- Use command-first guidance (above) instead of low-level converter internals.
- Prefer commands and do not assume a global
uv runalias.python - Keep output location at unless user asks otherwise.
.context/knowledge - Before the first run, ensure generated artifacts are ignored by git (or
.context/in.context/knowledge/)..gitignore - Preserve image presence in markdown with entries.
 - Keep section anchors and normalize textual references like into markdown links when anchor targets exist.
See Table 4.2 - Run on all provided files and summarize failures without stopping the whole batch unless explicitly requested.
- For detailed CLI options and execution presets, read .
references/execution-options.md - If is unavailable, follow the fallback instructions in
uv.references/execution-options.md
- 优先使用命令式指导(如上文所示),而非底层转换器的内部细节。
- 优先使用命令,不要假设系统存在全局
uv run别名。python - 除非用户另行要求,否则输出位置保持在目录。
.context/knowledge - 首次运行前,确保生成的文件被Git忽略(在中添加
.gitignore或.context/)。.context/knowledge/ - 在Markdown中保留图片引用,格式为。
 - 保留章节锚点,当锚点目标存在时,将诸如的文本引用标准化为Markdown链接。
See Table 4.2 - 处理所有提供的文件,汇总失败情况但不终止整个批量处理,除非用户明确要求。
- 如需了解详细的CLI选项和执行预设,请查阅。
references/execution-options.md - 如果不可用,请遵循
uv中的备选方案。references/execution-options.md
Resources
相关资源
- : main ingestion pipeline
scripts/ingest_docs.py - : chunk-level search helper
scripts/search_docs.py - : focused reader helper
scripts/read_docs.py - : output schema and retrieval contract
references/output-contract.md - : detailed runtime flags and command presets
references/execution-options.md
- :核心导入流水线脚本
scripts/ingest_docs.py - :块级搜索辅助脚本
scripts/search_docs.py - :指定内容读取辅助脚本
scripts/read_docs.py - :输出格式和检索约定文档
references/output-contract.md - :详细的运行时参数和命令预设文档
references/execution-options.md