datasheet-intelligence

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Datasheet Intelligence

数据手册智能处理

Purpose

用途

Ingest multi-format datasheets into a single Markdown-first knowledge base so agents can answer hardware questions using consistent structure.
将多种格式的数据手册导入到以Markdown为主的统一知识库中,使agents能够通过一致的结构回答硬件相关问题。

Prerequisites

前置条件

Check
uv
first:
bash
uv --version
If
uv
is not installed, install it by OS:
sh
undefined
首先检查
uv
是否安装:
bash
uv --version
如果未安装
uv
,请根据操作系统进行安装:
sh
undefined

macOS (Homebrew)

macOS(Homebrew)

brew install uv
brew install uv

Linux (official installer)

Linux(官方安装脚本)

Windows (WinGet)

Windows(WinGet)

winget install --id=astral-sh.uv -e

After installation, restart the shell and verify `uv --version` again.
winget install --id=astral-sh.uv -e

安装完成后,请重启终端并再次验证`uv --version`是否能正常输出。

Workflow

工作流程

  1. Place source datasheets in a folder (for example
    docs/datasheets/
    ).
  2. Run
    scripts/ingest_docs.py
    with
    uv run --with docling python3
    .
  3. Read outputs from
    .context/knowledge/
    :
    • <doc>/<doc>.md
      : normalized markdown
    • <doc>/<doc>.sections.jsonl
      : section chunks for retrieval
    • <doc>/<doc>.tables.md
      : table-focused markdown
    • <doc>/<doc>.meta.json
      : conversion metadata and validation info
    • <doc>/<doc>.docling.json
      : raw Docling structured export
    • <doc>/_images/*
      : extracted images
    • knowledge.index.json
      : corpus manifest
  4. Use search/read helpers:
    • scripts/search_docs.py
      for corpus search
    • scripts/read_docs.py
      for focused section reads
  1. 将源数据手册放入指定文件夹(例如
    docs/datasheets/
    )。
  2. 使用
    uv run --with docling python3
    命令运行
    scripts/ingest_docs.py
  3. .context/knowledge/
    目录读取输出文件:
    • <doc>/<doc>.md
      :标准化Markdown文件
    • <doc>/<doc>.sections.jsonl
      :用于检索的章节块
    • <doc>/<doc>.tables.md
      :聚焦表格的Markdown文件
    • <doc>/<doc>.meta.json
      :转换元数据和验证信息
    • <doc>/<doc>.docling.json
      :Docling导出的原始结构化数据
    • <doc>/_images/*
      :提取出的图片文件
    • knowledge.index.json
      :语料库清单
  4. 使用搜索/读取辅助工具:
    • scripts/search_docs.py
      :用于语料库搜索
    • scripts/read_docs.py
      :用于读取指定章节内容

Commands

命令示例

bash
undefined
bash
undefined

Ingest all supported datasheets from a directory

从指定目录导入所有支持格式的数据手册

uv run --with docling python3 scripts/ingest_docs.py docs/datasheets --output-dir .context/knowledge
uv run --with docling python3 scripts/ingest_docs.py docs/datasheets --output-dir .context/knowledge

Ingest only top-level files and skip OCR for faster runs

仅导入顶层文件并跳过OCR以加快运行速度

uv run --with docling python3 scripts/ingest_docs.py docs/datasheets --non-recursive --no-ocr
uv run --with docling python3 scripts/ingest_docs.py docs/datasheets --non-recursive --no-ocr

Search and focused read

搜索和指定章节读取

uv run python3 scripts/search_docs.py "SPI0 address" --knowledge-dir .context/knowledge uv run python3 scripts/read_docs.py exynos_spi_v1 --anchor section-4-2
undefined
uv run python3 scripts/search_docs.py "SPI0 address" --knowledge-dir .context/knowledge uv run python3 scripts/read_docs.py exynos_spi_v1 --anchor section-4-2
undefined

Operational Rules

操作规则

  • Use command-first guidance (above) instead of low-level converter internals.
  • Prefer
    uv run
    commands and do not assume a global
    python
    alias.
  • Keep output location at
    .context/knowledge
    unless user asks otherwise.
  • Before the first run, ensure generated artifacts are ignored by git (
    .context/
    or
    .context/knowledge/
    in
    .gitignore
    ).
  • Preserve image presence in markdown with
    ![image_ref](path)
    entries.
  • Keep section anchors and normalize textual references like
    See Table 4.2
    into markdown links when anchor targets exist.
  • Run on all provided files and summarize failures without stopping the whole batch unless explicitly requested.
  • For detailed CLI options and execution presets, read
    references/execution-options.md
    .
  • If
    uv
    is unavailable, follow the fallback instructions in
    references/execution-options.md
    .
  • 优先使用命令式指导(如上文所示),而非底层转换器的内部细节。
  • 优先使用
    uv run
    命令,不要假设系统存在全局
    python
    别名。
  • 除非用户另行要求,否则输出位置保持在
    .context/knowledge
    目录。
  • 首次运行前,确保生成的文件被Git忽略(在
    .gitignore
    中添加
    .context/
    .context/knowledge/
    )。
  • 在Markdown中保留图片引用,格式为
    ![image_ref](path)
  • 保留章节锚点,当锚点目标存在时,将诸如
    See Table 4.2
    的文本引用标准化为Markdown链接。
  • 处理所有提供的文件,汇总失败情况但不终止整个批量处理,除非用户明确要求。
  • 如需了解详细的CLI选项和执行预设,请查阅
    references/execution-options.md
  • 如果
    uv
    不可用,请遵循
    references/execution-options.md
    中的备选方案。

Resources

相关资源

  • scripts/ingest_docs.py
    : main ingestion pipeline
  • scripts/search_docs.py
    : chunk-level search helper
  • scripts/read_docs.py
    : focused reader helper
  • references/output-contract.md
    : output schema and retrieval contract
  • references/execution-options.md
    : detailed runtime flags and command presets
  • scripts/ingest_docs.py
    :核心导入流水线脚本
  • scripts/search_docs.py
    :块级搜索辅助脚本
  • scripts/read_docs.py
    :指定内容读取辅助脚本
  • references/output-contract.md
    :输出格式和检索约定文档
  • references/execution-options.md
    :详细的运行时参数和命令预设文档