doc-to-text
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDocument to Text Converter
文档转文本转换器
Overview
概述
This skill acts as a universal converter to extract plain text and structured data from various binary and complex file formats. It enables Gemini to "read" files that are otherwise inaccessible.
该Skill是一个通用转换器,可从各种二进制和复杂文件格式中提取纯文本和结构化数据。它能让Gemini“读取”原本无法访问的文件。
Capabilities
功能特性
1. Document Extraction
1. 文档提取
- PDF (): Extracts plain text.
.pdf - Excel (): Converts sheets to CSV and performs OCR on embedded images.
.xlsx - Word (): Extracts text and performs OCR on embedded images.
.docx - PowerPoint (): Extracts slide text and performs OCR on embedded images.
.pptx
- PDF ():提取纯文本。
.pdf - Excel ():将工作表转换为CSV格式,并对嵌入图片执行OCR识别。
.xlsx - Word ():提取文本,并对嵌入图片执行OCR识别。
.docx - PowerPoint ():提取幻灯片文本,并对嵌入图片执行OCR识别。
.pptx
2. Image OCR
2. 图片OCR识别
- Images (,
.png,.jpg,.jpeg): Uses Tesseract.js to perform OCR (Optical Character Recognition) and extract text from images. Supports English and Japanese..webp
- 图片 (,
.png,.jpg,.jpeg):使用Tesseract.js执行OCR(光学字符识别),从图片中提取文本。支持英文和日文。.webp
3. Data & Archives
3. 数据与压缩包处理
- Email (): Parses headers (From, To, Subject) and body text.
.eml - ZIP Archive (): Lists files and extracts content of text-based files within the archive without extracting to disk.
.zip
- 电子邮件 ():解析邮件头(发件人、收件人、主题)和正文文本。
.eml - ZIP压缩包 ():列出包内文件,并提取其中文本类文件的内容,无需解压到磁盘。
.zip
Usage
使用方法
To read a file, execute the script with the file path.
extract.cjsbash
node scripts/extract.cjs <path/to/file>Example:
User: "What does the error screenshot say?"
Action:
node scripts/extract.cjs error.png要读取文件,请执行脚本并传入文件路径。
extract.cjsbash
node scripts/extract.cjs <path/to/file>示例:
用户:“这个错误截图里写了什么?”
操作:
node scripts/extract.cjs error.pngDependencies
依赖项
This skill requires Node.js packages.
Run in the skill directory before using.
npm install该Skill需要Node.js包。使用前请在Skill目录下运行。
npm installKnowledge Protocol
知识协议
- This skill adheres to the . It automatically integrates Public, Confidential (Company/Client), and Personal knowledge tiers, prioritizing the most specific secrets while ensuring no leaks to public outputs.
knowledge/orchestration/knowledge-protocol.md
- 该Skill遵循协议。它会自动整合公开、保密(公司/客户)和个人知识层级,优先使用最具体的保密内容,同时确保不会泄露到公开输出中。
knowledge/orchestration/knowledge-protocol.md