syncfusion-dotnet-smart-data-extraction
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSmart Data Extractor — Syncfusion
Smart Data Extractor — Syncfusion
Overview
概述
Extracts complete document structures from PDFs and images files using the Syncfusion SmartDataExtractor Library.
This skill supports one operational mode — generating C# code for the user's project.
借助Syncfusion SmartDataExtractor Library从PDF和图片文件中提取完整的文档结构。
本Skill仅支持一种操作模式——为用户项目生成C#代码。
Key Capabilities
核心功能
- Document structure extraction: Identify text elements, images, headers, footers, and tables (including regions, header rows, columns, cell boundaries, and merged cells).
- File format support: Works with PDF documents and common image formats such as JPEG and PNG.
- Table extraction: Specialized capability to extract tabular data.
- Form recognition: Detects and processes structured form data.
- Page-level control: Extract data from specific pages or defined page ranges.
- Confidence threshold: Results are filtered based on a configurable confidence score (0.0–1.0).
- 文档结构提取:识别文本元素、图片、页眉、页脚及表格(包括区域、表头行、列、单元格边界和合并单元格)。
- 文件格式支持:兼容PDF文档及JPEG、PNG等常见图片格式。
- 表格提取:具备提取表格数据的专属能力。
- 表单识别:检测并处理结构化表单数据。
- 页面级控制:从特定页面或指定页面范围提取数据。
- 置信度阈值:基于可配置的置信度分数(0.0–1.0)过滤结果。
Prerequisites
前提条件
- Install required runtime and library packages from NuGet before running extraction.
- Syncfusion License: or env var
LICENSE.txtSYNCFUSION_LICENSE_KEY
- 在运行提取操作前,需从NuGet安装所需的运行时和库包。
- Syncfusion许可证:需提供文件或环境变量
LICENSE.txtSYNCFUSION_LICENSE_KEY
Quick Start Examples
快速开始示例
Example : Generate Code
示例:生成代码
User: "Write Program.cs code to extract the data from pdf and save as JSON."
Result: C# code snippet displayed (no files created)
用户:"编写Program.cs代码以从PDF提取数据并保存为JSON格式。"
**结果:**展示C#代码片段(不创建文件)
Mode
模式
Mode 1: Generate C# Code for the User's Project (default)
模式1:为用户项目生成C#代码(默认)
Use this mode when the user wants to view, write, review, refactor, or modify C# code related to Smart Data Extractor processing.
Trigger keywords: "show me how", "how to", "how can I", "how do I", "provide code", "provide an example", "give an example", "demonstrate", "code snippet", "sample code", "example", "sample", "give me", "show me", "Program.cs", "example code", "generate code for", "codesnippet" .
Workflow:
当用户需要查看、编写、审阅、重构或修改与Smart Data Extractor处理相关的C#代码时,使用此模式。
触发关键词:"show me how", "how to", "how can I", "how do I", "provide code", "provide an example", "give an example", "demonstrate", "code snippet", "sample code", "example", "sample", "give me", "show me", "Program.cs", "example code", "generate code for", "codesnippet"。
工作流程:
Step 1 — Detect Application Type and Suggest Required NuGet Packages
步骤1 — 检测应用类型并推荐所需NuGet包
- Inspect the workspace project files (,
.csproj,web.config,App.config,Startup.cs, etc.) and use the detection signals table inProgram.csto determine the application type.references/nuget-packages.md - Based on the detected application type, identify the correct NuGet package(s) from and instruct the user to install them before generating any code. ONLY use package IDs and versions listed in
references/nuget-packages.md— do not suggest, look up, or infer package names from external sources or common naming conventions.references/nuget-packages.md - Note: If the user's request is explicitly table-only (asks only to extract table data), recommend only the Table Extractor package listed in and review the ExtractTable section for the detected application type. Do not recommend or add the broader
references/nuget-packages.mdpackage unless the user requests non-table extraction or JSON conversion features.SmartDataExtractor
- 检查工作区项目文件(、
.csproj、web.config、App.config、Startup.cs等),并使用Program.cs中的检测信号表确定应用类型。references/nuget-packages.md - 根据检测到的应用类型,从中选择正确的NuGet包,并指导用户在生成代码前先安装这些包。仅可使用
references/nuget-packages.md中列出的包ID和版本——不得从外部来源或通用命名规则中建议、查找或推断包名称。references/nuget-packages.md - 注意:如果用户明确仅请求提取表格数据(仅要求提取表格),则仅为检测到的应用类型推荐中列出的表格提取器包,并查看对应应用类型的ExtractTable章节。除非用户请求非表格提取或JSON转换功能,否则不得推荐或添加更通用的
references/nuget-packages.md包。SmartDataExtractor
Step 2 — Generate Code from Reference Files Only
步骤2 — 仅从参考文件生成代码
Do NOT invent, guess, or suggest any API, method, property, class, or namespace not explicitly present in the reference files.
- Read the relevant file(s) for the requested feature
references/*.md - Build C# code strictly from the APIs and snippets found in those files
- Select the correct snippet variant based on the app type detected in Step 1:
- Windows-specific apps (WinForms, WPF, .NET Framework Console) → use Windows-specific snippets
- Cross-platform apps (ASP.NET Core, .NET Core/.NET 5+ Console, Blazor, MAUI) → use cross-platform / snippets
.Net.Core - After the / namespace lines at the top of the generated code, always insert the license registration block from the Register License section in
usingreferences/nuget-packages.md - Do not create or run any script
.csx
不得发明、猜测或使用参考文件中未明确提及的任何API、方法、属性、类或命名空间。
- 读取与所需功能相关的文件
references/*.md - 严格基于这些文件中的API和代码片段构建C#代码
- 根据步骤1中检测到的应用类型选择正确的代码片段变体:
- Windows专属应用(WinForms、WPF、.NET Framework控制台应用)→ 使用Windows专属代码片段
- 跨平台应用(ASP.NET Core、.NET Core/.NET 5+控制台应用、Blazor、MAUI)→ 使用跨平台/代码片段
.Net.Core - 在生成代码顶部的/命名空间行之后,务必插入
using中「注册许可证」章节的许可证注册代码块references/nuget-packages.md - 不得创建或运行任何脚本
.csx
Code References
代码参考
All templates and snippets are in the folder:
references/| File | Contents |
|---|---|
| document-structure.md | Quick extractor setup and usage snippets |
| extract-data.md | Examples: ExtractDataAsJson, ExtractDataAsPdfStream,ExtractDataAsPdfDocument, async variants |
| extract-table.md | Table extraction examples (ExtractTableAsJson) |
| recognize-forms.md | recognize form fields examples : FormRecognizeOptions, RecognizeFormAsPdfDocument,RecognizeFormAsPdfStream, RecognizeFormAsJson async variants |
| data-options.md | Explanation of |
所有模板和代码片段均位于文件夹中:
references/| 文件 | 内容 |
|---|---|
| document-structure.md | 快速提取器设置及使用代码片段 |
| extract-data.md | 示例:ExtractDataAsJson、ExtractDataAsPdfStream、ExtractDataAsPdfDocument及异步变体 |
| extract-table.md | 表格提取示例(ExtractTableAsJson) |
| recognize-forms.md | 表单字段识别示例:FormRecognizeOptions、RecognizeFormAsPdfDocument、RecognizeFormAsPdfStream、RecognizeFormAsJson及异步变体 |
| data-options.md | |
Rules
规则
- Output files go in directory
./output/ - Use license key from at workspace root
LICENSE.txt - Don't use any API which is not in reference
- Only use NuGet package IDs and versions defined in when recommending or adding packages.
references/nuget-packages.md - For table-only extraction requests, recommend/install only the table extractor package from for the detected application type.
references/nuget-packages.md
- 输出文件保存至目录
./output/ - 使用工作区根目录下中的许可证密钥
LICENSE.txt - 不得使用参考文件中未提及的任何API
- 推荐或添加包时,仅可使用中定义的NuGet包ID和版本。
references/nuget-packages.md - 对于仅提取表格的请求,为检测到的应用类型推荐/安装中列出的表格提取器包。
references/nuget-packages.md