doc-pipeline
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDoc Pipeline Skill
Doc Pipeline Skill
Overview
概述
This skill enables building document processing pipelines - chain multiple operations (extract, transform, convert) into reusable workflows with data flowing between stages.
本Skill支持构建文档处理流水线——将多种操作(提取、转换、格式转换)串联为可复用的工作流,数据可在各个阶段之间流转。
How to Use
使用方法
- Describe what you want to accomplish
- Provide any required input data or files
- I'll execute the appropriate operations
Example prompts:
- "PDF → Extract Text → Translate → Generate DOCX"
- "Image → OCR → Summarize → Create Report"
- "Excel → Analyze → Generate Charts → Create PPT"
- "Multiple inputs → Merge → Format → Output"
- 描述你想要完成的任务
- 提供所需的输入数据或文件
- 我将执行相应的操作
示例提示词:
- "PDF → 提取文本 → 翻译 → 生成DOCX"
- "图片 → OCR识别 → 摘要生成 → 创建报告"
- "Excel → 数据分析 → 生成图表 → 创建PPT"
- "多输入 → 合并 → 格式化 → 输出"
Domain Knowledge
领域知识
Pipeline Architecture
流水线架构
Stage 1 Stage 2 Stage 3 Stage 4
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│Extract│ → │Transform│ → │ AI │ → │Output│
│ PDF │ │ Data │ │Analyze│ │ DOCX │
└──────┘ └──────┘ └──────┘ └──────┘
│ │ │ │
└───────────┴───────────┴───────────┘
Data FlowStage 1 Stage 2 Stage 3 Stage 4
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│Extract│ → │Transform│ → │ AI │ → │Output│
│ PDF │ │ Data │ │Analyze│ │ DOCX │
└──────┘ └──────┘ └──────┘ └──────┘
│ │ │ │
└───────────┴───────────┴───────────┘
Data FlowPipeline DSL (Domain Specific Language)
流水线DSL(领域特定语言)
yaml
undefinedyaml
undefinedpipeline.yaml
pipeline.yaml
name: contract-review-pipeline
description: Extract, analyze, and report on contracts
stages:
-
name: extract operation: pdf-extraction input: $input_file output: $extracted_text
-
name: analyze operation: ai-analyze input: $extracted_text prompt: "Review this contract for risks..." output: $analysis
-
name: report operation: docx-generation input: $analysis template: templates/review_report.docx output: $output_file
undefinedname: contract-review-pipeline
description: Extract, analyze, and report on contracts
stages:
-
name: extract operation: pdf-extraction input: $input_file output: $extracted_text
-
name: analyze operation: ai-analyze input: $extracted_text prompt: "Review this contract for risks..." output: $analysis
-
name: report operation: docx-generation input: $analysis template: templates/review_report.docx output: $output_file
undefinedPython Implementation
Python实现
python
from typing import Callable, Any
from dataclasses import dataclass
@dataclass
class Stage:
name: str
operation: Callable
class Pipeline:
def __init__(self, name: str):
self.name = name
self.stages: list[Stage] = []
def add_stage(self, name: str, operation: Callable):
self.stages.append(Stage(name, operation))
return self # Fluent API
def run(self, input_data: Any) -> Any:
data = input_data
for stage in self.stages:
print(f"Running stage: {stage.name}")
data = stage.operation(data)
return datapython
from typing import Callable, Any
from dataclasses import dataclass
@dataclass
class Stage:
name: str
operation: Callable
class Pipeline:
def __init__(self, name: str):
self.name = name
self.stages: list[Stage] = []
def add_stage(self, name: str, operation: Callable):
self.stages.append(Stage(name, operation))
return self # Fluent API
def run(self, input_data: Any) -> Any:
data = input_data
for stage in self.stages:
print(f"Running stage: {stage.name}")
data = stage.operation(data)
return dataExample usage
Example usage
pipeline = Pipeline("contract-review")
pipeline.add_stage("extract", extract_pdf_text)
pipeline.add_stage("analyze", analyze_with_ai)
pipeline.add_stage("generate", create_docx_report)
result = pipeline.run("/path/to/contract.pdf")
undefinedpipeline = Pipeline("contract-review")
pipeline.add_stage("extract", extract_pdf_text)
pipeline.add_stage("analyze", analyze_with_ai)
pipeline.add_stage("generate", create_docx_report)
result = pipeline.run("/path/to/contract.pdf")
undefinedAdvanced: Conditional Pipelines
进阶:条件流水线
python
class ConditionalPipeline(Pipeline):
def add_conditional_stage(self, name: str, condition: Callable,
if_true: Callable, if_false: Callable):
def conditional_op(data):
if condition(data):
return if_true(data)
return if_false(data)
return self.add_stage(name, conditional_op)python
class ConditionalPipeline(Pipeline):
def add_conditional_stage(self, name: str, condition: Callable,
if_true: Callable, if_false: Callable):
def conditional_op(data):
if condition(data):
return if_true(data)
return if_false(data)
return self.add_stage(name, conditional_op)Usage
Usage
pipeline.add_conditional_stage(
"ocr_if_needed",
condition=lambda d: d.get("has_images"),
if_true=run_ocr,
if_false=lambda d: d
)
undefinedpipeline.add_conditional_stage(
"ocr_if_needed",
condition=lambda d: d.get("has_images"),
if_true=run_ocr,
if_false=lambda d: d
)
undefinedBest Practices
最佳实践
- Keep stages focused (single responsibility)
- Use intermediate outputs for debugging
- Implement stage-level error handling
- Make pipelines configurable via YAML/JSON
- 保持阶段聚焦(单一职责)
- 使用中间输出进行调试
- 实现阶段级别的错误处理
- 通过YAML/JSON实现流水线可配置化
Installation
安装
bash
undefinedbash
undefinedInstall required dependencies
Install required dependencies
pip install python-docx openpyxl python-pptx reportlab jinja2
undefinedpip install python-docx openpyxl python-pptx reportlab jinja2
undefined