doc-pipeline

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Doc Pipeline Skill

Overview

概述

This skill enables building document processing pipelines - chain multiple operations (extract, transform, convert) into reusable workflows with data flowing between stages.

本Skill支持构建文档处理流水线——将多种操作（提取、转换、格式转换）串联为可复用的工作流，数据可在各个阶段之间流转。

How to Use

使用方法

Describe what you want to accomplish
Provide any required input data or files
I'll execute the appropriate operations

Example prompts:

"PDF → Extract Text → Translate → Generate DOCX"
"Image → OCR → Summarize → Create Report"
"Excel → Analyze → Generate Charts → Create PPT"
"Multiple inputs → Merge → Format → Output"

描述你想要完成的任务
提供所需的输入数据或文件
我将执行相应的操作

示例提示词：

"PDF → 提取文本 → 翻译 → 生成DOCX"
"图片 → OCR识别 → 摘要生成 → 创建报告"
"Excel → 数据分析 → 生成图表 → 创建PPT"
"多输入 → 合并 → 格式化 → 输出"

Domain Knowledge

领域知识

Pipeline Architecture

流水线架构

Stage 1      Stage 2      Stage 3      Stage 4
┌──────┐    ┌──────┐    ┌──────┐    ┌──────┐
│Extract│ → │Transform│ → │ AI   │ → │Output│
│ PDF  │    │  Data  │    │Analyze│   │ DOCX │
└──────┘    └──────┘    └──────┘    └──────┘
     │           │           │           │
     └───────────┴───────────┴───────────┘
                 Data Flow

Stage 1      Stage 2      Stage 3      Stage 4
┌──────┐    ┌──────┐    ┌──────┐    ┌──────┐
│Extract│ → │Transform│ → │ AI   │ → │Output│
│ PDF  │    │  Data  │    │Analyze│   │ DOCX │
└──────┘    └──────┘    └──────┘    └──────┘
     │           │           │           │
     └───────────┴───────────┴───────────┘
                 Data Flow

Pipeline DSL (Domain Specific Language)

流水线DSL（领域特定语言）

yaml

undefined

yaml

undefined

pipeline.yaml

name: contract-review-pipeline description: Extract, analyze, and report on contracts

stages:

name: extract operation: pdf-extraction input: $input_file output: $extracted_text
name: analyze operation: ai-analyze input: $extracted_text prompt: "Review this contract for risks..." output: $analysis
name: report operation: docx-generation input: $analysis template: templates/review_report.docx output: $output_file

undefined

name: contract-review-pipeline description: Extract, analyze, and report on contracts

stages:

name: extract operation: pdf-extraction input: $input_file output: $extracted_text
name: analyze operation: ai-analyze input: $extracted_text prompt: "Review this contract for risks..." output: $analysis
name: report operation: docx-generation input: $analysis template: templates/review_report.docx output: $output_file

undefined

Python Implementation

Python实现

python

from typing import Callable, Any
from dataclasses import dataclass

@dataclass
class Stage:
    name: str
    operation: Callable
    
class Pipeline:
    def __init__(self, name: str):
        self.name = name
        self.stages: list[Stage] = []
    
    def add_stage(self, name: str, operation: Callable):
        self.stages.append(Stage(name, operation))
        return self  # Fluent API
    
    def run(self, input_data: Any) -> Any:
        data = input_data
        for stage in self.stages:
            print(f"Running stage: {stage.name}")
            data = stage.operation(data)
        return data

python

from typing import Callable, Any
from dataclasses import dataclass

@dataclass
class Stage:
    name: str
    operation: Callable
    
class Pipeline:
    def __init__(self, name: str):
        self.name = name
        self.stages: list[Stage] = []
    
    def add_stage(self, name: str, operation: Callable):
        self.stages.append(Stage(name, operation))
        return self  # Fluent API
    
    def run(self, input_data: Any) -> Any:
        data = input_data
        for stage in self.stages:
            print(f"Running stage: {stage.name}")
            data = stage.operation(data)
        return data

Example usage

pipeline = Pipeline("contract-review") pipeline.add_stage("extract", extract_pdf_text) pipeline.add_stage("analyze", analyze_with_ai) pipeline.add_stage("generate", create_docx_report)

result = pipeline.run("/path/to/contract.pdf")

undefined

pipeline = Pipeline("contract-review") pipeline.add_stage("extract", extract_pdf_text) pipeline.add_stage("analyze", analyze_with_ai) pipeline.add_stage("generate", create_docx_report)

result = pipeline.run("/path/to/contract.pdf")

undefined

Advanced: Conditional Pipelines

进阶：条件流水线

python

class ConditionalPipeline(Pipeline):
    def add_conditional_stage(self, name: str, condition: Callable, 
                               if_true: Callable, if_false: Callable):
        def conditional_op(data):
            if condition(data):
                return if_true(data)
            return if_false(data)
        return self.add_stage(name, conditional_op)

python

class ConditionalPipeline(Pipeline):
    def add_conditional_stage(self, name: str, condition: Callable, 
                               if_true: Callable, if_false: Callable):
        def conditional_op(data):
            if condition(data):
                return if_true(data)
            return if_false(data)
        return self.add_stage(name, conditional_op)

Usage

pipeline.add_conditional_stage( "ocr_if_needed", condition=lambda d: d.get("has_images"), if_true=run_ocr, if_false=lambda d: d )

undefined

pipeline.add_conditional_stage( "ocr_if_needed", condition=lambda d: d.get("has_images"), if_true=run_ocr, if_false=lambda d: d )

undefined

Best Practices

最佳实践

Keep stages focused (single responsibility)
Use intermediate outputs for debugging
Implement stage-level error handling
Make pipelines configurable via YAML/JSON

保持阶段聚焦（单一职责）
使用中间输出进行调试
实现阶段级别的错误处理
通过YAML/JSON实现流水线可配置化

Installation

安装

bash

undefined

bash

undefined

Install required dependencies

pip install python-docx openpyxl python-pptx reportlab jinja2

undefined

pip install python-docx openpyxl python-pptx reportlab jinja2

undefined

doc-pipeline

Original

Translation

Doc Pipeline Skill

Doc Pipeline Skill

Overview

概述

How to Use

使用方法

Domain Knowledge

领域知识

Pipeline Architecture

流水线架构

Pipeline DSL (Domain Specific Language)

流水线DSL（领域特定语言）

pipeline.yaml

pipeline.yaml

Python Implementation

Python实现

Example usage

Example usage

Advanced: Conditional Pipelines

进阶：条件流水线

Usage

Usage

Best Practices

最佳实践

Installation

安装

Install required dependencies

Install required dependencies

Resources

资源