layout-analyzer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Layout Analyzer Skill

布局分析Skill

Overview

概述

This skill enables document layout analysis using surya - an advanced document understanding system. Detect text blocks, tables, figures, headings, and determine reading order in complex documents.
本Skill基于surya(一款先进的文档理解系统)实现文档布局分析,可检测复杂文档中的文本块、表格、图表、标题,并确定阅读顺序。

How to Use

使用方法

  1. Provide the document image or PDF
  2. Specify what layout elements to detect
  3. I'll analyze the structure and return detected regions
Example prompts:
  • "Analyze the layout of this document page"
  • "Detect all tables and text blocks in this image"
  • "Determine the reading order for this PDF page"
  • "Find headings and paragraphs in this document"
  1. 提供文档图片或PDF文件
  2. 指定需要检测的布局元素
  3. 我会分析文档结构并返回检测到的区域
示例提示词:
  • "分析此文档页面的布局"
  • "检测此图片中的所有表格和文本块"
  • "确定此PDF页面的阅读顺序"
  • "查找此文档中的标题和段落"

Domain Knowledge

领域知识

surya Fundamentals

Surya 基础

python
from surya.detection import DetectionPredictor
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from PIL import Image
python
from surya.detection import DetectionPredictor
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from PIL import Image

Load image

Load image

image = Image.open("document.png")
image = Image.open("document.png")

Detect layout elements

Detect layout elements

layout_predictor = LayoutPredictor() layout_result = layout_predictor([image])
undefined
layout_predictor = LayoutPredictor() layout_result = layout_predictor([image])
undefined

Layout Element Types

布局元素类型

ElementDescription
TextRegular paragraph text
TitleDocument/section titles
Section-headerSection headings
List-itemBulleted/numbered items
TableTabular data
FigureImages/diagrams
CaptionFigure/table captions
FootnoteFootnotes
FormulaMathematical equations
Page-headerHeaders
Page-footerFooters
元素类型描述
Text常规段落文本
Title文档/章节标题
Section-header章节副标题
List-item项目符号/编号项
Table表格数据
Figure图片/图表
Caption图表/表格说明文字
Footnote脚注
Formula数学公式
Page-header页眉
Page-footer页脚

Text Detection

文本检测

python
from surya.detection import DetectionPredictor
from PIL import Image
python
from surya.detection import DetectionPredictor
from PIL import Image

Initialize detector

Initialize detector

detector = DetectionPredictor()
detector = DetectionPredictor()

Load image

Load image

image = Image.open("document.png")
image = Image.open("document.png")

Detect text regions

Detect text regions

results = detector([image])
results = detector([image])

Access results

Access results

for page_result in results: for bbox in page_result.bboxes: print(f"Text region: {bbox.bbox}") print(f"Confidence: {bbox.confidence}")
undefined
for page_result in results: for bbox in page_result.bboxes: print(f"Text region: {bbox.bbox}") print(f"Confidence: {bbox.confidence}")
undefined

Layout Analysis

布局分析

python
from surya.layout import LayoutPredictor
from PIL import Image
python
from surya.layout import LayoutPredictor
from PIL import Image

Initialize layout predictor

Initialize layout predictor

layout_predictor = LayoutPredictor()
layout_predictor = LayoutPredictor()

Analyze layout

Analyze layout

image = Image.open("document.png") layout_results = layout_predictor([image])
image = Image.open("document.png") layout_results = layout_predictor([image])

Process results

Process results

for page_result in layout_results: for element in page_result.bboxes: print(f"Type: {element.label}") print(f"Bbox: {element.bbox}") print(f"Confidence: {element.confidence}")
undefined
for page_result in layout_results: for element in page_result.bboxes: print(f"Type: {element.label}") print(f"Bbox: {element.bbox}") print(f"Confidence: {element.confidence}")
undefined

Reading Order Detection

阅读顺序检测

python
from surya.reading_order import ReadingOrderPredictor
from surya.layout import LayoutPredictor
from PIL import Image
python
from surya.reading_order import ReadingOrderPredictor
from surya.layout import LayoutPredictor
from PIL import Image

Get layout first

Get layout first

layout_predictor = LayoutPredictor() image = Image.open("document.png") layout_results = layout_predictor([image])
layout_predictor = LayoutPredictor() image = Image.open("document.png") layout_results = layout_predictor([image])

Determine reading order

Determine reading order

reading_order_predictor = ReadingOrderPredictor() order_results = reading_order_predictor([image], layout_results)
reading_order_predictor = ReadingOrderPredictor() order_results = reading_order_predictor([image], layout_results)

Access ordered elements

Access ordered elements

for page_result in order_results: for i, element in enumerate(page_result.ordered_bboxes): print(f"{i+1}. {element.label}: {element.bbox}")
undefined
for page_result in order_results: for i, element in enumerate(page_result.ordered_bboxes): print(f"{i+1}. {element.label}: {element.bbox}")
undefined

OCR with Layout

结合布局的OCR

python
from surya.ocr import OCRPredictor
from surya.layout import LayoutPredictor
from PIL import Image
python
from surya.ocr import OCRPredictor
from surya.layout import LayoutPredictor
from PIL import Image

Initialize predictors

Initialize predictors

ocr_predictor = OCRPredictor() layout_predictor = LayoutPredictor()
ocr_predictor = OCRPredictor() layout_predictor = LayoutPredictor()

Load image

Load image

image = Image.open("document.png")
image = Image.open("document.png")

Get layout

Get layout

layout_results = layout_predictor([image])
layout_results = layout_predictor([image])

Run OCR

Run OCR

ocr_results = ocr_predictor([image])
ocr_results = ocr_predictor([image])

Combine results

Combine results

for layout, ocr in zip(layout_results, ocr_results): for layout_elem in layout.bboxes: print(f"Element: {layout_elem.label}")
    # Find OCR text within this layout element
    for text_line in ocr.text_lines:
        if boxes_overlap(layout_elem.bbox, text_line.bbox):
            print(f"  Text: {text_line.text}")
undefined
for layout, ocr in zip(layout_results, ocr_results): for layout_elem in layout.bboxes: print(f"Element: {layout_elem.label}")
    # Find OCR text within this layout element
    for text_line in ocr.text_lines:
        if boxes_overlap(layout_elem.bbox, text_line.bbox):
            print(f"  Text: {text_line.text}")
undefined

Processing PDFs

PDF处理

python
from surya.layout import LayoutPredictor
from pdf2image import convert_from_path

def analyze_pdf_layout(pdf_path):
    """Analyze layout of all pages in PDF."""
    
    # Convert PDF to images
    images = convert_from_path(pdf_path)
    
    # Initialize predictor
    layout_predictor = LayoutPredictor()
    
    # Analyze all pages
    results = layout_predictor(images)
    
    document_structure = []
    
    for page_num, page_result in enumerate(results):
        page_elements = []
        
        for element in page_result.bboxes:
            page_elements.append({
                'type': element.label,
                'bbox': element.bbox,
                'confidence': element.confidence
            })
        
        document_structure.append({
            'page': page_num + 1,
            'elements': page_elements
        })
    
    return document_structure

structure = analyze_pdf_layout("document.pdf")
python
from surya.layout import LayoutPredictor
from pdf2image import convert_from_path

def analyze_pdf_layout(pdf_path):
    """Analyze layout of all pages in PDF."""
    
    # Convert PDF to images
    images = convert_from_path(pdf_path)
    
    # Initialize predictor
    layout_predictor = LayoutPredictor()
    
    # Analyze all pages
    results = layout_predictor(images)
    
    document_structure = []
    
    for page_num, page_result in enumerate(results):
        page_elements = []
        
        for element in page_result.bboxes:
            page_elements.append({
                'type': element.label,
                'bbox': element.bbox,
                'confidence': element.confidence
            })
        
        document_structure.append({
            'page': page_num + 1,
            'elements': page_elements
        })
    
    return document_structure

structure = analyze_pdf_layout("document.pdf")

Visualization

可视化

python
from surya.layout import LayoutPredictor
from PIL import Image, ImageDraw, ImageFont

def visualize_layout(image_path, output_path):
    """Visualize detected layout elements."""
    
    image = Image.open(image_path)
    layout_predictor = LayoutPredictor()
    results = layout_predictor([image])
    
    # Create drawing context
    draw = ImageDraw.Draw(image)
    
    # Color mapping for element types
    colors = {
        'Text': 'blue',
        'Title': 'red',
        'Table': 'green',
        'Figure': 'purple',
        'Section-header': 'orange',
        'List-item': 'cyan',
    }
    
    for element in results[0].bboxes:
        bbox = element.bbox
        color = colors.get(element.label, 'gray')
        
        # Draw rectangle
        draw.rectangle(bbox, outline=color, width=2)
        
        # Add label
        draw.text((bbox[0], bbox[1] - 15), 
                  f"{element.label} ({element.confidence:.2f})",
                  fill=color)
    
    image.save(output_path)
    return output_path
python
from surya.layout import LayoutPredictor
from PIL import Image, ImageDraw, ImageFont

def visualize_layout(image_path, output_path):
    """Visualize detected layout elements."""
    
    image = Image.open(image_path)
    layout_predictor = LayoutPredictor()
    results = layout_predictor([image])
    
    # Create drawing context
    draw = ImageDraw.Draw(image)
    
    # Color mapping for element types
    colors = {
        'Text': 'blue',
        'Title': 'red',
        'Table': 'green',
        'Figure': 'purple',
        'Section-header': 'orange',
        'List-item': 'cyan',
    }
    
    for element in results[0].bboxes:
        bbox = element.bbox
        color = colors.get(element.label, 'gray')
        
        # Draw rectangle
        draw.rectangle(bbox, outline=color, width=2)
        
        # Add label
        draw.text((bbox[0], bbox[1] - 15), 
                  f"{element.label} ({element.confidence:.2f})",
                  fill=color)
    
    image.save(output_path)
    return output_path

Best Practices

最佳实践

  1. Use High-Quality Images: 150+ DPI for best results
  2. Preprocess if Needed: Deskew rotated documents
  3. Validate Results: Check confidence scores
  4. Handle Multi-page: Process pages individually
  5. Combine with OCR: Get text within detected regions
  1. 使用高质量图片:建议使用150+ DPI的图片以获得最佳结果
  2. 按需预处理:对倾斜的文档进行校正
  3. 验证结果:检查置信度分数
  4. 处理多页文档:逐页进行处理
  5. 结合OCR使用:提取检测区域内的文本内容

Common Patterns

常见应用场景

Document Structure Extraction

文档结构提取

python
def extract_document_structure(image_path):
    """Extract hierarchical document structure."""
    
    from surya.layout import LayoutPredictor
    from surya.reading_order import ReadingOrderPredictor
    
    image = Image.open(image_path)
    
    # Get layout
    layout_predictor = LayoutPredictor()
    layout_results = layout_predictor([image])
    
    # Get reading order
    order_predictor = ReadingOrderPredictor()
    order_results = order_predictor([image], layout_results)
    
    structure = {
        'title': None,
        'sections': [],
        'tables': [],
        'figures': []
    }
    
    current_section = None
    
    for element in order_results[0].ordered_bboxes:
        if element.label == 'Title':
            structure['title'] = element
        elif element.label == 'Section-header':
            current_section = {'header': element, 'content': []}
            structure['sections'].append(current_section)
        elif element.label == 'Table':
            structure['tables'].append(element)
        elif element.label == 'Figure':
            structure['figures'].append(element)
        elif current_section and element.label in ['Text', 'List-item']:
            current_section['content'].append(element)
    
    return structure
python
def extract_document_structure(image_path):
    """Extract hierarchical document structure."""
    
    from surya.layout import LayoutPredictor
    from surya.reading_order import ReadingOrderPredictor
    
    image = Image.open(image_path)
    
    # Get layout
    layout_predictor = LayoutPredictor()
    layout_results = layout_predictor([image])
    
    # Get reading order
    order_predictor = ReadingOrderPredictor()
    order_results = order_predictor([image], layout_results)
    
    structure = {
        'title': None,
        'sections': [],
        'tables': [],
        'figures': []
    }
    
    current_section = None
    
    for element in order_results[0].ordered_bboxes:
        if element.label == 'Title':
            structure['title'] = element
        elif element.label == 'Section-header':
            current_section = {'header': element, 'content': []}
            structure['sections'].append(current_section)
        elif element.label == 'Table':
            structure['tables'].append(element)
        elif element.label == 'Figure':
            structure['figures'].append(element)
        elif current_section and element.label in ['Text', 'List-item']:
            current_section['content'].append(element)
    
    return structure

Table Region Extraction

表格区域提取

python
def extract_table_regions(image_path):
    """Extract table regions from document."""
    
    from surya.layout import LayoutPredictor
    
    image = Image.open(image_path)
    layout_predictor = LayoutPredictor()
    results = layout_predictor([image])
    
    tables = []
    
    for element in results[0].bboxes:
        if element.label == 'Table':
            bbox = element.bbox
            
            # Crop table region
            table_image = image.crop(bbox)
            
            tables.append({
                'bbox': bbox,
                'image': table_image,
                'confidence': element.confidence
            })
    
    return tables
python
def extract_table_regions(image_path):
    """Extract table regions from document."""
    
    from surya.layout import LayoutPredictor
    
    image = Image.open(image_path)
    layout_predictor = LayoutPredictor()
    results = layout_predictor([image])
    
    tables = []
    
    for element in results[0].bboxes:
        if element.label == 'Table':
            bbox = element.bbox
            
            # Crop table region
            table_image = image.crop(bbox)
            
            tables.append({
                'bbox': bbox,
                'image': table_image,
                'confidence': element.confidence
            })
    
    return tables

Examples

示例

Example 1: Academic Paper Analysis

示例1:学术论文分析

python
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from pdf2image import convert_from_path

def analyze_academic_paper(pdf_path):
    """Analyze structure of academic paper."""
    
    images = convert_from_path(pdf_path)
    
    layout_predictor = LayoutPredictor()
    order_predictor = ReadingOrderPredictor()
    
    paper_structure = {
        'pages': [],
        'element_counts': {
            'Title': 0,
            'Section-header': 0,
            'Text': 0,
            'Table': 0,
            'Figure': 0,
            'Formula': 0,
            'Footnote': 0
        }
    }
    
    layout_results = layout_predictor(images)
    order_results = order_predictor(images, layout_results)
    
    for page_num, (layout, order) in enumerate(zip(layout_results, order_results)):
        page_structure = {
            'page': page_num + 1,
            'elements': []
        }
        
        for element in order.ordered_bboxes:
            page_structure['elements'].append({
                'type': element.label,
                'bbox': element.bbox,
                'order': element.position
            })
            
            # Count element types
            if element.label in paper_structure['element_counts']:
                paper_structure['element_counts'][element.label] += 1
        
        paper_structure['pages'].append(page_structure)
    
    return paper_structure

paper = analyze_academic_paper('research_paper.pdf')
print(f"Total tables: {paper['element_counts']['Table']}")
print(f"Total figures: {paper['element_counts']['Figure']}")
python
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from pdf2image import convert_from_path

def analyze_academic_paper(pdf_path):
    """Analyze structure of academic paper."""
    
    images = convert_from_path(pdf_path)
    
    layout_predictor = LayoutPredictor()
    order_predictor = ReadingOrderPredictor()
    
    paper_structure = {
        'pages': [],
        'element_counts': {
            'Title': 0,
            'Section-header': 0,
            'Text': 0,
            'Table': 0,
            'Figure': 0,
            'Formula': 0,
            'Footnote': 0
        }
    }
    
    layout_results = layout_predictor(images)
    order_results = order_predictor(images, layout_results)
    
    for page_num, (layout, order) in enumerate(zip(layout_results, order_results)):
        page_structure = {
            'page': page_num + 1,
            'elements': []
        }
        
        for element in order.ordered_bboxes:
            page_structure['elements'].append({
                'type': element.label,
                'bbox': element.bbox,
                'order': element.position
            })
            
            # Count element types
            if element.label in paper_structure['element_counts']:
                paper_structure['element_counts'][element.label] += 1
        
        paper_structure['pages'].append(page_structure)
    
    return paper_structure

paper = analyze_academic_paper('research_paper.pdf')
print(f"Total tables: {paper['element_counts']['Table']}")
print(f"Total figures: {paper['element_counts']['Figure']}")

Example 2: Form Field Detection

示例2:表单字段检测

python
from surya.layout import LayoutPredictor
from PIL import Image

def detect_form_fields(image_path):
    """Detect form fields and labels."""
    
    image = Image.open(image_path)
    
    layout_predictor = LayoutPredictor()
    results = layout_predictor([image])
    
    form_fields = []
    
    for element in results[0].bboxes:
        # Look for text elements that might be labels
        if element.label == 'Text':
            # Check if there's a box/line nearby (potential input field)
            form_fields.append({
                'type': 'potential_label',
                'bbox': element.bbox,
                'confidence': element.confidence
            })
    
    return form_fields

fields = detect_form_fields('form.png')
print(f"Found {len(fields)} potential form elements")
python
from surya.layout import LayoutPredictor
from PIL import Image

def detect_form_fields(image_path):
    """Detect form fields and labels."""
    
    image = Image.open(image_path)
    
    layout_predictor = LayoutPredictor()
    results = layout_predictor([image])
    
    form_fields = []
    
    for element in results[0].bboxes:
        # Look for text elements that might be labels
        if element.label == 'Text':
            # Check if there's a box/line nearby (potential input field)
            form_fields.append({
                'type': 'potential_label',
                'bbox': element.bbox,
                'confidence': element.confidence
            })
    
    return form_fields

fields = detect_form_fields('form.png')
print(f"Found {len(fields)} potential form elements")

Example 3: Multi-column Article

示例3:多栏文章处理

python
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from PIL import Image

def process_multicolumn_article(image_path):
    """Process multi-column article layout."""
    
    image = Image.open(image_path)
    
    layout_predictor = LayoutPredictor()
    order_predictor = ReadingOrderPredictor()
    
    layout_results = layout_predictor([image])
    order_results = order_predictor([image], layout_results)
    
    # Group elements by column
    image_width = image.width
    column_threshold = image_width / 2
    
    columns = {
        'left': [],
        'right': [],
        'full_width': []
    }
    
    for element in order_results[0].ordered_bboxes:
        bbox = element.bbox
        element_center = (bbox[0] + bbox[2]) / 2
        element_width = bbox[2] - bbox[0]
        
        # Determine column
        if element_width > column_threshold * 1.5:
            columns['full_width'].append(element)
        elif element_center < column_threshold:
            columns['left'].append(element)
        else:
            columns['right'].append(element)
    
    return {
        'layout': 'multi-column',
        'columns': columns,
        'reading_order': order_results[0].ordered_bboxes
    }

article = process_multicolumn_article('newspaper_page.png')
print(f"Left column: {len(article['columns']['left'])} elements")
print(f"Right column: {len(article['columns']['right'])} elements")
python
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from PIL import Image

def process_multicolumn_article(image_path):
    """Process multi-column article layout."""
    
    image = Image.open(image_path)
    
    layout_predictor = LayoutPredictor()
    order_predictor = ReadingOrderPredictor()
    
    layout_results = layout_predictor([image])
    order_results = order_predictor([image], layout_results)
    
    # Group elements by column
    image_width = image.width
    column_threshold = image_width / 2
    
    columns = {
        'left': [],
        'right': [],
        'full_width': []
    }
    
    for element in order_results[0].ordered_bboxes:
        bbox = element.bbox
        element_center = (bbox[0] + bbox[2]) / 2
        element_width = bbox[2] - bbox[0]
        
        # Determine column
        if element_width > column_threshold * 1.5:
            columns['full_width'].append(element)
        elif element_center < column_threshold:
            columns['left'].append(element)
        else:
            columns['right'].append(element)
    
    return {
        'layout': 'multi-column',
        'columns': columns,
        'reading_order': order_results[0].ordered_bboxes
    }

article = process_multicolumn_article('newspaper_page.png')
print(f"Left column: {len(article['columns']['left'])} elements")
print(f"Right column: {len(article['columns']['right'])} elements")

Limitations

局限性

  • Handwritten layouts may be inaccurate
  • Very small text regions may be missed
  • Complex nested layouts challenging
  • GPU recommended for batch processing
  • Multi-language support varies
  • 手写文档的布局分析可能不准确
  • 极小的文本区域可能被遗漏
  • 复杂嵌套布局的处理难度较大
  • 批量处理建议使用GPU
  • 多语言支持能力参差不齐

Installation

安装

bash
pip install surya-ocr
bash
pip install surya-ocr

For PDF processing

For PDF processing

pip install pdf2image
undefined
pip install pdf2image
undefined

Resources

相关资源