layout-analyzer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLayout Analyzer Skill
布局分析Skill
Overview
概述
This skill enables document layout analysis using surya - an advanced document understanding system. Detect text blocks, tables, figures, headings, and determine reading order in complex documents.
本Skill基于surya(一款先进的文档理解系统)实现文档布局分析,可检测复杂文档中的文本块、表格、图表、标题,并确定阅读顺序。
How to Use
使用方法
- Provide the document image or PDF
- Specify what layout elements to detect
- I'll analyze the structure and return detected regions
Example prompts:
- "Analyze the layout of this document page"
- "Detect all tables and text blocks in this image"
- "Determine the reading order for this PDF page"
- "Find headings and paragraphs in this document"
- 提供文档图片或PDF文件
- 指定需要检测的布局元素
- 我会分析文档结构并返回检测到的区域
示例提示词:
- "分析此文档页面的布局"
- "检测此图片中的所有表格和文本块"
- "确定此PDF页面的阅读顺序"
- "查找此文档中的标题和段落"
Domain Knowledge
领域知识
surya Fundamentals
Surya 基础
python
from surya.detection import DetectionPredictor
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from PIL import Imagepython
from surya.detection import DetectionPredictor
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from PIL import ImageLoad image
Load image
image = Image.open("document.png")
image = Image.open("document.png")
Detect layout elements
Detect layout elements
layout_predictor = LayoutPredictor()
layout_result = layout_predictor([image])
undefinedlayout_predictor = LayoutPredictor()
layout_result = layout_predictor([image])
undefinedLayout Element Types
布局元素类型
| Element | Description |
|---|---|
| Text | Regular paragraph text |
| Title | Document/section titles |
| Section-header | Section headings |
| List-item | Bulleted/numbered items |
| Table | Tabular data |
| Figure | Images/diagrams |
| Caption | Figure/table captions |
| Footnote | Footnotes |
| Formula | Mathematical equations |
| Page-header | Headers |
| Page-footer | Footers |
| 元素类型 | 描述 |
|---|---|
| Text | 常规段落文本 |
| Title | 文档/章节标题 |
| Section-header | 章节副标题 |
| List-item | 项目符号/编号项 |
| Table | 表格数据 |
| Figure | 图片/图表 |
| Caption | 图表/表格说明文字 |
| Footnote | 脚注 |
| Formula | 数学公式 |
| Page-header | 页眉 |
| Page-footer | 页脚 |
Text Detection
文本检测
python
from surya.detection import DetectionPredictor
from PIL import Imagepython
from surya.detection import DetectionPredictor
from PIL import ImageInitialize detector
Initialize detector
detector = DetectionPredictor()
detector = DetectionPredictor()
Load image
Load image
image = Image.open("document.png")
image = Image.open("document.png")
Detect text regions
Detect text regions
results = detector([image])
results = detector([image])
Access results
Access results
for page_result in results:
for bbox in page_result.bboxes:
print(f"Text region: {bbox.bbox}")
print(f"Confidence: {bbox.confidence}")
undefinedfor page_result in results:
for bbox in page_result.bboxes:
print(f"Text region: {bbox.bbox}")
print(f"Confidence: {bbox.confidence}")
undefinedLayout Analysis
布局分析
python
from surya.layout import LayoutPredictor
from PIL import Imagepython
from surya.layout import LayoutPredictor
from PIL import ImageInitialize layout predictor
Initialize layout predictor
layout_predictor = LayoutPredictor()
layout_predictor = LayoutPredictor()
Analyze layout
Analyze layout
image = Image.open("document.png")
layout_results = layout_predictor([image])
image = Image.open("document.png")
layout_results = layout_predictor([image])
Process results
Process results
for page_result in layout_results:
for element in page_result.bboxes:
print(f"Type: {element.label}")
print(f"Bbox: {element.bbox}")
print(f"Confidence: {element.confidence}")
undefinedfor page_result in layout_results:
for element in page_result.bboxes:
print(f"Type: {element.label}")
print(f"Bbox: {element.bbox}")
print(f"Confidence: {element.confidence}")
undefinedReading Order Detection
阅读顺序检测
python
from surya.reading_order import ReadingOrderPredictor
from surya.layout import LayoutPredictor
from PIL import Imagepython
from surya.reading_order import ReadingOrderPredictor
from surya.layout import LayoutPredictor
from PIL import ImageGet layout first
Get layout first
layout_predictor = LayoutPredictor()
image = Image.open("document.png")
layout_results = layout_predictor([image])
layout_predictor = LayoutPredictor()
image = Image.open("document.png")
layout_results = layout_predictor([image])
Determine reading order
Determine reading order
reading_order_predictor = ReadingOrderPredictor()
order_results = reading_order_predictor([image], layout_results)
reading_order_predictor = ReadingOrderPredictor()
order_results = reading_order_predictor([image], layout_results)
Access ordered elements
Access ordered elements
for page_result in order_results:
for i, element in enumerate(page_result.ordered_bboxes):
print(f"{i+1}. {element.label}: {element.bbox}")
undefinedfor page_result in order_results:
for i, element in enumerate(page_result.ordered_bboxes):
print(f"{i+1}. {element.label}: {element.bbox}")
undefinedOCR with Layout
结合布局的OCR
python
from surya.ocr import OCRPredictor
from surya.layout import LayoutPredictor
from PIL import Imagepython
from surya.ocr import OCRPredictor
from surya.layout import LayoutPredictor
from PIL import ImageInitialize predictors
Initialize predictors
ocr_predictor = OCRPredictor()
layout_predictor = LayoutPredictor()
ocr_predictor = OCRPredictor()
layout_predictor = LayoutPredictor()
Load image
Load image
image = Image.open("document.png")
image = Image.open("document.png")
Get layout
Get layout
layout_results = layout_predictor([image])
layout_results = layout_predictor([image])
Run OCR
Run OCR
ocr_results = ocr_predictor([image])
ocr_results = ocr_predictor([image])
Combine results
Combine results
for layout, ocr in zip(layout_results, ocr_results):
for layout_elem in layout.bboxes:
print(f"Element: {layout_elem.label}")
# Find OCR text within this layout element
for text_line in ocr.text_lines:
if boxes_overlap(layout_elem.bbox, text_line.bbox):
print(f" Text: {text_line.text}")undefinedfor layout, ocr in zip(layout_results, ocr_results):
for layout_elem in layout.bboxes:
print(f"Element: {layout_elem.label}")
# Find OCR text within this layout element
for text_line in ocr.text_lines:
if boxes_overlap(layout_elem.bbox, text_line.bbox):
print(f" Text: {text_line.text}")undefinedProcessing PDFs
PDF处理
python
from surya.layout import LayoutPredictor
from pdf2image import convert_from_path
def analyze_pdf_layout(pdf_path):
"""Analyze layout of all pages in PDF."""
# Convert PDF to images
images = convert_from_path(pdf_path)
# Initialize predictor
layout_predictor = LayoutPredictor()
# Analyze all pages
results = layout_predictor(images)
document_structure = []
for page_num, page_result in enumerate(results):
page_elements = []
for element in page_result.bboxes:
page_elements.append({
'type': element.label,
'bbox': element.bbox,
'confidence': element.confidence
})
document_structure.append({
'page': page_num + 1,
'elements': page_elements
})
return document_structure
structure = analyze_pdf_layout("document.pdf")python
from surya.layout import LayoutPredictor
from pdf2image import convert_from_path
def analyze_pdf_layout(pdf_path):
"""Analyze layout of all pages in PDF."""
# Convert PDF to images
images = convert_from_path(pdf_path)
# Initialize predictor
layout_predictor = LayoutPredictor()
# Analyze all pages
results = layout_predictor(images)
document_structure = []
for page_num, page_result in enumerate(results):
page_elements = []
for element in page_result.bboxes:
page_elements.append({
'type': element.label,
'bbox': element.bbox,
'confidence': element.confidence
})
document_structure.append({
'page': page_num + 1,
'elements': page_elements
})
return document_structure
structure = analyze_pdf_layout("document.pdf")Visualization
可视化
python
from surya.layout import LayoutPredictor
from PIL import Image, ImageDraw, ImageFont
def visualize_layout(image_path, output_path):
"""Visualize detected layout elements."""
image = Image.open(image_path)
layout_predictor = LayoutPredictor()
results = layout_predictor([image])
# Create drawing context
draw = ImageDraw.Draw(image)
# Color mapping for element types
colors = {
'Text': 'blue',
'Title': 'red',
'Table': 'green',
'Figure': 'purple',
'Section-header': 'orange',
'List-item': 'cyan',
}
for element in results[0].bboxes:
bbox = element.bbox
color = colors.get(element.label, 'gray')
# Draw rectangle
draw.rectangle(bbox, outline=color, width=2)
# Add label
draw.text((bbox[0], bbox[1] - 15),
f"{element.label} ({element.confidence:.2f})",
fill=color)
image.save(output_path)
return output_pathpython
from surya.layout import LayoutPredictor
from PIL import Image, ImageDraw, ImageFont
def visualize_layout(image_path, output_path):
"""Visualize detected layout elements."""
image = Image.open(image_path)
layout_predictor = LayoutPredictor()
results = layout_predictor([image])
# Create drawing context
draw = ImageDraw.Draw(image)
# Color mapping for element types
colors = {
'Text': 'blue',
'Title': 'red',
'Table': 'green',
'Figure': 'purple',
'Section-header': 'orange',
'List-item': 'cyan',
}
for element in results[0].bboxes:
bbox = element.bbox
color = colors.get(element.label, 'gray')
# Draw rectangle
draw.rectangle(bbox, outline=color, width=2)
# Add label
draw.text((bbox[0], bbox[1] - 15),
f"{element.label} ({element.confidence:.2f})",
fill=color)
image.save(output_path)
return output_pathBest Practices
最佳实践
- Use High-Quality Images: 150+ DPI for best results
- Preprocess if Needed: Deskew rotated documents
- Validate Results: Check confidence scores
- Handle Multi-page: Process pages individually
- Combine with OCR: Get text within detected regions
- 使用高质量图片:建议使用150+ DPI的图片以获得最佳结果
- 按需预处理:对倾斜的文档进行校正
- 验证结果:检查置信度分数
- 处理多页文档:逐页进行处理
- 结合OCR使用:提取检测区域内的文本内容
Common Patterns
常见应用场景
Document Structure Extraction
文档结构提取
python
def extract_document_structure(image_path):
"""Extract hierarchical document structure."""
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
image = Image.open(image_path)
# Get layout
layout_predictor = LayoutPredictor()
layout_results = layout_predictor([image])
# Get reading order
order_predictor = ReadingOrderPredictor()
order_results = order_predictor([image], layout_results)
structure = {
'title': None,
'sections': [],
'tables': [],
'figures': []
}
current_section = None
for element in order_results[0].ordered_bboxes:
if element.label == 'Title':
structure['title'] = element
elif element.label == 'Section-header':
current_section = {'header': element, 'content': []}
structure['sections'].append(current_section)
elif element.label == 'Table':
structure['tables'].append(element)
elif element.label == 'Figure':
structure['figures'].append(element)
elif current_section and element.label in ['Text', 'List-item']:
current_section['content'].append(element)
return structurepython
def extract_document_structure(image_path):
"""Extract hierarchical document structure."""
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
image = Image.open(image_path)
# Get layout
layout_predictor = LayoutPredictor()
layout_results = layout_predictor([image])
# Get reading order
order_predictor = ReadingOrderPredictor()
order_results = order_predictor([image], layout_results)
structure = {
'title': None,
'sections': [],
'tables': [],
'figures': []
}
current_section = None
for element in order_results[0].ordered_bboxes:
if element.label == 'Title':
structure['title'] = element
elif element.label == 'Section-header':
current_section = {'header': element, 'content': []}
structure['sections'].append(current_section)
elif element.label == 'Table':
structure['tables'].append(element)
elif element.label == 'Figure':
structure['figures'].append(element)
elif current_section and element.label in ['Text', 'List-item']:
current_section['content'].append(element)
return structureTable Region Extraction
表格区域提取
python
def extract_table_regions(image_path):
"""Extract table regions from document."""
from surya.layout import LayoutPredictor
image = Image.open(image_path)
layout_predictor = LayoutPredictor()
results = layout_predictor([image])
tables = []
for element in results[0].bboxes:
if element.label == 'Table':
bbox = element.bbox
# Crop table region
table_image = image.crop(bbox)
tables.append({
'bbox': bbox,
'image': table_image,
'confidence': element.confidence
})
return tablespython
def extract_table_regions(image_path):
"""Extract table regions from document."""
from surya.layout import LayoutPredictor
image = Image.open(image_path)
layout_predictor = LayoutPredictor()
results = layout_predictor([image])
tables = []
for element in results[0].bboxes:
if element.label == 'Table':
bbox = element.bbox
# Crop table region
table_image = image.crop(bbox)
tables.append({
'bbox': bbox,
'image': table_image,
'confidence': element.confidence
})
return tablesExamples
示例
Example 1: Academic Paper Analysis
示例1:学术论文分析
python
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from pdf2image import convert_from_path
def analyze_academic_paper(pdf_path):
"""Analyze structure of academic paper."""
images = convert_from_path(pdf_path)
layout_predictor = LayoutPredictor()
order_predictor = ReadingOrderPredictor()
paper_structure = {
'pages': [],
'element_counts': {
'Title': 0,
'Section-header': 0,
'Text': 0,
'Table': 0,
'Figure': 0,
'Formula': 0,
'Footnote': 0
}
}
layout_results = layout_predictor(images)
order_results = order_predictor(images, layout_results)
for page_num, (layout, order) in enumerate(zip(layout_results, order_results)):
page_structure = {
'page': page_num + 1,
'elements': []
}
for element in order.ordered_bboxes:
page_structure['elements'].append({
'type': element.label,
'bbox': element.bbox,
'order': element.position
})
# Count element types
if element.label in paper_structure['element_counts']:
paper_structure['element_counts'][element.label] += 1
paper_structure['pages'].append(page_structure)
return paper_structure
paper = analyze_academic_paper('research_paper.pdf')
print(f"Total tables: {paper['element_counts']['Table']}")
print(f"Total figures: {paper['element_counts']['Figure']}")python
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from pdf2image import convert_from_path
def analyze_academic_paper(pdf_path):
"""Analyze structure of academic paper."""
images = convert_from_path(pdf_path)
layout_predictor = LayoutPredictor()
order_predictor = ReadingOrderPredictor()
paper_structure = {
'pages': [],
'element_counts': {
'Title': 0,
'Section-header': 0,
'Text': 0,
'Table': 0,
'Figure': 0,
'Formula': 0,
'Footnote': 0
}
}
layout_results = layout_predictor(images)
order_results = order_predictor(images, layout_results)
for page_num, (layout, order) in enumerate(zip(layout_results, order_results)):
page_structure = {
'page': page_num + 1,
'elements': []
}
for element in order.ordered_bboxes:
page_structure['elements'].append({
'type': element.label,
'bbox': element.bbox,
'order': element.position
})
# Count element types
if element.label in paper_structure['element_counts']:
paper_structure['element_counts'][element.label] += 1
paper_structure['pages'].append(page_structure)
return paper_structure
paper = analyze_academic_paper('research_paper.pdf')
print(f"Total tables: {paper['element_counts']['Table']}")
print(f"Total figures: {paper['element_counts']['Figure']}")Example 2: Form Field Detection
示例2:表单字段检测
python
from surya.layout import LayoutPredictor
from PIL import Image
def detect_form_fields(image_path):
"""Detect form fields and labels."""
image = Image.open(image_path)
layout_predictor = LayoutPredictor()
results = layout_predictor([image])
form_fields = []
for element in results[0].bboxes:
# Look for text elements that might be labels
if element.label == 'Text':
# Check if there's a box/line nearby (potential input field)
form_fields.append({
'type': 'potential_label',
'bbox': element.bbox,
'confidence': element.confidence
})
return form_fields
fields = detect_form_fields('form.png')
print(f"Found {len(fields)} potential form elements")python
from surya.layout import LayoutPredictor
from PIL import Image
def detect_form_fields(image_path):
"""Detect form fields and labels."""
image = Image.open(image_path)
layout_predictor = LayoutPredictor()
results = layout_predictor([image])
form_fields = []
for element in results[0].bboxes:
# Look for text elements that might be labels
if element.label == 'Text':
# Check if there's a box/line nearby (potential input field)
form_fields.append({
'type': 'potential_label',
'bbox': element.bbox,
'confidence': element.confidence
})
return form_fields
fields = detect_form_fields('form.png')
print(f"Found {len(fields)} potential form elements")Example 3: Multi-column Article
示例3:多栏文章处理
python
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from PIL import Image
def process_multicolumn_article(image_path):
"""Process multi-column article layout."""
image = Image.open(image_path)
layout_predictor = LayoutPredictor()
order_predictor = ReadingOrderPredictor()
layout_results = layout_predictor([image])
order_results = order_predictor([image], layout_results)
# Group elements by column
image_width = image.width
column_threshold = image_width / 2
columns = {
'left': [],
'right': [],
'full_width': []
}
for element in order_results[0].ordered_bboxes:
bbox = element.bbox
element_center = (bbox[0] + bbox[2]) / 2
element_width = bbox[2] - bbox[0]
# Determine column
if element_width > column_threshold * 1.5:
columns['full_width'].append(element)
elif element_center < column_threshold:
columns['left'].append(element)
else:
columns['right'].append(element)
return {
'layout': 'multi-column',
'columns': columns,
'reading_order': order_results[0].ordered_bboxes
}
article = process_multicolumn_article('newspaper_page.png')
print(f"Left column: {len(article['columns']['left'])} elements")
print(f"Right column: {len(article['columns']['right'])} elements")python
from surya.layout import LayoutPredictor
from surya.reading_order import ReadingOrderPredictor
from PIL import Image
def process_multicolumn_article(image_path):
"""Process multi-column article layout."""
image = Image.open(image_path)
layout_predictor = LayoutPredictor()
order_predictor = ReadingOrderPredictor()
layout_results = layout_predictor([image])
order_results = order_predictor([image], layout_results)
# Group elements by column
image_width = image.width
column_threshold = image_width / 2
columns = {
'left': [],
'right': [],
'full_width': []
}
for element in order_results[0].ordered_bboxes:
bbox = element.bbox
element_center = (bbox[0] + bbox[2]) / 2
element_width = bbox[2] - bbox[0]
# Determine column
if element_width > column_threshold * 1.5:
columns['full_width'].append(element)
elif element_center < column_threshold:
columns['left'].append(element)
else:
columns['right'].append(element)
return {
'layout': 'multi-column',
'columns': columns,
'reading_order': order_results[0].ordered_bboxes
}
article = process_multicolumn_article('newspaper_page.png')
print(f"Left column: {len(article['columns']['left'])} elements")
print(f"Right column: {len(article['columns']['right'])} elements")Limitations
局限性
- Handwritten layouts may be inaccurate
- Very small text regions may be missed
- Complex nested layouts challenging
- GPU recommended for batch processing
- Multi-language support varies
- 手写文档的布局分析可能不准确
- 极小的文本区域可能被遗漏
- 复杂嵌套布局的处理难度较大
- 批量处理建议使用GPU
- 多语言支持能力参差不齐
Installation
安装
bash
pip install surya-ocrbash
pip install surya-ocrFor PDF processing
For PDF processing
pip install pdf2image
undefinedpip install pdf2image
undefined