Legal Text Formatting Tool
Overview
Convert legal texts (legal provisions or legal cases) into standardized Markdown format, remove promotional redundant information, automatically identify text types and apply corresponding formatting rules.
Core Responsibilities: Only responsible for formatting and content cleaning, no content crawling capability included.
Collaboration with Other Skills
Typical Workflow
Scenario: User requests to format legal text on a web page
用户请求 → AI 判断来源 → 抓取类 skill 获取内容 → legal-text-format 格式化
Example workflow:
- User provides a WeChat Official Account link → AI uses to crawl content → AI calls for formatting
- User provides a regular web page link → AI uses other tools to crawl content → AI calls for formatting
- User pastes text directly → AI calls for formatting directly
Skill Responsibility Boundary:
- / other crawling tools: Responsible for obtaining original text content from various sources
- : Responsible for formatting and cleaning the obtained text
Core Principles
Content Integrity Guarantee: Except for format adjustment and removal of promotional content, all substantive content of legal cases and provisions must be retained completely without any omission!
Workflow
Step 1: Analyze Text Type
Use LLM to analyze the input text:
- Judge whether it is a legal provision or a legal case
- Identify text structural features (chapters, clauses, case numbers, etc.)
- Determine the appropriate formatting strategy
- Extract the theme for file naming
Reference Analysis Prompt:
text
分析以下文本,判断其类型:
- 如果是法律条文:识别章、节、条的结构
- 如果是法律案例:识别案例标题、案号、案情、裁判结果、典型意义等
- 提取主题用于文件命名
Step 2: Save Original Content
Save the input original content as a local Markdown file:
- File location:
archive/{YYYYMMDD_HHMMSS}_{主题}/
- File naming:
- Purpose: Provide traceability basis for content comparison and verification
Example Archive Directory Structure:
text
archive/20250122_153400_个人信息保护检察公益诉讼典型案例/
├── 20250122_个人信息保护检察公益诉讼典型案例_raw.md # 原始内容
├── 20250122_个人信息保护检察公益诉讼典型案例_formatted.md # 格式化后的内容(步骤4生成)
└── meta.json # 元信息(可选)
Step 3: Format Text
Important: Process the complete text at one time, do not segment it.
Formatting Prompt (refer to detailed examples in examples.md):
text
请将以下法律文本格式化为规范的 Markdown 格式。
# 法律条文格式化规则
- 章前面添加二级 markdown 格式(##)
- 不同条文之间添加空行
- 每一条条文内部换行时不应有多余空行
- "第X条"进行加粗(**第X条**)
- 如果一段文字的最后没有句号或分号,则删除后方的回车
- 保持所有条文内容完整,不得遗漏任何条款
# 法律案例格式化规则
- 把英文标点符号替换成中文标点符号(包括括号、逗号、句号、冒号、分号等)
- 案例序号或名称前添加二级 markdown 格式(##),序号后要紧跟案例名称
- 每个案例的章节前添加三级 markdown 格式(###)
- 每个案例的章节内部不应有大于1个的空行(连续换行数不超过2个)
- 清理多余的连续空行,保持段落间适当的分隔
- 把数字格式调整为半角
- 内容范围限定:
- 仅保留从第一个案例到最后一个案例的内容
- 删除前面的文章介绍、作者信息、引言、目录等
- 删除底部的宣传推广内容、二维码、公众号介绍、相关文章推荐等
- 保留标准:只保留案例标题、案号、基本案情、裁判结果、典型意义等法律案例本身的内容
- 保持所有案例实质内容完整,包括案情、裁判、意义等所有部分
# 参考示例
详见 references/examples.md 文件,其中包含4个完整的格式化示例。
Summary of Legal Provision Formatting Rules
- Add secondary Markdown format () before chapters
- Add blank lines between different provisions
- No extra blank lines when wrapping lines within a single provision
- Bold "Article X" ()
- If the end of a paragraph has no period or semicolon, delete the carriage return after it
- Keep all provision content complete, no clause shall be omitted
Summary of Legal Case Formatting Rules
- Replace English punctuation marks with Chinese punctuation marks
- Add secondary Markdown format () before case serial numbers or names, the case name shall follow the serial number immediately
- Add tertiary Markdown format () before each chapter of a case
- No more than 1 blank line within each chapter of a case (no more than 2 consecutive line breaks)
- Clean up extra consecutive blank lines to maintain appropriate separation between paragraphs
- Adjust number format to half-width
- Content scope limitation:
- Only retain content from the first case to the last case
- Delete preceding content such as article introduction, author information, foreword, table of contents, etc.
- Delete bottom content such as promotional content, QR codes, official account introduction, related article recommendations, etc.
- Retention standard: Only retain the content of the legal case itself, including case title, case number, basic facts, judgment result, typical significance, etc.
- Keep all substantive content of the case complete, including facts, judgment, significance and all other parts
Step 4: Save and Verify
Save the formatted document:
- File location:
archive/{YYYYMMDD_HHMMSS}_{主题}/
- File naming:
{YYYYMMDD}_{主题}_formatted.md
- Save in the same archive directory as
Content Integrity Verification:
- Compare the word count of the original document and the formatted document (a difference of ±10% is allowed due to the removal of promotional content)
- Confirm that all legal provisions/case titles exist
- Confirm that key content (case number, court, judgment gist, etc.) is fully retained
- Indicate the verification result in the meta information of the output document
Reference Documents
Formatting Examples
For detailed formatting examples and comparisons, please refer to examples.md, which includes:
- 4 complete legal case formatting examples
- Each example includes a comparison between original text and formatted text
- Summary of formatting points (punctuation processing, title hierarchy, blank line processing, content scope, etc.)
Accuracy Requirements
- Do not change the original meaning: Format adjustment shall not change the original meaning of the legal text
- Retain key information: Key information such as case number, court name, parties, etc. must be retained
- Keep serial numbers: Case serial numbers and provision serial numbers shall not be modified
Output Document Structure
Archive Directory Organization
All formatting results are archived and stored by timestamp:
text
archive/
├── {YYYYMMDD_HHMMSS}_{文档主题}/
│ ├── {YYYYMMDD}_{主题}_raw.md # 原始抓取内容
│ ├── {YYYYMMDD}_{主题}_formatted.md # 格式化后的内容
│ └── meta.json # 元信息(可选)
Naming Rules:
- Directory name: (Example:
20250122_153400_个人信息保护典型案例
)
- File name:
- Original file: (Example:
20250122_个人信息保护典型案例_raw.md
)
- Formatted file:
{YYYYMMDD}_{主题}_formatted.md
(Example: 20250122_个人信息保护典型案例_formatted.md
)
- Theme limitation: Core theme extracted from the text, limited to 30 characters
- Date format: (Example: 20250122)
formatted.md Content Structure
markdown
# {文档标题}
## 元信息
- **来源**:{原网页URL或"用户粘贴"}
- **处理时间**:{时间戳}
- **文本类型**:{法律条文/法律案例}
- **原始文件**:[{YYYYMMDD}_{主题}_raw.md]({YYYYMMDD}_{主题}_raw.md)
## 内容验证
- **原始字数**:{原始文档字数}
- **格式化后字数**:{格式化文档字数}
- **字数差异**:{差异百分比}%
- **案例/条文数量**:{识别到的案例或条文数量}
- **完整性检查**:✅ 通过 / ⚠️ 需人工复核
---
{格式化后的正文内容}
Quality Standards
- Unified punctuation: All punctuation marks use Chinese punctuation
- Unified number format: Numbers use half-width characters
- Clear hierarchical structure: Correct use of secondary and tertiary titles
- Standard blank lines: Appropriate blank lines between paragraphs, neither too many nor missing
- Content integrity: Retain all legal related content and remove irrelevant promotional information
Applicable Scenarios
- Organizing legal provision compilations
- Standardizing legal case collections
- Preparing legal learning materials
- Building a legal text knowledge base
- Cleaning legal texts crawled from web pages
Input Requirements
This skill accepts the following types of input:
- Crawled text content: Text obtained by other skills (such as wechat-article-fetch)
- User pasted text: Text content directly provided by the user
- Local files: Saved Markdown/text files
Not accepted: Web page links (links shall be processed by special crawling skills)