legal-text-format

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

法律文本格式化工具

Legal Text Formatting Tool

概述

Overview

将法律文本（法律条文或法律案例）转换为规范的 Markdown 格式，删除推广冗余信息，自动识别文本类型并应用相应的格式化规则。

核心职责：只负责格式化和内容清理，不包含内容抓取能力。

Convert legal texts (legal provisions or legal cases) into standardized Markdown format, remove promotional redundant information, automatically identify text types and apply corresponding formatting rules.

Core Responsibilities: Only responsible for formatting and content cleaning, no content crawling capability included.

与其他技能的协作

Collaboration with Other Skills

典型工作流程

Typical Workflow

场景：用户请求格式化网页上的法律文本

用户请求 → AI 判断来源 → 抓取类 skill 获取内容 → legal-text-format 格式化

示例流程：

用户提供微信公众号链接 → AI 使用
```
wechat-article-fetch
```
抓取 → AI 调用
```
legal-text-format
```
格式化
用户提供普通网页链接 → AI 使用其他工具抓取 → AI 调用
```
legal-text-format
```
格式化
用户直接粘贴文本 → AI 直接调用
```
legal-text-format
```
格式化

技能职责边界：

```
wechat-article-fetch
```
/ 其他抓取工具：负责从各种来源获取原始文本内容
```
legal-text-format
```
：负责对已获取的文本进行格式化和清理

Scenario: User requests to format legal text on a web page

用户请求 → AI 判断来源 → 抓取类 skill 获取内容 → legal-text-format 格式化

Example workflow:

User provides a WeChat Official Account link → AI uses
```
wechat-article-fetch
```
to crawl content → AI calls
```
legal-text-format
```
for formatting
User provides a regular web page link → AI uses other tools to crawl content → AI calls
```
legal-text-format
```
for formatting
User pastes text directly → AI calls
```
legal-text-format
```
for formatting directly

Skill Responsibility Boundary:

```
wechat-article-fetch
```
/ other crawling tools: Responsible for obtaining original text content from various sources
```
legal-text-format
```
: Responsible for formatting and cleaning the obtained text

核心原则

Core Principles

内容完整性保证：除格式调整和去除推广内容外，所有法律案例和法条的实质内容必须完整保留，不得有任何遗漏！

Content Integrity Guarantee: Except for format adjustment and removal of promotional content, all substantive content of legal cases and provisions must be retained completely without any omission!

工作流程

Workflow

步骤 1：分析文本类型

Step 1: Analyze Text Type

使用 LLM 分析输入文本：

判断是法律条文还是法律案例
识别文本结构特征（章节、条款、案号等）
确定适合的格式化策略
提取主题用于文件命名

分析提示词参考：

text

分析以下文本，判断其类型：
- 如果是法律条文：识别章、节、条的结构
- 如果是法律案例：识别案例标题、案号、案情、裁判结果、典型意义等
- 提取主题用于文件命名

Use LLM to analyze the input text:

Judge whether it is a legal provision or a legal case
Identify text structural features (chapters, clauses, case numbers, etc.)
Determine the appropriate formatting strategy
Extract the theme for file naming

Reference Analysis Prompt:

text

分析以下文本，判断其类型：
- 如果是法律条文：识别章、节、条的结构
- 如果是法律案例：识别案例标题、案号、案情、裁判结果、典型意义等
- 提取主题用于文件命名

步骤 2：保存原始内容

Step 2: Save Original Content

将输入的原始内容保存为本地 Markdown 文件：

文件位置：
```
archive/{YYYYMMDD_HHMMSS}_{主题}/
```
文件命名：
```
{YYYYMMDD}_{主题}_raw.md
```
目的：提供溯源依据，便于内容比对验证

归档目录结构示例：

text

archive/20250122_153400_个人信息保护检察公益诉讼典型案例/
├── 20250122_个人信息保护检察公益诉讼典型案例_raw.md      # 原始内容
├── 20250122_个人信息保护检察公益诉讼典型案例_formatted.md # 格式化后的内容（步骤4生成）
└── meta.json                                              # 元信息（可选）

Save the input original content as a local Markdown file:

File location:
```
archive/{YYYYMMDD_HHMMSS}_{主题}/
```
File naming:
```
{YYYYMMDD}_{主题}_raw.md
```
Purpose: Provide traceability basis for content comparison and verification

Example Archive Directory Structure:

text

archive/20250122_153400_个人信息保护检察公益诉讼典型案例/
├── 20250122_个人信息保护检察公益诉讼典型案例_raw.md      # 原始内容
├── 20250122_个人信息保护检察公益诉讼典型案例_formatted.md # 格式化后的内容（步骤4生成）
└── meta.json                                              # 元信息（可选）

步骤 3：格式化文本

Step 3: Format Text

重要：一次性处理完整文本，不进行分段。

格式化提示词（参考 examples.md 中的详细示例）：

text

请将以下法律文本格式化为规范的 Markdown 格式。

Important: Process the complete text at one time, do not segment it.

Formatting Prompt (refer to detailed examples in examples.md):

text

请将以下法律文本格式化为规范的 Markdown 格式。

法律条文格式化规则

章前面添加二级 markdown 格式（##）
不同条文之间添加空行
每一条条文内部换行时不应有多余空行
"第X条"进行加粗（第X条）
如果一段文字的最后没有句号或分号，则删除后方的回车
保持所有条文内容完整，不得遗漏任何条款

章前面添加二级 markdown 格式（##）
不同条文之间添加空行
每一条条文内部换行时不应有多余空行
"第X条"进行加粗（第X条）
如果一段文字的最后没有句号或分号，则删除后方的回车
保持所有条文内容完整，不得遗漏任何条款

法律案例格式化规则

把英文标点符号替换成中文标点符号（包括括号、逗号、句号、冒号、分号等）
案例序号或名称前添加二级 markdown 格式（##），序号后要紧跟案例名称
每个案例的章节前添加三级 markdown 格式（###）
每个案例的章节内部不应有大于1个的空行（连续换行数不超过2个）
清理多余的连续空行，保持段落间适当的分隔
把数字格式调整为半角
内容范围限定：
- 仅保留从第一个案例到最后一个案例的内容
- 删除前面的文章介绍、作者信息、引言、目录等
- 删除底部的宣传推广内容、二维码、公众号介绍、相关文章推荐等
- 保留标准：只保留案例标题、案号、基本案情、裁判结果、典型意义等法律案例本身的内容
保持所有案例实质内容完整，包括案情、裁判、意义等所有部分

把英文标点符号替换成中文标点符号（包括括号、逗号、句号、冒号、分号等）
案例序号或名称前添加二级 markdown 格式（##），序号后要紧跟案例名称
每个案例的章节前添加三级 markdown 格式（###）
每个案例的章节内部不应有大于1个的空行（连续换行数不超过2个）
清理多余的连续空行，保持段落间适当的分隔
把数字格式调整为半角
内容范围限定：
- 仅保留从第一个案例到最后一个案例的内容
- 删除前面的文章介绍、作者信息、引言、目录等
- 删除底部的宣传推广内容、二维码、公众号介绍、相关文章推荐等
- 保留标准：只保留案例标题、案号、基本案情、裁判结果、典型意义等法律案例本身的内容
保持所有案例实质内容完整，包括案情、裁判、意义等所有部分

参考示例

详见 references/examples.md 文件，其中包含4个完整的格式化示例。

undefined

详见 references/examples.md 文件，其中包含4个完整的格式化示例。

undefined

法律条文格式化规则摘要

Summary of Legal Provision Formatting Rules

章前面添加二级 markdown 格式（
```
##
```
）
不同条文之间添加空行
每一条条文内部换行时不应有多余空行
"第X条"进行加粗（
```
**第X条**
```
）
如果一段文字的最后没有句号或分号，则删除后方的回车
保持所有条文内容完整，不得遗漏任何条款

Add secondary Markdown format (
```
##
```
) before chapters
Add blank lines between different provisions
No extra blank lines when wrapping lines within a single provision
Bold "Article X" (
```
**Article X**
```
)
If the end of a paragraph has no period or semicolon, delete the carriage return after it
Keep all provision content complete, no clause shall be omitted

法律案例格式化规则摘要

Summary of Legal Case Formatting Rules

把英文标点符号替换成中文标点符号
案例序号或名称前添加二级 markdown 格式（
```
##
```
），序号后要紧跟案例名称
每个案例的章节前添加三级 markdown 格式（
```
###
```
）
每个案例的章节内部不应有大于1个的空行（连续换行数不超过2个）
清理多余的连续空行，保持段落间适当的分隔
把数字格式调整为半角
内容范围限定：
- 仅保留从第一个案例到最后一个案例的内容
- 删除前面的文章介绍、作者信息、引言、目录等
- 删除底部的宣传推广内容、二维码、公众号介绍、相关文章推荐等
- 保留标准：只保留案例标题、案号、基本案情、裁判结果、典型意义等法律案例本身的内容
保持所有案例实质内容完整，包括案情、裁判、意义等所有部分

Replace English punctuation marks with Chinese punctuation marks
Add secondary Markdown format (
```
##
```
) before case serial numbers or names, the case name shall follow the serial number immediately
Add tertiary Markdown format (
```
###
```
) before each chapter of a case
No more than 1 blank line within each chapter of a case (no more than 2 consecutive line breaks)
Clean up extra consecutive blank lines to maintain appropriate separation between paragraphs
Adjust number format to half-width
Content scope limitation:
- Only retain content from the first case to the last case
- Delete preceding content such as article introduction, author information, foreword, table of contents, etc.
- Delete bottom content such as promotional content, QR codes, official account introduction, related article recommendations, etc.
- Retention standard: Only retain the content of the legal case itself, including case title, case number, basic facts, judgment result, typical significance, etc.
Keep all substantive content of the case complete, including facts, judgment, significance and all other parts

步骤 4：保存并验证

Step 4: Save and Verify

保存格式化后的文档：

文件位置：
```
archive/{YYYYMMDD_HHMMSS}_{主题}/
```
文件命名：
```
{YYYYMMDD}_{主题}_formatted.md
```
与
```
raw.md
```
保存在同一归档目录下

内容完整性验证：

对比原始文档和格式化文档的字数（允许±10%的差异，因为删除了推广内容）
确认所有法律条文/案例标题都存在
确认关键内容（案号、法院、裁判要旨等）完整保留
在输出文档的元信息中注明验证结果

Save the formatted document:

File location:
```
archive/{YYYYMMDD_HHMMSS}_{主题}/
```
File naming:
```
{YYYYMMDD}_{主题}_formatted.md
```
Save in the same archive directory as
```
raw.md
```

Content Integrity Verification:

Compare the word count of the original document and the formatted document (a difference of ±10% is allowed due to the removal of promotional content)
Confirm that all legal provisions/case titles exist
Confirm that key content (case number, court, judgment gist, etc.) is fully retained
Indicate the verification result in the meta information of the output document

参考文档

Reference Documents

格式化示例

Formatting Examples

详细的格式化示例和对比请参见 examples.md，包含：

4个完整的法律案例格式化示例
每个示例包含原始文本和格式化后的对比
格式化要点总结（标点符号处理、标题层级、空行处理、内容范围等）

For detailed formatting examples and comparisons, please refer to examples.md, which includes:

4 complete legal case formatting examples
Each example includes a comparison between original text and formatted text
Summary of formatting points (punctuation processing, title hierarchy, blank line processing, content scope, etc.)

准确性要求

Accuracy Requirements

不改变原文含义：格式调整不得改变法律文本的原意
保留关键信息：案号、法院名称、当事人等关键信息必须保留
序号保持：案例序号、条文序号不得修改

Do not change the original meaning: Format adjustment shall not change the original meaning of the legal text
Retain key information: Key information such as case number, court name, parties, etc. must be retained
Keep serial numbers: Case serial numbers and provision serial numbers shall not be modified

输出文档结构

Output Document Structure

归档目录组织

Archive Directory Organization

所有格式化结果按时间戳归档存储：

text

archive/
├── {YYYYMMDD_HHMMSS}_{文档主题}/
│   ├── {YYYYMMDD}_{主题}_raw.md        # 原始抓取内容
│   ├── {YYYYMMDD}_{主题}_formatted.md  # 格式化后的内容
│   └── meta.json                       # 元信息（可选）

命名规则：

目录名：

{YYYYMMDD_HHMMSS}_{主题}

（例：

20250122_153400_个人信息保护典型案例

）

文件名：

原始文件：

{YYYYMMDD}_{主题}_raw.md

（例：

20250122_个人信息保护典型案例_raw.md

）

格式化文件：

{YYYYMMDD}_{主题}_formatted.md

（例：

20250122_个人信息保护典型案例_formatted.md

）

主题限制：从文本中提取的核心主题，限制30字以内
日期格式：
```
YYYYMMDD
```
（例：20250122）

All formatting results are archived and stored by timestamp:

text

archive/
├── {YYYYMMDD_HHMMSS}_{文档主题}/
│   ├── {YYYYMMDD}_{主题}_raw.md        # 原始抓取内容
│   ├── {YYYYMMDD}_{主题}_formatted.md  # 格式化后的内容
│   └── meta.json                       # 元信息（可选）

Naming Rules:

Directory name:

{YYYYMMDD_HHMMSS}_{主题}

(Example:

20250122_153400_个人信息保护典型案例

)

File name:

Original file:

{YYYYMMDD}_{主题}_raw.md

(Example:

20250122_个人信息保护典型案例_raw.md

)

Formatted file:

{YYYYMMDD}_{主题}_formatted.md

(Example:

20250122_个人信息保护典型案例_formatted.md

)

Theme limitation: Core theme extracted from the text, limited to 30 characters
Date format:
```
YYYYMMDD
```
(Example: 20250122)

formatted.md 内容结构

formatted.md Content Structure

markdown

undefined

markdown

undefined

{文档标题}

元信息

来源：{原网页URL或"用户粘贴"}
处理时间：{时间戳}
文本类型：{法律条文/法律案例}
原始文件：{YYYYMMDD}_{主题}_raw.md

来源：{原网页URL或"用户粘贴"}
处理时间：{时间戳}
文本类型：{法律条文/法律案例}
原始文件：{YYYYMMDD}_{主题}_raw.md

内容验证

原始字数：{原始文档字数}
格式化后字数：{格式化文档字数}
字数差异：{差异百分比}%
案例/条文数量：{识别到的案例或条文数量}
完整性检查：✅ 通过 / ⚠️ 需人工复核

{格式化后的正文内容}

undefined

原始字数：{原始文档字数}
格式化后字数：{格式化文档字数}
字数差异：{差异百分比}%
案例/条文数量：{识别到的案例或条文数量}
完整性检查：✅ 通过 / ⚠️ 需人工复核

{格式化后的正文内容}

undefined

质量标准

Quality Standards

标点符号统一：所有标点符号使用中文标点
数字格式统一：数字使用半角字符
层级结构清晰：正确使用二级、三级标题
空行规范：段落间空行适当，不过多也不缺失
内容完整性：保留所有法律相关内容，去除无关推广信息

Unified punctuation: All punctuation marks use Chinese punctuation
Unified number format: Numbers use half-width characters
Clear hierarchical structure: Correct use of secondary and tertiary titles
Standard blank lines: Appropriate blank lines between paragraphs, neither too many nor missing
Content integrity: Retain all legal related content and remove irrelevant promotional information

适用场景

Applicable Scenarios

整理法律条文汇编
规范化法律案例集
准备法律学习材料
建立法律文本知识库
清理从网页抓取的法律文本

Organizing legal provision compilations
Standardizing legal case collections
Preparing legal learning materials
Building a legal text knowledge base
Cleaning legal texts crawled from web pages

输入要求

Input Requirements

本技能接受以下类型的输入：

已抓取的文本内容：由其他 skill（如 wechat-article-fetch）获取的文本
用户粘贴的文本：用户直接提供的文本内容
本地文件：已保存的 Markdown/文本文件

不接受：网页链接（链接应由专门的抓取类 skill 处理）

This skill accepts the following types of input:

Crawled text content: Text obtained by other skills (such as wechat-article-fetch)
User pasted text: Text content directly provided by the user
Local files: Saved Markdown/text files

Not accepted: Web page links (links shall be processed by special crawling skills)