document-extraction
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDocument Extraction Skill
文档提取Skill
Extract requirements from existing documentation sources for systematic requirement mining.
从现有文档来源中提取需求,用于系统化的需求挖掘。
When to Use This Skill
何时使用此Skill
Keywords: extract requirements, document mining, PDF requirements, transcript analysis, parse document, existing documentation, legacy requirements, competitive analysis
Invoke this skill when:
- Mining requirements from existing documents
- Processing meeting transcripts for requirements
- Extracting requirements from competitor products
- Analyzing regulatory documents for compliance requirements
- Converting legacy documentation to structured requirements
关键词: 提取需求、文档挖掘、PDF需求、记录分析、解析文档、现有文档、遗留需求、竞品分析
在以下场景调用此Skill:
- 从现有文档中挖掘需求
- 处理会议记录以提取需求
- 从竞品产品中提取需求
- 分析监管文档以获取合规需求
- 将遗留文档转换为结构化需求
Supported Document Types
支持的文档类型
| Type | Extension | Extraction Method |
|---|---|---|
| Markdown | .md | Direct Read |
| Text | .txt | Direct Read |
| Read tool (PDF support) | ||
| Word | .docx | Read tool |
| Web Page | URL | WebFetch tool |
| Meeting Notes | .md, .txt | Transcript patterns |
| Specification | .md, .docx | Requirement patterns |
| 类型 | 扩展名 | 提取方法 |
|---|---|---|
| Markdown | .md | 直接读取 |
| 文本 | .txt | 直接读取 |
| 读取工具(支持PDF) | ||
| Word | .docx | 读取工具 |
| 网页 | URL | WebFetch工具 |
| 会议纪要 | .md, .txt | 记录模式匹配 |
| 规格说明书 | .md, .docx | 需求模式匹配 |
Extraction Workflow
提取流程
Step 1: Document Assessment
步骤1:文档评估
Analyze the document to determine extraction strategy:
yaml
document_assessment:
path: "{file path or URL}"
type: "{detected document type}"
size: "{approximate size}"
structure:
has_sections: true|false
has_lists: true|false
has_tables: true|false
quality:
formal_language: true|false
clear_requirements: true|false
needs_interpretation: true|false分析文档以确定提取策略:
yaml
document_assessment:
path: "{file path or URL}"
type: "{detected document type}"
size: "{approximate size}"
structure:
has_sections: true|false
has_lists: true|false
has_tables: true|false
quality:
formal_language: true|false
clear_requirements: true|false
needs_interpretation: true|falseStep 2: Pattern Matching
步骤2:模式匹配
Apply requirement detection patterns:
Explicit Requirement Markers:
text
- "The system shall..."
- "The system must..."
- "Users should be able to..."
- "REQ-XXX:"
- Numbered requirements (1.1, 1.2, etc.)EARS Patterns:
text
- "When [trigger], the [system] shall [response]"
- "While [state], the [system] shall [behavior]"
- "Where [feature], the [system] shall [behavior]"
- "If [condition], then the [system] shall [response]"Implicit Requirement Indicators:
text
- "It is important that..."
- "We need..."
- "The goal is to..."
- "Users expect..."
- "Performance should..."应用需求检测模式:
显式需求标记:
text
- "The system shall..."
- "The system must..."
- "Users should be able to..."
- "REQ-XXX:"
- Numbered requirements (1.1, 1.2, etc.)EARS模式:
text
- "When [trigger], the [system] shall [response]"
- "While [state], the [system] shall [behavior]"
- "Where [feature], the [system] shall [behavior]"
- "If [condition], then the [system] shall [response]"隐式需求指标:
text
- "It is important that..."
- "We need..."
- "The goal is to..."
- "Users expect..."
- "Performance should..."Step 3: Requirement Extraction
步骤3:需求提取
For each identified requirement:
yaml
extracted_requirement:
id: REQ-{sequence}
text: "{cleaned requirement statement}"
source: document
source_file: "{file path}"
source_location: "{section/page/line}"
original_text: "{exact text from document}"
type: functional|non-functional|constraint|assumption
confidence: high|medium|low
extraction_method: explicit|pattern|inferred
needs_review: true|false
review_notes: "{why review needed}"针对每个识别出的需求:
yaml
extracted_requirement:
id: REQ-{sequence}
text: "{cleaned requirement statement}"
source: document
source_file: "{file path}"
source_location: "{section/page/line}"
original_text: "{exact text from document}"
type: functional|non-functional|constraint|assumption
confidence: high|medium|low
extraction_method: explicit|pattern|inferred
needs_review: true|false
review_notes: "{why review needed}"Step 4: Categorization
步骤4:分类
Categorize extracted requirements:
yaml
categories:
functional:
- features
- behaviors
- interactions
non_functional:
- performance
- security
- usability
- reliability
- scalability
constraints:
- technical
- business
- regulatory
assumptions:
- environmental
- user_behavior
- dependencies对提取的需求进行分类:
yaml
categories:
functional:
- features
- behaviors
- interactions
non_functional:
- performance
- security
- usability
- reliability
- scalability
constraints:
- technical
- business
- regulatory
assumptions:
- environmental
- user_behavior
- dependenciesStep 5: Deduplication
步骤5:去重
Identify and merge duplicate requirements:
yaml
deduplication:
strategy: semantic_similarity
threshold: 0.8
action: merge|flag_for_review
merged_requirements:
- id: REQ-merged-001
sources: [REQ-001, REQ-015]
text: "{consolidated requirement}"识别并合并重复需求:
yaml
deduplication:
strategy: semantic_similarity
threshold: 0.8
action: merge|flag_for_review
merged_requirements:
- id: REQ-merged-001
sources: [REQ-001, REQ-015]
text: "{consolidated requirement}"Document-Specific Strategies
特定文档策略
Meeting Transcripts
会议记录
yaml
transcript_extraction:
focus_on:
- Action items
- Decisions made
- Requirements discussed
- Concerns raised
patterns:
- "We decided that..."
- "The requirement is..."
- "Action item:"
- "TODO:"
- "Need to..."
speaker_context:
- Note who said what
- Weight by speaker roleyaml
transcript_extraction:
focus_on:
- Action items
- Decisions made
- Requirements discussed
- Concerns raised
patterns:
- "We decided that..."
- "The requirement is..."
- "Action item:"
- "TODO:"
- "Need to..."
speaker_context:
- Note who said what
- Weight by speaker roleRegulatory Documents
监管文档
yaml
regulatory_extraction:
focus_on:
- Mandatory requirements ("shall", "must")
- Prohibited actions ("shall not", "must not")
- Conditional requirements ("if...then")
compliance_mapping:
- Reference section numbers
- Note effective dates
- Track version/revisionyaml
regulatory_extraction:
focus_on:
- Mandatory requirements ("shall", "must")
- Prohibited actions ("shall not", "must not")
- Conditional requirements ("if...then")
compliance_mapping:
- Reference section numbers
- Note effective dates
- Track version/revisionCompetitor Analysis
竞品分析
yaml
competitor_extraction:
focus_on:
- Feature descriptions
- User capabilities
- Unique selling points
output:
- Feature requirements
- Differentiation opportunities
- Gap identification
confidence: low # Based on external observationyaml
competitor_extraction:
focus_on:
- Feature descriptions
- User capabilities
- Unique selling points
output:
- Feature requirements
- Differentiation opportunities
- Gap identification
confidence: low # Based on external observationLegacy Specifications
遗留规格说明书
yaml
legacy_extraction:
focus_on:
- Existing requirements
- System behaviors
- Integration points
modernization:
- Update terminology
- Convert to EARS format
- Flag deprecated requirementsyaml
legacy_extraction:
focus_on:
- Existing requirements
- System behaviors
- Integration points
modernization:
- Update terminology
- Convert to EARS format
- Flag deprecated requirementsOutput Format
输出格式
Per-Document Output
单文档输出
yaml
extraction_result:
source:
file: "{path or URL}"
type: "{document type}"
extraction_date: "{ISO-8601}"
confidence: high|medium|low
statistics:
total_candidates: {number}
extracted: {number}
filtered: {number}
needs_review: {number}
requirements:
- id: REQ-{number}
text: "{requirement}"
type: functional|non-functional|constraint
source_location: "{section/page}"
confidence: high|medium|low
original_text: "{exact source text}"
review_items:
- requirement_id: REQ-{number}
reason: "{why review needed}"
suggestion: "{proposed action}"
metadata:
sections_processed: {number}
extraction_patterns_used: ["{pattern names}"]yaml
extraction_result:
source:
file: "{path or URL}"
type: "{document type}"
extraction_date: "{ISO-8601}"
confidence: high|medium|low
statistics:
total_candidates: {number}
extracted: {number}
filtered: {number}
needs_review: {number}
requirements:
- id: REQ-{number}
text: "{requirement}"
type: functional|non-functional|constraint
source_location: "{section/page}"
confidence: high|medium|low
original_text: "{exact source text}"
review_items:
- requirement_id: REQ-{number}
reason: "{why review needed}"
suggestion: "{proposed action}"
metadata:
sections_processed: {number}
extraction_patterns_used: ["{pattern names}"]Autonomy Levels
自主级别
Guided Mode
引导模式
yaml
guided_behavior:
document_selection: Human selects
extraction_strategy: AI suggests, human approves
each_requirement: AI highlights, human confirms
categorization: AI suggests, human validatesyaml
guided_behavior:
document_selection: Human selects
extraction_strategy: AI suggests, human approves
each_requirement: AI highlights, human confirms
categorization: AI suggests, human validatesSemi-Autonomous Mode
半自主模式
yaml
semi_auto_behavior:
document_selection: AI suggests priority, human approves list
extraction_strategy: AI chooses autonomously
requirements: AI extracts all, human reviews in batches
categorization: AI categorizes, human spot-checksyaml
semi_auto_behavior:
document_selection: AI suggests priority, human approves list
extraction_strategy: AI chooses autonomously
requirements: AI extracts all, human reviews in batches
categorization: AI categorizes, human spot-checksFully Autonomous Mode
完全自主模式
yaml
full_auto_behavior:
document_selection: AI processes all relevant
extraction_strategy: AI optimizes per document
requirements: AI extracts, deduplicates, categorizes
output: Full extraction report for final reviewyaml
full_auto_behavior:
document_selection: AI processes all relevant
extraction_strategy: AI optimizes per document
requirements: AI extracts, deduplicates, categorizes
output: Full extraction report for final reviewQuality Indicators
质量指标
High Confidence Extraction
高置信度提取
- Explicit requirement markers ("shall", "must")
- EARS-pattern matches
- Numbered requirement lists
- Clear imperative statements
- 显式需求标记("shall"、"must")
- EARS模式匹配
- 编号需求列表
- 清晰的命令式语句
Medium Confidence Extraction
中等置信度提取
- Implicit indicators ("should", "needs to")
- Context-dependent interpretation
- Partial pattern matches
- Requires domain knowledge
- 隐式指标("should"、"needs to")
- 依赖上下文的解读
- 部分模式匹配
- 需要领域知识
Low Confidence Extraction
低置信度提取
- Inferred from descriptions
- Narrative text interpretation
- Competitive analysis
- Assumptions based on context
- 从描述中推断
- 叙述性文本解读
- 竞品分析
- 基于上下文的假设
Delegation
任务委派
For related tasks, delegate to:
- gap-analysis: Check extracted requirements for completeness
- domain-research: Research unfamiliar terms or concepts
- elicitation-methodology: Route back for technique selection
对于相关任务,可委派给:
- gap-analysis:检查提取的需求是否完整
- domain-research:研究不熟悉的术语或概念
- elicitation-methodology:返回以选择合适的技术
Output Location
输出位置
Save extraction results to:
text
.requirements/{domain}/documents/DOC-{filename}-{timestamp}.yaml将提取结果保存至:
text
.requirements/{domain}/documents/DOC-{filename}-{timestamp}.yamlRelated
相关Skill
- - Parent hub skill
elicitation-methodology - - Post-extraction completeness checking
gap-analysis - - Clarify extracted requirements with stakeholders
interview-conducting
- - 父级核心Skill
elicitation-methodology - - 提取后完整性检查
gap-analysis - - 与利益相关者澄清提取的需求
interview-conducting