sec-edgar-pipeline

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

SEC EDGAR Pipeline

SEC EDGAR 数据处理流水线

Overview

概述

This pipeline is centered on

edgar-analyzer

and the EDGAR data sources. The core loop is: configure credentials, create a project with examples, analyze patterns, generate code, run extraction, and export reports.

该流水线围绕

edgar-analyzer

和EDGAR数据源构建。核心流程为：配置凭证、创建带示例的项目、分析模式、生成代码、运行提取任务以及导出报告。

Setup (Keys + User Agent)

配置设置（密钥 + 用户代理）

Use the setup wizard to configure required keys:

bash

python -m edgar_analyzer setup

使用设置向导配置所需密钥：

bash

python -m edgar_analyzer setup

or

或

edgar-analyzer setup


Required entries:

- `OPENROUTER_API_KEY`
- (Optional) `JINA_API_KEY`
- `EDGAR` user agent string ("Name email@example.com")

edgar-analyzer setup


必填项：

- `OPENROUTER_API_KEY`
-（可选）`JINA_API_KEY`
- `EDGAR`用户代理字符串（格式为“姓名 email@example.com”）

End-to-End CLI Workflow

端到端CLI工作流

bash

undefined

bash

undefined

1. Create project

1. 创建项目

edgar-analyzer project create my_project --template minimal

2. Add examples + project.yaml

2. 添加示例文件 + 配置project.yaml

projects/my_project/examples/*.json

路径：projects/my_project/examples/*.json

3. Analyze examples

3. 分析示例文件

edgar-analyzer analyze-project projects/my_project

4. Generate extraction code

4. 生成提取代码

edgar-analyzer generate-code projects/my_project

5. Run extraction

5. 运行提取任务

edgar-analyzer run-extraction projects/my_project --output-format csv


Outputs land in `projects/<name>/output/`.

edgar-analyzer run-extraction projects/my_project --output-format csv


输出文件将保存至`projects/<name>/output/`目录下。

EDGAR-Specific Conventions

EDGAR专属规范

CIK values are 10-digit, zero-padded (e.g.,
```
0000320193
```
).
Rate limit: SEC API allows 10 requests/sec. Scripts use ~0.11s delays.
User agent is mandatory; include name + email.

CIK值为10位数字，不足位数需补零（例如：
```
0000320193
```
）。
速率限制：SEC API允许每秒10次请求，脚本默认设置约0.11秒的请求延迟。
用户代理为必填项，需包含姓名+邮箱信息。

Scripted Example (Apple DEF 14A)

脚本示例（苹果公司DEF 14A文件）

edgar/scripts/fetch_apple_def14a.py

shows the direct flow:

Fetch latest DEF 14A metadata
Download HTML
Parse Summary Compensation Table (SCT)
Save raw HTML + extracted JSON + ground truth

edgar/scripts/fetch_apple_def14a.py

展示了直接处理流程：

获取最新DEF 14A文件元数据
下载HTML文件
解析摘要薪酬表（SCT）
保存原始HTML文件 + 提取后的JSON数据 + 基准真值数据

Recipe-Driven Extraction

基于规则的提取

edgar/recipes/sct_extraction/config.yaml

defines a multi-step pipeline:

Fetch DEF 14A filings by company list
Extract SCT tables with
```
SCTAdapter
```
Validate with
```
sct_validator
```
Write results to
```
output/sct
```

edgar/recipes/sct_extraction/config.yaml

定义了多步骤流水线：

按公司列表获取DEF 14A申报文件
使用
```
SCTAdapter
```
提取SCT表格数据
通过
```
sct_validator
```
验证数据
将结果写入
```
output/sct
```
目录

Report Generation

报告生成

edgar/scripts/create_csv_reports.py

converts JSON results into:

```
executive_compensation_<timestamp>.csv
```
```
top_25_executives_<timestamp>.csv
```
```
company_summary_<timestamp>.csv
```

edgar/scripts/create_csv_reports.py

可将JSON结果转换为以下格式的报告：

```
executive_compensation_<timestamp>.csv
```
```
top_25_executives_<timestamp>.csv
```
```
company_summary_<timestamp>.csv
```

Troubleshooting

故障排查

No filings found: confirm CIK formatting and filing type (DEF 14A vs DEF 14A/A).
API errors: slow down requests and confirm user-agent is set.
Extraction errors: regenerate code or use manual ground truth in POC scripts.

未找到申报文件：确认CIK格式是否正确，以及申报文件类型是否匹配（区分DEF 14A与DEF 14A/A）。
API调用错误：降低请求速率，确认用户代理信息已正确设置。
提取错误：重新生成提取代码，或在POC脚本中使用手动配置的基准真值数据。

sec-edgar-pipeline

Original

Translation

SEC EDGAR Pipeline

SEC EDGAR 数据处理流水线

Overview

概述

Setup (Keys + User Agent)

配置设置（密钥 + 用户代理）

or

或

End-to-End CLI Workflow

端到端CLI工作流

1. Create project

1. 创建项目

2. Add examples + project.yaml

2. 添加示例文件 + 配置project.yaml

projects/my_project/examples/*.json

路径：projects/my_project/examples/*.json

3. Analyze examples

3. 分析示例文件

4. Generate extraction code

4. 生成提取代码

5. Run extraction

5. 运行提取任务

EDGAR-Specific Conventions

EDGAR专属规范

Scripted Example (Apple DEF 14A)

脚本示例（苹果公司DEF 14A文件）

Recipe-Driven Extraction

基于规则的提取

Report Generation

报告生成

Troubleshooting

故障排查

Related Skills

相关技能