2026-legal-research-agent

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

2026 Legal Research Agent

2026 法律研究Agent

When to Use This Skill

何时使用该Skill

Use this skill when you need to:

Find authoritative legal sources for a specific state's expungement laws
Configure Firecrawl jobs to scrape court systems, legislatures, or legal aid sites
Validate scraped data for accuracy and completeness
Research 2026 law changes including Clean Slate acts and marijuana expungement
Build URL patterns for systematic state-by-state data collection
Identify gaps in existing scraped data coverage

Do NOT use this skill for:

Interpreting what laws mean (use
```
national-expungement-expert
```
)
Building user interfaces or components
Providing legal advice to users
General web scraping unrelated to legal data

使用该Skill的场景：

查找特定州expungement法律的权威来源
配置Firecrawl任务以抓取法院系统、立法机构或法律援助网站的数据
验证抓取数据的准确性和完整性
研究2026年法律变更，包括Clean Slate法案和大麻相关expungement规定
构建URL模式以实现系统化的逐州数据收集
识别现有抓取数据覆盖范围的缺口

请勿将该Skill用于：

解读法律含义（请使用
```
national-expungement-expert
```
）
构建用户界面或组件
为用户提供法律咨询
与法律数据无关的通用网页抓取

Core Instructions

核心指引

1. Authoritative Source Hierarchy

1. 权威来源优先级

When researching expungement laws, prioritize sources in this order:

Tier 1 (Primary Authority):
├── State Legislature websites (statute text)
├── State Court Administrative Office
└── State Attorney General publications

Tier 2 (Official Secondary):
├── State Bar Association guides
├── Court self-help centers
└── Public law databases (public.law, justia.com)

Tier 3 (Tertiary but Valuable):
├── Legal aid organizations (LSC grantees)
├── Law school clinics
└── Reentry organizations (CCRC, NACDL)

Tier 4 (Verification Only):
├── Commercial legal databases
├── News articles about law changes
└── Attorney blog posts

Shibboleth: A novice scrapes the first Google result. An expert knows that

courts.{state}.gov

contains the self-help forms while

legislature.{state}.gov

contains the statute text—and both are needed.

研究expungement法律时，请按以下优先级选择来源：

Tier 1 (Primary Authority):
├── State Legislature websites (statute text)
├── State Court Administrative Office
└── State Attorney General publications

Tier 2 (Official Secondary):
├── State Bar Association guides
├── Court self-help centers
└── Public law databases (public.law, justia.com)

Tier 3 (Tertiary but Valuable):
├── Legal aid organizations (LSC grantees)
├── Law school clinics
└── Reentry organizations (CCRC, NACDL)

Tier 4 (Verification Only):
├── Commercial legal databases
├── News articles about law changes
└── Attorney blog posts

关键提示：新手会抓取谷歌搜索的第一条结果，而专家知道

courts.{state}.gov

包含自助表格，

legislature.{state}.gov

包含法规文本——两者都不可或缺。

2. URL Pattern Knowledge by State Type

2. 按州类型分类的URL模式知识

States organize their legal resources differently. Know the patterns:

Unified Court Systems (courts own everything):

California: courts.ca.gov/selfhelp-expungement.htm
Oregon: courts.oregon.gov/programs/exp/Pages/default.aspx
Washington: courts.wa.gov/forms/?fa=forms.contribute&formID=101

Split Systems (legislature + court separate):

Texas: txcourts.gov (forms) + texas.public.law (statutes)
New York: nycourts.gov (forms) + nysenate.gov/legislation/laws (statutes)
Florida: flcourts.gov (forms) + leg.state.fl.us/statutes (statutes)

Public.law States (excellent statute hosting):

oregon.public.law, california.public.law, texas.public.law
michigan.public.law, washington.public.law

Shibboleth: Knowing that

apps.leg.wa.gov/RCW/

is Washington's statute database while

leg.wa.gov

is the general legislature site—the RCW subdomain is where the actual law text lives.

各州的法律资源组织方式不同，请了解以下模式：

统一法院系统（法院管辖所有资源）：

California: courts.ca.gov/selfhelp-expungement.htm
Oregon: courts.oregon.gov/programs/exp/Pages/default.aspx
Washington: courts.wa.gov/forms/?fa=forms.contribute&formID=101

分立系统（立法机构与法院分离）：

Texas: txcourts.gov (forms) + texas.public.law (statutes)
New York: nycourts.gov (forms) + nysenate.gov/legislation/laws (statutes)
Florida: flcourts.gov (forms) + leg.state.fl.us/statutes (statutes)

Public.law覆盖州（法规托管服务优质）：

oregon.public.law, california.public.law, texas.public.law
michigan.public.law, washington.public.law

关键提示：要知道

apps.leg.wa.gov/RCW/

是华盛顿州的法规数据库，而

leg.wa.gov

是普通立法网站——RCW子域名才是实际法律文本的存储位置。

3. 2026 Legal Landscape Awareness

3. 2026年法律态势认知

As of 2026, these major changes affect research:

Clean Slate States (automatic expungement passed):

Pennsylvania (2018), Utah (2019), New Jersey (2019), Michigan (2020)
California (2020), Connecticut (2021), Delaware (2021), Virginia (2021)
Oklahoma (2022), Colorado (2022), New York (2023), Minnesota (2023)
Maryland (2024), Illinois (2024), Oregon (2025)

Marijuana Expungement (specific statutes):

Most states now have separate marijuana expungement provisions
Search for "cannabis conviction" alongside "expungement"
Check for retroactive application dates

2025-2026 Law Changes to Verify:

Oregon HB 2316 (expanded eligibility)
California AB 1076 (automatic relief expansion)
Check CCRC's Restoration of Rights Project for current status

Shibboleth: Knowing that "automatic expungement" doesn't mean immediate—Pennsylvania's Clean Slate has a 10-year waiting period for arrests and varies by offense. Research must capture these nuances.

截至2026年，以下重大变更会影响研究工作：

Clean Slate法案生效州（已通过自动expungement规定）：

宾夕法尼亚州（2018）、犹他州（2019）、新泽西州（2019）、密歇根州（2020）
加利福尼亚州（2020）、康涅狄格州（2021）、特拉华州（2021）、弗吉尼亚州（2021）
俄克拉荷马州（2022）、科罗拉多州（2022）、纽约州（2023）、明尼苏达州（2023）
马里兰州（2024）、伊利诺伊州（2024）、俄勒冈州（2025）

大麻相关Expungement（专项法规）：

大多数州现在都有独立的大麻相关expungement条款
同时搜索“cannabis conviction”和“expungement”
核查追溯适用日期

需核实的2025-2026年法律变更：

俄勒冈州HB 2316（扩大资格范围）
加利福尼亚州AB 1076（扩大自动救济范围）
查看CCRC的权利恢复项目获取最新状态

关键提示：要知道“自动expungement”并不意味着立即生效——宾夕法尼亚州的Clean Slate法案对逮捕记录设有10年等待期，且因罪名不同而有差异。研究必须捕捉这些细节。

4. Firecrawl Configuration Expertise

4. Firecrawl配置专业知识

When configuring scrape jobs:

Extraction Schema Design:

typescript

// For statute pages, extract:
{
  statuteCitation: "string",   // e.g., "ORS 137.225"
  title: "string",             // e.g., "Setting aside conviction"
  fullText: "string",          // Complete statute text
  effectiveDate: "string",     // When current version took effect
  lastAmended: "string",       // Most recent amendment date
  subsections: "array",        // Parsed subsections
}

// For court self-help pages, extract:
{
  stateName: "string",
  expungementPageUrl: "string",
  formsLibraryUrl: "string",
  selfHelpUrl: "string",
  contactPhone: "string",
  feeScheduleUrl: "string",
}

// For forms, extract:
{
  formNumber: "string",        // e.g., "MC-440"
  formTitle: "string",
  pdfUrl: "string",
  applicableTo: "array",       // ["misdemeanor", "arrest"]
  lastUpdated: "string",
}

Rate Limiting for Government Sites:

typescript

rateLimit: 2,  // 2 requests/second max for .gov sites
timeout: 90000,  // Government sites can be slow
maxRetries: 3,  // Retry on timeout
waitFor: 3000,  // Wait for JavaScript on modern court sites

Shibboleth: Knowing to set

onlyMainContent: true

for statute pages (to skip navigation chrome) but

onlyMainContent: false

for forms pages (where the form links are often in sidebars).

配置抓取任务时：

提取 schema 设计：

typescript

// For statute pages, extract:
{
  statuteCitation: "string",   // e.g., "ORS 137.225"
  title: "string",             // e.g., "Setting aside conviction"
  fullText: "string",          // Complete statute text
  effectiveDate: "string",     // When current version took effect
  lastAmended: "string",       // Most recent amendment date
  subsections: "array",        // Parsed subsections
}

// For court self-help pages, extract:
{
  stateName: "string",
  expungementPageUrl: "string",
  formsLibraryUrl: "string",
  selfHelpUrl: "string",
  contactPhone: "string",
  feeScheduleUrl: "string",
}

// For forms, extract:
{
  formNumber: "string",        // e.g., "MC-440"
  formTitle: "string",
  pdfUrl: "string",
  applicableTo: "array",       // ["misdemeanor", "arrest"]
  lastUpdated: "string",
}

政府网站的速率限制：

typescript

rateLimit: 2,  // 2 requests/second max for .gov sites
timeout: 90000,  // Government sites can be slow
maxRetries: 3,  // Retry on timeout
waitFor: 3000,  // Wait for JavaScript on modern court sites

关键提示：要知道为法规页面设置

onlyMainContent: true

（以跳过导航栏内容），而为表格页面设置

onlyMainContent: false

（因为表格链接通常位于侧边栏）。

5. Data Validation Checklist

5. 数据验证清单

After scraping, validate:

□ Statute citations match official format (e.g., "ORS" not "Or. Rev. Stat.")
□ Effective dates are parseable and reasonable (not future, not too old)
□ URLs are live and return 200 status
□ PDF form links actually download PDFs (not HTML error pages)
□ Phone numbers are in consistent format
□ Fee amounts are numeric and reasonable ($0-$500 typical range)
□ State code extracted correctly (watch for ambiguous URLs)

Common Extraction Errors:

"oregon.public.law" matching "la" (Louisiana) instead of "or" (Oregon)
Statute text truncated at 10,000 characters (increase limit)
Form "last updated" dates in inconsistent formats
County-specific URLs mistaken for state-level

抓取完成后，请验证以下内容：

□ Statute citations match official format (e.g., "ORS" not "Or. Rev. Stat.")
□ Effective dates are parseable and reasonable (not future, not too old)
□ URLs are live and return 200 status
□ PDF form links actually download PDFs (not HTML error pages)
□ Phone numbers are in consistent format
□ Fee amounts are numeric and reasonable ($0-$500 typical range)
□ State code extracted correctly (watch for ambiguous URLs)

常见提取错误：

"oregon.public.law"被匹配为“la”（路易斯安那州）而非“or”（俄勒冈州）
法规文本被截断为10000字符（需提高限制）
表格的“最后更新”日期格式不一致
特定县的URL被误认为是州级URL

6. Gap Analysis Process

6. 缺口分析流程

To identify missing data for a state:

bash

undefined

要识别某州的缺失数据：

bash

undefined

Check what we have

ls src/data/scraped/states/{state}/

Expected files for complete coverage:

- statutes.json (eligibility rules from statute text)

- court-system.json (court URLs, contacts, forms links)

- forms/ (actual PDF forms)

- fees.json (filing fee amounts)

- counties/ (county-specific court data)

Cross-reference with state data file

grep -l "waitingPeriods|eligibilityRules" src/data/states/{state}.ts


**Priority order for filling gaps**:
1. Statutes (foundation for all rules)
2. Court forms (what users actually need to file)
3. Fee information (users need to budget)
4. County contacts (where to file)
5. Spanish resources (accessibility)

---

grep -l "waitingPeriods|eligibilityRules" src/data/states/{state}.ts


**填补缺口的优先级**：
1. 法规（所有规则的基础）
2. 法院表格（用户实际需要提交的文件）
3. 费用信息（用户需要规划预算）
4. 县联系方式（提交地点）
5. 西班牙语资源（无障碍访问）

---

Anti-Patterns

反模式

Never Do These:

绝对禁止的操作：

Scrape without rate limiting - Government sites will block you
Trust secondary sources for statute text - Always verify against primary
Assume URL patterns are consistent - Each state is different
Ignore effective dates - Laws change; scraped data needs timestamps
Scrape county sites without state context - County rules supplement, not replace, state law
Skip the self-help sections - Often have the clearest eligibility summaries
Treat all states the same - Clean Slate states have fundamentally different processes

无速率限制的抓取 - 政府网站会封禁你的访问
依赖二级来源获取法规文本 - 始终需通过一级来源验证
假设URL模式一致 - 每个州的情况都不同
忽略生效日期 - 法律会变更，抓取的数据需要时间戳
脱离州级背景抓取县级网站 - 县级规则是对州级法律的补充，而非替代
跳过自助板块 - 这些板块通常包含最清晰的资格总结
对所有州一视同仁 - 实施Clean Slate法案的州流程完全不同

Common Mistakes:

常见错误：

❌ Scraping Wikipedia for statute text
✅ Scraping the state legislature's official code

❌ Using findlaw.com as primary source
✅ Using findlaw.com to find the citation, then scraping the official source

❌ Assuming "expungement" is the only term
✅ Searching for: expungement, sealing, set-aside, dismissal, destruction, pardons

❌ Treating waiting periods as simple numbers
✅ Capturing offense-specific waiting periods (felonies vs misdemeanors vs arrests)

❌ 从维基百科抓取法规文本
✅ 从州立法机构的官方法典抓取

❌ 将findlaw.com作为一级来源
✅ 使用findlaw.com查找引用，再从官方来源抓取

❌ 假设“expungement”是唯一术语
✅ 搜索以下术语：expungement, sealing, set-aside, dismissal, destruction, pardons

❌ 将等待期视为简单数字
✅ 记录按罪名划分的等待期（重罪、轻罪、逮捕记录）

Project-Specific Context

项目特定背景

This skill is designed for the National Expungement Guide project:

该Skill专为National Expungement Guide项目设计：

Existing Infrastructure

现有基础设施

Firecrawl scripts:
```
scripts/firecrawl/
```
Job definitions:
```
scripts/firecrawl/jobs.ts
```
(P0-P4 priority jobs)
URL config:
```
scripts/firecrawl/config.ts
```
(all 50 states)
Output path:
```
src/data/scraped/states/{state}/
```
State data:
```
src/data/states/
```
(TypeScript files per state)

Firecrawl脚本：
```
scripts/firecrawl/
```
任务定义：
```
scripts/firecrawl/jobs.ts
```
（P0-P4优先级任务）
URL配置：
```
scripts/firecrawl/config.ts
```
（覆盖全部50个州）
输出路径：
```
src/data/scraped/states/{state}/
```
州数据：
```
src/data/states/
```
（每个州对应一个TypeScript文件）

Running Scrapes

运行抓取任务

bash

undefined

bash

undefined

Set API key first

export FIRECRAWL_API_KEY=your_key

Run P0 (state statutes + courts) - ~$0.20 cost

npx tsx scripts/firecrawl/run-p0.ts

Dry run to preview

npx tsx scripts/firecrawl/run-p0.ts --dry-run

Check reports

cat scripts/firecrawl/reports/p0-*.json

undefined

cat scripts/firecrawl/reports/p0-*.json

undefined

Data Flow

数据流程

Firecrawl scrape → src/data/scraped/{state}/*.json
       ↓
Manual review + cleanup
       ↓
Integrated into src/data/states/{state}.ts
       ↓
Used by eligibility wizard + PDF generator

Firecrawl抓取 → src/data/scraped/{state}/*.json
       ↓
人工审核与清理
       ↓
整合到src/data/states/{state}.ts
       ↓
用于资格向导 + PDF生成器

References

参考资料

See

references/

folder for:

```
url-patterns-by-state.md
```
- Complete URL patterns for all 50 states
```
clean-slate-timeline.md
```
- When each Clean Slate law passed and took effect
```
firecrawl-schemas.md
```
- All extraction schemas used

请查看

references/

文件夹获取：

```
url-patterns-by-state.md
```
- 全部50个州的完整URL模式
```
clean-slate-timeline.md
```
- 各Clean Slate法案的通过与生效时间
```
firecrawl-schemas.md
```
- 使用的所有提取schema

Example Workflow

示例工作流

User request: "Research California's 2026 expungement laws and scrape the latest data"

Agent workflow:

Check existing data:
```
ls src/data/scraped/states/ca/
```
Verify current statute version at
```
california.public.law
```
Check for 2025-2026 law changes via CCRC or news search
Update
```
scripts/firecrawl/config.ts
```
if URLs changed
Run targeted scrape: add CA-specific URLs to P0 job
Validate extracted data against known statute citations
Document any gaps or changes found

用户请求："研究加利福尼亚州2026年的expungement法律并抓取最新数据"

Agent工作流：

检查现有数据：
```
ls src/data/scraped/states/ca/
```
在
```
california.public.law
```
验证当前法规版本
通过CCRC或新闻搜索检查2025-2026年的法律变更
若URL变更则更新
```
scripts/firecrawl/config.ts
```
运行定向抓取：将加州特定URL添加到P0任务
根据已知法规引用验证提取的数据
记录发现的任何缺口或变更