2026-legal-research-agent

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

2026 Legal Research Agent

2026 法律研究Agent



When to Use This Skill

何时使用该Skill

Use this skill when you need to:
  • Find authoritative legal sources for a specific state's expungement laws
  • Configure Firecrawl jobs to scrape court systems, legislatures, or legal aid sites
  • Validate scraped data for accuracy and completeness
  • Research 2026 law changes including Clean Slate acts and marijuana expungement
  • Build URL patterns for systematic state-by-state data collection
  • Identify gaps in existing scraped data coverage
Do NOT use this skill for:
  • Interpreting what laws mean (use
    national-expungement-expert
    )
  • Building user interfaces or components
  • Providing legal advice to users
  • General web scraping unrelated to legal data

使用该Skill的场景:
  • 查找特定州expungement法律的权威来源
  • 配置Firecrawl任务以抓取法院系统、立法机构或法律援助网站的数据
  • 验证抓取数据的准确性和完整性
  • 研究2026年法律变更,包括Clean Slate法案和大麻相关expungement规定
  • 构建URL模式以实现系统化的逐州数据收集
  • 识别现有抓取数据覆盖范围的缺口
请勿将该Skill用于:
  • 解读法律含义(请使用
    national-expungement-expert
  • 构建用户界面或组件
  • 为用户提供法律咨询
  • 与法律数据无关的通用网页抓取

Core Instructions

核心指引

1. Authoritative Source Hierarchy

1. 权威来源优先级

When researching expungement laws, prioritize sources in this order:
Tier 1 (Primary Authority):
├── State Legislature websites (statute text)
├── State Court Administrative Office
└── State Attorney General publications

Tier 2 (Official Secondary):
├── State Bar Association guides
├── Court self-help centers
└── Public law databases (public.law, justia.com)

Tier 3 (Tertiary but Valuable):
├── Legal aid organizations (LSC grantees)
├── Law school clinics
└── Reentry organizations (CCRC, NACDL)

Tier 4 (Verification Only):
├── Commercial legal databases
├── News articles about law changes
└── Attorney blog posts
Shibboleth: A novice scrapes the first Google result. An expert knows that
courts.{state}.gov
contains the self-help forms while
legislature.{state}.gov
contains the statute text—and both are needed.
研究expungement法律时,请按以下优先级选择来源:
Tier 1 (Primary Authority):
├── State Legislature websites (statute text)
├── State Court Administrative Office
└── State Attorney General publications

Tier 2 (Official Secondary):
├── State Bar Association guides
├── Court self-help centers
└── Public law databases (public.law, justia.com)

Tier 3 (Tertiary but Valuable):
├── Legal aid organizations (LSC grantees)
├── Law school clinics
└── Reentry organizations (CCRC, NACDL)

Tier 4 (Verification Only):
├── Commercial legal databases
├── News articles about law changes
└── Attorney blog posts
关键提示:新手会抓取谷歌搜索的第一条结果,而专家知道
courts.{state}.gov
包含自助表格,
legislature.{state}.gov
包含法规文本——两者都不可或缺。

2. URL Pattern Knowledge by State Type

2. 按州类型分类的URL模式知识

States organize their legal resources differently. Know the patterns:
Unified Court Systems (courts own everything):
California: courts.ca.gov/selfhelp-expungement.htm
Oregon: courts.oregon.gov/programs/exp/Pages/default.aspx
Washington: courts.wa.gov/forms/?fa=forms.contribute&formID=101
Split Systems (legislature + court separate):
Texas: txcourts.gov (forms) + texas.public.law (statutes)
New York: nycourts.gov (forms) + nysenate.gov/legislation/laws (statutes)
Florida: flcourts.gov (forms) + leg.state.fl.us/statutes (statutes)
Public.law States (excellent statute hosting):
oregon.public.law, california.public.law, texas.public.law
michigan.public.law, washington.public.law
Shibboleth: Knowing that
apps.leg.wa.gov/RCW/
is Washington's statute database while
leg.wa.gov
is the general legislature site—the RCW subdomain is where the actual law text lives.
各州的法律资源组织方式不同,请了解以下模式:
统一法院系统(法院管辖所有资源):
California: courts.ca.gov/selfhelp-expungement.htm
Oregon: courts.oregon.gov/programs/exp/Pages/default.aspx
Washington: courts.wa.gov/forms/?fa=forms.contribute&formID=101
分立系统(立法机构与法院分离):
Texas: txcourts.gov (forms) + texas.public.law (statutes)
New York: nycourts.gov (forms) + nysenate.gov/legislation/laws (statutes)
Florida: flcourts.gov (forms) + leg.state.fl.us/statutes (statutes)
Public.law覆盖州(法规托管服务优质):
oregon.public.law, california.public.law, texas.public.law
michigan.public.law, washington.public.law
关键提示:要知道
apps.leg.wa.gov/RCW/
是华盛顿州的法规数据库,而
leg.wa.gov
是普通立法网站——RCW子域名才是实际法律文本的存储位置。

3. 2026 Legal Landscape Awareness

3. 2026年法律态势认知

As of 2026, these major changes affect research:
Clean Slate States (automatic expungement passed):
  • Pennsylvania (2018), Utah (2019), New Jersey (2019), Michigan (2020)
  • California (2020), Connecticut (2021), Delaware (2021), Virginia (2021)
  • Oklahoma (2022), Colorado (2022), New York (2023), Minnesota (2023)
  • Maryland (2024), Illinois (2024), Oregon (2025)
Marijuana Expungement (specific statutes):
  • Most states now have separate marijuana expungement provisions
  • Search for "cannabis conviction" alongside "expungement"
  • Check for retroactive application dates
2025-2026 Law Changes to Verify:
  • Oregon HB 2316 (expanded eligibility)
  • California AB 1076 (automatic relief expansion)
  • Check CCRC's Restoration of Rights Project for current status
Shibboleth: Knowing that "automatic expungement" doesn't mean immediate—Pennsylvania's Clean Slate has a 10-year waiting period for arrests and varies by offense. Research must capture these nuances.
截至2026年,以下重大变更会影响研究工作:
Clean Slate法案生效州(已通过自动expungement规定):
  • 宾夕法尼亚州(2018)、犹他州(2019)、新泽西州(2019)、密歇根州(2020)
  • 加利福尼亚州(2020)、康涅狄格州(2021)、特拉华州(2021)、弗吉尼亚州(2021)
  • 俄克拉荷马州(2022)、科罗拉多州(2022)、纽约州(2023)、明尼苏达州(2023)
  • 马里兰州(2024)、伊利诺伊州(2024)、俄勒冈州(2025)
大麻相关Expungement(专项法规):
  • 大多数州现在都有独立的大麻相关expungement条款
  • 同时搜索“cannabis conviction”和“expungement”
  • 核查追溯适用日期
需核实的2025-2026年法律变更
  • 俄勒冈州HB 2316(扩大资格范围)
  • 加利福尼亚州AB 1076(扩大自动救济范围)
  • 查看CCRC的权利恢复项目获取最新状态
关键提示:要知道“自动expungement”并不意味着立即生效——宾夕法尼亚州的Clean Slate法案对逮捕记录设有10年等待期,且因罪名不同而有差异。研究必须捕捉这些细节。

4. Firecrawl Configuration Expertise

4. Firecrawl配置专业知识

When configuring scrape jobs:
Extraction Schema Design:
typescript
// For statute pages, extract:
{
  statuteCitation: "string",   // e.g., "ORS 137.225"
  title: "string",             // e.g., "Setting aside conviction"
  fullText: "string",          // Complete statute text
  effectiveDate: "string",     // When current version took effect
  lastAmended: "string",       // Most recent amendment date
  subsections: "array",        // Parsed subsections
}

// For court self-help pages, extract:
{
  stateName: "string",
  expungementPageUrl: "string",
  formsLibraryUrl: "string",
  selfHelpUrl: "string",
  contactPhone: "string",
  feeScheduleUrl: "string",
}

// For forms, extract:
{
  formNumber: "string",        // e.g., "MC-440"
  formTitle: "string",
  pdfUrl: "string",
  applicableTo: "array",       // ["misdemeanor", "arrest"]
  lastUpdated: "string",
}
Rate Limiting for Government Sites:
typescript
rateLimit: 2,  // 2 requests/second max for .gov sites
timeout: 90000,  // Government sites can be slow
maxRetries: 3,  // Retry on timeout
waitFor: 3000,  // Wait for JavaScript on modern court sites
Shibboleth: Knowing to set
onlyMainContent: true
for statute pages (to skip navigation chrome) but
onlyMainContent: false
for forms pages (where the form links are often in sidebars).
配置抓取任务时:
提取 schema 设计
typescript
// For statute pages, extract:
{
  statuteCitation: "string",   // e.g., "ORS 137.225"
  title: "string",             // e.g., "Setting aside conviction"
  fullText: "string",          // Complete statute text
  effectiveDate: "string",     // When current version took effect
  lastAmended: "string",       // Most recent amendment date
  subsections: "array",        // Parsed subsections
}

// For court self-help pages, extract:
{
  stateName: "string",
  expungementPageUrl: "string",
  formsLibraryUrl: "string",
  selfHelpUrl: "string",
  contactPhone: "string",
  feeScheduleUrl: "string",
}

// For forms, extract:
{
  formNumber: "string",        // e.g., "MC-440"
  formTitle: "string",
  pdfUrl: "string",
  applicableTo: "array",       // ["misdemeanor", "arrest"]
  lastUpdated: "string",
}
政府网站的速率限制
typescript
rateLimit: 2,  // 2 requests/second max for .gov sites
timeout: 90000,  // Government sites can be slow
maxRetries: 3,  // Retry on timeout
waitFor: 3000,  // Wait for JavaScript on modern court sites
关键提示:要知道为法规页面设置
onlyMainContent: true
(以跳过导航栏内容),而为表格页面设置
onlyMainContent: false
(因为表格链接通常位于侧边栏)。

5. Data Validation Checklist

5. 数据验证清单

After scraping, validate:
□ Statute citations match official format (e.g., "ORS" not "Or. Rev. Stat.")
□ Effective dates are parseable and reasonable (not future, not too old)
□ URLs are live and return 200 status
□ PDF form links actually download PDFs (not HTML error pages)
□ Phone numbers are in consistent format
□ Fee amounts are numeric and reasonable ($0-$500 typical range)
□ State code extracted correctly (watch for ambiguous URLs)
Common Extraction Errors:
  • "oregon.public.law" matching "la" (Louisiana) instead of "or" (Oregon)
  • Statute text truncated at 10,000 characters (increase limit)
  • Form "last updated" dates in inconsistent formats
  • County-specific URLs mistaken for state-level
抓取完成后,请验证以下内容:
□ Statute citations match official format (e.g., "ORS" not "Or. Rev. Stat.")
□ Effective dates are parseable and reasonable (not future, not too old)
□ URLs are live and return 200 status
□ PDF form links actually download PDFs (not HTML error pages)
□ Phone numbers are in consistent format
□ Fee amounts are numeric and reasonable ($0-$500 typical range)
□ State code extracted correctly (watch for ambiguous URLs)
常见提取错误
  • "oregon.public.law"被匹配为“la”(路易斯安那州)而非“or”(俄勒冈州)
  • 法规文本被截断为10000字符(需提高限制)
  • 表格的“最后更新”日期格式不一致
  • 特定县的URL被误认为是州级URL

6. Gap Analysis Process

6. 缺口分析流程

To identify missing data for a state:
bash
undefined
要识别某州的缺失数据:
bash
undefined

Check what we have

Check what we have

ls src/data/scraped/states/{state}/
ls src/data/scraped/states/{state}/

Expected files for complete coverage:

Expected files for complete coverage:

- statutes.json (eligibility rules from statute text)

- statutes.json (eligibility rules from statute text)

- court-system.json (court URLs, contacts, forms links)

- court-system.json (court URLs, contacts, forms links)

- forms/ (actual PDF forms)

- forms/ (actual PDF forms)

- fees.json (filing fee amounts)

- fees.json (filing fee amounts)

- counties/ (county-specific court data)

- counties/ (county-specific court data)

Cross-reference with state data file

Cross-reference with state data file

grep -l "waitingPeriods|eligibilityRules" src/data/states/{state}.ts

**Priority order for filling gaps**:
1. Statutes (foundation for all rules)
2. Court forms (what users actually need to file)
3. Fee information (users need to budget)
4. County contacts (where to file)
5. Spanish resources (accessibility)

---
grep -l "waitingPeriods|eligibilityRules" src/data/states/{state}.ts

**填补缺口的优先级**:
1. 法规(所有规则的基础)
2. 法院表格(用户实际需要提交的文件)
3. 费用信息(用户需要规划预算)
4. 县联系方式(提交地点)
5. 西班牙语资源(无障碍访问)

---

Anti-Patterns

反模式

Never Do These:

绝对禁止的操作:

  1. Scrape without rate limiting - Government sites will block you
  2. Trust secondary sources for statute text - Always verify against primary
  3. Assume URL patterns are consistent - Each state is different
  4. Ignore effective dates - Laws change; scraped data needs timestamps
  5. Scrape county sites without state context - County rules supplement, not replace, state law
  6. Skip the self-help sections - Often have the clearest eligibility summaries
  7. Treat all states the same - Clean Slate states have fundamentally different processes
  1. 无速率限制的抓取 - 政府网站会封禁你的访问
  2. 依赖二级来源获取法规文本 - 始终需通过一级来源验证
  3. 假设URL模式一致 - 每个州的情况都不同
  4. 忽略生效日期 - 法律会变更,抓取的数据需要时间戳
  5. 脱离州级背景抓取县级网站 - 县级规则是对州级法律的补充,而非替代
  6. 跳过自助板块 - 这些板块通常包含最清晰的资格总结
  7. 对所有州一视同仁 - 实施Clean Slate法案的州流程完全不同

Common Mistakes:

常见错误:

❌ Scraping Wikipedia for statute text
✅ Scraping the state legislature's official code

❌ Using findlaw.com as primary source
✅ Using findlaw.com to find the citation, then scraping the official source

❌ Assuming "expungement" is the only term
✅ Searching for: expungement, sealing, set-aside, dismissal, destruction, pardons

❌ Treating waiting periods as simple numbers
✅ Capturing offense-specific waiting periods (felonies vs misdemeanors vs arrests)

❌ 从维基百科抓取法规文本
✅ 从州立法机构的官方法典抓取

❌ 将findlaw.com作为一级来源
✅ 使用findlaw.com查找引用,再从官方来源抓取

❌ 假设“expungement”是唯一术语
✅ 搜索以下术语:expungement, sealing, set-aside, dismissal, destruction, pardons

❌ 将等待期视为简单数字
✅ 记录按罪名划分的等待期(重罪、轻罪、逮捕记录)

Project-Specific Context

项目特定背景

This skill is designed for the National Expungement Guide project:
该Skill专为National Expungement Guide项目设计:

Existing Infrastructure

现有基础设施

  • Firecrawl scripts:
    scripts/firecrawl/
  • Job definitions:
    scripts/firecrawl/jobs.ts
    (P0-P4 priority jobs)
  • URL config:
    scripts/firecrawl/config.ts
    (all 50 states)
  • Output path:
    src/data/scraped/states/{state}/
  • State data:
    src/data/states/
    (TypeScript files per state)
  • Firecrawl脚本
    scripts/firecrawl/
  • 任务定义
    scripts/firecrawl/jobs.ts
    (P0-P4优先级任务)
  • URL配置
    scripts/firecrawl/config.ts
    (覆盖全部50个州)
  • 输出路径
    src/data/scraped/states/{state}/
  • 州数据
    src/data/states/
    (每个州对应一个TypeScript文件)

Running Scrapes

运行抓取任务

bash
undefined
bash
undefined

Set API key first

Set API key first

export FIRECRAWL_API_KEY=your_key
export FIRECRAWL_API_KEY=your_key

Run P0 (state statutes + courts) - ~$0.20 cost

Run P0 (state statutes + courts) - ~$0.20 cost

npx tsx scripts/firecrawl/run-p0.ts
npx tsx scripts/firecrawl/run-p0.ts

Dry run to preview

Dry run to preview

npx tsx scripts/firecrawl/run-p0.ts --dry-run
npx tsx scripts/firecrawl/run-p0.ts --dry-run

Check reports

Check reports

cat scripts/firecrawl/reports/p0-*.json
undefined
cat scripts/firecrawl/reports/p0-*.json
undefined

Data Flow

数据流程

Firecrawl scrape → src/data/scraped/{state}/*.json
Manual review + cleanup
Integrated into src/data/states/{state}.ts
Used by eligibility wizard + PDF generator

Firecrawl抓取 → src/data/scraped/{state}/*.json
人工审核与清理
整合到src/data/states/{state}.ts
用于资格向导 + PDF生成器

References

参考资料

See
references/
folder for:
  • url-patterns-by-state.md
    - Complete URL patterns for all 50 states
  • clean-slate-timeline.md
    - When each Clean Slate law passed and took effect
  • firecrawl-schemas.md
    - All extraction schemas used

请查看
references/
文件夹获取:
  • url-patterns-by-state.md
    - 全部50个州的完整URL模式
  • clean-slate-timeline.md
    - 各Clean Slate法案的通过与生效时间
  • firecrawl-schemas.md
    - 使用的所有提取schema

Example Workflow

示例工作流

User request: "Research California's 2026 expungement laws and scrape the latest data"
Agent workflow:
  1. Check existing data:
    ls src/data/scraped/states/ca/
  2. Verify current statute version at
    california.public.law
  3. Check for 2025-2026 law changes via CCRC or news search
  4. Update
    scripts/firecrawl/config.ts
    if URLs changed
  5. Run targeted scrape: add CA-specific URLs to P0 job
  6. Validate extracted data against known statute citations
  7. Document any gaps or changes found
用户请求:"研究加利福尼亚州2026年的expungement法律并抓取最新数据"
Agent工作流
  1. 检查现有数据:
    ls src/data/scraped/states/ca/
  2. california.public.law
    验证当前法规版本
  3. 通过CCRC或新闻搜索检查2025-2026年的法律变更
  4. 若URL变更则更新
    scripts/firecrawl/config.ts
  5. 运行定向抓取:将加州特定URL添加到P0任务
  6. 根据已知法规引用验证提取的数据
  7. 记录发现的任何缺口或变更