2026-legal-research-agent
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese2026 Legal Research Agent
2026 法律研究Agent
When to Use This Skill
何时使用该Skill
Use this skill when you need to:
- Find authoritative legal sources for a specific state's expungement laws
- Configure Firecrawl jobs to scrape court systems, legislatures, or legal aid sites
- Validate scraped data for accuracy and completeness
- Research 2026 law changes including Clean Slate acts and marijuana expungement
- Build URL patterns for systematic state-by-state data collection
- Identify gaps in existing scraped data coverage
Do NOT use this skill for:
- Interpreting what laws mean (use )
national-expungement-expert - Building user interfaces or components
- Providing legal advice to users
- General web scraping unrelated to legal data
使用该Skill的场景:
- 查找特定州expungement法律的权威来源
- 配置Firecrawl任务以抓取法院系统、立法机构或法律援助网站的数据
- 验证抓取数据的准确性和完整性
- 研究2026年法律变更,包括Clean Slate法案和大麻相关expungement规定
- 构建URL模式以实现系统化的逐州数据收集
- 识别现有抓取数据覆盖范围的缺口
请勿将该Skill用于:
- 解读法律含义(请使用)
national-expungement-expert - 构建用户界面或组件
- 为用户提供法律咨询
- 与法律数据无关的通用网页抓取
Core Instructions
核心指引
1. Authoritative Source Hierarchy
1. 权威来源优先级
When researching expungement laws, prioritize sources in this order:
Tier 1 (Primary Authority):
├── State Legislature websites (statute text)
├── State Court Administrative Office
└── State Attorney General publications
Tier 2 (Official Secondary):
├── State Bar Association guides
├── Court self-help centers
└── Public law databases (public.law, justia.com)
Tier 3 (Tertiary but Valuable):
├── Legal aid organizations (LSC grantees)
├── Law school clinics
└── Reentry organizations (CCRC, NACDL)
Tier 4 (Verification Only):
├── Commercial legal databases
├── News articles about law changes
└── Attorney blog postsShibboleth: A novice scrapes the first Google result. An expert knows that contains the self-help forms while contains the statute text—and both are needed.
courts.{state}.govlegislature.{state}.gov研究expungement法律时,请按以下优先级选择来源:
Tier 1 (Primary Authority):
├── State Legislature websites (statute text)
├── State Court Administrative Office
└── State Attorney General publications
Tier 2 (Official Secondary):
├── State Bar Association guides
├── Court self-help centers
└── Public law databases (public.law, justia.com)
Tier 3 (Tertiary but Valuable):
├── Legal aid organizations (LSC grantees)
├── Law school clinics
└── Reentry organizations (CCRC, NACDL)
Tier 4 (Verification Only):
├── Commercial legal databases
├── News articles about law changes
└── Attorney blog posts关键提示:新手会抓取谷歌搜索的第一条结果,而专家知道包含自助表格,包含法规文本——两者都不可或缺。
courts.{state}.govlegislature.{state}.gov2. URL Pattern Knowledge by State Type
2. 按州类型分类的URL模式知识
States organize their legal resources differently. Know the patterns:
Unified Court Systems (courts own everything):
California: courts.ca.gov/selfhelp-expungement.htm
Oregon: courts.oregon.gov/programs/exp/Pages/default.aspx
Washington: courts.wa.gov/forms/?fa=forms.contribute&formID=101Split Systems (legislature + court separate):
Texas: txcourts.gov (forms) + texas.public.law (statutes)
New York: nycourts.gov (forms) + nysenate.gov/legislation/laws (statutes)
Florida: flcourts.gov (forms) + leg.state.fl.us/statutes (statutes)Public.law States (excellent statute hosting):
oregon.public.law, california.public.law, texas.public.law
michigan.public.law, washington.public.lawShibboleth: Knowing that is Washington's statute database while is the general legislature site—the RCW subdomain is where the actual law text lives.
apps.leg.wa.gov/RCW/leg.wa.gov各州的法律资源组织方式不同,请了解以下模式:
统一法院系统(法院管辖所有资源):
California: courts.ca.gov/selfhelp-expungement.htm
Oregon: courts.oregon.gov/programs/exp/Pages/default.aspx
Washington: courts.wa.gov/forms/?fa=forms.contribute&formID=101分立系统(立法机构与法院分离):
Texas: txcourts.gov (forms) + texas.public.law (statutes)
New York: nycourts.gov (forms) + nysenate.gov/legislation/laws (statutes)
Florida: flcourts.gov (forms) + leg.state.fl.us/statutes (statutes)Public.law覆盖州(法规托管服务优质):
oregon.public.law, california.public.law, texas.public.law
michigan.public.law, washington.public.law关键提示:要知道是华盛顿州的法规数据库,而是普通立法网站——RCW子域名才是实际法律文本的存储位置。
apps.leg.wa.gov/RCW/leg.wa.gov3. 2026 Legal Landscape Awareness
3. 2026年法律态势认知
As of 2026, these major changes affect research:
Clean Slate States (automatic expungement passed):
- Pennsylvania (2018), Utah (2019), New Jersey (2019), Michigan (2020)
- California (2020), Connecticut (2021), Delaware (2021), Virginia (2021)
- Oklahoma (2022), Colorado (2022), New York (2023), Minnesota (2023)
- Maryland (2024), Illinois (2024), Oregon (2025)
Marijuana Expungement (specific statutes):
- Most states now have separate marijuana expungement provisions
- Search for "cannabis conviction" alongside "expungement"
- Check for retroactive application dates
2025-2026 Law Changes to Verify:
- Oregon HB 2316 (expanded eligibility)
- California AB 1076 (automatic relief expansion)
- Check CCRC's Restoration of Rights Project for current status
Shibboleth: Knowing that "automatic expungement" doesn't mean immediate—Pennsylvania's Clean Slate has a 10-year waiting period for arrests and varies by offense. Research must capture these nuances.
截至2026年,以下重大变更会影响研究工作:
Clean Slate法案生效州(已通过自动expungement规定):
- 宾夕法尼亚州(2018)、犹他州(2019)、新泽西州(2019)、密歇根州(2020)
- 加利福尼亚州(2020)、康涅狄格州(2021)、特拉华州(2021)、弗吉尼亚州(2021)
- 俄克拉荷马州(2022)、科罗拉多州(2022)、纽约州(2023)、明尼苏达州(2023)
- 马里兰州(2024)、伊利诺伊州(2024)、俄勒冈州(2025)
大麻相关Expungement(专项法规):
- 大多数州现在都有独立的大麻相关expungement条款
- 同时搜索“cannabis conviction”和“expungement”
- 核查追溯适用日期
需核实的2025-2026年法律变更:
- 俄勒冈州HB 2316(扩大资格范围)
- 加利福尼亚州AB 1076(扩大自动救济范围)
- 查看CCRC的权利恢复项目获取最新状态
关键提示:要知道“自动expungement”并不意味着立即生效——宾夕法尼亚州的Clean Slate法案对逮捕记录设有10年等待期,且因罪名不同而有差异。研究必须捕捉这些细节。
4. Firecrawl Configuration Expertise
4. Firecrawl配置专业知识
When configuring scrape jobs:
Extraction Schema Design:
typescript
// For statute pages, extract:
{
statuteCitation: "string", // e.g., "ORS 137.225"
title: "string", // e.g., "Setting aside conviction"
fullText: "string", // Complete statute text
effectiveDate: "string", // When current version took effect
lastAmended: "string", // Most recent amendment date
subsections: "array", // Parsed subsections
}
// For court self-help pages, extract:
{
stateName: "string",
expungementPageUrl: "string",
formsLibraryUrl: "string",
selfHelpUrl: "string",
contactPhone: "string",
feeScheduleUrl: "string",
}
// For forms, extract:
{
formNumber: "string", // e.g., "MC-440"
formTitle: "string",
pdfUrl: "string",
applicableTo: "array", // ["misdemeanor", "arrest"]
lastUpdated: "string",
}Rate Limiting for Government Sites:
typescript
rateLimit: 2, // 2 requests/second max for .gov sites
timeout: 90000, // Government sites can be slow
maxRetries: 3, // Retry on timeout
waitFor: 3000, // Wait for JavaScript on modern court sitesShibboleth: Knowing to set for statute pages (to skip navigation chrome) but for forms pages (where the form links are often in sidebars).
onlyMainContent: trueonlyMainContent: false配置抓取任务时:
提取 schema 设计:
typescript
// For statute pages, extract:
{
statuteCitation: "string", // e.g., "ORS 137.225"
title: "string", // e.g., "Setting aside conviction"
fullText: "string", // Complete statute text
effectiveDate: "string", // When current version took effect
lastAmended: "string", // Most recent amendment date
subsections: "array", // Parsed subsections
}
// For court self-help pages, extract:
{
stateName: "string",
expungementPageUrl: "string",
formsLibraryUrl: "string",
selfHelpUrl: "string",
contactPhone: "string",
feeScheduleUrl: "string",
}
// For forms, extract:
{
formNumber: "string", // e.g., "MC-440"
formTitle: "string",
pdfUrl: "string",
applicableTo: "array", // ["misdemeanor", "arrest"]
lastUpdated: "string",
}政府网站的速率限制:
typescript
rateLimit: 2, // 2 requests/second max for .gov sites
timeout: 90000, // Government sites can be slow
maxRetries: 3, // Retry on timeout
waitFor: 3000, // Wait for JavaScript on modern court sites关键提示:要知道为法规页面设置(以跳过导航栏内容),而为表格页面设置(因为表格链接通常位于侧边栏)。
onlyMainContent: trueonlyMainContent: false5. Data Validation Checklist
5. 数据验证清单
After scraping, validate:
□ Statute citations match official format (e.g., "ORS" not "Or. Rev. Stat.")
□ Effective dates are parseable and reasonable (not future, not too old)
□ URLs are live and return 200 status
□ PDF form links actually download PDFs (not HTML error pages)
□ Phone numbers are in consistent format
□ Fee amounts are numeric and reasonable ($0-$500 typical range)
□ State code extracted correctly (watch for ambiguous URLs)Common Extraction Errors:
- "oregon.public.law" matching "la" (Louisiana) instead of "or" (Oregon)
- Statute text truncated at 10,000 characters (increase limit)
- Form "last updated" dates in inconsistent formats
- County-specific URLs mistaken for state-level
抓取完成后,请验证以下内容:
□ Statute citations match official format (e.g., "ORS" not "Or. Rev. Stat.")
□ Effective dates are parseable and reasonable (not future, not too old)
□ URLs are live and return 200 status
□ PDF form links actually download PDFs (not HTML error pages)
□ Phone numbers are in consistent format
□ Fee amounts are numeric and reasonable ($0-$500 typical range)
□ State code extracted correctly (watch for ambiguous URLs)常见提取错误:
- "oregon.public.law"被匹配为“la”(路易斯安那州)而非“or”(俄勒冈州)
- 法规文本被截断为10000字符(需提高限制)
- 表格的“最后更新”日期格式不一致
- 特定县的URL被误认为是州级URL
6. Gap Analysis Process
6. 缺口分析流程
To identify missing data for a state:
bash
undefined要识别某州的缺失数据:
bash
undefinedCheck what we have
Check what we have
ls src/data/scraped/states/{state}/
ls src/data/scraped/states/{state}/
Expected files for complete coverage:
Expected files for complete coverage:
- statutes.json (eligibility rules from statute text)
- statutes.json (eligibility rules from statute text)
- court-system.json (court URLs, contacts, forms links)
- court-system.json (court URLs, contacts, forms links)
- forms/ (actual PDF forms)
- forms/ (actual PDF forms)
- fees.json (filing fee amounts)
- fees.json (filing fee amounts)
- counties/ (county-specific court data)
- counties/ (county-specific court data)
Cross-reference with state data file
Cross-reference with state data file
grep -l "waitingPeriods|eligibilityRules" src/data/states/{state}.ts
**Priority order for filling gaps**:
1. Statutes (foundation for all rules)
2. Court forms (what users actually need to file)
3. Fee information (users need to budget)
4. County contacts (where to file)
5. Spanish resources (accessibility)
---grep -l "waitingPeriods|eligibilityRules" src/data/states/{state}.ts
**填补缺口的优先级**:
1. 法规(所有规则的基础)
2. 法院表格(用户实际需要提交的文件)
3. 费用信息(用户需要规划预算)
4. 县联系方式(提交地点)
5. 西班牙语资源(无障碍访问)
---Anti-Patterns
反模式
Never Do These:
绝对禁止的操作:
- Scrape without rate limiting - Government sites will block you
- Trust secondary sources for statute text - Always verify against primary
- Assume URL patterns are consistent - Each state is different
- Ignore effective dates - Laws change; scraped data needs timestamps
- Scrape county sites without state context - County rules supplement, not replace, state law
- Skip the self-help sections - Often have the clearest eligibility summaries
- Treat all states the same - Clean Slate states have fundamentally different processes
- 无速率限制的抓取 - 政府网站会封禁你的访问
- 依赖二级来源获取法规文本 - 始终需通过一级来源验证
- 假设URL模式一致 - 每个州的情况都不同
- 忽略生效日期 - 法律会变更,抓取的数据需要时间戳
- 脱离州级背景抓取县级网站 - 县级规则是对州级法律的补充,而非替代
- 跳过自助板块 - 这些板块通常包含最清晰的资格总结
- 对所有州一视同仁 - 实施Clean Slate法案的州流程完全不同
Common Mistakes:
常见错误:
❌ Scraping Wikipedia for statute text
✅ Scraping the state legislature's official code
❌ Using findlaw.com as primary source
✅ Using findlaw.com to find the citation, then scraping the official source
❌ Assuming "expungement" is the only term
✅ Searching for: expungement, sealing, set-aside, dismissal, destruction, pardons
❌ Treating waiting periods as simple numbers
✅ Capturing offense-specific waiting periods (felonies vs misdemeanors vs arrests)❌ 从维基百科抓取法规文本
✅ 从州立法机构的官方法典抓取
❌ 将findlaw.com作为一级来源
✅ 使用findlaw.com查找引用,再从官方来源抓取
❌ 假设“expungement”是唯一术语
✅ 搜索以下术语:expungement, sealing, set-aside, dismissal, destruction, pardons
❌ 将等待期视为简单数字
✅ 记录按罪名划分的等待期(重罪、轻罪、逮捕记录)Project-Specific Context
项目特定背景
This skill is designed for the National Expungement Guide project:
该Skill专为National Expungement Guide项目设计:
Existing Infrastructure
现有基础设施
- Firecrawl scripts:
scripts/firecrawl/ - Job definitions: (P0-P4 priority jobs)
scripts/firecrawl/jobs.ts - URL config: (all 50 states)
scripts/firecrawl/config.ts - Output path:
src/data/scraped/states/{state}/ - State data: (TypeScript files per state)
src/data/states/
- Firecrawl脚本:
scripts/firecrawl/ - 任务定义:(P0-P4优先级任务)
scripts/firecrawl/jobs.ts - URL配置:(覆盖全部50个州)
scripts/firecrawl/config.ts - 输出路径:
src/data/scraped/states/{state}/ - 州数据:(每个州对应一个TypeScript文件)
src/data/states/
Running Scrapes
运行抓取任务
bash
undefinedbash
undefinedSet API key first
Set API key first
export FIRECRAWL_API_KEY=your_key
export FIRECRAWL_API_KEY=your_key
Run P0 (state statutes + courts) - ~$0.20 cost
Run P0 (state statutes + courts) - ~$0.20 cost
npx tsx scripts/firecrawl/run-p0.ts
npx tsx scripts/firecrawl/run-p0.ts
Dry run to preview
Dry run to preview
npx tsx scripts/firecrawl/run-p0.ts --dry-run
npx tsx scripts/firecrawl/run-p0.ts --dry-run
Check reports
Check reports
cat scripts/firecrawl/reports/p0-*.json
undefinedcat scripts/firecrawl/reports/p0-*.json
undefinedData Flow
数据流程
Firecrawl scrape → src/data/scraped/{state}/*.json
↓
Manual review + cleanup
↓
Integrated into src/data/states/{state}.ts
↓
Used by eligibility wizard + PDF generatorFirecrawl抓取 → src/data/scraped/{state}/*.json
↓
人工审核与清理
↓
整合到src/data/states/{state}.ts
↓
用于资格向导 + PDF生成器References
参考资料
See folder for:
references/- - Complete URL patterns for all 50 states
url-patterns-by-state.md - - When each Clean Slate law passed and took effect
clean-slate-timeline.md - - All extraction schemas used
firecrawl-schemas.md
请查看文件夹获取:
references/- - 全部50个州的完整URL模式
url-patterns-by-state.md - - 各Clean Slate法案的通过与生效时间
clean-slate-timeline.md - - 使用的所有提取schema
firecrawl-schemas.md
Example Workflow
示例工作流
User request: "Research California's 2026 expungement laws and scrape the latest data"
Agent workflow:
- Check existing data:
ls src/data/scraped/states/ca/ - Verify current statute version at
california.public.law - Check for 2025-2026 law changes via CCRC or news search
- Update if URLs changed
scripts/firecrawl/config.ts - Run targeted scrape: add CA-specific URLs to P0 job
- Validate extracted data against known statute citations
- Document any gaps or changes found
用户请求:"研究加利福尼亚州2026年的expungement法律并抓取最新数据"
Agent工作流:
- 检查现有数据:
ls src/data/scraped/states/ca/ - 在验证当前法规版本
california.public.law - 通过CCRC或新闻搜索检查2025-2026年的法律变更
- 若URL变更则更新
scripts/firecrawl/config.ts - 运行定向抓取:将加州特定URL添加到P0任务
- 根据已知法规引用验证提取的数据
- 记录发现的任何缺口或变更