firecrawl-research
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFireCrawl Research
FireCrawl 研究工具
Overview
概述
Enrich research documents by automatically searching and scraping web sources using the FireCrawl API. Extract research topics from markdown files and generate comprehensive research documents with source material.
通过FireCrawl API自动搜索并抓取网络资源,丰富研究文档内容。从Markdown文件中提取研究主题,结合原始资源生成全面的研究文档。
When to Use This Skill
何时使用本Skill
Use this skill when the user:
- Says "Research this topic using FireCrawl"
- Requests to enrich notes or documents with web sources
- Wants to gather information about topics listed in a markdown file
- Needs to search and scrape multiple topics systematically
当用户有以下需求时,可使用本Skill:
- 提出“使用FireCrawl研究这个主题”
- 请求用网络资源丰富笔记或文档
- 想要收集Markdown文件中列出的主题相关信息
- 需要系统性地搜索和抓取多个主题的信息
How It Works
工作原理
1. Topic Extraction
1. 主题提取
The script automatically extracts research topics from markdown files using two methods:
Method 1: Headers
markdown
undefined脚本通过两种方法自动从Markdown文件中提取研究主题:
方法1:标题
markdown
undefinedSpatial Reasoning in AI
Spatial Reasoning in AI
Computer Vision Applications
Computer Vision Applications
Both `Spatial Reasoning in AI` and `Computer Vision Applications` become research topics.
**Method 2: Research Tags**
```markdown
- [research] Large Language Models for robotics
- [search] Theory of Mind in autonomous drivingBoth tagged items become research topics.
`Spatial Reasoning in AI`和`Computer Vision Applications`都会成为研究主题。
**方法2:研究标签**
```markdown
- [research] Large Language Models for robotics
- [search] Theory of Mind in autonomous driving所有带标签的条目都会成为研究主题。
2. Search and Scrape
2. 搜索与抓取
For each topic:
- Searches FireCrawl with the topic as query
- Retrieves up to N results (default: 5)
- Automatically scrapes full content from each result
- Extracts markdown-formatted content (main content only)
针对每个主题:
- 以该主题为查询词在FireCrawl中搜索
- 最多获取N条结果(默认:5条)
- 自动抓取每条结果的完整内容
- 提取Markdown格式的内容(仅保留主体内容)
3. Output Generation
3. 输出生成
Creates new markdown files in the specified output directory:
- One file per topic
- Filename:
{topic}_{timestamp}.md - Contains: title, date, sources count, full scraped content
- Each source includes: title, URL, markdown content
在指定的输出目录中创建新的Markdown文件:
- 每个主题对应一个文件
- 文件名格式:
{topic}_{timestamp}.md - 文件包含:标题、日期、资源数量、完整抓取内容
- 每条资源包含:标题、URL、Markdown内容
Usage
使用方法
Basic Usage
基础用法
bash
python scripts/firecrawl_research.py research.mdOutputs to current directory.
bash
python scripts/firecrawl_research.py research.md输出到当前目录。
Specify Output Directory
指定输出目录
bash
python scripts/firecrawl_research.py research.md ./outputCreates files in folder.
./output/bash
python scripts/firecrawl_research.py research.md ./output将文件创建到文件夹中。
./output/Limit Results Per Topic
限制每个主题的结果数量
bash
python scripts/firecrawl_research.py research.md ./output 3Retrieves maximum 3 results per topic.
bash
python scripts/firecrawl_research.py research.md ./output 3每个主题最多获取3条结果。
Configuration
配置说明
API Key Setup
API密钥设置
-
Copyto
.env.example:.envbashcp .env.example .env -
Add FireCrawl API key:
FIRECRAWL_API_KEY=fc-your-actual-api-key
The script automatically loads the API key from the skill's file.
.env-
将复制为
.env.example:.envbashcp .env.example .env -
添加FireCrawl API密钥:
FIRECRAWL_API_KEY=fc-your-actual-api-key
脚本会自动从Skill的文件中加载API密钥。
.envRate Limiting
速率限制
The script includes automatic rate limiting for FireCrawl's free tier:
- Free tier limit: 5 requests/minute
- Built-in delay: 12 seconds between topics
- Prevents API errors and credit exhaustion
When processing multiple topics, expect:
- 5 topics: ~1 minute
- 10 topics: ~2 minutes
- 20 topics: ~4 minutes
脚本针对FireCrawl的免费套餐内置了自动速率限制:
- 免费套餐限制: 每分钟5次请求
- 内置延迟: 主题之间间隔12秒
- 避免API错误和额度耗尽
处理多个主题时,耗时参考:
- 5个主题:约1分钟
- 10个主题:约2分钟
- 20个主题:约4分钟
Workflow Example
工作流示例
User request: "Research these AI topics using FireCrawl"
Input file ():
ai-research.mdmarkdown
undefined用户请求: "使用FireCrawl研究这些AI主题"
输入文件():
ai-research.mdmarkdown
undefinedAI Research Topics
AI Research Topics
Spatial Reasoning in Vision-Language Models
Spatial Reasoning in Vision-Language Models
- [research] Embodied AI for robotics
- [research] Computer Use Agents
**Command:**
```bash
python scripts/firecrawl_research.py ai-research.md ./research_output 5Output:
research_output/
├── Spatial_Reasoning_in_Vision-Language_Models_20251122_140530.md
├── Embodied_AI_for_robotics_20251122_140542.md
└── Computer_Use_Agents_20251122_140554.mdEach file contains:
- Topic title
- Timestamp
- Source count
- Full scraped content from up to 5 sources
- Source URLs
- [research] Embodied AI for robotics
- [research] Computer Use Agents
**命令:**
```bash
python scripts/firecrawl_research.py ai-research.md ./research_output 5输出:
research_output/
├── Spatial_Reasoning_in_Vision-Language_Models_20251122_140530.md
├── Embodied_AI_for_robotics_20251122_140542.md
└── Computer_Use_Agents_20251122_140554.md每个文件包含:
- 主题标题
- 时间戳
- 资源数量
- 最多5条资源的完整抓取内容
- 资源URL
Common Patterns
常见使用模式
Pattern 1: Quick Research
模式1:快速研究
Extract topics from existing notes, research them, save to current folder:
bash
python scripts/firecrawl_research.py my-notes.md从现有笔记中提取主题,进行研究并保存到当前文件夹:
bash
python scripts/firecrawl_research.py my-notes.mdPattern 2: Organized Research
模式2:规范化研究
Create dedicated output folder for research results:
bash
python scripts/firecrawl_research.py topics.md ./research_results为研究结果创建专用输出文件夹:
bash
python scripts/firecrawl_research.py topics.md ./research_resultsPattern 3: Deep Dive
模式3:深度研究
Increase results per topic for comprehensive coverage:
bash
python scripts/firecrawl_research.py topics.md ./deep_research 10增加每个主题的结果数量以获取全面内容:
bash
python scripts/firecrawl_research.py topics.md ./deep_research 10Pattern 4: Obsidian Vault Integration
模式4:Obsidian库集成
Direct output to vault's research folder:
bash
python scripts/firecrawl_research.py topics.md ~/Brains/brain/Research直接输出到Obsidian库的研究文件夹:
bash
python scripts/firecrawl_research.py topics.md ~/Brains/brain/ResearchError Handling
错误处理
"API key not found"
"API key not found"
Create file in skill folder with
.envFIRECRAWL_API_KEY=...在Skill文件夹中创建文件,并添加
.envFIRECRAWL_API_KEY=..."Rate limit exceeded"
"Rate limit exceeded"
- Free tier: 5 req/min
- Script has 12s delay built-in
- If still hitting limit, reduce topics or wait between runs
- 免费套餐:每分钟5次请求
- 脚本内置12秒延迟
- 如果仍触发限制,减少主题数量或在多次运行之间等待
"Insufficient credits"
"Insufficient credits"
- Check FireCrawl account credits
- Upgrade plan or wait for credit reset
- 检查FireCrawl账户的额度
- 升级套餐或等待额度重置
"No topics found"
"No topics found"
Add topics to markdown using:
## Header format- [research] Topic format- [search] Topic format
通过以下方式在Markdown中添加主题:
## 标题格式- [research] 主题格式- [search] 主题格式
Script Details
脚本详情
Location:
scripts/firecrawl_research.pyDependencies:
- - Environment variable management
python-dotenv - - HTTP requests to FireCrawl API
requests
Install dependencies:
bash
pip install python-dotenv requestsFireCrawl Features Used:
- endpoint - Search with automatic scraping
/v1/search - - Markdown output
scrapeOptions.formats: ['markdown'] - - Filter noise
scrapeOptions.onlyMainContent: true
位置:
scripts/firecrawl_research.py依赖项:
- - 环境变量管理
python-dotenv - - 向FireCrawl API发送HTTP请求
requests
安装依赖项:
bash
pip install python-dotenv requests使用的FireCrawl功能:
- 端点 - 带自动抓取的搜索功能
/v1/search - - Markdown格式输出
scrapeOptions.formats: ['markdown'] - - 过滤无关内容
scrapeOptions.onlyMainContent: true
Academic Writing Templates
学术写作模板
This skill includes templates for writing scientific papers in markdown format.
本Skill包含用于撰写Markdown格式科学论文的模板。
Available Templates
可用模板
1. Pandoc Scholarly Paper ()
assets/templates/pandoc-scholarly-paper.md- Standard academic paper format
- Compatible with Pandoc converter
- Supports citations via BibTeX
- Exports to PDF, DOCX, HTML
2. MyST Scientific Paper ()
assets/templates/myst-scientific-paper.md- MyST (Markedly Structured Text) format
- Advanced cross-referencing
- Professional scientific publishing
- Multi-format export (PDF, LaTeX, DOCX)
1. Pandoc学术论文模板()
assets/templates/pandoc-scholarly-paper.md- 标准学术论文格式
- 与Pandoc转换器兼容
- 支持通过BibTeX添加引用
- 可导出为PDF、DOCX、HTML格式
2. MyST科学论文模板()
assets/templates/myst-scientific-paper.md- MyST(Markedly Structured Text)格式
- 支持高级交叉引用
- 适用于专业科学出版
- 支持多格式导出(PDF、LaTeX、DOCX)
Using Templates
模板使用方法
Copy template to your project:
bash
cp assets/templates/pandoc-scholarly-paper.md my-paper.md将模板复制到你的项目中:
bash
cp assets/templates/pandoc-scholarly-paper.md my-paper.mdor
或
cp assets/templates/myst-scientific-paper.md my-paper.md
**Edit content:**
- Update YAML frontmatter (title, authors, affiliations)
- Write your content in sections
- Add citations using `[@AuthorYear]` (Pandoc) or `{cite}\`AuthorYear\`` (MyST)
**Convert to PDF/DOCX:**
```bash
python scripts/convert_academic.py my-paper.md pdf
python scripts/convert_academic.py my-paper.md docx
python scripts/convert_academic.py my-paper.md pdf --myst # For MySTcp assets/templates/myst-scientific-paper.md my-paper.md
**编辑内容:**
- 更新YAML前置内容(标题、作者、机构)
- 按章节撰写内容
- 使用`[@AuthorYear]`(Pandoc)或`{cite}\`AuthorYear\``(MyST)添加引用
**转换为PDF/DOCX:**
```bash
python scripts/convert_academic.py my-paper.md pdf
python scripts/convert_academic.py my-paper.md docx
python scripts/convert_academic.py my-paper.md pdf --myst # 针对MyST模板Bibliography Generation
参考文献生成
Convert FireCrawl research results into BibTeX bibliography entries:
bash
python scripts/generate_bibliography.py research_output/*.md -o references.bibWhat it does:
- Extracts URLs and titles from FireCrawl markdown files
- Generates BibTeX entries
@misc - Creates citation keys automatically
- Adds access dates
Example workflow:
bash
undefined将FireCrawl研究结果转换为BibTeX参考文献条目:
bash
python scripts/generate_bibliography.py research_output/*.md -o references.bib功能说明:
- 从FireCrawl生成的Markdown文件中提取URL和标题
- 生成BibTeX的条目
@misc - 自动创建引用键
- 添加访问日期
示例工作流:
bash
undefined1. Research topics
1. 研究主题
python scripts/firecrawl_research.py topics.md ./research
python scripts/firecrawl_research.py topics.md ./research
2. Generate bibliography
2. 生成参考文献
python scripts/generate_bibliography.py research/*.md -o refs.bib
python scripts/generate_bibliography.py research/*.md -o refs.bib
3. Copy template
3. 复制模板
cp assets/templates/pandoc-scholarly-paper.md paper.md
cp assets/templates/pandoc-scholarly-paper.md paper.md
4. Edit paper.md (add content, cite sources)
4. 编辑paper.md(添加内容、引用资源)
5. Convert to PDF
5. 转换为PDF
python scripts/convert_academic.py paper.md pdf
undefinedpython scripts/convert_academic.py paper.md pdf
undefinedCitation Examples
引用示例
Pandoc syntax:
markdown
Recent research [@Smith2024] shows...
Multiple studies [@Jones2023; @Brown2024] indicate...MyST syntax:
markdown
Recent research {cite}`Smith2024` shows...
Multiple studies {cite}`Jones2023,Brown2024` indicate...Pandoc语法:
markdown
Recent research [@Smith2024] shows...
Multiple studies [@Jones2023; @Brown2024] indicate...MyST语法:
markdown
Recent research {cite}`Smith2024` shows...
Multiple studies {cite}`Jones2023,Brown2024` indicate...Example Bibliography File
参考文献示例文件
An example bibliography is provided in with common entry types:
assets/references.bib- Journal articles ()
@article - Conference papers ()
@inproceedings - Books ()
@book - PhD theses ()
@phdthesis - Web resources ()
@misc - Preprints (with arXiv)
@article
assets/references.bib- 期刊文章()
@article - 会议论文()
@inproceedings - 书籍()
@book - 博士论文()
@phdthesis - 网络资源()
@misc - 预印本(带arXiv的)
@article
Tips
使用技巧
- Organize topics hierarchically - Use for main topics,
##for subtopics### - Use descriptive names - Topic text becomes filename, make it clear
- Batch processing - Group related topics in one file for efficiency
- Output organization - Create separate folders for different research projects
- Content review - Results are truncated at 3000 chars/source for readability
- Academic workflow - Use bibliography generator to cite research sources in papers
- Template customization - Modify templates for your field's citation style
- 分层组织主题 - 使用表示主主题,
##表示子主题### - 使用描述性名称 - 主题文本会成为文件名,确保清晰易懂
- 批量处理 - 将相关主题分组到一个文件中以提高效率
- 输出管理 - 为不同的研究项目创建独立文件夹
- 内容审核 - 为保证可读性,每条资源的内容会被截断至3000字符
- 学术工作流 - 使用参考文献生成器在论文中引用研究资源
- 模板自定义 - 根据所在领域的引用风格修改模板
Limitations
局限性
- No summarization - Returns raw scraped content, not summaries
- No deduplication - Duplicate sources may appear across topics
- No quality ranking - All results treated equally
- New files only - Does not append to existing files
- Free tier constraints - Rate limiting affects processing speed
- 无摘要功能 - 返回原始抓取内容,不生成摘要
- 无去重功能 - 不同主题间可能出现重复资源
- 无质量排序 - 所有结果被同等对待
- 仅创建新文件 - 不会追加到现有文件中
- 免费套餐限制 - 速率限制会影响处理速度