firecrawl-research

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

FireCrawl Research

FireCrawl 研究工具

Overview

概述

Enrich research documents by automatically searching and scraping web sources using the FireCrawl API. Extract research topics from markdown files and generate comprehensive research documents with source material.
通过FireCrawl API自动搜索并抓取网络资源,丰富研究文档内容。从Markdown文件中提取研究主题,结合原始资源生成全面的研究文档。

When to Use This Skill

何时使用本Skill

Use this skill when the user:
  • Says "Research this topic using FireCrawl"
  • Requests to enrich notes or documents with web sources
  • Wants to gather information about topics listed in a markdown file
  • Needs to search and scrape multiple topics systematically
当用户有以下需求时,可使用本Skill:
  • 提出“使用FireCrawl研究这个主题”
  • 请求用网络资源丰富笔记或文档
  • 想要收集Markdown文件中列出的主题相关信息
  • 需要系统性地搜索和抓取多个主题的信息

How It Works

工作原理

1. Topic Extraction

1. 主题提取

The script automatically extracts research topics from markdown files using two methods:
Method 1: Headers
markdown
undefined
脚本通过两种方法自动从Markdown文件中提取研究主题:
方法1:标题
markdown
undefined

Spatial Reasoning in AI

Spatial Reasoning in AI

Computer Vision Applications

Computer Vision Applications

Both `Spatial Reasoning in AI` and `Computer Vision Applications` become research topics.

**Method 2: Research Tags**
```markdown
- [research] Large Language Models for robotics
- [search] Theory of Mind in autonomous driving
Both tagged items become research topics.
`Spatial Reasoning in AI`和`Computer Vision Applications`都会成为研究主题。

**方法2:研究标签**
```markdown
- [research] Large Language Models for robotics
- [search] Theory of Mind in autonomous driving
所有带标签的条目都会成为研究主题。

2. Search and Scrape

2. 搜索与抓取

For each topic:
  1. Searches FireCrawl with the topic as query
  2. Retrieves up to N results (default: 5)
  3. Automatically scrapes full content from each result
  4. Extracts markdown-formatted content (main content only)
针对每个主题:
  1. 以该主题为查询词在FireCrawl中搜索
  2. 最多获取N条结果(默认:5条)
  3. 自动抓取每条结果的完整内容
  4. 提取Markdown格式的内容(仅保留主体内容)

3. Output Generation

3. 输出生成

Creates new markdown files in the specified output directory:
  • One file per topic
  • Filename:
    {topic}_{timestamp}.md
  • Contains: title, date, sources count, full scraped content
  • Each source includes: title, URL, markdown content
在指定的输出目录中创建新的Markdown文件:
  • 每个主题对应一个文件
  • 文件名格式:
    {topic}_{timestamp}.md
  • 文件包含:标题、日期、资源数量、完整抓取内容
  • 每条资源包含:标题、URL、Markdown内容

Usage

使用方法

Basic Usage

基础用法

bash
python scripts/firecrawl_research.py research.md
Outputs to current directory.
bash
python scripts/firecrawl_research.py research.md
输出到当前目录。

Specify Output Directory

指定输出目录

bash
python scripts/firecrawl_research.py research.md ./output
Creates files in
./output/
folder.
bash
python scripts/firecrawl_research.py research.md ./output
将文件创建到
./output/
文件夹中。

Limit Results Per Topic

限制每个主题的结果数量

bash
python scripts/firecrawl_research.py research.md ./output 3
Retrieves maximum 3 results per topic.
bash
python scripts/firecrawl_research.py research.md ./output 3
每个主题最多获取3条结果。

Configuration

配置说明

API Key Setup

API密钥设置

  1. Copy
    .env.example
    to
    .env
    :
    bash
    cp .env.example .env
  2. Add FireCrawl API key:
    FIRECRAWL_API_KEY=fc-your-actual-api-key
The script automatically loads the API key from the skill's
.env
file.
  1. .env.example
    复制为
    .env
    bash
    cp .env.example .env
  2. 添加FireCrawl API密钥:
    FIRECRAWL_API_KEY=fc-your-actual-api-key
脚本会自动从Skill的
.env
文件中加载API密钥。

Rate Limiting

速率限制

The script includes automatic rate limiting for FireCrawl's free tier:
  • Free tier limit: 5 requests/minute
  • Built-in delay: 12 seconds between topics
  • Prevents API errors and credit exhaustion
When processing multiple topics, expect:
  • 5 topics: ~1 minute
  • 10 topics: ~2 minutes
  • 20 topics: ~4 minutes
脚本针对FireCrawl的免费套餐内置了自动速率限制:
  • 免费套餐限制: 每分钟5次请求
  • 内置延迟: 主题之间间隔12秒
  • 避免API错误和额度耗尽
处理多个主题时,耗时参考:
  • 5个主题:约1分钟
  • 10个主题:约2分钟
  • 20个主题:约4分钟

Workflow Example

工作流示例

User request: "Research these AI topics using FireCrawl"
Input file (
ai-research.md
):
markdown
undefined
用户请求: "使用FireCrawl研究这些AI主题"
输入文件(
ai-research.md
):
markdown
undefined

AI Research Topics

AI Research Topics

Spatial Reasoning in Vision-Language Models

Spatial Reasoning in Vision-Language Models

  • [research] Embodied AI for robotics
  • [research] Computer Use Agents

**Command:**
```bash
python scripts/firecrawl_research.py ai-research.md ./research_output 5
Output:
research_output/
├── Spatial_Reasoning_in_Vision-Language_Models_20251122_140530.md
├── Embodied_AI_for_robotics_20251122_140542.md
└── Computer_Use_Agents_20251122_140554.md
Each file contains:
  • Topic title
  • Timestamp
  • Source count
  • Full scraped content from up to 5 sources
  • Source URLs
  • [research] Embodied AI for robotics
  • [research] Computer Use Agents

**命令:**
```bash
python scripts/firecrawl_research.py ai-research.md ./research_output 5
输出:
research_output/
├── Spatial_Reasoning_in_Vision-Language_Models_20251122_140530.md
├── Embodied_AI_for_robotics_20251122_140542.md
└── Computer_Use_Agents_20251122_140554.md
每个文件包含:
  • 主题标题
  • 时间戳
  • 资源数量
  • 最多5条资源的完整抓取内容
  • 资源URL

Common Patterns

常见使用模式

Pattern 1: Quick Research

模式1:快速研究

Extract topics from existing notes, research them, save to current folder:
bash
python scripts/firecrawl_research.py my-notes.md
从现有笔记中提取主题,进行研究并保存到当前文件夹:
bash
python scripts/firecrawl_research.py my-notes.md

Pattern 2: Organized Research

模式2:规范化研究

Create dedicated output folder for research results:
bash
python scripts/firecrawl_research.py topics.md ./research_results
为研究结果创建专用输出文件夹:
bash
python scripts/firecrawl_research.py topics.md ./research_results

Pattern 3: Deep Dive

模式3:深度研究

Increase results per topic for comprehensive coverage:
bash
python scripts/firecrawl_research.py topics.md ./deep_research 10
增加每个主题的结果数量以获取全面内容:
bash
python scripts/firecrawl_research.py topics.md ./deep_research 10

Pattern 4: Obsidian Vault Integration

模式4:Obsidian库集成

Direct output to vault's research folder:
bash
python scripts/firecrawl_research.py topics.md ~/Brains/brain/Research
直接输出到Obsidian库的研究文件夹:
bash
python scripts/firecrawl_research.py topics.md ~/Brains/brain/Research

Error Handling

错误处理

"API key not found"

"API key not found"

Create
.env
file in skill folder with
FIRECRAWL_API_KEY=...
在Skill文件夹中创建
.env
文件,并添加
FIRECRAWL_API_KEY=...

"Rate limit exceeded"

"Rate limit exceeded"

  • Free tier: 5 req/min
  • Script has 12s delay built-in
  • If still hitting limit, reduce topics or wait between runs
  • 免费套餐:每分钟5次请求
  • 脚本内置12秒延迟
  • 如果仍触发限制,减少主题数量或在多次运行之间等待

"Insufficient credits"

"Insufficient credits"

  • Check FireCrawl account credits
  • Upgrade plan or wait for credit reset
  • 检查FireCrawl账户的额度
  • 升级套餐或等待额度重置

"No topics found"

"No topics found"

Add topics to markdown using:
  • ## Header format
  • - [research] Topic format
  • - [search] Topic format
通过以下方式在Markdown中添加主题:
  • ## 标题格式
  • - [research] 主题格式
  • - [search] 主题格式

Script Details

脚本详情

Location:
scripts/firecrawl_research.py
Dependencies:
  • python-dotenv
    - Environment variable management
  • requests
    - HTTP requests to FireCrawl API
Install dependencies:
bash
pip install python-dotenv requests
FireCrawl Features Used:
  • /v1/search
    endpoint - Search with automatic scraping
  • scrapeOptions.formats: ['markdown']
    - Markdown output
  • scrapeOptions.onlyMainContent: true
    - Filter noise
位置:
scripts/firecrawl_research.py
依赖项:
  • python-dotenv
    - 环境变量管理
  • requests
    - 向FireCrawl API发送HTTP请求
安装依赖项:
bash
pip install python-dotenv requests
使用的FireCrawl功能:
  • /v1/search
    端点 - 带自动抓取的搜索功能
  • scrapeOptions.formats: ['markdown']
    - Markdown格式输出
  • scrapeOptions.onlyMainContent: true
    - 过滤无关内容

Academic Writing Templates

学术写作模板

This skill includes templates for writing scientific papers in markdown format.
本Skill包含用于撰写Markdown格式科学论文的模板。

Available Templates

可用模板

1. Pandoc Scholarly Paper (
assets/templates/pandoc-scholarly-paper.md
)
  • Standard academic paper format
  • Compatible with Pandoc converter
  • Supports citations via BibTeX
  • Exports to PDF, DOCX, HTML
2. MyST Scientific Paper (
assets/templates/myst-scientific-paper.md
)
  • MyST (Markedly Structured Text) format
  • Advanced cross-referencing
  • Professional scientific publishing
  • Multi-format export (PDF, LaTeX, DOCX)
1. Pandoc学术论文模板
assets/templates/pandoc-scholarly-paper.md
  • 标准学术论文格式
  • 与Pandoc转换器兼容
  • 支持通过BibTeX添加引用
  • 可导出为PDF、DOCX、HTML格式
2. MyST科学论文模板
assets/templates/myst-scientific-paper.md
  • MyST(Markedly Structured Text)格式
  • 支持高级交叉引用
  • 适用于专业科学出版
  • 支持多格式导出(PDF、LaTeX、DOCX)

Using Templates

模板使用方法

Copy template to your project:
bash
cp assets/templates/pandoc-scholarly-paper.md my-paper.md
将模板复制到你的项目中:
bash
cp assets/templates/pandoc-scholarly-paper.md my-paper.md

or

cp assets/templates/myst-scientific-paper.md my-paper.md

**Edit content:**
- Update YAML frontmatter (title, authors, affiliations)
- Write your content in sections
- Add citations using `[@AuthorYear]` (Pandoc) or `{cite}\`AuthorYear\`` (MyST)

**Convert to PDF/DOCX:**
```bash
python scripts/convert_academic.py my-paper.md pdf
python scripts/convert_academic.py my-paper.md docx
python scripts/convert_academic.py my-paper.md pdf --myst  # For MyST
cp assets/templates/myst-scientific-paper.md my-paper.md

**编辑内容:**
- 更新YAML前置内容(标题、作者、机构)
- 按章节撰写内容
- 使用`[@AuthorYear]`(Pandoc)或`{cite}\`AuthorYear\``(MyST)添加引用

**转换为PDF/DOCX:**
```bash
python scripts/convert_academic.py my-paper.md pdf
python scripts/convert_academic.py my-paper.md docx
python scripts/convert_academic.py my-paper.md pdf --myst  # 针对MyST模板

Bibliography Generation

参考文献生成

Convert FireCrawl research results into BibTeX bibliography entries:
bash
python scripts/generate_bibliography.py research_output/*.md -o references.bib
What it does:
  • Extracts URLs and titles from FireCrawl markdown files
  • Generates BibTeX
    @misc
    entries
  • Creates citation keys automatically
  • Adds access dates
Example workflow:
bash
undefined
将FireCrawl研究结果转换为BibTeX参考文献条目:
bash
python scripts/generate_bibliography.py research_output/*.md -o references.bib
功能说明:
  • 从FireCrawl生成的Markdown文件中提取URL和标题
  • 生成BibTeX的
    @misc
    条目
  • 自动创建引用键
  • 添加访问日期
示例工作流:
bash
undefined

1. Research topics

1. 研究主题

python scripts/firecrawl_research.py topics.md ./research
python scripts/firecrawl_research.py topics.md ./research

2. Generate bibliography

2. 生成参考文献

python scripts/generate_bibliography.py research/*.md -o refs.bib
python scripts/generate_bibliography.py research/*.md -o refs.bib

3. Copy template

3. 复制模板

cp assets/templates/pandoc-scholarly-paper.md paper.md
cp assets/templates/pandoc-scholarly-paper.md paper.md

4. Edit paper.md (add content, cite sources)

4. 编辑paper.md(添加内容、引用资源)

5. Convert to PDF

5. 转换为PDF

python scripts/convert_academic.py paper.md pdf
undefined
python scripts/convert_academic.py paper.md pdf
undefined

Citation Examples

引用示例

Pandoc syntax:
markdown
Recent research [@Smith2024] shows...
Multiple studies [@Jones2023; @Brown2024] indicate...
MyST syntax:
markdown
Recent research {cite}`Smith2024` shows...
Multiple studies {cite}`Jones2023,Brown2024` indicate...
Pandoc语法:
markdown
Recent research [@Smith2024] shows...
Multiple studies [@Jones2023; @Brown2024] indicate...
MyST语法:
markdown
Recent research {cite}`Smith2024` shows...
Multiple studies {cite}`Jones2023,Brown2024` indicate...

Example Bibliography File

参考文献示例文件

An example bibliography is provided in
assets/references.bib
with common entry types:
  • Journal articles (
    @article
    )
  • Conference papers (
    @inproceedings
    )
  • Books (
    @book
    )
  • PhD theses (
    @phdthesis
    )
  • Web resources (
    @misc
    )
  • Preprints (
    @article
    with arXiv)
assets/references.bib
中提供了参考文献示例,包含常见条目类型:
  • 期刊文章(
    @article
  • 会议论文(
    @inproceedings
  • 书籍(
    @book
  • 博士论文(
    @phdthesis
  • 网络资源(
    @misc
  • 预印本(带arXiv的
    @article

Tips

使用技巧

  1. Organize topics hierarchically - Use
    ##
    for main topics,
    ###
    for subtopics
  2. Use descriptive names - Topic text becomes filename, make it clear
  3. Batch processing - Group related topics in one file for efficiency
  4. Output organization - Create separate folders for different research projects
  5. Content review - Results are truncated at 3000 chars/source for readability
  6. Academic workflow - Use bibliography generator to cite research sources in papers
  7. Template customization - Modify templates for your field's citation style
  1. 分层组织主题 - 使用
    ##
    表示主主题,
    ###
    表示子主题
  2. 使用描述性名称 - 主题文本会成为文件名,确保清晰易懂
  3. 批量处理 - 将相关主题分组到一个文件中以提高效率
  4. 输出管理 - 为不同的研究项目创建独立文件夹
  5. 内容审核 - 为保证可读性,每条资源的内容会被截断至3000字符
  6. 学术工作流 - 使用参考文献生成器在论文中引用研究资源
  7. 模板自定义 - 根据所在领域的引用风格修改模板

Limitations

局限性

  • No summarization - Returns raw scraped content, not summaries
  • No deduplication - Duplicate sources may appear across topics
  • No quality ranking - All results treated equally
  • New files only - Does not append to existing files
  • Free tier constraints - Rate limiting affects processing speed
  • 无摘要功能 - 返回原始抓取内容,不生成摘要
  • 无去重功能 - 不同主题间可能出现重复资源
  • 无质量排序 - 所有结果被同等对待
  • 仅创建新文件 - 不会追加到现有文件中
  • 免费套餐限制 - 速率限制会影响处理速度