firecrawl-research

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

FireCrawl Research

FireCrawl 研究工具

Overview

概述

Enrich research documents by automatically searching and scraping web sources using the FireCrawl API. Extract research topics from markdown files and generate comprehensive research documents with source material.

通过FireCrawl API自动搜索并抓取网络资源，丰富研究文档内容。从Markdown文件中提取研究主题，结合原始资源生成全面的研究文档。

When to Use This Skill

何时使用本Skill

Use this skill when the user:

Says "Research this topic using FireCrawl"
Requests to enrich notes or documents with web sources
Wants to gather information about topics listed in a markdown file
Needs to search and scrape multiple topics systematically

当用户有以下需求时，可使用本Skill：

提出“使用FireCrawl研究这个主题”
请求用网络资源丰富笔记或文档
想要收集Markdown文件中列出的主题相关信息
需要系统性地搜索和抓取多个主题的信息

How It Works

工作原理

1. Topic Extraction

1. 主题提取

The script automatically extracts research topics from markdown files using two methods:

Method 1: Headers

markdown

undefined

脚本通过两种方法自动从Markdown文件中提取研究主题：

方法1：标题

markdown

undefined

Spatial Reasoning in AI

Computer Vision Applications

Both `Spatial Reasoning in AI` and `Computer Vision Applications` become research topics.

**Method 2: Research Tags**
```markdown
- [research] Large Language Models for robotics
- [search] Theory of Mind in autonomous driving

Both tagged items become research topics.

`Spatial Reasoning in AI`和`Computer Vision Applications`都会成为研究主题。

**方法2：研究标签**
```markdown
- [research] Large Language Models for robotics
- [search] Theory of Mind in autonomous driving

所有带标签的条目都会成为研究主题。

2. Search and Scrape

2. 搜索与抓取

For each topic:

Searches FireCrawl with the topic as query
Retrieves up to N results (default: 5)
Automatically scrapes full content from each result
Extracts markdown-formatted content (main content only)

针对每个主题：

以该主题为查询词在FireCrawl中搜索
最多获取N条结果（默认：5条）
自动抓取每条结果的完整内容
提取Markdown格式的内容（仅保留主体内容）

3. Output Generation

3. 输出生成

Creates new markdown files in the specified output directory:

One file per topic
Filename:
```
{topic}_{timestamp}.md
```
Contains: title, date, sources count, full scraped content
Each source includes: title, URL, markdown content

在指定的输出目录中创建新的Markdown文件：

每个主题对应一个文件
文件名格式：
```
{topic}_{timestamp}.md
```
文件包含：标题、日期、资源数量、完整抓取内容
每条资源包含：标题、URL、Markdown内容

Usage

使用方法

Basic Usage

基础用法

bash

python scripts/firecrawl_research.py research.md

Outputs to current directory.

bash

python scripts/firecrawl_research.py research.md

输出到当前目录。

Specify Output Directory

指定输出目录

bash

python scripts/firecrawl_research.py research.md ./output

Creates files in

./output/

folder.

bash

python scripts/firecrawl_research.py research.md ./output

将文件创建到

./output/

文件夹中。

Limit Results Per Topic

限制每个主题的结果数量

bash

python scripts/firecrawl_research.py research.md ./output 3

Retrieves maximum 3 results per topic.

bash

python scripts/firecrawl_research.py research.md ./output 3

每个主题最多获取3条结果。

Configuration

配置说明

API Key Setup

API密钥设置

Copy
```
.env.example
```
to
```
.env
```
:
bash
```
cp .env.example .env
```

Add FireCrawl API key:

FIRECRAWL_API_KEY=fc-your-actual-api-key

The script automatically loads the API key from the skill's

.env

file.

将
```
.env.example
```
复制为
```
.env
```
：
bash
```
cp .env.example .env
```

添加FireCrawl API密钥：

FIRECRAWL_API_KEY=fc-your-actual-api-key

脚本会自动从Skill的

.env

文件中加载API密钥。

Rate Limiting

速率限制

The script includes automatic rate limiting for FireCrawl's free tier:

Free tier limit: 5 requests/minute
Built-in delay: 12 seconds between topics
Prevents API errors and credit exhaustion

When processing multiple topics, expect:

5 topics: ~1 minute
10 topics: ~2 minutes
20 topics: ~4 minutes

脚本针对FireCrawl的免费套餐内置了自动速率限制：

免费套餐限制： 每分钟5次请求
内置延迟： 主题之间间隔12秒
避免API错误和额度耗尽

处理多个主题时，耗时参考：

5个主题：约1分钟
10个主题：约2分钟
20个主题：约4分钟

Workflow Example

工作流示例

User request: "Research these AI topics using FireCrawl"

Input file (
ai-research.md
):

markdown

undefined

用户请求： "使用FireCrawl研究这些AI主题"

输入文件（
ai-research.md
）：

markdown

undefined

AI Research Topics

Spatial Reasoning in Vision-Language Models

[research] Embodied AI for robotics
[research] Computer Use Agents


**Command:**
```bash
python scripts/firecrawl_research.py ai-research.md ./research_output 5

Output:

research_output/
├── Spatial_Reasoning_in_Vision-Language_Models_20251122_140530.md
├── Embodied_AI_for_robotics_20251122_140542.md
└── Computer_Use_Agents_20251122_140554.md

Each file contains:

Topic title
Timestamp
Source count
Full scraped content from up to 5 sources
Source URLs

[research] Embodied AI for robotics
[research] Computer Use Agents


**命令：**
```bash
python scripts/firecrawl_research.py ai-research.md ./research_output 5

输出：

research_output/
├── Spatial_Reasoning_in_Vision-Language_Models_20251122_140530.md
├── Embodied_AI_for_robotics_20251122_140542.md
└── Computer_Use_Agents_20251122_140554.md

每个文件包含：

主题标题
时间戳
资源数量
最多5条资源的完整抓取内容
资源URL

Common Patterns

常见使用模式

Pattern 1: Quick Research

模式1：快速研究

Extract topics from existing notes, research them, save to current folder:

bash

python scripts/firecrawl_research.py my-notes.md

从现有笔记中提取主题，进行研究并保存到当前文件夹：

bash

python scripts/firecrawl_research.py my-notes.md

Pattern 2: Organized Research

模式2：规范化研究

Create dedicated output folder for research results:

bash

python scripts/firecrawl_research.py topics.md ./research_results

为研究结果创建专用输出文件夹：

bash

python scripts/firecrawl_research.py topics.md ./research_results

Pattern 3: Deep Dive

模式3：深度研究

Increase results per topic for comprehensive coverage:

bash

python scripts/firecrawl_research.py topics.md ./deep_research 10

增加每个主题的结果数量以获取全面内容：

bash

python scripts/firecrawl_research.py topics.md ./deep_research 10

Pattern 4: Obsidian Vault Integration

模式4：Obsidian库集成

Direct output to vault's research folder:

bash

python scripts/firecrawl_research.py topics.md ~/Brains/brain/Research

直接输出到Obsidian库的研究文件夹：

bash

python scripts/firecrawl_research.py topics.md ~/Brains/brain/Research

Error Handling

错误处理

"API key not found"

Create

.env

file in skill folder with

FIRECRAWL_API_KEY=...

在Skill文件夹中创建

.env

文件，并添加

FIRECRAWL_API_KEY=...

"Rate limit exceeded"

Free tier: 5 req/min
Script has 12s delay built-in
If still hitting limit, reduce topics or wait between runs

免费套餐：每分钟5次请求
脚本内置12秒延迟
如果仍触发限制，减少主题数量或在多次运行之间等待

"Insufficient credits"

Check FireCrawl account credits
Upgrade plan or wait for credit reset

检查FireCrawl账户的额度
升级套餐或等待额度重置

"No topics found"

Add topics to markdown using:

```
## Header format
```
```
- [research] Topic format
```
```
- [search] Topic format
```

通过以下方式在Markdown中添加主题：

```
## 标题格式
```
```
- [research] 主题格式
```
```
- [search] 主题格式
```

Script Details

脚本详情

Location:

scripts/firecrawl_research.py

Dependencies:

```
python-dotenv
```
- Environment variable management
```
requests
```
- HTTP requests to FireCrawl API

Install dependencies:

bash

pip install python-dotenv requests

FireCrawl Features Used:

```
/v1/search
```
endpoint - Search with automatic scraping
```
scrapeOptions.formats: ['markdown']
```
- Markdown output
```
scrapeOptions.onlyMainContent: true
```
- Filter noise

位置：

scripts/firecrawl_research.py

依赖项：

```
python-dotenv
```
- 环境变量管理
```
requests
```
- 向FireCrawl API发送HTTP请求

安装依赖项：

bash

pip install python-dotenv requests

使用的FireCrawl功能：

```
/v1/search
```
端点 - 带自动抓取的搜索功能
```
scrapeOptions.formats: ['markdown']
```
- Markdown格式输出
```
scrapeOptions.onlyMainContent: true
```
- 过滤无关内容

Academic Writing Templates

学术写作模板

This skill includes templates for writing scientific papers in markdown format.

本Skill包含用于撰写Markdown格式科学论文的模板。

Available Templates

可用模板

1. Pandoc Scholarly Paper (

assets/templates/pandoc-scholarly-paper.md

)

Standard academic paper format
Compatible with Pandoc converter
Supports citations via BibTeX
Exports to PDF, DOCX, HTML

2. MyST Scientific Paper (

assets/templates/myst-scientific-paper.md

)

MyST (Markedly Structured Text) format
Advanced cross-referencing
Professional scientific publishing
Multi-format export (PDF, LaTeX, DOCX)

1. Pandoc学术论文模板（

assets/templates/pandoc-scholarly-paper.md

）

标准学术论文格式
与Pandoc转换器兼容
支持通过BibTeX添加引用
可导出为PDF、DOCX、HTML格式

2. MyST科学论文模板（

assets/templates/myst-scientific-paper.md

）

MyST（Markedly Structured Text）格式
支持高级交叉引用
适用于专业科学出版
支持多格式导出（PDF、LaTeX、DOCX）

Using Templates

模板使用方法

Copy template to your project:

bash

cp assets/templates/pandoc-scholarly-paper.md my-paper.md

将模板复制到你的项目中：

bash

cp assets/templates/pandoc-scholarly-paper.md my-paper.md

or

或

cp assets/templates/myst-scientific-paper.md my-paper.md


**Edit content:**
- Update YAML frontmatter (title, authors, affiliations)
- Write your content in sections
- Add citations using `[@AuthorYear]` (Pandoc) or `{cite}\`AuthorYear\`` (MyST)

**Convert to PDF/DOCX:**
```bash
python scripts/convert_academic.py my-paper.md pdf
python scripts/convert_academic.py my-paper.md docx
python scripts/convert_academic.py my-paper.md pdf --myst  # For MyST

cp assets/templates/myst-scientific-paper.md my-paper.md


**编辑内容：**
- 更新YAML前置内容（标题、作者、机构）
- 按章节撰写内容
- 使用`[@AuthorYear]`（Pandoc）或`{cite}\`AuthorYear\``（MyST）添加引用

**转换为PDF/DOCX：**
```bash
python scripts/convert_academic.py my-paper.md pdf
python scripts/convert_academic.py my-paper.md docx
python scripts/convert_academic.py my-paper.md pdf --myst  # 针对MyST模板

Bibliography Generation

参考文献生成

Convert FireCrawl research results into BibTeX bibliography entries:

bash

python scripts/generate_bibliography.py research_output/*.md -o references.bib

What it does:

Extracts URLs and titles from FireCrawl markdown files
Generates BibTeX
```
@misc
```
entries
Creates citation keys automatically
Adds access dates

Example workflow:

bash

undefined

将FireCrawl研究结果转换为BibTeX参考文献条目：

bash

python scripts/generate_bibliography.py research_output/*.md -o references.bib

功能说明：

从FireCrawl生成的Markdown文件中提取URL和标题
生成BibTeX的
```
@misc
```
条目
自动创建引用键
添加访问日期

示例工作流：

bash

undefined

1. Research topics

1. 研究主题

python scripts/firecrawl_research.py topics.md ./research

2. Generate bibliography

2. 生成参考文献

python scripts/generate_bibliography.py research/*.md -o refs.bib

3. Copy template

3. 复制模板

cp assets/templates/pandoc-scholarly-paper.md paper.md

4. Edit paper.md (add content, cite sources)

4. 编辑paper.md（添加内容、引用资源）

5. Convert to PDF

5. 转换为PDF

python scripts/convert_academic.py paper.md pdf

undefined

python scripts/convert_academic.py paper.md pdf

undefined

Citation Examples

引用示例

Pandoc syntax:

markdown

Recent research [@Smith2024] shows...
Multiple studies [@Jones2023; @Brown2024] indicate...

MyST syntax:

markdown

Recent research {cite}`Smith2024` shows...
Multiple studies {cite}`Jones2023,Brown2024` indicate...

Pandoc语法：

markdown

Recent research [@Smith2024] shows...
Multiple studies [@Jones2023; @Brown2024] indicate...

MyST语法：

markdown

Recent research {cite}`Smith2024` shows...
Multiple studies {cite}`Jones2023,Brown2024` indicate...

Example Bibliography File

参考文献示例文件

An example bibliography is provided in

assets/references.bib

with common entry types:

Journal articles (
```
@article
```
)
Conference papers (
```
@inproceedings
```
)
Books (
```
@book
```
)
PhD theses (
```
@phdthesis
```
)
Web resources (
```
@misc
```
)
Preprints (
```
@article
```
with arXiv)

assets/references.bib

中提供了参考文献示例，包含常见条目类型：

期刊文章（
```
@article
```
）
会议论文（
```
@inproceedings
```
）
书籍（
```
@book
```
）
博士论文（
```
@phdthesis
```
）
网络资源（
```
@misc
```
）
预印本（带arXiv的
```
@article
```
）

Tips

使用技巧

Organize topics hierarchically - Use
```
##
```
for main topics,
```
###
```
for subtopics
Use descriptive names - Topic text becomes filename, make it clear
Batch processing - Group related topics in one file for efficiency
Output organization - Create separate folders for different research projects
Content review - Results are truncated at 3000 chars/source for readability
Academic workflow - Use bibliography generator to cite research sources in papers
Template customization - Modify templates for your field's citation style

分层组织主题 - 使用
```
##
```
表示主主题，
```
###
```
表示子主题
使用描述性名称 - 主题文本会成为文件名，确保清晰易懂
批量处理 - 将相关主题分组到一个文件中以提高效率
输出管理 - 为不同的研究项目创建独立文件夹
内容审核 - 为保证可读性，每条资源的内容会被截断至3000字符
学术工作流 - 使用参考文献生成器在论文中引用研究资源
模板自定义 - 根据所在领域的引用风格修改模板

Limitations

局限性

No summarization - Returns raw scraped content, not summaries
No deduplication - Duplicate sources may appear across topics
No quality ranking - All results treated equally
New files only - Does not append to existing files
Free tier constraints - Rate limiting affects processing speed

无摘要功能 - 返回原始抓取内容，不生成摘要
无去重功能 - 不同主题间可能出现重复资源
无质量排序 - 所有结果被同等对待
仅创建新文件 - 不会追加到现有文件中
免费套餐限制 - 速率限制会影响处理速度