firecrawl-research-index

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Firecrawl Research Index

Firecrawl Research 索引

Find the research papers that answer a research query. Some questions have a single answer; many have several — and when in doubt, lean toward returning the fuller relevant set (most relevant first) rather than narrowing to one. A reader is better served seeing the neighboring methods and papers than having them silently dropped.
There is no fixed recipe. Read the query, decide what kind it is, and choose the approach below. Some queries need a single search; others need heavy sturctural/semantic expansion. Don't run machinery a query doesn't call for.
查找能回应研究查询的学术论文。有些问题只有一个答案;但多数问题有多个答案——当存在疑问时,应倾向于返回更完整的相关论文集合(按相关性从高到低排序),而非仅筛选出一篇。让读者看到相关的同类研究方法和论文,比直接忽略它们更有价值。
没有固定流程。请先阅读查询内容,判断其类型,再选择下方对应的方法。有些查询只需一次搜索;有些则需要大量的结构/语义扩展。不要执行查询不需要的操作。

The tools, and what each is uniquely good at

各工具及其独特优势

  • MCP:
    firecrawl_research_search_papers(query, k?)
    CLI:
    firecrawl research search-papers <query> [--k <number>]
    Semantic (HyDE) search over abstracts. The natural first move for almost any query. If results look thin or all-alike, re-run with a different framing (sibling domain, rival method, dataset/benchmark name) rather than giving up.
  • MCP:
    firecrawl_research_related_papers(seed_ids, intent, mode?, k?)
    CLI:
    firecrawl research related-papers <seedIds...> --intent <intent> [--mode <similar|citers|references>] [--k <number>]
    Semantic and structural expansion, ranked to your
    intent
    . This reaches papers semantic search cannot, and it's how you turn one good hit into the rest of a set.
    mode=similar
    → niche siblings;
    citers
    → who uses/builds on the seeds;
    references
    → what they build on / compare against.
  • MCP:
    firecrawl_research_inspect_paper(id)
    CLI:
    firecrawl research inspect-paper <id>
    Canonical metadata for one paper: title, abstract, authors, categories, source ids, and dates. Use it after
    search_papers
    or
    related_papers
    when you need the complete citation/metadata for a candidate, or when you have an id from elsewhere and need to confirm what paper it resolves to. This does not read the paper body; use
    read_paper
    for specific full-text questions.
  • MCP:
    firecrawl_research_read_paper(id, question)
    CLI:
    firecrawl research read-paper <id> --question <question>
    In-body passages of one paper, to verify a load-bearing constraint (a method actually used, a score actually reported, an affiliation, what a paper compares to). Use it to settle a specific doubt, not on everything.
  • MCP:
    firecrawl_search(query)
    /
    firecrawl_scrape(url)
    CLI:
    firecrawl search <query>
    /
    firecrawl scrape <url>
    General web search and page fetch, for facts that don't live in paper abstracts: benchmark leaderboards, rankings, "who scores best / is largest / is most used." Find the ranking on the web, then map the top entries back to papers with
    search_papers
    . Reach for these only when the corpus can't answer the question on its own.
  • MCP:
    firecrawl_research_search_papers(query, k?)
    CLI:
    firecrawl research search-papers <query> [--k <number>]
    针对摘要进行语义(HyDE)搜索。几乎所有查询的首选操作。 若搜索结果数量少或相似度极高,请换一种表述方式(如同类领域、竞品方法、数据集/基准测试名称)重新搜索,不要直接放弃。
  • MCP:
    firecrawl_research_related_papers(seed_ids, intent, mode?, k?)
    CLI:
    firecrawl research related-papers <seedIds...> --intent <intent> [--mode <similar|citers|references>] [--k <number>]
    根据你的
    intent
    进行排序的语义与结构扩展。 该工具能找到语义搜索无法触及的论文,可将一篇优质结果扩展为完整的相关论文集合。
    mode=similar
    → 同类细分领域论文;
    citers
    → 引用或基于种子论文进行研究的论文;
    references
    → 种子论文所基于或对比的论文。
  • MCP:
    firecrawl_research_inspect_paper(id)
    CLI:
    firecrawl research inspect-paper <id>
    获取单篇论文的标准元数据:标题、摘要、作者、分类、来源ID及日期。 当你需要候选论文的完整引用/元数据,或已有某个论文ID并需确认对应的论文时,可在
    search_papers
    related_papers
    之后使用此工具。 该工具不会读取论文正文;若需针对全文内容的特定问题,请使用
    read_paper
  • MCP:
    firecrawl_research_read_paper(id, question)
    CLI:
    firecrawl research read-paper <id> --question <question>
    获取单篇论文的正文片段,用于验证关键约束条件(如实际使用的方法、报告的分数、机构归属、论文对比对象等)。 仅用它解决特定疑问,不要滥用。
  • MCP:
    firecrawl_search(query)
    /
    firecrawl_scrape(url)
    CLI:
    firecrawl search <query>
    /
    firecrawl scrape <url>
    通用网页搜索与页面抓取,用于获取论文摘要中没有的信息:基准测试排行榜、排名、"谁得分最高/规模最大/使用最广泛"等。 先在网页上找到排名,再通过
    search_papers
    将排名靠前的条目对应到论文。 仅当论文库无法自行回答问题时,才使用这些工具。

Match the approach to the query

根据查询类型选择对应方法

  • Single named paper ("the Qwen3 report") → one
    search_papers
    , done. This is the only case that truly wants exactly one paper.
  • Paper by description / by method or technique ("the paper that introduced X", "training-free N-gram detection of AI text") → find the best match, then assume there's a family: expand with
    related_papers
    and include the closely-related methods/papers too. Even when one paper is the exact literal match, surface and keep its neighbors — don't narrow to the single best hit and reason the rest out. Only treat it as one-answer if the query names a specific paper.
  • Enumeration / method-family ("papers that do X", "alternatives to Adam", "benchmarks for Y") → the answer is a set, and this is where
    related_papers
    earns its keep: expand several strong anchors with
    mode=similar
    , re-seed from new strong hits. One search is never enough here.
  • Exhibiting ("papers that use / exhibit property P") → the relevant papers apply P but their abstracts may not describe it. Go from P's defining paper outward via
    citers
    /
    references
    , and use
    read_paper
    to confirm a candidate actually uses P.
  • Superlative / leaderboard ("best on benchmark X", "largest", "most popular") → the ranking lives on leaderboards / the web, not in any single abstract. Use
    firecrawl_search
    /
    firecrawl_scrape
    to find the benchmark's leaderboard or rankings, read off the top models/papers, then
    search_papers
    each to get its paper. As a fallback, search the benchmark and
    read_paper
    candidates for reported numbers. The hardest kind — cast wide.
  • Org / author filtered ("from <org>", "by <author>") → topical match isn't enough; verify the affiliation/authorship (metadata or
    read_paper
    ) before keeping a paper.
  • Compare-against ("what does paper X benchmark against / build on") → the answer is inside paper X:
    read_paper(X, ...)
    or
    related_papers([X], ..., mode="references")
    .
  • 指定名称的单篇论文(如"Qwen3报告")→ 执行一次
    search_papers
    即可。这是唯一真正只需要一篇论文的场景。
  • 按描述/方法或技术查找论文(如"提出X方法的论文"、"无需训练的AI文本N-gram检测论文")→ 找到最匹配的论文后,默认存在同类研究:使用
    related_papers
    进行扩展,并同时包含密切相关的方法/论文。即使找到完全匹配的论文,也要展示并保留其同类研究——不要仅筛选出最优结果而忽略其他相关内容。只有当查询明确指定某篇论文时,才只返回一篇。
  • 枚举/方法类查询(如"研究X的论文"、"Adam优化器的替代方案"、"Y任务的基准测试")→ 答案是一个论文集合,此时
    related_papers
    就能发挥作用:以几篇优质论文为锚点,使用
    mode=similar
    进行扩展,并从新的优质结果中重新选取锚点继续扩展。仅一次搜索远远不够。
  • 属性验证类查询(如"使用/具备属性P的论文")→ 相关论文确实应用了P,但摘要中可能未提及。从定义P的论文出发,通过
    citers
    /
    references
    向外扩展,并使用
    read_paper
    确认候选论文是否确实使用了P。
  • 最高级/排行榜类查询(如"基准测试X上表现最佳的论文"、"规模最大的"、"最受欢迎的")→ 这类排名信息存在于网页/排行榜中,而非单篇论文的摘要里。使用
    firecrawl_search
    /
    firecrawl_scrape
    找到基准测试的排行榜或排名,提取排名靠前的模型/论文,再通过
    search_papers
    获取对应的论文。作为备选方案,可搜索该基准测试相关内容,并通过
    read_paper
    查看候选论文中报告的数据。这是难度最高的一类查询,需扩大搜索范围。
  • 机构/作者筛选类查询(如"来自<机构>的论文"、"<作者>的论文")→ 仅主题匹配不够;在保留论文前,需验证其机构归属/作者信息(通过元数据或
    read_paper
    )。
  • 对比类查询(如"论文X以哪些内容为基准/基于哪些研究")→ 答案就在论文X中:使用
    read_paper(X, ...)
    related_papers([X], ..., mode="references")

Principles

原则

  • When in doubt, include. For any topic / method / comparison question, return the relevant family, not just the single best match — err toward keeping a plausibly-relevant paper rather than dropping it. The neighboring methods are part of a good answer; don't reason close work out just because one paper is the most exact match.
  • Follow the literature, and keep what you find. The seminal source, the competing methods, the close neighbors are usually a hop away — use
    related_papers
    , and include them, not just the first hit. Stopping at one good result is the most common way to leave the reader with half an answer.
  • Verify to exclude, not to gatekeep. Use
    read_paper
    to rule a paper out when a hard constraint clearly fails (wrong org/author, doesn't actually report the score). When a paper is plausibly relevant, lean toward keeping it rather than demanding proof.
  • Only drop the clearly off-topic. Don't pad with papers you're confident are unrelated — but that's a high bar; most plausibly-relevant work should make the cut.
  • 存疑时,优先保留。 对于任何主题/方法/对比类问题,返回相关的论文家族,而非仅最优匹配结果——宁可保留看似相关的论文,也不要轻易排除。同类研究方法是优质答案的一部分;不要因为某篇论文是最匹配的结果,就忽略其他相关研究。
  • 追踪文献脉络,保留所有发现。 开创性研究、竞品方法、同类研究通常只需一步就能找到——使用
    related_papers
    ,并将这些论文都包含进来,不要仅停留在首个结果。仅找到一个优质结果就停止,是导致读者得到不完整答案的最常见原因。
  • 验证是为了排除,而非设限。 当明确不符合硬性约束条件(如机构/作者错误、未报告对应分数)时,使用
    read_paper
    排除该论文。当论文看似相关时,优先保留,而非要求绝对的证据。
  • 仅排除明显无关的论文。 不要添加你确定无关的论文——但排除标准要严格;大多数看似相关的研究都应被保留。