apify-link-prospecting-outreach

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Link Prospecting Outreach

链接潜在合作方挖掘与外展

Turn a goal + a target keyword + a URL the user wants to promote into a tiered, ready-to-send outreach list: SERP-ranking prospects with Ahrefs-scored authority, the strongest pitch angle per prospect, an outreach-type-matched email draft, and a copy-paste-ready link placement.
将目标+目标关键词+用户想要推广的URL转化为分层、可直接发送的外展列表:包含经Ahrefs评分的SERP排名潜在合作方、每个潜在合作方的最佳推广切入点、适配推广类型的邮件草稿,以及可直接复制粘贴的链接放置方案。

Prerequisites

前置条件

(No need to check it upfront)
  • .env
    file with
    APIFY_TOKEN
  • Ahrefs MCP available (the skill calls
    mcp__claude_ai_Ahrefs__*
    tools for prospect scoring)
  • Node.js 20.6+ (for native
    --env-file
    support)
  • One-time setup inside the skill's
    scripts/
    folder:
    npm install
(无需提前检查)
  • 包含
    APIFY_TOKEN
    .env
    文件
  • 可使用Ahrefs MCP(本技能会调用
    mcp__claude_ai_Ahrefs__*
    工具为潜在合作方评分)
  • Node.js 20.6+(支持原生
    --env-file
    功能)
  • 在技能的
    scripts/
    文件夹中完成一次性设置:
    npm install

Helper scripts (one config, four steps)

辅助脚本(一个配置,四个步骤)

After Step 1–2 inputs are collected, write them to a single
campaign.json
(schema in
campaign.json.example
). Every downstream script reads
--config campaign.json
, so the agent doesn't fork per-campaign copies. Sequence:
bash
undefined
收集完步骤1-2的输入后,将其写入单个**
campaign.json
**文件(示例 schema 见
campaign.json.example
)。所有下游脚本均读取
--config campaign.json
,因此无需为每个活动复制配置。流程如下:
bash
undefined

1. Run the Actor (writes {base}.json + sub-Actor sidecars when --fetch-sub-datasets)

1. 运行Actor(当使用--fetch-sub-datasets时,会写入{base}.json以及子Actor的附属文件)

node --env-file=.env scripts/run_actor.js --actor "apify/link-prospecting-tool" --input '<json>' --timeout 1800 --fetch-sub-datasets --output {base}.json --format json
node --env-file=.env scripts/run_actor.js --actor "apify/link-prospecting-tool" --input '<json>' --timeout 1800 --fetch-sub-datasets --output {base}.json --format json

2. Build unified prospect table from the sidecars

2. 从附属文件构建统一的潜在合作方表格

python3 scripts/build_prospects.py --config campaign.json
python3 scripts/build_prospects.py --config campaign.json

3. (After Step 5 Ahrefs MCP calls → save to {base}_ahrefs_domain.json + {base}_ahrefs_page.json)

3.(步骤5调用Ahrefs MCP后 → 保存至{base}_ahrefs_domain.json + {base}_ahrefs_page.json)

python3 scripts/enrich_prospects.py --config campaign.json
python3 scripts/enrich_prospects.py --config campaign.json

4. (After Step 8 sub-agents write outputs to /tmp/placement_outputs/row_*.json)

4.(步骤8子代理将输出写入/tmp/placement_outputs/row_*.json后)

python3 scripts/merge_subagent_outputs.py --config campaign.json --outputs-dir /tmp/placement_outputs
python3 scripts/merge_subagent_outputs.py --config campaign.json --outputs-dir /tmp/placement_outputs

5. Write the final xlsx + metadata sidecar

5. 生成最终的xlsx文件及元数据附属文件

python3 scripts/write_xlsx.py --config campaign.json

If the runner's client-side wait elapses with the Actor still running on Apify, use `scripts/fetch_run_artifacts.js --run-id <id> --output {base}.json` instead of restarting. If the parent run is missing `SUB_ACTOR_RESULTS` (post-2026-05-20 Actor schema), `scripts/fetch_subactors_from_log.js` resolves sub-Actor runIds from the parent log.
python3 scripts/write_xlsx.py --config campaign.json

如果Apify上的Actor仍在运行,但本地客户端等待超时,可使用`scripts/fetch_run_artifacts.js --run-id <id> --output {base}.json`替代重启。如果主运行缺少`SUB_ACTOR_RESULTS`(2026-05-20之后的Actor schema),`scripts/fetch_subactors_from_log.js`将从主运行日志中解析子Actor的runId。

Workflow

工作流程

Copy this checklist and track progress:
Task Progress:
- [ ] Step 1: Collect required anchor inputs incl. goal (block on these)
- [ ] Step 2: Collect brand voice, partnership type, output format
- [ ] Step 3: Run apify/link-prospecting-tool
- [ ] Step 4: Pull leads, mentions, authors, and sub-Actor datasets
- [ ] Step 5: Enrich every domain with Ahrefs metrics, assign Prospect Tier
- [ ] Step 6: Run skip pass — flag rows to drop before drafting
- [ ] Step 7: Compute "Why This Prospect" tag per surviving row
- [ ] Step 8: Compose per-row 3-artifact placement + outreach-type-aware email
- [ ] Step 9: Render output in chosen format
复制此清单并跟踪进度:
任务进度:
- [ ] 步骤1:收集必需的核心输入,包括目标(需完成这些才能继续)
- [ ] 步骤2:收集品牌调性、合作类型、输出格式
- [ ] 步骤3:运行apify/link-prospecting-tool
- [ ] 步骤4:提取线索、提及记录、作者信息及子Actor数据集
- [ ] 步骤5:使用Ahrefs指标丰富每个域名信息,分配潜在合作方层级
- [ ] 步骤6:执行筛选流程——标记需要剔除的记录,再开始起草邮件
- [ ] 步骤7:为留存的每条记录计算“选择该潜在合作方的理由”标签
- [ ] 步骤8:为每条记录撰写包含三种形式的链接放置方案+适配推广类型的邮件
- [ ] 步骤9:以选定格式渲染输出

Step 1: Required Anchor Inputs (ask FIRST, before anything else)

步骤1:必需的核心输入(首先询问,优先完成)

Do NOT proceed to Step 2 until every required input is answered. Surface them as the very first interaction. The dedup input (#7) is optional but must still be explicitly asked.
  1. Concrete goal for this campaign — pick one preset or supply custom text. The goal drives skip-pass filtering, outreach-type template selection, and Prospect Tier thresholds. Required.
    PresetEffect downstream
    Recover unlinked brand mentions
    Skip pass drops every row where
    brand_mentioned_in_source
    is
    false
    . Default outreach type =
    unlinked-mention-claim
    .
    Replace competitor links
    Skip pass drops every row not tagged
    Links to competitor
    . Default outreach type =
    competitor-link-replacement
    .
    Topical authority links to specific URL
    No filter. Tier thresholds tighten (DR ≥ 50 for tier A). Default outreach type chosen per-row from
    Why This Prospect
    .
    Maximum link volume from any relevant site
    No filter. Tier thresholds relax (DR ≥ 30 for tier A). Default outreach type chosen per-row.
    Custom
    User-supplied paragraph; biases email tone and tier weights. No automatic skip filter.
  2. Target keyword(s) — one or more keywords the user wants their link to appear next to. The skill prospects the SERP for each. At least one required.
  3. Brand name — the user's brand or product name. The Actor will not run without this (it is the
    brand
    input field).
  4. Product/category description — one or two sentences describing what the user sells, who they sell to, and what category their product fits in. Example: "Apify — web scraping platform that runs serverless scrapers as APIs. We sell to developers and data teams who need scraped data without managing infrastructure." Required. Used in Step 6 (topical-fit gate) and Step 7 (adversarial-mention detection) to recognise prospects who are in the same product category — those won't link no matter the pitch. Without this, the skill cannot distinguish a genuine editorial opportunity from a competitor's blog.
  5. URL of content to link to — the destination URL that will be inserted into partner articles. Required.
  6. Competitors — anyone in the user's product category who would publish a "ours vs theirs" comparison page on their own site. Frame the ask this way explicitly: "List every company that would write an X-vs-YourBrand comparison page. These won't link to you no matter what — small competitors count too." Encourage 10+ entries; most users default to listing 3–5 obvious ones and miss the long tail. Mapped to
    competitorDomains
    on the Actor and reused in Steps 6 (adversarial-mention skip) and 7 (
    Links to competitor
    Why-tag).
    After the user answers, offer (do not push) an Ahrefs auto-pull of organic competitors: "Want me to pull your top organic competitors from Ahrefs and add them to this list? Adds ~50 API units and surfaces smaller competitors you may have missed." If the user says yes and Ahrefs MCP is available, call
    mcp__claude_ai_Ahrefs__site-explorer-organic-competitors
    on the user's domain (extracted from input #5) and merge results into
    competitorDomains
    . If Ahrefs is unavailable or the user declines, proceed with the user-supplied list only.
  7. Already-pitched domains (optional) — domains the user has already contacted in past campaigns. Accept a comma-separated list, a CSV/Sheet path, or "none". The skill drops these in the skip pass so the user doesn't double-pitch. Not required to proceed.
  8. Number of organic results per keyword — how many Google organic SERP results to prospect per keyword. Default 10 if the user is unsure, but ask the question so the user knows the lever exists. Mapped to
    organicResult
    .
  9. LLM sources to track — multi-select. Each enabled engine queries an additional AI search/chat surface and adds Google Search Scraper sub-Actor cost per result fetched. Default: all enabled. Mapping to Actor input flags:
    OptionActor flagCost impact
    ChatGPT Search
    enableChatGpt
    Per-result Google Search Scraper cost
    Gemini
    enableGemini
    Per-result Google Search Scraper cost
    Copilot (Microsoft / Bing)
    enableCopilot
    Per-result Google Search Scraper cost
    Perplexity
    enablePerplexity
    Per-result Google Search Scraper cost
    Google AI Mode
    enableAiMode
    Per-result Google Search Scraper cost
    Google AI Overviews
    enableAiOverviews
    Free — parsed from the SERP already fetched. Keep on regardless of budget.
    Surface the multi-select to the user with all six pre-checked. Disabling individual engines is the main cost-cutting lever short of dropping
    organicResult
    — recommend keeping ChatGPT + Gemini on at minimum (they capture the largest share of LLM-driven discovery traffic in 2026).
  10. Run email verification? — boolean. Default:
    yes
    . Mapped to
    enableEmailVerification
    on the Actor. When enabled, the Actor verifies every email returned by the Contact Details Scraper sub-Actor and tags each lead with a verification status (
    verified
    /
    catch-all
    /
    risky
    /
    invalid
    /
    unknown
    ). The skill uses the status in Step 6 (invalid emails get auto-skipped) and surfaces it as the
    Email Verification
    column in the output. Disable only if the user is rate-limited on verification quota or running cost-tight smoke tests.
Once 1–6 and 8–10 are captured (7 is optional), move on.
在收集到所有必需输入前,请勿进入步骤2。将这些作为首次交互的核心内容。去重输入(第7项)为可选,但仍需明确询问。
  1. 本次活动的具体目标——选择一个预设目标或提供自定义文本。目标将驱动筛选流程、推广类型模板选择及潜在合作方层级阈值。必填项。
    预设目标下游影响
    Recover unlinked brand mentions
    (回收未链接品牌提及)
    筛选流程会剔除所有
    brand_mentioned_in_source
    false
    的记录。默认推广类型 =
    unlinked-mention-claim
    (未链接提及认领)。
    Replace competitor links
    (替换竞品链接)
    筛选流程会剔除所有未标记为
    Links to competitor
    (链接至竞品)的记录。默认推广类型 =
    competitor-link-replacement
    (竞品链接替换)。
    Topical authority links to specific URL
    (为特定URL获取主题权威链接)
    无筛选条件。层级阈值收紧(A级要求DR ≥ 50)。默认推广类型根据每条记录的“选择该潜在合作方的理由”确定。
    Maximum link volume from any relevant site
    (从所有相关网站获取最大链接量)
    无筛选条件。层级阈值放宽(A级要求DR ≥ 30)。默认推广类型根据每条记录确定。
    Custom
    (自定义)
    用户提供的段落;会影响邮件语气和层级权重。无自动筛选规则。
  2. 目标关键词——用户希望其链接出现在附近的一个或多个关键词。本技能会针对每个关键词挖掘SERP结果。至少填写一个,必填项。
  3. 品牌名称——用户的品牌或产品名称。Actor无法在缺少该信息的情况下运行(这是
    brand
    输入字段)。必填项。
  4. 产品/品类描述——用1-2句话描述用户销售的产品、目标受众及产品所属品类。示例:"Apify — 无服务器爬虫API化运行的网页抓取平台。我们面向需要抓取数据但无需管理基础设施的开发者和数据团队。" 必填项。用于步骤6(主题适配校验)和步骤7(对立提及检测),以识别同品类的潜在合作方——这类合作方无论推广方式如何都不会提供链接。缺少此信息,本技能无法区分真正的编辑合作机会与竞品博客。
  5. 待推广内容的URL——将插入合作方文章的目标URL。必填项。
  6. 竞品——用户所在产品品类中,会在自身网站发布“我方vs对方”对比页面的企业。明确以如下方式询问:"列出所有会撰写X-vs-YourBrand对比页面的公司。无论推广方式如何,这些公司都不会链接到你——小型竞品也需计入。" 建议提供10个以上条目;大多数用户默认只列出3-5个明显的竞品,会遗漏长尾竞品。该信息会映射到Actor的
    competitorDomains
    字段,并在步骤6(对立提及筛选)和步骤7(
    Links to competitor
    标签)中复用。
    用户回复后,主动提供(而非强制)从Ahrefs自动获取自然搜索竞品的选项:"是否需要我从Ahrefs获取你的顶级自然搜索竞品并添加到此列表?会消耗约50个API单元,还能帮你发现可能遗漏的小型竞品。" 如果用户同意且Ahrefs MCP可用,调用
    mcp__claude_ai_Ahrefs__site-explorer-organic-competitors
    工具(目标为用户输入第5项中的域名),并将结果合并到
    competitorDomains
    中。如果Ahrefs不可用或用户拒绝,则仅使用用户提供的列表。
  7. 已联系过的域名(可选)——用户在过往活动中已联系过的域名。接受逗号分隔的列表、CSV/表格路径或“无”。本技能会在筛选流程中剔除这些域名,避免重复联系。非必填项。
  8. 每个关键词的自然搜索结果数量——针对每个关键词挖掘多少条Google自然SERP结果。如果用户不确定,默认值为10,但需询问该问题,让用户知晓此可配置项。映射到
    organicResult
    字段。
  9. 需跟踪的LLM来源——多选。每个启用的引擎会额外查询一个AI搜索/聊天界面,并根据获取的结果增加Google Search Scraper子Actor的成本。默认:全部启用。与Actor输入标志的映射关系:
    选项Actor标志成本影响
    ChatGPT Search
    enableChatGpt
    每条结果产生Google Search Scraper成本
    Gemini
    enableGemini
    每条结果产生Google Search Scraper成本
    Copilot(Microsoft / Bing)
    enableCopilot
    每条结果产生Google Search Scraper成本
    Perplexity
    enablePerplexity
    每条结果产生Google Search Scraper成本
    Google AI Mode
    enableAiMode
    每条结果产生Google Search Scraper成本
    Google AI Overviews
    enableAiOverviews
    免费——从已获取的SERP中解析而来。无论预算如何,建议保持启用。
    向用户展示多选选项,默认全选。禁用个别引擎是除了减少
    organicResult
    之外的主要成本控制手段——建议至少保留ChatGPT + Gemini(它们在2026年占据了最大份额的LLM驱动发现流量)。
  10. 是否运行邮箱验证?——布尔值。默认:
    yes
    。映射到Actor的
    enableEmailVerification
    字段。启用后,Actor会验证Contact Details Scraper子Actor返回的每个邮箱,并为每条线索标记验证状态(
    verified
    /
    catch-all
    /
    risky
    /
    invalid
    /
    unknown
    )。本技能会在步骤6(自动剔除无效邮箱)中使用该状态,并在输出的
    Email Verification
    列中展示。仅当用户邮箱验证配额受限或运行成本紧张的测试时,才建议禁用。
收集完1-6及8-10项(第7项可选)后,进入下一步。

Step 2: Secondary Inputs

步骤2:次要输入

Ask these next:
  1. Brand info and voice — a short paragraph describing the product/brand and the tone for outreach (e.g., "casual and helpful", "formal B2B", "founder-led"). Used verbatim to shape every generated email.
  2. Partnership type — the offer the user is willing to make. Determines the offer paragraph substituted into the per-row email. Outreach-type template selection happens separately, per-row, in Step 8.
    OptionWhat it offers
    ABC link exchangeThree-way link swap: partner links to user, user links to a third party, third party links to partner.
    Direct A B link exchangeTwo-way link swap: partner links to user, user links to partner.
    Resource page / list inclusionAsk to be added to an existing curated list or roundup. No reciprocal link offered.
    Unilateral ask (no reciprocal)User asks for the link without offering anything in return — appropriate for unlinked-mention claims and broken-link replacements.
    OtherUser types their own offer (paid placement, free product, co-authored content, etc.).
  3. Output format:
    FormatBehavior
    xlsx
    run_actor.js
    writes a styled spreadsheet to disk.
    markdown
    Agent renders the table inline in chat with email drafts beneath each row.
接下来询问以下内容:
  1. 品牌信息与调性——简短描述产品/品牌及外展邮件的语气(例如:“轻松友好”“正式B2B风格”“创始人主导风格”)。会直接用于塑造每封生成的邮件。
  2. 合作类型——用户愿意提供的合作条件。决定了插入到每条记录邮件中的合作提议段落。推广类型模板的选择会在步骤8中单独针对每条记录进行。
    选项合作内容
    ABC link exchange(三方链接交换)三方链接互换:合作方链接到用户,用户链接到第三方,第三方链接到合作方。
    Direct A B link exchange(双方直接链接交换)双方链接互换:合作方链接到用户,用户链接到合作方。
    Resource page / list inclusion(资源页面/列表收录)请求加入现有精选列表或汇总页面。不提供反向链接。
    Unilateral ask (no reciprocal)(单向请求,无反向链接)用户仅请求链接,不提供任何回报——适用于未链接提及认领和失效链接替换场景。
    Other(其他)用户自定义合作条件(付费放置、免费产品、联合创作内容等)。
  3. 输出格式
    格式行为
    xlsx
    run_actor.js
    会在本地生成带样式的电子表格。
    markdown
    代理会在聊天中内联渲染表格,并在每条记录下方展示邮件草稿。

Step 3: Run the Actor

步骤3:运行Actor

The Actor ID is
apify/link-prospecting-tool
. Full input schema lives in
reference/apify-actor-usage.md
.
Recommended call payload for this skill (defaults chosen for outreach-first workflow):
json
{
  "queries": "<keyword 1>\n<keyword 2>",
  "brand": "<user's brand name>",
  "ownDomains": ["<user-domain.com>"],
  "competitorDomains": [],
  "ignoreDomains": [
    "wikipedia.org", "github.com", "stackoverflow.com", "stackexchange.com",
    "reddit.com", "quora.com", "youtube.com", "twitter.com", "x.com",
    "linkedin.com", "facebook.com", "medium.com", "archive.org",
    "chromewebstore.google.com", "addons.mozilla.org", "apps.apple.com",
    "play.google.com", "microsoftedge.microsoft.com", "marketplace.visualstudio.com"
  ],
  "organicResult": 10,
  "maxContactsPerDomain": 3,
  "department": ["marketing"],
  "searchAuthorName": true,
  "includeMention": true,
  "enableChatGpt": true,
  "enableGemini": true,
  "enableCopilot": true,
  "enablePerplexity": true,
  "enableAiMode": true,
  "enableAiOverviews": true,
  "enableEmailVerification": true
}
The six
enable*
LLM-source flags map 1:1 to the user's Step 1 input #9 multi-select. Pass
false
for any engine the user deselected.
enableEmailVerification
maps to Step 1 input #10.
The
ignoreDomains
default includes two groups:
  • Giants and UGC (wikipedia, github, stackoverflow, reddit, etc.) — too broad to pitch as editorial partners.
  • App / extension marketplaces (Chrome Web Store, Firefox Add-ons, Apple/Google Play, VS Code Marketplace, etc.) — product directory listings, no editorial decision-makers.
Do NOT auto-add to
ignoreDomains
(let the user decide):
  • UGC/community sites like
    kaggle.com
    ,
    dev.to
    ,
    substack.com
    ,
    producthunt.com
    ,
    g2.com
    ,
    capterra.com
    ,
    trustpilot.com
    — some users get real value pitching these.
  • API directories like
    rapidapi.com
    ,
    programmableweb.com
    ,
    publicapis.dev
    — relevant for some products (especially developer-tool brands), irrelevant for others. Surface these as candidates only if the user wants to add them.
The URL-pattern skip rules in Step 6 catch the per-row noise (subdomain prefixes, path patterns) that
ignoreDomains
can't express.
department
defaults to
["marketing"]
only. The skill prioritises editorial-leaning contacts within the returned
marketing
department during row composition (see Step 8). Only add
sales
if the user explicitly wants BD-style partnership pitches. Only add
c_suite
if the prospect domains are very small (1–5 person shops) where the founder may also be the editor.
Call the runner script:
bash
node --env-file=.env ${CLAUDE_PLUGIN_ROOT}/scripts/run_actor.js \
  --actor "apify/link-prospecting-tool" \
  --input 'JSON_INPUT' \
  --timeout 1800 \
  --fetch-sub-datasets \
  --output YYYY-MM-DD_outreach.json \
  --format json
Notes:
  • --timeout 1800
    is the recommended client-side wait. The Actor itself runs 15-50+ min depending on keyword count, LLM-engine fan-out, and
    enableEmailVerification
    . Past calibration runs land in the 20–55 min range. Bumping the default avoids the partial-result situation where the runner gives up but the Actor keeps going.
  • If the client-side wait still elapses with the Actor still running on Apify (status
    RUNNING
    or
    READY
    when the runner exits), do not restart the Actor. Use
    scripts/fetch_run_artifacts.js --run-id <id> --output <file>
    to poll the existing run and download all artifacts — same output shape as
    run_actor.js --fetch-sub-datasets
    .
  • --fetch-sub-datasets
    downloads sibling files alongside the main output:
    *_mentions.json
    ,
    *_authors.json
    ,
    *_serp.json
    ,
    *_wcc.json
    . You need all of them to populate every output column.
Actor ID为
apify/link-prospecting-tool
。完整输入schema见
reference/apify-actor-usage.md
针对本技能的推荐调用负载(默认配置适配外展优先的工作流程):
json
{
  "queries": "<keyword 1>\n<keyword 2>",
  "brand": "<user's brand name>",
  "ownDomains": ["<user-domain.com>"],
  "competitorDomains": [],
  "ignoreDomains": [
    "wikipedia.org", "github.com", "stackoverflow.com", "stackexchange.com",
    "reddit.com", "quora.com", "youtube.com", "twitter.com", "x.com",
    "linkedin.com", "facebook.com", "medium.com", "archive.org",
    "chromewebstore.google.com", "addons.mozilla.org", "apps.apple.com",
    "play.google.com", "microsoftedge.microsoft.com", "marketplace.visualstudio.com"
  ],
  "organicResult": 10,
  "maxContactsPerDomain": 3,
  "department": ["marketing"],
  "searchAuthorName": true,
  "includeMention": true,
  "enableChatGpt": true,
  "enableGemini": true,
  "enableCopilot": true,
  "enablePerplexity": true,
  "enableAiMode": true,
  "enableAiOverviews": true,
  "enableEmailVerification": true
}
六个
enable*
LLM来源标志与用户步骤1输入第9项的多选设置一一对应。对于用户取消选择的引擎,传递
false
enableEmailVerification
映射到步骤1输入第10项。
默认的
ignoreDomains
包含两类域名:
  • 巨头平台与UGC站点(wikipedia、github、stackoverflow、reddit等)——范围过广,不适合作为编辑合作方进行推广。
  • 应用/扩展市场(Chrome Web Store、Firefox Add-ons、苹果/谷歌应用商店、VS Code Marketplace等)——产品目录列表,无编辑决策者。
请勿自动添加到
ignoreDomains
(让用户自行决定):
  • UGC/社区站点如
    kaggle.com
    dev.to
    substack.com
    producthunt.com
    g2.com
    capterra.com
    trustpilot.com
    ——部分用户能通过推广这些站点获得实际价值。
  • API目录如
    rapidapi.com
    programmableweb.com
    publicapis.dev
    ——对部分产品(尤其是开发者工具品牌)相关,对其他产品无关。仅当用户希望添加时,才将这些作为候选域名展示。
步骤6中的URL模式筛选规则会处理
ignoreDomains
无法覆盖的逐行噪声(子域名前缀、路径模式)。
department
默认仅设置为
["marketing"]
。在记录整理过程中(见步骤8),本技能会优先选择返回的
marketing
部门中偏向编辑角色的联系人。仅当用户明确希望进行BD风格的合作推广时,才添加
sales
。仅当潜在合作方域名属于非常小的团队(1-5人)且创始人可能同时担任编辑时,才添加
c_suite
调用运行脚本:
bash
node --env-file=.env ${CLAUDE_PLUGIN_ROOT}/scripts/run_actor.js \
  --actor "apify/link-prospecting-tool" \
  --input 'JSON_INPUT' \
  --timeout 1800 \
  --fetch-sub-datasets \
  --output YYYY-MM-DD_outreach.json \
  --format json
注意事项:
  • --timeout 1800
    是推荐的客户端等待时间。Actor本身的运行时间为15-50+分钟,具体取决于关键词数量、LLM引擎扩展及
    enableEmailVerification
    设置。过往校准运行的时间范围为20-55分钟。提高默认值可避免客户端超时但Actor仍在运行的部分结果情况。
  • 如果客户端等待超时但Apify上的Actor仍在运行(运行状态为
    RUNNING
    READY
    ),请勿重启Actor。使用
    scripts/fetch_run_artifacts.js --run-id <id> --output <file>
    轮询现有运行并下载所有产物——输出格式与
    run_actor.js --fetch-sub-datasets
    相同。
  • --fetch-sub-datasets
    会在主输出文件旁下载附属文件:
    *_mentions.json
    *_authors.json
    *_serp.json
    *_wcc.json
    。需要所有这些文件才能填充输出的每一列。

Step 4: Access All Datasets

步骤4:访问所有数据集

The Actor's output schema changed on or before 2026-05-20. The build_prospects script must handle the new shape; older skill versions that joined a separate MENTIONS dataset are broken.
Current schema (verified 2026-05-20):
File written by runner / fetcherSourcePopulates
*_output.json
(main)
"All leads" dataset
Contact Full Name
,
Contact Job Title
,
Department
,
Seniority
,
Contact Email
,
Email Verification
(when
enableEmailVerification: true
),
Contact LinkedIn
,
Company
,
Domain
. Each lead's
source_url[]
array contains the article URLs that produced this contact, each with a
brand_mentioned_in_source
boolean
— this is the new home of the per-(URL, contact) mention data.
*_serp.json
Google Search Results Scraper sub-Actor (one item per
(query × engine)
combination)
SERP Position
,
Article Title
,
Publish Date
(via
organicResults[]
), and engine attribution per URL (Google Organic, ChatGPT, Gemini, Copilot, Perplexity, Google AI Mode) by joining
aiModeResult.sources[]
,
perplexitySearchResult.sources[]
,
chatGptSearchResult.sources[]
,
geminiSearchResult.sources[]
,
copilotSearchResult.sources[]
. URLs from ChatGPT carry a
?utm_source=chatgpt.com
query suffix — normalise URLs (strip tracking params) before joining.
*_wcc.json
Website Content Crawler sub-Actor
Placement Source Sentence
,
Placement With Link
,
Placement New Insertion
,
Article Author
cross-check, outbound-link inspection for
Links to competitor
and
Resource / roundup page
tags. Canonical URL list for building rows — every URL that got body-crawled appears here, including ones that didn't yield a lead.
*_authors.json
AI Web Scraper sub-Actor (when
searchAuthorName: true
)
Article Author
,
Author Source
(set to
searchAuthorName
). Note: this sub-Actor frequently TIMES-OUT at its 300s default — partial results are still saved.
What changed (vs. pre-2026-05-20 runs):
  1. No separate
    MENTIONS
    /
    AUTHORS
    /
    DOMAINS_WITH_LEADS
    named datasets — mention info is folded into
    main_leads[i].source_url[]
    .
  2. No
    SUB_ACTOR_RESULTS
    record in the parent run's key-value store. Sub-Actor runIds are now only discoverable from the parent run log via regex
    \[apify\.<slug> runId:([A-Za-z0-9]+)\]
    . The runner script's
    --fetch-sub-datasets
    flag now falls back to log-parsing when the KV index is missing; the standalone
    scripts/fetch_subactors_from_log.js
    does the same for runs whose runner already exited.
  3. The mentions schema reduced:
    source_url[i]
    carries only
    {domain, brand_mentioned_in_source, url}
    — no per-engine flags like the old
    ChatGPT_mention
    /
    Perplexity_mention
    . Engine attribution must be reconstructed from the SERP sub-dataset's LLM-result sub-fields (see SERP row above).
If a column's source is missing, write
"Not found"
and add a manual-lookup hint in
Notes
. Never fabricate.
Actor的输出schema在2026-05-20或之前已变更。build_prospects脚本必须适配新格式;旧版本技能中使用单独MENTIONS数据集进行关联的方式已失效。
当前schema(2026-05-20验证):
运行器/获取器生成的文件来源填充字段
*_output.json
(主文件)
"所有线索"数据集
Contact Full Name
Contact Job Title
Department
Seniority
Contact Email
Email Verification
(当
enableEmailVerification: true
时)、
Contact LinkedIn
Company
Domain
每条线索的
source_url[]
数组包含生成该联系人的文章URL,每个URL带有
brand_mentioned_in_source
布尔值
——这是每个(URL,联系人)提及数据的新存储位置。
*_serp.json
Google Search Results Scraper子Actor(每个
(query × engine)
组合对应一条记录)
SERP Position
Article Title
Publish Date
(通过
organicResults[]
),以及每个URL的引擎归属(Google Organic、ChatGPT、Gemini、Copilot、Perplexity、Google AI Mode),通过关联
aiModeResult.sources[]
perplexitySearchResult.sources[]
chatGptSearchResult.sources[]
geminiSearchResult.sources[]
copilotSearchResult.sources[]
获取。来自ChatGPT的URL带有
?utm_source=chatgpt.com
查询参数——关联前需标准化URL(去除跟踪参数)。
*_wcc.json
Website Content Crawler子Actor
Placement Source Sentence
Placement With Link
Placement New Insertion
Article Author
交叉校验、用于
Links to competitor
Resource / roundup page
标签的出站链接检查。构建记录的规范URL列表——所有被抓取正文的URL都会出现在这里,包括未生成线索的URL。
*_authors.json
AI Web Scraper子Actor(当
searchAuthorName: true
时)
Article Author
Author Source
(设置为
searchAuthorName
)。注意:该子Actor经常在默认300秒超时时间内超时——仍会保存部分结果。
与2026-05-20之前的运行相比的变化:
  1. 不再有单独的
    MENTIONS
    /
    AUTHORS
    /
    DOMAINS_WITH_LEADS
    命名数据集——提及信息已整合到
    main_leads[i].source_url[]
    中。
  2. 主运行的键值存储中不再有
    SUB_ACTOR_RESULTS
    记录。子Actor的runId现在只能通过正则表达式
    \[apify\.<slug> runId:([A-Za-z0-9]+)\]
    从主运行日志中获取。当KV索引缺失时,运行器脚本的
    --fetch-sub-datasets
    标志现在会回退到日志解析;独立的
    scripts/fetch_subactors_from_log.js
    脚本对已退出的运行执行相同操作。
  3. 提及schema简化:
    source_url[i]
    仅包含
    {domain, brand_mentioned_in_source, url}
    ——不再有旧版的
    ChatGPT_mention
    /
    Perplexity_mention
    等按引擎划分的标志。引擎归属必须从SERP子数据集的LLM结果子字段中重建(见上方SERP行说明)。
如果某列的来源数据缺失,填写
"Not found"
并在
Notes
中添加手动查找提示。切勿编造数据。

Step 5: Ahrefs Enrichment and Prospect Tier

步骤5:Ahrefs信息丰富与潜在合作方层级

For every unique domain that survived the Actor's filtering, fetch authority and traffic metrics via Ahrefs MCP. Call all three tools in parallel per domain (and across domains — batch parallelise to keep this step under a minute for typical 20–50 prospect lists):
Ahrefs toolUsed forColumn it populates
mcp__claude_ai_Ahrefs__site-explorer-domain-rating
(target = domain)
Domain Rating
Domain DR
mcp__claude_ai_Ahrefs__site-explorer-metrics
(target = article URL, mode =
exact
)
Page-level organic traffic (last 30 days)
Page Traffic
mcp__claude_ai_Ahrefs__site-explorer-backlinks-stats
(target = domain)
Referring domains count
Referring Domains
If Ahrefs returns no data (domain not indexed, page too new), set the column to
"-"
and add a
Notes
hint
"Ahrefs has no data — verify manually before pitching"
. Do not fabricate values.
Assign
Prospect Tier
using the thresholds matching the user's goal:
GoalTier ATier BTier C
Topical authority links to specific URL
DR ≥ 50 AND Page Traffic ≥ 300/moDR 30–49 OR Page Traffic 50–299everything below
Maximum link volume from any relevant site
DR ≥ 30 AND Page Traffic ≥ 100/moDR 15–29 OR Page Traffic 20–99everything below
Recover unlinked brand mentions
irrelevant — every mention is worth claiming; tier by DR alone (≥ 40 = A, 20–39 = B, < 20 = C)
Replace competitor links
tier by DR (≥ 50 = A, 30–49 = B, < 30 = C)
Custom
use the
Topical authority
thresholds
Surface tier breakdown to the user before Step 8 — let them confirm whether to draft emails for all tiers or only A/B.
对于所有通过Actor筛选的唯一域名,通过Ahrefs MCP获取权重和流量指标。针对每个域名并行调用这三个工具(同时跨域名批量并行处理,确保此步骤在典型的20-50个潜在合作方列表场景下耗时不超过1分钟):
Ahrefs工具用途填充列
mcp__claude_ai_Ahrefs__site-explorer-domain-rating
(目标=域名)
域名权重
Domain DR
mcp__claude_ai_Ahrefs__site-explorer-metrics
(目标=文章URL,模式=
exact
页面级自然搜索流量(过去30天)
Page Traffic
mcp__claude_ai_Ahrefs__site-explorer-backlinks-stats
(目标=域名)
引用域名数量
Referring Domains
如果Ahrefs未返回数据(域名未被索引、页面过新),将该列设置为
"-"
并在
Notes
中添加提示
"Ahrefs无数据——推广前请手动验证"
。切勿编造数值。
根据用户目标的阈值分配
Prospect Tier
目标A级B级C级
Topical authority links to specific URL
DR ≥ 50 且 Page Traffic ≥ 300/月DR 30–49 或 Page Traffic 50–299其余所有
Maximum link volume from any relevant site
DR ≥ 30 且 Page Traffic ≥ 100/月DR 15–29 或 Page Traffic 20–99其余所有
Recover unlinked brand mentions
无关——所有提及都值得认领;仅按DR划分层级(≥40=A级,20–39=B级,<20=C级)
Replace competitor links
按DR划分层级(≥50=A级,30–49=B级,<30=C级)
Custom
使用
Topical authority
的阈值
在步骤8之前向用户展示层级划分结果——让用户确认是否为所有层级或仅A/B级起草邮件。

Step 6: Skip Pass

步骤6:筛选流程

Before drafting any email, walk every row and apply skip rules. Skipped rows get
Outreach Status = "Skip"
, a one-line reason in
Notes
, and no email or placement is generated (saves tokens and user review time).
Skip rules (in order):
  1. Goal mismatch. If the goal is
    Recover unlinked brand mentions
    and the row's Mentions data shows
    brand_mentioned_in_source: false
    , skip. If the goal is
    Replace competitor links
    and the row's WCC body has no outbound link to any
    competitorDomains
    entry, skip.
  2. Already pitched. If the row's domain matches an entry in the optional already-pitched list from Step 1 input #7, skip.
  3. Own / competitor domain leak. The Actor should already filter these, but double-check — if the row's domain matches
    ownDomains
    or
    competitorDomains
    , skip.
  4. Stale content. If
    Publish Date
    is older than 5 years, skip (low chance the editor will update the post).
  5. URL-pattern skip. Skip rows whose URL matches any of these patterns:
    • Subdomain prefixes:
      developers.*
      ,
      docs.*
      ,
      support.*
      ,
      helpcenter.*
      ,
      legacy.*
      ,
      dsarequests.*
      ,
      connectivity.*
      ,
      community.*
      ,
      dev.*
      (when used as a doc subdomain — e.g.
      dev.example.com/api/
      ),
      api.*
      only when followed by a path that's clearly documentation (
      /reference/
      ,
      /docs/
      ,
      /spec/
      ). Do NOT skip
      rapidapi.com
      or other API-directory domains by this rule alone —
      api.*
      is a subdomain check, not a substring check.
    • Path patterns:
      /api-docs/
      ,
      /reference/
      ,
      /marketplace/
      ,
      /extensions/
      ,
      /profile/
      ,
      /users/
      ,
      /free-tools/
      ,
      /spec/
      ,
      /content/privacy
      ,
      /content/terms
      ,
      /content/dma
      ,
      /content/how_we_work
      ,
      /legal/
      ,
      /_redirects
      ,
      /sitemap
      .
    • Vendor product page patterns: URL ends in
      -scraper.php
      ,
      -scraping.php
      , contains
      -data-scraper.
      ,
      -data-scraping.
      ,
      /bots/
      ,
      /extension/
      ,
      /detail/
      (extension detail pages).
  6. Non-editorial page type. Inspect the WCC page body. Skip vendor product pages, pricing pages, login walls, sign-up pages, terms/legal pages, and pages with fewer than 400 words of body text. Word count <400 is the threshold — most editorial articles are 800+ words.
  7. UGC slipped through. If the page URL contains
    /forum/
    ,
    /thread/
    ,
    /comments/
    ,
    /answers/
    ,
    /q/
    ,
    /topic/
    ,
    /discussion/
    , or the WCC body is structured as discussion replies, skip.
  8. Category-fit gate (loose). Extract 4–6 category keywords from the user's product description (Step 1 input #4) — these describe the product category, not the specific subject of the user's URL. Examples for a web-scraping product:
    scrape
    ,
    scraping
    ,
    scraper
    ,
    crawl
    ,
    extract
    ,
    data extraction
    . For a CMS product:
    cms
    ,
    headless
    ,
    content
    ,
    editorial
    . The row's WCC body must contain at least 1 of these category keywords. If not, skip with reason
    Article isn't in user's product category (no '<kw>' match)
    — kills recipe blogs, finance articles, and other off-category content that slipped through SERP filtering.
    For non-English campaigns, include both source-language and English keywords in the category set — many Czech/German/French articles cite English brand names and product categories inline. Example for a Czech water-filtration brand:
    {filtr, filtrace, vod, voda, filter, filtration, water}
    . A pure-Czech keyword set would miss articles by Czech authors who write in mixed CS/EN.
    Known false negatives this rule can't catch (the per-row sub-agent in Step 8 must catch them):
    • Local e-commerce competitors selling the exact same product category. Past campaigns have seen multiple regional e-shops survive the mechanical pass — typically platform-based stores (e.g. Shoptet, Shopify) with "add to cart" buttons embedded in the article body. The sub-agents correctly skipped them, but the wasted compute is a smell. Future versions of this rule should detect platform fingerprints (platform bundle URLs, locale-specific add-to-cart strings,
      /eshop/
      , embedded product cards with prices in body) and pre-skip.
    • Category-name homonyms. "filtr" in Czech also means "filter" in the photography or coffee sense — a coffee-filter or camera-filter blog would pass this gate but isn't a real fit. Sub-agent catches these by reading the body context.
    The category gate is intentionally loose. It is a category check, not a subject check — fine-grained "does this specific article fit my specific URL?" is delegated to the per-row sub-agent in Step 8. Example: for a user URL specifically about scraping a single travel site, a general "python web scraping" guide that never mentions that travel site passes this gate because it's in the user's category. The Step 8 sub-agent then decides whether to draft a placement (e.g., an additive line that names the specific travel site) or to recommend a content-based skip.
    Surface the extracted category keyword list to the user at the start of Step 6 and let them add/remove before the pass runs.
  9. Adversarial-mention detection. When
    brand_mentioned_in_source: true
    , scan ±100 characters around the brand mention in the WCC body for negative-context tokens:
    vs
    ,
    versus
    ,
    alternative to
    ,
    alternatives to
    ,
    compared to
    ,
    compared with
    ,
    instead of
    ,
    better than
    ,
    worse than
    ,
    pros and cons
    ,
    comparison
    ,
    review of
    . If any of those appear within the window, skip with reason
    Adversarial mention (likely competitor comparison page) — won't link
    . This catches "ScrapeHero vs Apify" footer mentions, "alternatives to YourBrand" listicles, and similar non-link contexts. Critical — without this rule, the
    unlinked-mention-claim
    outreach type fires on dozens of false positives.
  10. No contact AND no editorial path. If
    Contact Email = "Not found"
    AND no
    Article Author
    AND no domain-level contact page found in WCC outbound links, skip — there is no one to pitch.
  11. Invalid email (from verification). When
    enableEmailVerification: true
    ran, inspect each row's
    Email Verification
    status. If the primary contact's status is
    invalid
    , try the alternate contacts first before skipping the row — past runs have lost Tier A candidates because the primary contact's email was invalid but a verified alternate existed on the same domain. Only skip with reason
    Email failed verification (invalid address)
    when no alternate has a verified or unchecked email. Statuses
    catch-all
    ,
    risky
    ,
    unknown
    are informational only (not auto-skipped) — surface them in the
    Email Verification
    column. Status
    verified
    ships as-is. If verification didn't run for this campaign, the column shows
    -
    and this rule is a no-op.
Never suggest external lookup services or workaround tools in Notes — no
hunter.io
, no
LinkedIn search
, no third-party verification services. The skill's job is to surface what we found, factually. When information is missing (no email, no author, etc.), state the gap and stop. The user knows where to look; suggesting their tools back at them is condescending and clutters the output.
A row failing any rule above is skipped before Step 7. Skipped rows still appear in the final output (so the user can see what was filtered) but with empty placement and email cells and
Outreach Status = "Skip"
plus the reason in
Notes
.
在起草任何邮件之前,遍历每条记录并应用筛选规则。被筛选的记录会标记为
Outreach Status = "Skip"
,在
Notes
中添加一行原因,并且不会生成邮件或链接放置方案(节省令牌和用户审核时间)。
筛选规则(按顺序):
  1. 目标不匹配。如果目标是
    Recover unlinked brand mentions
    且记录的提及数据显示
    brand_mentioned_in_source: false
    ,则筛选掉。如果目标是
    Replace competitor links
    且记录的WCC正文中没有指向任何
    competitorDomains
    条目的出站链接,则筛选掉。
  2. 已联系过。如果记录的域名与步骤1输入第7项的可选已联系列表中的条目匹配,则筛选掉。
  3. 自有/竞品域名泄露。Actor应已过滤这些域名,但需再次检查——如果记录的域名与
    ownDomains
    competitorDomains
    匹配,则筛选掉。
  4. 内容过时。如果
    Publish Date
    早于5年,则筛选掉(编辑更新该文章的概率极低)。
  5. URL模式筛选。如果URL匹配以下任一模式,则筛选掉:
    • 子域名前缀
      developers.*
      docs.*
      support.*
      helpcenter.*
      legacy.*
      dsarequests.*
      connectivity.*
      community.*
      dev.*
      (用作文档子域名时——例如
      dev.example.com/api/
      )、
      api.*
      仅当后续路径明显为文档时
      /reference/
      /docs/
      /spec/
      )。请勿仅通过此规则筛选
      rapidapi.com
      或其他API目录域名——
      api.*
      是子域名检查,而非子字符串检查。
    • 路径模式
      /api-docs/
      /reference/
      /marketplace/
      /extensions/
      /profile/
      /users/
      /free-tools/
      /spec/
      /content/privacy
      /content/terms
      /content/dma
      /content/how_we_work
      /legal/
      /_redirects
      /sitemap
    • 厂商产品页面模式:URL以
      -scraper.php
      -scraping.php
      结尾,包含
      -data-scraper.
      -data-scraping.
      /bots/
      /extension/
      /detail/
      (扩展详情页面)。
  6. 非编辑页面类型。检查WCC页面正文。筛选掉厂商产品页面、定价页面、登录墙、注册页面、条款/法律页面,以及正文少于400词的页面。400词是阈值——大多数编辑文章的字数在800词以上。
  7. UGC内容漏网。如果页面URL包含
    /forum/
    /thread/
    /comments/
    /answers/
    /q/
    /topic/
    /discussion/
    ,或WCC正文为讨论回复结构,则筛选掉。
  8. 品类适配校验(宽松)。从用户的产品描述(步骤1输入第4项)中提取4-6个品类关键词——这些关键词描述产品品类,而非用户URL的具体主题。例如,网页抓取产品的品类关键词:
    scrape
    scraping
    scraper
    crawl
    extract
    data extraction
    。CMS产品的品类关键词:
    cms
    headless
    content
    editorial
    。记录的WCC正文必须包含至少1个这些品类关键词。如果不包含,则筛选掉,原因填写
    Article isn't in user's product category (no '<kw>' match)
    (文章不属于用户的产品品类(无'<kw>'匹配))——这会过滤掉食谱博客、财经文章等通过SERP筛选的跨品类内容。
    针对非英文活动,品类关键词集需包含源语言和英文关键词——许多捷克/德国/法国文章会在行内引用英文品牌名称和产品品类。例如,捷克净水品牌的关键词集:
    {filtr, filtrace, vod, voda, filter, filtration, water}
    。纯捷克语关键词集会错过捷克作者混合使用捷克语/英语撰写的文章。
    本规则无法识别的已知假阳性(步骤8的逐行子代理必须识别):
    • 销售完全相同产品品类的本地电商竞品。过往活动中发现多个区域电商店铺通过了机械校验——通常是基于平台的店铺(如Shoptet、Shopify),文章正文中嵌入了“加入购物车”按钮。子代理会正确筛选掉这些记录,但浪费的计算资源是一个问题。未来版本的规则应检测平台特征(平台捆绑URL、特定地区的加入购物车字符串、
      /eshop/
      、正文中带价格的嵌入式产品卡片)并提前筛选。
    • 品类名称同音异义词。捷克语中的“filtr”也指摄影或咖啡领域的“过滤器”——咖啡过滤器或相机过滤器博客会通过此校验,但并非真正适配。子代理会通过读取正文上下文识别这些情况。
    品类校验故意设置为宽松。这是品类检查,而非主题检查——细粒度的“这篇具体文章是否适配我的具体URL?”会委托给步骤8的逐行子代理处理。示例:用户URL专门关于抓取单个旅游网站,而一篇通用的“python网页抓取”指南从未提及该旅游网站,会通过此校验,因为它属于用户的产品品类。步骤8的子代理会决定是否起草链接放置方案(例如,添加一行提及该特定旅游网站的内容)或建议基于内容的筛选。
    在步骤6开始时向用户展示提取的品类关键词列表,让用户在运行校验前添加/删除关键词。
  9. 对立提及检测。当
    brand_mentioned_in_source: true
    时,扫描WCC正文中品牌提及前后±100字符范围内的负面语境令牌
    vs
    versus
    alternative to
    alternatives to
    compared to
    compared with
    instead of
    better than
    worse than
    pros and cons
    comparison
    review of
    。如果其中任何一个出现在该范围内,则筛选掉,原因填写
    Adversarial mention (likely competitor comparison page) — won't link
    (对立提及(可能是竞品对比页面)——不会提供链接)。这会捕获“ScrapeHero vs Apify”页脚提及、“YourBrand替代方案”列表文章等非链接场景。至关重要——如果没有此规则,
    unlinked-mention-claim
    推广类型会在数十个假阳性场景下触发。
  10. 无联系人且无编辑路径。如果
    Contact Email = "Not found"
    且无
    Article Author
    且在WCC出站链接中未找到域名级联系页面,则筛选掉——没有可推广的对象。
  11. 无效邮箱(来自验证)。当运行
    enableEmailVerification: true
    时,检查每条记录的
    Email Verification
    状态。如果主联系人的状态为
    invalid
    先尝试备用联系人,再筛选掉该记录——过往运行中曾因主联系人邮箱无效但同一域名存在已验证的备用联系人而丢失A级候选。仅当没有备用联系人的邮箱为已验证或未检查状态时,才筛选掉,原因填写
    Email failed verification (invalid address)
    (邮箱验证失败(无效地址))。状态
    catch-all
    risky
    unknown
    仅为信息性(不会自动筛选)——在
    Email Verification
    列中展示。状态
    verified
    保持原样。如果本次活动未运行验证,该列显示
    -
    ,此规则不生效。
请勿在Notes中建议外部查找服务或变通工具——不要推荐
hunter.io
、LinkedIn搜索或第三方验证服务。本技能的职责是如实展示我们找到的信息。当信息缺失(无邮箱、无作者等)时,说明缺口即可停止。用户知道去哪里查找;向他们推荐工具显得 condescending 且会使输出杂乱。
任何违反上述规则的记录会在步骤7之前被筛选掉。被筛选的记录仍会出现在最终输出中(以便用户查看筛选内容),但链接放置和邮箱单元格为空,且
Outreach Status = "Skip"
并在Notes中添加原因。

Step 7: "Why This Prospect" Tags

步骤7:“选择该潜在合作方的理由”标签

For every surviving row, compute one or two
Why This Prospect
tags, prioritised by which makes the strongest pitch. These tags drive the outreach-type template selection in Step 8.
TagTriggerSource of truth
Mentions brand, no backlink
brand_mentioned_in_source: true
AND
backlink_in_source: false
in Mentions dataset
Mentions dataset
Links to competitor [domain]
WCC page body contains an outbound link whose host matches any
competitorDomains
entry
WCC dataset
Top-3 SERP for [keyword]
SERP Position
is 1, 2, or 3 for any keyword
Google Search Scraper sub-dataset
Resource / roundup page
WCC page body has 10+ outbound links AND the page title or H1 matches
/(best|top|list|roundup|tools|resources|guide to)/i
WCC dataset
Outdated content
Publish Date
is older than 24 months AND newer than 5 years (5+ years was already skipped)
Google Search Scraper / WCC
A row may carry up to two tags. Order them by pitch strength using this priority:
Mentions brand, no backlink
>
Links to competitor
>
Resource / roundup page
>
Top-3 SERP
>
Outdated content
. If no tag fits, leave the column as
"-"
— the row still gets pitched, just without a special angle.
对于所有留存的记录,计算一个或两个
Why This Prospect
标签,按推广力度优先级排序。这些标签会驱动步骤8中的推广类型模板选择。
标签触发条件数据源
Mentions brand, no backlink
(提及品牌但未添加链接)
提及数据集中
brand_mentioned_in_source: true
backlink_in_source: false
提及数据集
Links to competitor [domain]
(链接至竞品[域名])
WCC页面正文包含指向任何
competitorDomains
条目的出站链接
WCC数据集
Top-3 SERP for [keyword]
([关键词]的SERP前三)
任何关键词的
SERP Position
为1、2或3
Google Search Scraper子数据集
Resource / roundup page
(资源/汇总页面)
WCC页面正文有10+个出站链接,且页面标题或H1匹配
/(best|top|list|roundup|tools|resources|guide to)/i
WCC数据集
Outdated content
(过时内容)
Publish Date
早于24个月但晚于5年(5年以上已被筛选掉)
Google Search Scraper / WCC
一条记录最多可携带两个标签。按以下优先级排序:
Mentions brand, no backlink
>
Links to competitor
>
Resource / roundup page
>
Top-3 SERP
>
Outdated content
。如果没有标签匹配,该列留空为
"-"
——记录仍会被推广,只是没有特殊切入点。

Step 8: Compose Per-Row Placement and Email

步骤8:为每条记录撰写链接放置方案与邮件

Each surviving row gets three placement artifacts plus one email draft. Apply these quality rules without exception:
  1. No fabrication. If the article author or contact email is unknown, set the field to
    "Not found"
    and leave a one-line factual note (e.g.,
    "No email found for this contact"
    or
    "No author detected"
    ). Do not suggest external lookup tools or workarounds in Notes — see Step 6 rule 11 for the rationale. Just state the fact and stop.
  2. Prioritise editorial-leaning contacts. When the All leads dataset returned multiple contacts for the same domain, prefer the one whose
    jobTitle
    matches
    /editor|content|writer|managing|editorial|blog|copy/i
    , demote anyone whose
    jobTitle
    matches
    /ceo|cfo|cto|founder|chief|vp\b|president/i
    unless the company is a 1–5 person shop. Surface the chosen contact in the row; keep alternates in
    Notes
    as
    "Alternate contacts: <name1> (<title>), <name2> (<title>)"
    .
  3. Three placement artifacts — try strategies in this priority order. Use the WCC sub-dataset's page text. Try strategies 1 → 2 → 3 in order; stop at the first one that produces a clean fit. Record which strategy was used by prepending the
    Notes
    field with
    Placement: drop-in
    /
    Placement: additive
    /
    Placement: new insertion
    .
    Strategy 1 — drop-in (preferred). Find a sentence in the article where the user's URL can be added to existing words without changing any of the surrounding prose. The link goes on an existing word or short phrase the author already wrote. Output:
    • Placement Source Sentence
      = the verbatim sentence as it appears in the article.
    • Placement With Link
      = the same sentence with the link inserted on an existing word/phrase. No new prose, no rewording, no deletions. Example: source =
      "Tools like Octoparse and BeautifulSoup work well for hotel data."
      → with-link =
      "Tools like Octoparse and **[BeautifulSoup](URL)** work well for hotel data."
      (link added to existing word). The editor doesn't have to approve any new wording — just a hyperlink.
    • Placement New Insertion
      =
      "-"
      .
    Drop-in works when the article already names a brand, tool, or technique that maps cleanly to the user's URL. It's the lowest-friction ask of any outreach pattern: "could you add a hyperlink to a word you already wrote?"
    Strategy 2 — additive (second choice). When no drop-in target exists but the article has a sentence the user's URL would naturally follow, keep the original sentence intact and add one new sentence after it. The new sentence introduces an adjacent reader-need that the article doesn't already cover and that the user's URL addresses. Output:
    • Placement Source Sentence
      = the verbatim original sentence.
    • Placement With Link
      = original sentence kept verbatim, followed by
      and a one-sentence follow-on containing the link. Example: source =
      "By integrating with Acme Travel's APIs, developers can enrich their platforms with hotel data."
      → with-link =
      "By integrating with Acme Travel's APIs, developers can enrich their platforms with hotel data. → ...with hotel data. In need of competitor pricing data the API doesn't expose? Then you need a [hotel-data scraper](URL)."
      Keep the original sentence verbatim; the follow-on is the only new prose.
    • Placement New Insertion
      =
      "-"
      .
    The follow-on must (a) raise a reader need the existing sentence doesn't address, (b) connect that need to the user's URL, (c) be one sentence, ≤25 words, written in the article's voice.
    Strategy 3 — new insertion (last resort). Only when neither drop-in nor additive works (e.g., no relevant sentence exists in the article body). Draft a fully new 1–2 sentence paragraph in the article's voice with a precise insertion location:
    • Placement Source Sentence
      =
      "-"
      .
    • Placement With Link
      =
      "-"
      .
    • Placement New Insertion
      = the drafted paragraph + the exact anchor (
      "insert as a new paragraph immediately after the sentence ending in '…X.' in the section under H2 'Y'."
      ).
    If even a new insertion can't be drafted (the article is the wrong topic for the user's URL), set
    Outreach Status = "Skip"
    and add
    Notes: "No natural placement — article topic mismatch"
    . However, the topical-fit gate in Step 6 rule 8 should have caught this case already; if a row makes it to Step 8 and can't get a placement, treat that as a hint that the gate needs more keywords.
    Every surviving row goes through a sub-agent — not just Tier A/B/mention-only. The mechanical skip pass (Step 6) cuts the obviously bad prospects (competitor domains, doc subdomains, policy pages, dead-contact rows, off-category articles). Everything that passes is by definition a candidate worth real consideration, and the sub-agent makes the final fit call: read the article, attempt a placement (drop-in → additive → new insertion), and either draft email or return
    placement_strategy = "skip"
    with a content-specific reason. Python templates / regex / keyword scoring are not acceptable for the final draft — they produce mechanical splices that read awkward in context (we've seen this fail in practice on real campaigns).
    Spawn sub-agents in parallel: one per surviving row, each given the WCC text, user URL context, contact info, brand voice, and partnership offer. The output schema (placement strategy, the three placement column values, email subject + body, skip recommendation, notes) is what gets merged back into the spreadsheet row.
    A row that the sub-agent decides to skip after content review gets
    Outreach Status = "Skip"
    and the agent's reason in Notes — same shape as a Step 6 mechanical skip, just with a more nuanced rationale.
  4. Determine
    Outreach Type
    per row
    from the
    Why This Prospect
    tags + user goal:
    Trigger
    Outreach Type
    Tag
    Mentions brand, no backlink
    present
    unlinked-mention-claim
    Tag
    Links to competitor
    present
    competitor-link-replacement
    Tag
    Resource / roundup page
    present
    resource-page-inclusion
    Tag
    Outdated content
    present
    outdated-content-replacement
    None of the above OR only
    Top-3 SERP
    tag
    topical-niche-edit
    Pull the matching template from
    reference/email-templates.md
    . The user's Step 2
    Partnership type
    answer substitutes into the
    {{offer_paragraph}}
    placeholder inside the template — the outreach type determines structure and opening hook, the partnership type determines the offer.
  5. Suggested Email Copy
    must use the user's brand voice.
    Apply the voice paragraph verbatim per the voice substitution rules in
    reference/email-templates.md
    . If voice input was skipped, use the generic-professional default and note this in
    Notes
    .
5a. The email MUST include the exact placement wording — verbatim. The recipient should never have to ask "what's the wording you're suggesting?" or click through to a separate cell to see the proposed text. Embed the proposal directly:
  • For drop-in: quote both the source sentence and the linked version inline. Example:
    "In your line 'Tools like X and Y work well for Z', would you turn 'Y' into a hyperlink to <URL>?"
  • For additive: quote the anchor sentence verbatim AND the exact follow-on sentence you're proposing. Example:
    "Right after your sentence 'X happens because Y.', would you add: 'For the Z case specifically, see <URL>.'?"
  • For new insertion: quote the anchor sentence the new paragraph should follow, then the full proposed paragraph inline. Example:
    "In the 'Honorable mentions' section, after 'each platform has its own trade-offs.', would you add this paragraph: '<full paragraph with link>'?"
The email is the ask. If the wording isn't in the email, the ask is incomplete. Vague phrasing like "happy to draft it for you" / "happy to send exact wording" / "a follow-on sentence linking to..." is a content-skill bug — always rewrite to include the verbatim proposal.
  1. Word cap: emails are 150 words or less (subject + body combined).
  2. Personalisation is mandatory. Every email must open with a concrete reference to the specific article (title + a one-line takeaway from its content). No generic "I loved your article" openers.
每个留存的记录会获得三个链接放置产物加一封邮件草稿。严格遵循以下质量规则:
  1. 不得编造数据。如果文章作者或联系人邮箱未知,将字段设置为
    "Not found"
    并添加一行事实性说明(例如
    "No email found for this contact"
    "No author detected"
    )。请勿在Notes中建议外部查找工具或变通方法——见步骤6规则11的理由。只需陈述事实即可停止。
  2. 优先选择偏向编辑角色的联系人。当所有线索数据集返回同一域名的多个联系人时,优先选择
    jobTitle
    匹配
    /editor|content|writer|managing|editorial|blog|copy/i
    的联系人,降级
    jobTitle
    匹配
    /ceo|cfo|cto|founder|chief|vp\b|president/i
    的联系人,除非公司是1-5人的小型团队。在记录中展示选定的联系人;将备用联系人保留在Notes中,格式为
    "Alternate contacts: <name1> (<title>), <name2> (<title>)"
  3. 三个链接放置产物——按以下优先级尝试策略。使用WCC子数据集的页面文本。按策略1→2→3的顺序尝试;在第一个产生合适结果的策略处停止。通过在Notes字段前添加
    Placement: drop-in
    /
    Placement: additive
    /
    Placement: new insertion
    记录使用的策略。
    策略1——直接插入(首选)。在文章中找到一个句子,用户的URL可添加到现有文字中无需修改任何周边内容。链接添加到作者已写的现有单词或短语上。输出:
    • Placement Source Sentence
      = 文章中原文原句。
    • Placement With Link
      = 插入链接后的同一句子。不得添加新内容、不得改写、不得删除。示例:原文 =
      "Tools like Octoparse and BeautifulSoup work well for hotel data."
      → 插入链接后 =
      "Tools like Octoparse and **[BeautifulSoup](URL)** work well for hotel data."
      (链接添加到现有单词)。编辑无需批准任何新措辞——只需添加一个超链接。
    • Placement New Insertion
      =
      "-"
    当文章已提及与用户URL直接匹配的品牌、工具或技术时,直接插入策略适用。这是所有外展模式中最低摩擦的请求:“能否为你已写的单词添加一个超链接?”
    策略2——附加内容(次选)。当没有直接插入目标,但文章中有一个句子可自然跟随用户的URL时,保留原句不变,并在其后添加一个新句子。新句子提出文章未覆盖的相邻读者需求,且用户的URL可满足该需求。输出:
    • Placement Source Sentence
      = 原文原句。
    • Placement With Link
      = 保留原句不变,后跟
      和包含链接的一句附加内容。示例:原文 =
      "By integrating with Acme Travel's APIs, developers can enrich their platforms with hotel data."
      → 插入链接后 =
      "By integrating with Acme Travel's APIs, developers can enrich their platforms with hotel data. → ...with hotel data. In need of competitor pricing data the API doesn't expose? Then you need a [hotel-data scraper](URL)."
      保留原句不变;附加内容是唯一的新内容。
    • Placement New Insertion
      =
      "-"
    附加内容必须满足:(a) 提出现有句子未覆盖的读者需求,(b) 将该需求与用户的URL关联,(c) 为一个句子,≤25词,符合文章语气。
    策略3——新增插入(最后手段)。仅当直接插入和附加内容策略均不适用时(例如,文章正文中无相关句子)。撰写符合文章语气的1-2句完整段落,并指定精确的插入位置:
    • Placement Source Sentence
      =
      "-"
    • Placement With Link
      =
      "-"
    • Placement New Insertion
      = 撰写的段落 + 精确锚点(
      "insert as a new paragraph immediately after the sentence ending in '…X.' in the section under H2 'Y'."
      )。
    如果连新增插入都无法撰写(文章主题与用户URL不匹配),设置
    Outreach Status = "Skip"
    并添加
    Notes: "No natural placement — article topic mismatch"
    (无自然适配位置——文章主题不匹配)。不过,步骤6规则8的品类适配校验应已捕获此类情况;如果记录进入步骤8且无法生成链接放置方案,应视为校验需要更多关键词的提示。
    所有留存记录都需经过子代理处理——不仅限于A/B级/仅提及记录。机械筛选流程(步骤6)剔除了明显不合格的潜在合作方(竞品域名、文档子域名、政策页面、无联系人记录、跨品类文章)。所有通过筛选的记录理论上都值得认真考虑,子代理会做出最终适配判断:阅读文章,尝试链接放置(直接插入→附加内容→新增插入),并起草邮件或返回
    placement_strategy = "skip"
    及基于内容的原因。Python模板/正则表达式/关键词评分不适用于最终草稿——这些方法会生成在上下文中读起来生硬的机械拼接内容(我们已在实际活动中看到这种方法失败)。
    并行生成子代理:每个留存记录对应一个子代理,每个子代理会获得WCC文本、用户URL上下文、联系人信息、品牌调性及合作提议。输出schema(链接放置策略、三个链接放置列的值、邮件主题+正文、筛选建议、备注)会合并回电子表格记录中。
    子代理在内容审核后决定筛选的记录会标记为
    Outreach Status = "Skip"
    ,并在Notes中添加代理给出的原因——格式与步骤6的机械筛选相同,只是理由更细致。
  4. 根据
    Why This Prospect
    标签+用户目标确定每条记录的
    Outreach Type
    触发条件
    Outreach Type
    存在标签
    Mentions brand, no backlink
    unlinked-mention-claim
    (未链接提及认领)
    存在标签
    Links to competitor
    competitor-link-replacement
    (竞品链接替换)
    存在标签
    Resource / roundup page
    resource-page-inclusion
    (资源页面收录)
    存在标签
    Outdated content
    outdated-content-replacement
    (过时内容替换)
    无上述标签或仅存在
    Top-3 SERP
    标签
    topical-niche-edit
    (主题niche-edit)
    reference/email-templates.md
    中提取匹配的模板。用户步骤2的
    Partnership type
    回复会替换模板中的
    {{offer_paragraph}}
    占位符——推广类型决定结构和开场钩子,合作类型决定合作提议。
  5. Suggested Email Copy
    必须符合用户的品牌调性
    。严格按照
    reference/email-templates.md
    中的调性替换规则应用调性段落。如果未提供调性输入,使用通用专业风格默认值,并在Notes中说明。
5a. 邮件必须包含精确的链接放置措辞——原文原句。收件人无需询问“你建议的措辞是什么?”或点击单独的单元格查看提议内容。直接在邮件中嵌入提议:
  • 对于直接插入:内联引用原文原句和插入链接后的版本。示例:
    "In your line 'Tools like X and Y work well for Z', would you turn 'Y' into a hyperlink to <URL>?"
  • 对于附加内容:内联引用锚点句子原文及提议的精确附加句子。示例:
    "Right after your sentence 'X happens because Y.', would you add: 'For the Z case specifically, see <URL>.'?"
  • 对于新增插入:引用新段落应跟随的锚点句子,然后内联完整的提议段落。示例:
    "In the 'Honorable mentions' section, after 'each platform has its own trade-offs.', would you add this paragraph: '<full paragraph with link>'?"
邮件是请求。如果措辞不在邮件中,请求就是不完整的。模糊表述如“happy to draft it for you”/“happy to send exact wording”/“a follow-on sentence linking to...”属于内容技能错误——必须重写为包含原文原句提议的表述。
  1. 字数限制:邮件总字数(主题+正文)≤150词
  2. 必须个性化。每封邮件必须以对具体文章的明确引用开头(标题+对内容的一句总结)。禁止使用通用的“I loved your article”开场。

Step 9: Render Output

步骤9:渲染输出

Markdown format — agent renders inline in chat:
  1. A header line with the Apify run ID and tier breakdown (
    A: 8, B: 15, C: 7, Skipped: 12
    ).
  2. One Markdown table row per prospect with the most actionable columns (tier, why, contact, placement summary). Skipped rows render in a separate collapsed section at the bottom.
  3. Below the table, one fenced code block per non-skipped row containing the email draft (subject + body), labeled with the row index, tier, and outreach type.
xlsx format
scripts/write_xlsx.py --config campaign.json
writes a 2-sheet workbook after Steps 5–8 finish.
xlsx is written as two sheets:
  • Outreach
    — active rows only, full 30 columns. This is the send-ready deliverable. Sorted by
    Prospect Tier
    ascending (A first), then by
    Domain DR
    descending, then by
    SERP Position
    ascending.
  • Skipped
    — skipped rows with reduced columns:
    Domain
    ,
    Article URL
    ,
    Article Title
    ,
    Skip Reason
    (extracted from Notes),
    Source Engines
    ,
    Why This Prospect
    . This sheet exists for auditing what was filtered without cluttering the main view. Missing Ahrefs columns aren't visible here, so the empty-data confusion goes away.
The user opens the file and lands on
Outreach
by default — only actionable prospects. They can switch to
Skipped
to audit. This pattern replaces the older single-sheet-with-red-rows approach.
Both formats also produce a sidecar
run_metadata.json
:
{ runId, actorId, startedAt, finishedAt, inputs, datasetIds, tierCounts, skipCounts }
. Drop it next to the main output file.
Markdown格式——代理在聊天中内联渲染:
  1. 包含Apify run ID和层级划分的标题行(
    A: 8, B: 15, C: 7, Skipped: 12
    )。
  2. 每个潜在合作方对应一行Markdown表格,包含最具操作性的列(层级、选择理由、联系人、链接放置摘要)。被筛选的记录渲染在底部单独的折叠区域中。
  3. 表格下方,每个未被筛选的记录对应一个代码块,包含邮件草稿(主题+正文),并标记行索引、层级和推广类型。
xlsx格式——步骤5-8完成后,
scripts/write_xlsx.py --config campaign.json
生成包含两个工作表的工作簿。
xlsx包含两个工作表:
  • Outreach
    ——仅包含有效记录,共30列。这是可直接发送的交付物。按
    Prospect Tier
    升序排序(A级优先),然后按
    Domain DR
    降序,再按
    SERP Position
    升序。
  • Skipped
    ——包含被筛选的记录,列数简化:
    Domain
    Article URL
    Article Title
    Skip Reason
    (从Notes中提取)、
    Source Engines
    Why This Prospect
    。此工作表用于审核筛选内容,不会使主视图杂乱。Ahrefs相关空列不会在此展示,避免数据缺失的困惑。
用户打开文件时默认进入
Outreach
工作表——仅展示可操作的潜在合作方。他们可切换到
Skipped
工作表进行审核。此模式替代了旧版的单工作表红色标记行方法。
两种格式还会生成附属文件
run_metadata.json
{ runId, actorId, startedAt, finishedAt, inputs, datasetIds, tierCounts, skipCounts }
。将其放在主输出文件旁。

Output Row Schema (30 columns)

输出记录Schema(30列)

SERP Position
,
Source Engines
,
Keyword
,
Article Title
,
Article URL
,
Domain
,
Domain DR
,
Page Traffic
,
Referring Domains
,
Prospect Tier
,
Why This Prospect
,
Article Author
,
Author Source
,
Publish Date
,
Contact Full Name
,
Contact Job Title
,
Department
,
Seniority
,
Contact Email
,
Email Verification
(one of
verified
/
catch-all
/
risky
/
invalid
/
unknown
/
-
),
Contact LinkedIn
,
Company
,
Outreach Type
,
Partnership Offer
,
Placement Source Sentence
,
Placement With Link
,
Placement New Insertion
,
Suggested Email Copy
,
Outreach Status
(default
"Not started"
,
"Skip"
for skipped rows),
Notes
(auto-flags + skip reason + manual hints + alternate contacts).
Full schema with types and source datasets per column is in
reference/output-formats.md
.
SERP Position
Source Engines
Keyword
Article Title
Article URL
Domain
Domain DR
Page Traffic
Referring Domains
Prospect Tier
Why This Prospect
Article Author
Author Source
Publish Date
Contact Full Name
Contact Job Title
Department
Seniority
Contact Email
Email Verification
(取值为
verified
/
catch-all
/
risky
/
invalid
/
unknown
/
-
)、
Contact LinkedIn
Company
Outreach Type
Partnership Offer
Placement Source Sentence
Placement With Link
Placement New Insertion
Suggested Email Copy
Outreach Status
(默认
"Not started"
,被筛选记录为
"Skip"
)、
Notes
(自动标记+筛选原因+手动提示+备用联系人)。
包含各列类型和数据源的完整schema见
reference/output-formats.md

Error Handling

错误处理

Error / symptomWhat to do
APIFY_TOKEN not found
Ask user to create
.env
with
APIFY_TOKEN=your_token
. Get one at console.apify.com/account/integrations.
Ahrefs MCP unavailableSkip Step 5. Set
Domain DR
,
Page Traffic
,
Referring Domains
,
Prospect Tier
to
"-"
and add a one-line note in the output header explaining tiers were not computed. Continue with the rest of the workflow.
Cannot find module 'xlsx'
Run
npm install
inside the skill's
scripts/
folder.
Error: 'brand' is required
The user skipped Step 1 anchor #3. Re-ask brand name.
Actor run
TIMED-OUT
(client-side, Actor still running on Apify)
Do not restart. Use
node --env-file=.env scripts/fetch_run_artifacts.js --run-id <runId> --output YYYY-MM-DD_outreach.json --timeout 1800
to poll the existing run and download all datasets when it terminates. Same output shape as the runner.
Actor run
TIMED-OUT
(Actor itself ran past its
timeoutSecs
)
See
reference/troubleshooting.md
. Lower
organicResult
, cut keywords, or disable some LLM engines. Almost never the cause — usually it's the client-side wait.
Author = Not found
Expected for ~30% of pages without bylines. Skill writes
"Not found"
and adds a manual-lookup hint to
Notes
. Do not fabricate. The AI Web Scraper sub-Actor frequently TIMES-OUT at its 300s default mid-crawl (past campaigns have lost author data for ~7 high-DR sites at a time this way); when this happens the
_authors.json
dataset still contains the partial results that finished before the timeout. WCC
metadata.author
/ openGraph
article:author
/ JSON-LD
Person.name
are the fallbacks the agent should try before writing
"Not found"
.
Sub-Actor datasets missing (no
_mentions.json
/
_authors.json
/
_serp.json
/
_wcc.json
after
--fetch-sub-datasets
)
Actor's output schema changed on or before 2026-05-20:
SUB_ACTOR_RESULTS
is no longer in the KV store, and
MENTIONS
/
AUTHORS
/
DOMAINS_WITH_LEADS
named datasets are gone. The runner script now falls back to log-parsing automatically — but if it didn't, run
node --env-file=.env scripts/fetch_subactors_from_log.js --run-id <id> --base <prefix>
to populate
_serp.json
,
_wcc.json
,
_authors.json
from the sub-Actor runIds visible in the parent log. Mention data is now embedded in
main_leads[i].source_url[]
, not a separate file.
Contact Email = Not found
Contact Details Scraper sub-Actor missed the site. Set the field to
"Not found"
and leave a one-line factual
Notes
entry (
"No email found for this contact"
). Do not fabricate, do not suggest external tools. If both contact and author are missing, the skip pass (Step 6, rule 7) will drop the row.
All rows skipped by goal filterThe user's goal is too narrow for the SERP results. Suggest broadening the goal (e.g.,
Topical authority
instead of
Recover unlinked brand mentions
) or expanding keywords.
0 leads returned
Keyword too narrow, or
ownDomains
/
competitorDomains
filtered out all SERP results. Broaden keyword, narrow exclusions, raise
organicResult
.
Costs higher than expectedSub-Actor fan-out (Google Search Scraper, WCC, Contact Details Scraper, AI Web Scraper) stacks. See cost section in
reference/apify-actor-usage.md
. To shrink: drop
searchAuthorName
, disable AI platforms (
enableChatGpt: false
, etc.), lower
maxContactsPerDomain
to 1, lower
organicResult
to 5.
错误/症状处理方式
APIFY_TOKEN not found
要求用户创建包含
APIFY_TOKEN=your_token
.env
文件。可在console.apify.com/account/integrations获取令牌。
Ahrefs MCP不可用跳过步骤5。将
Domain DR
Page Traffic
Referring Domains
Prospect Tier
设置为
"-"
,并在输出标题中添加一行说明,解释未计算层级。继续执行其余工作流程。
Cannot find module 'xlsx'
在技能的
scripts/
文件夹中运行
npm install
Error: 'brand' is required
用户跳过了步骤1核心输入第3项。重新询问品牌名称。
Actor运行
TIMED-OUT
(客户端超时,Apify上的Actor仍在运行)
请勿重启。使用
node --env-file=.env scripts/fetch_run_artifacts.js --run-id <runId> --output YYYY-MM-DD_outreach.json --timeout 1800
轮询现有运行,待其终止后下载所有数据集。输出格式与运行器相同。
Actor运行
TIMED-OUT
(Actor自身超时,超过
timeoutSecs
reference/troubleshooting.md
。降低
organicResult
、减少关键词或禁用部分LLM引擎。几乎不会出现此情况——通常是客户端等待超时。
Author = Not found
约30%无署名的页面会出现此情况。本技能会填写
"Not found"
并在Notes中添加手动查找提示。不得编造数据。AI Web Scraper子Actor经常在默认300秒超时时间内中途超时(过往活动中曾因此丢失约7个高DR站点的作者数据);发生此情况时,
_authors.json
数据集仍包含超时前完成的部分结果。在填写
"Not found"
之前,代理应尝试使用WCC的
metadata.author
/ openGraph的
article:author
/ JSON-LD的
Person.name
作为 fallback。
子Actor数据集缺失(
--fetch-sub-datasets
后无
_mentions.json
/
_authors.json
/
_serp.json
/
_wcc.json
Actor的输出schema在2026-05-20或之前已变更:KV存储中不再有
SUB_ACTOR_RESULTS
,且
MENTIONS
/
AUTHORS
/
DOMAINS_WITH_LEADS
命名数据集已消失。运行器脚本现在会自动回退到日志解析——如果未生效,运行
node --env-file=.env scripts/fetch_subactors_from_log.js --run-id <id> --base <prefix>
,从主日志中可见的子Actor runId生成
_serp.json
_wcc.json
_authors.json
。提及数据现在嵌入在
main_leads[i].source_url[]
中,而非单独文件。
Contact Email = Not found
Contact Details Scraper子Actor未找到该站点的邮箱。将字段设置为
"Not found"
并在Notes中添加一行事实性说明(
"No email found for this contact"
)。不得编造数据,不得建议外部工具。如果联系人和作者均缺失,筛选流程(步骤6规则7)会剔除该记录。
所有记录因目标筛选被剔除用户的目标相对于SERP结果过于狭窄。建议放宽目标(例如,将
Recover unlinked brand mentions
改为
Topical authority
)或扩展关键词。
0 leads returned
关键词过于狭窄,或
ownDomains
/
competitorDomains
过滤掉了所有SERP结果。放宽关键词、缩小排除范围、提高
organicResult
成本高于预期子Actor扩展(Google Search Scraper、WCC、Contact Details Scraper、AI Web Scraper)叠加导致成本增加。见
reference/apify-actor-usage.md
中的成本部分。降低成本的方法:关闭
searchAuthorName
、禁用AI平台(
enableChatGpt: false
等)、将
maxContactsPerDomain
降至1、将
organicResult
降至5。